Re: [xwiki-users] Attachments lost !

19 Nov 2010

Hi Piotr,
Piotr Dziubecki wrote:
...
  Hi,
 today I've noticed that something bad had happen to some of the attachments in my
XWiki, here is a
 screenshot from one of the affected pages:
 http://i.imgur.com/p6Xs7.png
 Take a look, a couple of attachments have been uploaded but only one is displayed in the
attachment tab.
 Person who uploaded them claims that yesterday they were ok, but today somehow they
disappeared.
 It's weird that there is no trace of any operation on them after the uploading phase.
 I'm using XWiki Enterprise 2.5.32127 with MySQL data base (Server version  5.1.47).
 To add more context, last days my users started to add more attachements to their pages.
Currently the
 database after the dump is around 200 MB large.
 Also looked at the logs and found several interesting fragments ( all of the log snippets
are from the time
 this have been noticed ):
 2010-11-18 09:03:09,355
[http://apps.man.poznan.pl:28181/xwiki/bin/download/Documents/Proposals/2009…]
 ERROR web.XWikiAction                 - Connection aborted
 Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: Followed by 3999
 Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: Followed by 3999
 Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: Followed by 3999
 Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: Followed by 3999
 2010-11-18 13:23:53,118 [http://localhost:28181/xwiki/bin/view/Projects/Opinion+Mining]
WARN
 xwiki.MyPersistentLoginManager  - Login cookie validation hash mismatch! Cookies have
been tampered with
 2010-11-18 13:23:53,119 [http://localhost:28181/xwiki/bin/view/Projects/Opinion+Mining]
WARN
 xwiki.MyPersistentLoginManager  - Login cookie validation hash mismatch! Cookies have
been tampered with
 Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: Followed by 3999
 Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: Followed by 3999
 Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: Followed by 3999
 Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: Followed by 3999
 2010-11-18 13:57:55,471 [Lucene Index Updater] WARN  lucene.AttachmentData           -
error getting content
 of attachment [2009BEinGRIDwow2greenCONTEXTREVIEW.PPT] for document
[xwiki:Documents.Presentations]
 org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from
 org.apache.tika.parser.microsoft.OfficeParser@72be25d1
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:138)
          at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
          at org.apache.tika.Tika.parseToString(Tika.java:267)
          at
com.xpn.xwiki.plugin.lucene.AttachmentData.getContentAsText(AttachmentData.java:161)
          at
com.xpn.xwiki.plugin.lucene.AttachmentData.getFullText(AttachmentData.java:136)
          at com.xpn.xwiki.plugin.lucene.IndexData.getFullText(IndexData.java:190)
          at
com.xpn.xwiki.plugin.lucene.IndexData.addDataToLuceneDocument(IndexData.java:146)
          at
com.xpn.xwiki.plugin.lucene.AttachmentData.addDataToLuceneDocument(AttachmentData.java:65)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.addToIndex(IndexUpdater.java:296)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.updateIndex(IndexUpdater.java:237)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.runMainLoop(IndexUpdater.java:171)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.runInternal(IndexUpdater.java:153)
          at com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:99)
          at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: Cannot remove block[ 4209 ]; out of range[ 0 - 3804 ]
          at org.apache.poi.poifs.storage.BlockListImpl.remove(BlockListImpl.java:98)
          at
org.apache.poi.poifs.storage.RawDataBlockList.remove(RawDataBlockList.java:32)
          at
org.apache.poi.poifs.storage.BlockAllocationTableReader.<init>(BlockAllocationTableReader.java:99)
          at
org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:164)
          at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:74)
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
          ... 13 more
 Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: Followed by 3999
 Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: Followed by 3999
 Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: Followed by 3999
 Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: Followed by 3999
 Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: Followed by 4006
 Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom: Followed by 4006
 2010-11-18 15:05:10,412
[http://apps.man.poznan.pl:28181/xwiki/bin/download/Documents/Presentations/…]
 ERROR web.XWikiAction                 - Connection aborted
 Unfotunately, today this situation has repeated with other group of  users, the same
scenario - after the
 attachment submission and few edits of the page, they are gone. A snippet from the log
from that period of
 time ( a lot of that warnings ):
 2010-11-19 10:43:37,199 [Lucene Index Updater] WARN  util.PDFStreamEngine            -
java.io.IOException:
 Error: expected hex character and not  :32
 java.io.IOException: Error: expected hex character and not  :32
          at org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:316)
          at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:138)
          at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:549)
          at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:383)
          at
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:372)
          at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
          at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:552)
          at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:248)
          at org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:74)
          at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:552)
          at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:248)
          at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:207)
          at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367)
          at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291)
          at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247)
          at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
          at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
          at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:79)
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
          at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
          at org.apache.tika.Tika.parseToString(Tika.java:267)
          at
com.xpn.xwiki.plugin.lucene.AttachmentData.getContentAsText(AttachmentData.java:161)
          at
com.xpn.xwiki.plugin.lucene.AttachmentData.getFullText(AttachmentData.java:136)
          at com.xpn.xwiki.plugin.lucene.IndexData.getFullText(IndexData.java:190)
          at
com.xpn.xwiki.plugin.lucene.IndexData.addDataToLuceneDocument(IndexData.java:146)
          at
com.xpn.xwiki.plugin.lucene.AttachmentData.addDataToLuceneDocument(AttachmentData.java:65)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.addToIndex(IndexUpdater.java:296)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.updateIndex(IndexUpdater.java:237)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.runMainLoop(IndexUpdater.java:171)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.runInternal(IndexUpdater.java:153)
          at com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:99)
          at java.lang.Thread.run(Thread.java:662)
 One more from another user:
 2010-11-19 10:43:37,464 [Lucene Index Updater] WARN  util.PDFStreamEngine            -
java.io.IOException:
 Error: expected hex character and not  :32
 java.io.IOException: Error: expected hex character and not  :32
          at org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:316)
          at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:138)
          at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:549)
          at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:383)
          at
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:372)
          at org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:61)
          at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:552)
          at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:248)
          at org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:74)
          at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:552)
          at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:248)
          at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:207)
          at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367)
          at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291)
          at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247)
          at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
          at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
          at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:79)
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
          at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
          at org.apache.tika.Tika.parseToString(Tika.java:267)
          at
com.xpn.xwiki.plugin.lucene.AttachmentData.getContentAsText(AttachmentData.java:161)
          at
com.xpn.xwiki.plugin.lucene.AttachmentData.getFullText(AttachmentData.java:142)
          at com.xpn.xwiki.plugin.lucene.IndexData.getFullText(IndexData.java:190)
          at
com.xpn.xwiki.plugin.lucene.IndexData.addDataToLuceneDocument(IndexData.java:146)
          at
com.xpn.xwiki.plugin.lucene.AttachmentData.addDataToLuceneDocument(AttachmentData.java:65)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.addToIndex(IndexUpdater.java:296)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.updateIndex(IndexUpdater.java:237)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.runMainLoop(IndexUpdater.java:171)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.runInternal(IndexUpdater.java:153)
          at com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:99)
          at java.lang.Thread.run(Thread.java:662)
 2010-11-19 11:32:39,900 [Lucene Index Updater] WARN  lucene.AttachmentData           -
error getting content
 of attachment [2008BEinGRIDdesignconceptdiagramdoneinVisio.vsd] for document
[xwiki:Documents.Diagrams]
 org.apache.tika.exception.TikaException: Unexpected RuntimeException from
 org.apache.tika.parser.microsoft.OfficeParser@54ad9fa4
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:134)
          at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
          at org.apache.tika.Tika.parseToString(Tika.java:267)
          at
com.xpn.xwiki.plugin.lucene.AttachmentData.getContentAsText(AttachmentData.java:161)
          at
com.xpn.xwiki.plugin.lucene.AttachmentData.getFullText(AttachmentData.java:136)
          at com.xpn.xwiki.plugin.lucene.IndexData.getFullText(IndexData.java:190)
          at
com.xpn.xwiki.plugin.lucene.IndexData.addDataToLuceneDocument(IndexData.java:146)
          at
com.xpn.xwiki.plugin.lucene.AttachmentData.addDataToLuceneDocument(AttachmentData.java:65)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.addToIndex(IndexUpdater.java:296)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.updateIndex(IndexUpdater.java:237)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.runMainLoop(IndexUpdater.java:171)
          at com.xpn.xwiki.plugin.lucene.IndexUpdater.runInternal(IndexUpdater.java:153)
          at com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:99)
          at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.IllegalArgumentException: Found a chunk with a negative length,
which isn't allowed
          at org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:120)
          at org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:59)
          at
org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:93)
          at
org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
          at
org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
          at org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:95)
          at
org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:52)
          at
org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:49)
          at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:127)
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
          ... 13 more
 I'm counting on your help since I don't know it's more XWiki issue or maybe I
misconfigured something.
 Regards,
 Piotr
 _______________________________________________
 users mailing list
 users(a)xwiki.org
 http://lists.xwiki.org/mailman/listinfo/users
    
I think you could be facing two kind of problems: one related with
memory availability (the one causing attachements to "dissapear") and
other one related to Lucene and some incompatibilities with Microsoft/
Microsoft Office files.
Concerning the problem related with memory availability, please, check
this two links:
http://www.xwiki.org/xwiki/bin/view/FAQ/Howtoincreasethemaximumattachmentsi…
http://www.xwiki.org/xwiki/bin/view/FAQ/HowToSolveAJavaHeapMemoryError
I'm not sure if this issus could lead to corrupted attachments or only
to failures in the process. But I think it is worth to take them into
account.
There are some recent quite interesting threads in devs list dealing
with a proposal from Caleb. Just look for attachments in titles there.
Sorry if I'm repeating this proposal!
Concerning Lucene errors. I do need to solve this also here. I've seeing
also here issues with Lucene and Office files. Do you mind I try here
with the attachments are causing you problems? Are there quite big?
Could you send me a couple of them or make than available at any place?
I can install on Monday recent XE snapshot in my dev box and you could
upload them there, but I would already try them on my laptop.
Thanks!
Cheers,
Ricardo
--
Ricardo Rodríguez
CTO
eBioTIC.
Life Sciences, Data Modeling and Information Management Systems

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [xwiki-users] Attachments lost !