Re: [xwiki-users] Attachments lost !

6 Dec 2010

Caleb James DeLisle wrote:
...
  On 11/30/2010 05:46 PM, Ricardo Rodriguez [eBioTIC.]
wrote:
  Piotr Dziubecki wrote:
  Hi Sergiu,
 W dniu 10-11-22 12:58, Sergiu Dumitriu wrote:
  On 11/22/2010 11:20 AM, Piotr Dziubecki wrote:
> Hi Ricardo,
>
> W dniu 10-11-19 19:37, Ricardo Rodriguez [eBioTIC.] wrote:
>
>
>> Hi Piotr,
>>
>> Piotr Dziubecki wrote:
>>
>>
>>> Hi,
>>>
>>> today I've noticed that something bad had happen to some of the
attachments in my XWiki, here is a
>>> screenshot from one of the affected pages:
>>>
>>> http://i.imgur.com/p6Xs7.png
>>>
>>> Take a look, a couple of attachments have been uploaded but only one is
displayed in the attachment tab.
>>> Person who uploaded them claims that yesterday they were ok, but today
somehow they disappeared.
>>>
>>> It's weird that there is no trace of any operation on them after the
uploading phase.
>>>
>>> I'm using XWiki Enterprise 2.5.32127 with MySQL data base (Server version
5.1.47).
>>>
>>> To add more context, last days my users started to add more attachements to
their pages. Currently the
>>> database after the dump is around 200 MB large.
>>>
>>> Also looked at the logs and found several interesting fragments ( all of the
log snippets are from the time
>>> this have been noticed ):
>>>
>>> 2010-11-18 09:03:09,355
>>>
[http://apps.man.poznan.pl:28181/xwiki/bin/download/Documents/Proposals/2009…]
>>> ERROR web.XWikiAction                 - Connection aborted
>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom:
Followed by 3999
>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom:
Followed by 3999
>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom:
Followed by 3999
>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom:
Followed by 3999
>>> 2010-11-18 13:23:53,118
[http://localhost:28181/xwiki/bin/view/Projects/Opinion+Mining] WARN
>>> xwiki.MyPersistentLoginManager  - Login cookie validation hash mismatch!
Cookies have been tampered with
>>> 2010-11-18 13:23:53,119
[http://localhost:28181/xwiki/bin/view/Projects/Opinion+Mining] WARN
>>> xwiki.MyPersistentLoginManager  - Login cookie validation hash mismatch!
Cookies have been tampered with
>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom:
Followed by 3999
>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom:
Followed by 3999
>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom:
Followed by 3999
>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom:
Followed by 3999
>>> 2010-11-18 13:57:55,471 [Lucene Index Updater] WARN  lucene.AttachmentData
- error getting content
>>> of attachment [2009BEinGRIDwow2greenCONTEXTREVIEW.PPT] for document
[xwiki:Documents.Presentations]
>>> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from
>>> org.apache.tika.parser.microsoft.OfficeParser@72be25d1
>>>             at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:138)
>>>             at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
>>>             at org.apache.tika.Tika.parseToString(Tika.java:267)
>>>             at
com.xpn.xwiki.plugin.lucene.AttachmentData.getContentAsText(AttachmentData.java:161)
>>>             at
com.xpn.xwiki.plugin.lucene.AttachmentData.getFullText(AttachmentData.java:136)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexData.getFullText(IndexData.java:190)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexData.addDataToLuceneDocument(IndexData.java:146)
>>>             at
com.xpn.xwiki.plugin.lucene.AttachmentData.addDataToLuceneDocument(AttachmentData.java:65)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.addToIndex(IndexUpdater.java:296)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.updateIndex(IndexUpdater.java:237)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.runMainLoop(IndexUpdater.java:171)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.runInternal(IndexUpdater.java:153)
>>>             at
com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:99)
>>>             at java.lang.Thread.run(Thread.java:662)
>>> Caused by: java.io.IOException: Cannot remove block[ 4209 ]; out of range[ 0
- 3804 ]
>>>             at
org.apache.poi.poifs.storage.BlockListImpl.remove(BlockListImpl.java:98)
>>>             at
org.apache.poi.poifs.storage.RawDataBlockList.remove(RawDataBlockList.java:32)
>>>             at
org.apache.poi.poifs.storage.BlockAllocationTableReader.<init>(BlockAllocationTableReader.java:99)
>>>             at
org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:164)
>>>             at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:74)
>>>             at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
>>>             ... 13 more
>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom:
Followed by 3999
>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom:
Followed by 3999
>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom:
Followed by 3999
>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom:
Followed by 3999
>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom:
Followed by 4006
>>> Found a TextHeaderAtom not followed by a TextBytesAtom or TextCharsAtom:
Followed by 4006
>>> 2010-11-18 15:05:10,412
>>>
[http://apps.man.poznan.pl:28181/xwiki/bin/download/Documents/Presentations/…]
>>> ERROR web.XWikiAction                 - Connection aborted
>>>
>>>
>>>
>>> Unfotunately, today this situation has repeated with other group of  users,
the same scenario - after the
>>> attachment submission and few edits of the page, they are gone. A snippet
from the log from that period of
>>> time ( a lot of that warnings ):
>>>
>>> 2010-11-19 10:43:37,199 [Lucene Index Updater] WARN  util.PDFStreamEngine
- java.io.IOException:
>>> Error: expected hex character and not  :32
>>> java.io.IOException: Error: expected hex character and not  :32
>>>             at
org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:316)
>>>             at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:138)
>>>             at
org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:549)
>>>             at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:383)
>>>             at
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:372)
>>>             at
org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
>>>             at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:552)
>>>             at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:248)
>>>             at org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:74)
>>>             at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:552)
>>>             at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:248)
>>>             at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:207)
>>>             at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367)
>>>             at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291)
>>>             at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247)
>>>             at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
>>>             at
org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
>>>             at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:79)
>>>             at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
>>>             at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
>>>             at org.apache.tika.Tika.parseToString(Tika.java:267)
>>>             at
com.xpn.xwiki.plugin.lucene.AttachmentData.getContentAsText(AttachmentData.java:161)
>>>             at
com.xpn.xwiki.plugin.lucene.AttachmentData.getFullText(AttachmentData.java:136)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexData.getFullText(IndexData.java:190)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexData.addDataToLuceneDocument(IndexData.java:146)
>>>             at
com.xpn.xwiki.plugin.lucene.AttachmentData.addDataToLuceneDocument(AttachmentData.java:65)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.addToIndex(IndexUpdater.java:296)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.updateIndex(IndexUpdater.java:237)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.runMainLoop(IndexUpdater.java:171)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.runInternal(IndexUpdater.java:153)
>>>             at
com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:99)
>>>             at java.lang.Thread.run(Thread.java:662)
>>>
>>>
>>> One more from another user:
>>>
>>> 2010-11-19 10:43:37,464 [Lucene Index Updater] WARN  util.PDFStreamEngine
- java.io.IOException:
>>> Error: expected hex character and not  :32
>>> java.io.IOException: Error: expected hex character and not  :32
>>>             at
org.apache.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:316)
>>>             at org.apache.fontbox.cmap.CMapParser.parse(CMapParser.java:138)
>>>             at
org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:549)
>>>             at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:383)
>>>             at
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:372)
>>>             at
org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:61)
>>>             at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:552)
>>>             at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:248)
>>>             at org.apache.pdfbox.util.operator.Invoke.process(Invoke.java:74)
>>>             at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:552)
>>>             at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:248)
>>>             at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:207)
>>>             at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:367)
>>>             at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:291)
>>>             at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:247)
>>>             at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
>>>             at
org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
>>>             at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:79)
>>>             at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
>>>             at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
>>>             at org.apache.tika.Tika.parseToString(Tika.java:267)
>>>             at
com.xpn.xwiki.plugin.lucene.AttachmentData.getContentAsText(AttachmentData.java:161)
>>>             at
com.xpn.xwiki.plugin.lucene.AttachmentData.getFullText(AttachmentData.java:142)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexData.getFullText(IndexData.java:190)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexData.addDataToLuceneDocument(IndexData.java:146)
>>>             at
com.xpn.xwiki.plugin.lucene.AttachmentData.addDataToLuceneDocument(AttachmentData.java:65)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.addToIndex(IndexUpdater.java:296)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.updateIndex(IndexUpdater.java:237)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.runMainLoop(IndexUpdater.java:171)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.runInternal(IndexUpdater.java:153)
>>>             at
com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:99)
>>>             at java.lang.Thread.run(Thread.java:662)
>>> 2010-11-19 11:32:39,900 [Lucene Index Updater] WARN  lucene.AttachmentData
- error getting content
>>> of attachment [2008BEinGRIDdesignconceptdiagramdoneinVisio.vsd] for document
[xwiki:Documents.Diagrams]
>>> org.apache.tika.exception.TikaException: Unexpected RuntimeException from
>>> org.apache.tika.parser.microsoft.OfficeParser@54ad9fa4
>>>             at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:134)
>>>             at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
>>>             at org.apache.tika.Tika.parseToString(Tika.java:267)
>>>             at
com.xpn.xwiki.plugin.lucene.AttachmentData.getContentAsText(AttachmentData.java:161)
>>>             at
com.xpn.xwiki.plugin.lucene.AttachmentData.getFullText(AttachmentData.java:136)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexData.getFullText(IndexData.java:190)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexData.addDataToLuceneDocument(IndexData.java:146)
>>>             at
com.xpn.xwiki.plugin.lucene.AttachmentData.addDataToLuceneDocument(AttachmentData.java:65)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.addToIndex(IndexUpdater.java:296)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.updateIndex(IndexUpdater.java:237)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.runMainLoop(IndexUpdater.java:171)
>>>             at
com.xpn.xwiki.plugin.lucene.IndexUpdater.runInternal(IndexUpdater.java:153)
>>>             at
com.xpn.xwiki.util.AbstractXWikiRunnable.run(AbstractXWikiRunnable.java:99)
>>>             at java.lang.Thread.run(Thread.java:662)
>>> Caused by: java.lang.IllegalArgumentException: Found a chunk with a negative
length, which isn't allowed
>>>             at
org.apache.poi.hdgf.chunks.ChunkFactory.createChunk(ChunkFactory.java:120)
>>>             at
org.apache.poi.hdgf.streams.ChunkStream.findChunks(ChunkStream.java:59)
>>>             at
org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:93)
>>>             at
org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
>>>             at
org.apache.poi.hdgf.streams.PointerContainingStream.findChildren(PointerContainingStream.java:100)
>>>             at
org.apache.poi.hdgf.HDGFDiagram.<init>(HDGFDiagram.java:95)
>>>             at
org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:52)
>>>             at
org.apache.poi.hdgf.extractor.VisioTextExtractor.<init>(VisioTextExtractor.java:49)
>>>             at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:127)
>>>             at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
>>>             ... 13 more
>>>
>>>
>>> I'm counting on your help since I don't know it's more XWiki
issue or maybe I misconfigured something.
>>>
>>> Regards,
>>> Piotr
>>> _______________________________________________
>>> users mailing list
>>> users(a)xwiki.org
>>> http://lists.xwiki.org/mailman/listinfo/users
>>>
>>>
>>>
>>>
>> I think you could be facing two kind of problems: one related with
>> memory availability (the one causing attachements to "dissapear") and
>> other one related to Lucene and some incompatibilities with Microsoft/
>> Microsoft Office files.
>>
>> Concerning the problem related with memory availability, please, check
>> this two links:
>>
>> http://www.xwiki.org/xwiki/bin/view/FAQ/Howtoincreasethemaximumattachmentsi…
>> http://www.xwiki.org/xwiki/bin/view/FAQ/HowToSolveAJavaHeapMemoryError
>>
>>
> I've already done that - I'm storing attachments 20MB size without any errors
while uploading.
>
>
>
>> I'm not sure if this issus could lead to corrupted attachments or only
>> to failures in the process. But I think it is worth to take them into
>> account.
>>
>>
> What scares me is the fact that even if something went wrong I have no visible
warning or transaction's
> rollback. It's ending in the middle and confuses users.
>
>
 If you're using MySQL, then it's a limitation of the default myisam
 engine, which doesn't have support for transactions. You should switch
 to innodb.
 If you're not on myisam, then there's a bug in the storage.
           Thanks for that info, indeed we had that default one enabled. Now
we've switched to the innodb and we are
 monitoring our documents.
 I hope that will solve our problem.
 Thanks,
 Piotr
         Does this add a reason to always use innodb as engine when running XWiki
 with MySQL as database? Thanks!

 In general, myisam should not be used in cases where the integrity of the data is
important. This is
 because myisam makes no effort to repair the database if saving content fails in the
middle of the
 operation.
 Caleb

Thanks, Caleb.
I was not aware of this and always used MyISAM table that seem to be the
default take. Don't you think that at least a warning to new users to
consider if they prefer MyISAM or InnoDB databases? Perhaps here...
http://platform.xwiki.org/xwiki/bin/view/AdminGuide/InstallationMySQL
I understand that RDBMS details are not a XWiki matter, but it could be
it if the decision obviously affects XWiki security and performance.
WDYT?
...
  >
>
>>>> There are some recent quite interesting threads in devs list dealing
>>>> with a proposal from Caleb. Just look for attachments in titles there.
>>>> Sorry if I'm repeating this proposal!
>>>>
>>>>
>>> Ok will do that.
>>>
>>>
>>>> Concerning Lucene errors. I do need to solve this also here. I've
seeing
>>>> also here issues with Lucene and Office files. Do you mind I try here
>>>> with the attachments are causing you problems? Are there quite big?
>>>> Could you send me a couple of them or make than available at any place?
>>>> I can install on Monday recent XE snapshot in my dev box and you could
>>>> upload them there, but I would already try them on my laptop.
>>>>
>>>>
>>>>
>>> I need to ask whether I could share that documents with others, if so
I'll send you some examples.
>>>
>>>
>>>
>>>> Thanks!
>>>>
>>>> Cheers,
>>>>
>>>> Ricardo
>>>>
>>>>
> _______________________________________________
> users mailing list
> users(a)xwiki.org
> http://lists.xwiki.org/mailman/listinfo/users
>
>
>        
 _______________________________________________
 users mailing list
 users(a)xwiki.org
 http://lists.xwiki.org/mailman/listinfo/users

--
Ricardo Rodríguez
CTO
eBioTIC.
Life Sciences, Data Modeling and Information Management Systems

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [xwiki-users] Attachments lost !