[xwiki-devs] Profiling: Why do attachments require so much memory

Paul Libbrecht paul at activemath.org
Tue Mar 4 10:07:53 CET 2008


Could I add yet another idea which is hanging around since long, I  
think: java content repository ?

It may have catches in licenses (just as any of these JCR efforts)  
but I believe this is a sturdy way to expose streams of varying size.  
Indeed, it'd need a file-system-storage but that's a good thing  
certainly or?

I am not really an expert there unfortunately, but last I played with  
jackRabbit it really seemed like a sturdy piece you could rely on.

paul

Le 3 mars 08 à 17:28, Sergiu Dumitriu a écrit :

> Vincent Massol wrote:
>> Nice work Sergiu. We should transform this into a jira issue to not
>> forget it.
>>
>
> We should vote for it first.
>
>> One other idea: store attachments on the file system and not in  
>> the DB.
>>
>> Thanks
>> -Vincent
>>
>> On Feb 27, 2008, at 3:48 PM, Sergiu Dumitriu wrote:
>>
>>> Hi devs,
>>>
>>> Last night I checked what happens when uploading a file, and why  
>>> does
>>> that action require huge amounts of memory.
>>>
>>> So, whenever uploading a file, there are several places where the  
>>> file
>>> content is loaded into memory:
>>> - as an XWikiAttachment as byte[] ~= filesize
>>> - as an XWikiAttachmentArchive as Base64 encoded string ~=
>>> 2*4*filesize
>>> - as hibernate tokens that are sent to the database, clones of the
>>> XWikiAttachment and XWikiAttachmentArchive data ~= 9*filesize
>>> - as Cached attachments and attachment archive, clones of the same 2
>>> objects ~= 9*filesize
>>>
>>> Total: ~27*filesize bytes in memory.
>>>
>>> So, out of a 10M file, we get at least 270M of needed memory.
>>>
>>> Worse, if this is not the first version of the attachment, then the
>>> complete attachment history is loaded in memory, so add another
>>> 24*versionsize*versions of memory needed during upload.
>>>
>>> After the upload is done, most of these are cleared, only the cached
>>> objects will remain in memory.
>>>
>>> However, a problem still remains with the cache. It is a LRU cache
>>> with
>>> a fixed capacity, so even if the memory is full, the cached
>>> attachments
>>> will not be released.
>>>
>>> Things we can improve:
>>> - Make the cache use References. This will allow cached  
>>> attachments to
>>> be removed from memory when there's a need for more memory
>>> - Do a better attachment archive system. I'm not sure it is a good
>>> idea
>>> to have diff-based versioning of attachments. In theory, it saves
>>> space
>>> when versions are much alike, but it does not really work in  
>>> practice
>>> because it does a line-diff, and a base64 encoded string does not  
>>> have
>>> newlines. What's more, the space gain would be efficient when there
>>> are
>>> many versions, as one version alone takes 4 times more space than a
>>> binary dump of the content.
>>>
>>> Suppose we switch to a "one version per table row" for attachment
>>> history, with direct binary dump, then the memory needed for  
>>> uploading
>>> would be 6*filesize, which is much less.
>
>
> -- 
> Sergiu Dumitriu
> http://purl.org/net/sergiu/
> _______________________________________________
> devs mailing list
> devs at xwiki.org
> http://lists.xwiki.org/mailman/listinfo/devs

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2203 bytes
Desc: not available
Url : http://lists.xwiki.org/pipermail/devs/attachments/20080304/cf5f29f6/attachment.bin 


More information about the devs mailing list