Could I add yet another idea which is hanging around since long, I
think: java content repository ?
It may have catches in licenses (just as any of these JCR efforts)
but I believe this is a sturdy way to expose streams of varying size.
Indeed, it'd need a file-system-storage but that's a good thing
certainly or?
I am not really an expert there unfortunately, but last I played with
jackRabbit it really seemed like a sturdy piece you could rely on.
paul
Le 3 mars 08 à 17:28, Sergiu Dumitriu a écrit :
Vincent Massol wrote:
Nice work Sergiu. We should transform this into a
jira issue to not
forget it.
We should vote for it first.
One other idea: store attachments on the file
system and not in
the DB.
Thanks
-Vincent
On Feb 27, 2008, at 3:48 PM, Sergiu Dumitriu wrote:
> Hi devs,
>
> Last night I checked what happens when uploading a file, and why
> does
> that action require huge amounts of memory.
>
> So, whenever uploading a file, there are several places where the
> file
> content is loaded into memory:
> - as an XWikiAttachment as byte[] ~= filesize
> - as an XWikiAttachmentArchive as Base64 encoded string ~=
> 2*4*filesize
> - as hibernate tokens that are sent to the database, clones of the
> XWikiAttachment and XWikiAttachmentArchive data ~= 9*filesize
> - as Cached attachments and attachment archive, clones of the same 2
> objects ~= 9*filesize
>
> Total: ~27*filesize bytes in memory.
>
> So, out of a 10M file, we get at least 270M of needed memory.
>
> Worse, if this is not the first version of the attachment, then the
> complete attachment history is loaded in memory, so add another
> 24*versionsize*versions of memory needed during upload.
>
> After the upload is done, most of these are cleared, only the cached
> objects will remain in memory.
>
> However, a problem still remains with the cache. It is a LRU cache
> with
> a fixed capacity, so even if the memory is full, the cached
> attachments
> will not be released.
>
> Things we can improve:
> - Make the cache use References. This will allow cached
> attachments to
> be removed from memory when there's a need for more memory
> - Do a better attachment archive system. I'm not sure it is a good
> idea
> to have diff-based versioning of attachments. In theory, it saves
> space
> when versions are much alike, but it does not really work in
> practice
> because it does a line-diff, and a base64 encoded string does not
> have
> newlines. What's more, the space gain would be efficient when there
> are
> many versions, as one version alone takes 4 times more space than a
> binary dump of the content.
>
> Suppose we switch to a "one version per table row" for attachment
> history, with direct binary dump, then the memory needed for
> uploading
> would be 6*filesize, which is much less.
--
Sergiu Dumitriu
http://purl.org/net/sergiu/
_______________________________________________
devs mailing list
devs(a)xwiki.org
http://lists.xwiki.org/mailman/listinfo/devs