Hi Denis,
Your improvements sound really great!
I saw mentioned a patch several times but didn't manage to find it... Any
hints where I can find it?
Thanks!
Anamaria
On Thu, Sep 17, 2009 at 7:49 PM, Denis Gervalle <dgl(a)softec.st> wrote:
 On Sep 17, 2009, at 19:10, Asiri Rathnayake wrote:
  Hi,
 1) The current implementation mostly build a DOM in memory for
  immediately serialized it into a stream. So I
have remove the
 intermediate DOM and provide direct streaming of Element content by:
        1.1) extending org.dom4j.XMLWriter to allow direct streaming
 of Element content into the output stream,  as is, or Base64 encoded.
 Accessorily, my extension also ensure proper pairing of open/close
 tag.
       1.2) writing a minimal DOMXMLWriter which extends my
 XMLWriter and
 could be used with the same toXML() code to build a DOMDocument to
 provide the toXMLDocument() methods to support the older
 implementation unchanged if ever needed.
       1.3) using the above, minimal change to the current XML code
 was
 required
               1.3.1) replacing element.add(Element) by either
 writer.writeElement)
 or writer.writeOpen(Element)
                1.3.2) for large content, use my extensions, either
 writer.write(Element, InputStream) or writer.writeBase64(Element,
 InputStream) which use the InputStream for the element content
 
 I'm not aware of the context so I might be wrong, but can't we use
 StAX to
 achieve this xml streaming? (
 
http://www.xml.com/pub/a/2003/09/17/stax.html?page=2) 
 In fact, I have not changed the library currently in use (DOM4J),
 especially because I want 1.3).
 Minimizing code change in the XML processing part avoid the
 introduction of new bugs there.
 Since the code has to be almost completely rewritten soon or later, I
 had prefer this approach for now.
  2) The current implementation for binary data
such as attachments and
 export zip file is mostly based on in memory byte[] passing from
 function to function while these data initially came from a
 request.getInputStream() or are written to a response.getOutputStream
 (). So I have change these to passover the stream instead of the
 data:
       2.1) using IOUtils.copy when required
        2.2) using org.apache.commons.codec.binary.Base64OutputStream
 for base64 encoding when required
        2.3) using an extension of ZipInputStream to cope with
 unexpected close()
        2.4) avoid buffer duplication in favor of stream filters
 
 Sounds good. May be FileUploadPlugin should also pass streams
 instead of
 data so we do not run out of memory.(
 
 
http://maven.xwiki.org/site/xwiki-core-parent/xwiki-core/apidocs/com/xpn/xw…
  ) 
 This is in my patch already ;)
 But, due to 3), this is not enough to prevent out of memory
 completely, this put the limit higher for export but not much for
 upload. Of course, this is no more an issue of the plugin now. Storage
 and caching are were improvement should go later.
  3) Since most oftently used large data came from
the database through
 an attachment content, it would be nice to have these attachment
 streamed from the database when they are too large. However, I feel
 that it is still too early to convert our binary into a blob, mainly
 because HSQLDB and MySQL still does not really support blob, just an
 emulation. These are also used to be cached in the document cache,
 and
 this will require improvement to support blob. However I propose to
 take the occasion to go in the direction of the blob by:
       3.1) deprecating setContent(byte[]) and Byte[] getContent()
 in favor
 of newly created setContent(InputStream, int), InputStream
 getContentInputStream() and getSize()
       3.2) Begin to use these new function as much as possible as 2)
 implied
       3.3) this also open the ability to store attachment in another
 repository that support better the streaming aspect (ie: a
 filesystem)
 
 +1
 Still, I wonder if streaming blobs from databases is a mature
 technology.
 (Mysql has an extension : 
http://blobstreaming.org/) 
 
 Well, I was not thinking about direct streaming, but using classical
 blob with a Locator, which is not much supported by open source
 databases. As far as I know, Oracle began to do it quite well
 beginning with 9.x version, but there is still some caveat. So I
 prefer to first provide an interface that allow the switch and put the
 switch in another patch when appropriate. This one is already large
 enough and it move an issue that is everywhere in the core, to a
 single place. (Almost, since Archive is not yet fixed).
  Keep up the good work :) 
 Thanks ;)
 Denis
 _______________________________________________
 devs mailing list
 devs(a)xwiki.org
 
http://lists.xwiki.org/mailman/listinfo/devs