On Sep 17, 2009, at 19:10, Asiri Rathnayake wrote:
Hi,
1) The current implementation mostly build a DOM in memory for
immediately serialized it into a stream. So I
have remove the
intermediate DOM and provide direct streaming of Element content by:
1.1) extending org.dom4j.XMLWriter to allow direct streaming
of Element content into the output stream, as is, or Base64 encoded.
Accessorily, my extension also ensure proper pairing of open/close
tag.
1.2) writing a minimal DOMXMLWriter which extends my
XMLWriter and
could be used with the same toXML() code to build a DOMDocument to
provide the toXMLDocument() methods to support the older
implementation unchanged if ever needed.
1.3) using the above, minimal change to the current XML code
was
required
1.3.1) replacing element.add(Element) by either
writer.writeElement)
or writer.writeOpen(Element)
1.3.2) for large content, use my extensions, either
writer.write(Element, InputStream) or writer.writeBase64(Element,
InputStream) which use the InputStream for the element content
I'm not aware of the context so I might be wrong, but can't we use
StAX to
achieve this xml streaming? (
http://www.xml.com/pub/a/2003/09/17/stax.html?page=2)
In fact, I have not changed the library currently in use (DOM4J),
especially because I want 1.3).
Minimizing code change in the XML processing part avoid the
introduction of new bugs there.
Since the code has to be almost completely rewritten soon or later, I
had prefer this approach for now.
2) The current implementation for binary data
such as attachments and
export zip file is mostly based on in memory byte[] passing from
function to function while these data initially came from a
request.getInputStream() or are written to a response.getOutputStream
(). So I have change these to passover the stream instead of the
data:
2.1) using IOUtils.copy when required
2.2) using org.apache.commons.codec.binary.Base64OutputStream
for base64 encoding when required
2.3) using an extension of ZipInputStream to cope with
unexpected close()
2.4) avoid buffer duplication in favor of stream filters
Sounds good. May be FileUploadPlugin should also pass streams
instead of
data so we do not run out of memory.(
http://maven.xwiki.org/site/xwiki-core-parent/xwiki-core/apidocs/com/xpn/xw…
)
This is in my patch already ;)
But, due to 3), this is not enough to prevent out of memory
completely, this put the limit higher for export but not much for
upload. Of course, this is no more an issue of the plugin now. Storage
and caching are were improvement should go later.
3) Since most oftently used large data came from
the database through
an attachment content, it would be nice to have these attachment
streamed from the database when they are too large. However, I feel
that it is still too early to convert our binary into a blob, mainly
because HSQLDB and MySQL still does not really support blob, just an
emulation. These are also used to be cached in the document cache,
and
this will require improvement to support blob. However I propose to
take the occasion to go in the direction of the blob by:
3.1) deprecating setContent(byte[]) and Byte[] getContent()
in favor
of newly created setContent(InputStream, int), InputStream
getContentInputStream() and getSize()
3.2) Begin to use these new function as much as possible as 2)
implied
3.3) this also open the ability to store attachment in another
repository that support better the streaming aspect (ie: a
filesystem)
+1
Still, I wonder if streaming blobs from databases is a mature
technology.
(Mysql has an extension :
http://blobstreaming.org/)
Well, I was not thinking about direct streaming, but using classical
blob with a Locator, which is not much supported by open source
databases. As far as I know, Oracle began to do it quite well
beginning with 9.x version, but there is still some caveat. So I
prefer to first provide an interface that allow the switch and put the
switch in another patch when appropriate. This one is already large
enough and it move an issue that is everywhere in the core, to a
single place. (Almost, since Archive is not yet fixed).
Keep up the good work :)
Thanks ;)
Denis