Hi,
1) The current implementation mostly build a DOM in memory for
immediately serialized it into a stream. So I have
remove the
intermediate DOM and provide direct streaming of Element content by:
1.1) extending org.dom4j.XMLWriter to allow direct streaming
of Element content into the output stream, as is, or Base64 encoded.
Accessorily, my extension also ensure proper pairing of open/close tag.
1.2) writing a minimal DOMXMLWriter which extends my XMLWriter and
could be used with the same toXML() code to build a DOMDocument to
provide the toXMLDocument() methods to support the older
implementation unchanged if ever needed.
1.3) using the above, minimal change to the current XML code was
required
1.3.1) replacing element.add(Element) by either
writer.writeElement)
or writer.writeOpen(Element)
1.3.2) for large content, use my extensions, either
writer.write(Element, InputStream) or writer.writeBase64(Element,
InputStream) which use the InputStream for the element content
I'm not aware of the context so I might be wrong, but can't we use StAX to
achieve this xml streaming? (
http://www.xml.com/pub/a/2003/09/17/stax.html?page=2)
2) The current implementation for binary data such as
attachments and
export zip file is mostly based on in memory byte[] passing from
function to function while these data initially came from a
request.getInputStream() or are written to a response.getOutputStream
(). So I have change these to passover the stream instead of the data:
2.1) using IOUtils.copy when required
2.2) using org.apache.commons.codec.binary.Base64OutputStream
for base64 encoding when required
2.3) using an extension of ZipInputStream to cope with
unexpected close()
2.4) avoid buffer duplication in favor of stream filters
Sounds good. May be FileUploadPlugin should also pass streams instead of
data so we do not run out of memory.(
http://maven.xwiki.org/site/xwiki-core-parent/xwiki-core/apidocs/com/xpn/xw…
)
3) Since most oftently used large data came from the
database through
an attachment content, it would be nice to have these attachment
streamed from the database when they are too large. However, I feel
that it is still too early to convert our binary into a blob, mainly
because HSQLDB and MySQL still does not really support blob, just an
emulation. These are also used to be cached in the document cache, and
this will require improvement to support blob. However I propose to
take the occasion to go in the direction of the blob by:
3.1) deprecating setContent(byte[]) and Byte[] getContent() in favor
of newly created setContent(InputStream, int), InputStream
getContentInputStream() and getSize()
3.2) Begin to use these new function as much as possible as 2)
implied
3.3) this also open the ability to store attachment in another
repository that support better the streaming aspect (ie: a filesystem)
+1
Still, I wonder if streaming blobs from databases is a mature technology.
(Mysql has an extension :
http://blobstreaming.org/)
Keep up the good work :)
Thanks.
- Asiri