Re: [xwiki-devs] OfficeImporter API changes - Take 2

26 Oct 2009

Hi,

...
  OfficeImporter::officeToXHTML(byte[] officeFileData,
DocumentName
  referenceDocument, boolean
filterStyles):XHTMLOfficeDocument 
 Can you explain why you replaced InputStream with byte[]? Arrays require
 contiguous memory locations and loading large arrays into memory is bad
 IMO.

Yes, you are correct. I still have some doubts about this:

1. OpenOffice requires the input to be on a disk file -- so we need to write
the stream / byte [] into the disk. Having a stream instead of a byte[] is
good in this case because we won't be loading the whole file into memory at
once.

2. However OpenOffice will load the whole file into memory at once when
performing the conversion. Still, having a stream is kind of better because
we are not consuming any additional memory.

3. Currently file upload plugin has no method to get a stream of input. So
it's returning a byte[]. Having this byte[] converted into a stream is kind
of useless.

4. However, having an API based on streams is better because in future we
might anyway upgrade the fileupload plugin to return a stream.

5. Even after the conversion I need to load the results into memory (byte[])
so that I can attach them into wiki pages.

6. But we might change the DAB api in future so that attachments can be
saved in an streaming fashion (this might be hard).

I might be wrong here, but I don't think having streams oriented API for
officeimporter will bring any performance enhancement in near future. All it
will add is some glue code to convert between streams / byte[] so that it
can work with rest of XE infrastructure (fileupload plugin, DAB, attachment
saving).

If we base our decision solely on "having a streamable API is always wise",
I'd also go with the streams based API.

I'd like to stay away from this decision and let you all decide :)

Thanks.

- Asiri

...

 Thanks,
 Marius

  OfficeImporter::xhtmlToXDOM(XHTMLOfficeDocument
 xhtmlOfficeDocument):XDOMOfficeDocument
 OfficeImporter::officeToXDOM(byte[] officeFileData, DocumentName
 referenceDocument, boolean filterStyles):XDOMOfficeDocument
 OfficeImporter::buildPresentation(byte[]  officeFileData):XDOMOfficeDocument
  OfficeImporter::splitImport(XDOMOfficeDocument
xdomOfficeDocument, int[]
 headingLevelsToSplit, NamingCriterion namingCriterion, DocumentName
 baseDocumentName):Map<TargetPageDescriptor, XDOMOfficeDocument>
 </code>

 As you can see, these methods are more granular and the responsibilities  are
  well defined. Customizing the import process can
be done from the client
 code. For an example:

 1. Make the initial import from office to XHTMLOfficeDocument -
 OfficeImporter::officeToXHTML()

 2. Perform customizations on the XHTMLOfficeDocument - w3c DOM
 manipulations.

 3. Import the XHTMLOfficeDocument into XDOMOfficeDocument -
 OfficeImporter::xhtmlToXDOM()

 4. Perform customizations on the XDOMOfficeDocument (XDOM) - XDOM
 manipulations.

 5. Split the XDOMOfficeDocument into multiple XDOMOfficeDocument  instances -
  OfficeImporter::splitImport()

 6. Perform customizations on these child XDOMOfficeDocument instances -  XDOM
  manipulations.

 7. Render the XDOMOfficeDocument instances & save them into wiki pages -
 XWiki rendering operations.

 I think this interface will make it easy to extend & maintain  officeimporter
  functionality in the future.

 Along with this, I would also like to refactor the xwiki-refactoring  module
  a bit to get rid of string based document names
from it.

 This whole refactoring operation would take approximately one day to
 complete. And since this operation is not adding any new features, I  think
  it can be committed on both trunk and 2.0
branch.

 Here's my +1 to all of above.

 Thanks.

 - Asiri
 _______________________________________________
 devs mailing list
 devs(a)xwiki.org
 http://lists.xwiki.org/mailman/listinfo/devs 
_______________________________________________
 devs mailing list
 devs(a)xwiki.org
 http://lists.xwiki.org/mailman/listinfo/devs

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [xwiki-devs] OfficeImporter API changes - Take 2