[xwiki-devs] [Office Import] Mid Term Review and todos - xwiki-devs@xwiki.org

12 Jul 2008

Hi Wang,
I haven't run your code yet but I've looked at it and I wanted to send
you a quick review before the end of the midterm.
* General comment: good work! This is promising. Now we need to work
on making it generate xwiki syntax and in making the XHTML parser work
fine. I'd expect your help on this now that you've done the basic work
for the rest.
* Please use the following issue to notify about your progress from
now on: http://jira.xwiki.org/jira/browse/XSANDBOX-32. Also whenever
you do a commit you should now refer to this issue in your commit
comment.
* Some comments on OfficeImporter interface
public interface OfficeImporter
{
     /**
      * convert the inputStream which contain the office document to a
outputStream which contain the result html code
      *
      * @param inputStream
      * @param inputfileExtension the extension of source content
      * @param outputStream
      * @param outputfileExtension the extension of target content
      */
     void convert(InputStream inputStream, String inputfileExtension,
OutputStream outputStream,
         String outputfileExtension);
a) The javadoc should be improved. I had to look at the source code to
understand the meaning of inputfileExtension... which doesn't look
like a file extension at all :)
I think it's a file type rather than an extension. Thus I suggest to
create an OfficeDocumentType enum class to list all supported file
types and use that enum for input and output file types.
b) Also I think the output stream should be returned instead of being
passed as a parameter. It would also be more homogeneous with the
byte[] convert() method which returns the results as an array of bytes.
c) Since it's an importer the method should be named import(). However
I think it's a converter more than an importer so I'd rather you
rename it to OfficeConverter. WDYT?
* OfficeImporterFactory. What is this required for?
* We need to discuss on the list about how to best integrate your work
in XE. This includes discussing the needs for an application or a
macro (I'm not sure they are required and in any case should be left
for later IMO).
* ImporterServerConnectionFactory. IMO should be renamed to
OfficeServerConnection and should be implemented as a Plexus component
(singleton).
* OfficeImporterImp should also be implemented as a Plexus component
and should extend AbstractLogEnabled.
* OfficeImporterImp: what is this method used for: convert2HTML? It
doesn't seem to be used. Same for convert2PDF. They are also not in
the interface so I'm not sure what they are for.
* Need better exception handling. You need to create a
OfficeConverterException exception IMO.
* We're using dom4j and not jdom in xwiki so I think it would be
better if you could use it instead of jdom. WDYT? (unless there's a
good reason of course :))
* I think it would be better to transform your Util class into a set
of Filters. Basically the idea would be to implement a Filter chain
that would be executed against the generated XHTML and before passing
the XHTML to the XWiki XHTML parser. You would implement them as
components and have one Filter component per filtering needed (one for
removing PinLi, one for transforming the images elements, etc).
* BTW I need to check but it's possible that you don't need to
transform the img elements into {image} macros since this might be
done by the XHTML parser automatically. Same for removing the HTML,
HEAD or BODY elements.
* Note that xwiki has  some util method for creating temporary
directories so you shouldn't create them yourself with:
         // create the new temp folder
         String oriTempDir = System.getProperty("java.io.tmpdir");
         File tempFile = new File(new File(oriTempDir),
"xwikiOfficeImportTemp");
         if (!tempFile.exists()) {
             tempFile.mkdirs();
         }
         System.setProperty("java.io.tmpdir",
tempFile.getAbsolutePath());
* The build doesn't work right now. It's missing some artifacts in
maven remote repositories. We need to work on making it even more
simpler to use.
* The tests should be improved (better tests, no try catch, etc).
Could you send an email on the list asking for feedback from xwiki
committers for defining the integration points for this office
converter feature?
Thanks
-Vincent