Hi Wang,
My opinion is that you should stick with the html
export for now.
Before going deeper into the OO api, you should study
a little bit the XWiki architecture.
If you want to test your html export simply copy the
html into the wiki editor of a newly created page.
Be aware that both OO and MS Office "save as html" api
create a messy code. To solve that problem you could
use the "tidy api" or use tidy command line to
optimize a little bit the html code.
Why do I recommend html instead of pdf or another
format? First, because is easyer to integrate with
XWiki. Second, If you look inside the html exported by
MS Word, for example, you would see some tags hidden
in html comments. These could be very important for
our scope: Bigging collaboration to the next level(not
only text, but: charts, tabular calculus, diagrams,
even engineering<MS Visio>)
So, those extra tags describe the fonts and other
embeded elemets in the document. This elements are
converted by default in gif images stored in the
document's associated folder.It is best to have them
as attached files in xwiki. When importing, you should
parse the html and replace the "src" attribute with
the url of the attachments. On the long shot, if you
go further and create a "retrieve page feature", that
will be spectacular, because we can enable that
collaboration I was talking about.
Also for the long term, we should keep an eye on
upcoming Open XML renderers(embeded elements remain in
the XML and that should be an advantage).
It will be nice to have an OO plug-in for exporting,
and retrieving the pages. I can suply the MS Office
add-ins. I devopped some colaborative solutions in the
past, using .NET and MS Office, and things can become
pretty enteresting.
So I think the "import document to XWiki" should be
completed by a "export to xwiki" feature present in
the rich client appications.
There is a market for this hibid solutions.Take a look
at MS Sharepoint, Exchange Server, Live writer,
NewsGathor+Feed Daemon etc.
I can provide some use cases for XWiki(especialy
Workspaces). WDYT?
I also atthached a print screen of a automated "save
as html" add-in for MS Office.