[Proposal] New Importer Architecture

Vincent Massol vincent at massol.net
Fri Apr 6 09:50:51 CEST 2007


Hi,

I had some time in the train yesterday so I thought about what a new  
Importer architecture would look like.

First the reason for changing the current one (which is located in  
the Package plugin):

* "bad" design (everything is mixed up in one big class, not modular)  
and too complex to maintain
* cannot import HTML, plain text, etc
* cannot convert from one wiki syntax to another

Proposal
=======

     * A Importer interface to represent the different importers
           o import(Converter converter, DocumentImportFactory factory)
           o setFilter(ImportFilter filter) : to decide what document  
to import
     * A Converter interface to convert original content before it's  
imported into a page
           o OutputStream convert(InputStream originalContent)
     * A DocumentImportFactory interface for delegating how pages are  
created. This is important as there are different strategies for  
finding out the following data from the original content:
           o Language
           o Target Space
           o Target Page name
           o Objects to attach
           o Attachments
           o Versions
           o Author
           o API:
                 + XWikiDocument createDocument(String  
originalFileName, InputStream contentAfterConversion)
                 + setMode(REPLACE || APPEND): whether to create a  
new version or replace any existing doc

Examples of implementations:

     * For Importer: FileImporter, DirectoryImporter, ZIPImporter,  
ZipURLImporter, JARImporter
     * For Converter: PlainTextConverter, HTMLConverter,  
TWikiConverter, ConfluenceConverter, XWikiXMLConverter (for  
converting documents in XWiki XML format)
     * For DocumentImportFactory: XARDocumentImportFactory,  
ExpandedXARDocumentImportFactory, DefaultDocumentImportFactory (uses  
the file name as page name and parent directory as space, etc)

Examples of using it
================

     * A XAR file
           o new ZipImporter(new File(".../.xar"), new  
XWikiXMLConverter(), new XARDocumentImportFactory(new File(".../.xar")))
     * A single HTML file
           o new FileImporter(new File(".../.html"), new HTMLConverter 
(), new DefaultDocumentImporterFactory())
     * A zip file containing TWiki pages
           o new ZipImporter(new File(".../.zip"), new TWikiConverter 
(), new DefaultDocumentImporterFactory())
     * An expanded directory of HTML files
           o new DirectoryImporter(new File(".../somedir"), new  
HTMLConverter(), new DefaultDocumentImporterFactory())

I've put all this on http://www.xwiki.org/xwiki/bin/view/Idea/ 
NewImporterArchitecture but I think it's better to discuss it here as  
email is better for discussions...

Note: I'm not planning to implement this yet as our first priority is  
still the 1.0 release but once it's released, I'm volunteering for  
implementing it, using a component strategy (cf new V2 architecture).

WDYT?

Thanks
-Vincent





More information about the devs mailing list