On 4/6/07, Vincent Massol <vincent@massol.net> wrote:

Hi,

I had some time in the train yesterday so I thought about what a new
Importer architecture would look like.

First the reason for changing the current one (which is located in
the Package plugin):

* "bad" design (everything is mixed up in one big class, not modular)
and too complex to maintain
* cannot import HTML, plain text, etc
* cannot convert from one wiki syntax to another

Proposal
=======

     * A Importer interface to represent the different importers
           o import(Converter converter, DocumentImportFactory factory)
           o setFilter(ImportFilter filter) : to decide what document
to import
     * A Converter interface to convert original content before it's
imported into a page
           o OutputStream convert(InputStream originalContent)

Maybe something to configure the converter? Can we make something general enough, or do we let implementations to provide custom methods? A general method is to provide plain get/set methods, like for a hashmap.

     * A DocumentImportFactory interface for delegating how pages are
created. This is important as there are different strategies for
finding out the following data from the original content:
           o Language
           o Target Space
           o Target Page name
           o Objects to attach
           o Attachments
           o Versions
           o Author
           o API:
                 + XWikiDocument createDocument(String
originalFileName, InputStream contentAfterConversion)
                 + setMode(REPLACE || APPEND): whether to create a
new version or replace any existing doc

I don't know if the name is good. DocumentImportFactory doesn't sound like a factory that creates documents to me (maybe it's just me), so I'd say that DocumentFactory is enough, if it resides in an import package.

originalFileName reflects only the filename, or the complete path?

Examples of implementations:

     * For Importer: FileImporter, DirectoryImporter, ZIPImporter,
ZipURLImporter, JARImporter
     * For Converter: PlainTextConverter, HTMLConverter,
TWikiConverter, ConfluenceConverter, XWikiXMLConverter (for
converting documents in XWiki XML format)
     * For DocumentImportFactory: XARDocumentImportFactory,
ExpandedXARDocumentImportFactory, DefaultDocumentImportFactory (uses
the file name as page name and parent directory as space, etc)

Examples of using it
================

     * A XAR file
           o new ZipImporter(new File(".../.xar"), new
XWikiXMLConverter(), new XARDocumentImportFactory(new File(".../.xar")))
     * A single HTML file
           o new FileImporter(new File(".../.html"), new HTMLConverter
(), new DefaultDocumentImporterFactory())
     * A zip file containing TWiki pages
           o new ZipImporter(new File(".../.zip"), new TWikiConverter
(), new DefaultDocumentImporterFactory())
     * An expanded directory of HTML files
           o new DirectoryImporter(new File(".../somedir"), new
HTMLConverter(), new DefaultDocumentImporterFactory())

I've put all this on http://www.xwiki.org/xwiki/bin/view/Idea/
NewImporterArchitecture but I think it's better to discuss it here as
email is better for discussions...

Note: I'm not planning to implement this yet as our first priority is
still the 1.0 release but once it's released, I'm volunteering for
implementing it, using a component strategy (cf new V2 architecture).

WDYT?