On Apr 6, 2007, at 3:25 PM, Sergiu Dumitriu wrote:

On 4/6/07, Vincent Massol <vincent@massol.net> wrote:
Hi,

I had some time in the train yesterday so I thought about what a new
Importer architecture would look like.

First the reason for changing the current one (which is located in
the Package plugin):

* "bad" design (everything is mixed up in one big class, not modular)
and too complex to maintain
* cannot import HTML, plain text, etc
* cannot convert from one wiki syntax to another

Proposal
=======

     * A Importer interface to represent the different importers
           o import(Converter converter, DocumentImportFactory factory)
           o setFilter(ImportFilter filter) : to decide what document
to import
     * A Converter interface to convert original content before it's
imported into a page
           o OutputStream convert(InputStream originalContent)

Maybe something to configure the converter? Can we make something general enough, or do we let implementations to provide custom methods? A general method is to provide plain get/set methods, like for a hashmap.

Yes. Also required parameters will be passed in the constructor. Let's defer this till we start the implementation.

     * A DocumentImportFactory interface for delegating how pages are
created. This is important as there are different strategies for
finding out the following data from the original content:
           o Language
           o Target Space
           o Target Page name
           o Objects to attach
           o Attachments
           o Versions
           o Author
           o API:
                 + XWikiDocument createDocument(String
originalFileName, InputStream contentAfterConversion)
                 + setMode(REPLACE || APPEND): whether to create a
new version or replace any existing doc

I don't know if the name is good. DocumentImportFactory doesn't sound like a factory that creates documents to me (maybe it's just me), so I'd say that DocumentFactory is enough, if it resides in an import package.

Yeah I'm not too sure about the method name. I'm fine with DocumentFactory.

originalFileName reflects only the filename, or the complete path?

Yep good question. I was hesitating here. We do need the complete path for sure as one strategy is to use the parent directory as the target space name for example. I'm still not 100% clear what gets passed exactly. For example in the case of a Zip file, do we pass the relative path to the file inside the zip? Do we pass a full URL like path as in /some/path/my.zip!relative/path/some.file or simply path/some.file? The former could possibly be useful as the name of the zip could maybe be used somewhere to compute a value. Same applies for Directory importers, etc. What's important is that a DocumentFactory implementation must be able to work regardless of the importer used.

Hmmm.... Thinking more about it I think passing a URL would be the best.

Examples of implementations:

     * For Importer: FileImporter, DirectoryImporter, ZIPImporter,
ZipURLImporter, JARImporter
     * For Converter: PlainTextConverter, HTMLConverter,
TWikiConverter, ConfluenceConverter, XWikiXMLConverter (for
converting documents in XWiki XML format)
     * For DocumentImportFactory: XARDocumentImportFactory,
ExpandedXARDocumentImportFactory, DefaultDocumentImportFactory (uses
the file name as page name and parent directory as space, etc)

Examples of using it
================

     * A XAR file
           o new ZipImporter(new File(".../.xar"), new
XWikiXMLConverter(), new XARDocumentImportFactory(new File(".../.xar")))
     * A single HTML file
           o new FileImporter(new File(".../.html"), new HTMLConverter
(), new DefaultDocumentImporterFactory())
     * A zip file containing TWiki pages
           o new ZipImporter(new File(".../.zip"), new TWikiConverter
(), new DefaultDocumentImporterFactory())
     * An expanded directory of HTML files
           o new DirectoryImporter(new File(".../somedir"), new
HTMLConverter(), new DefaultDocumentImporterFactory())

I've put all this on http://www.xwiki.org/xwiki/bin/view/Idea/
NewImporterArchitecture but I think it's better to discuss it here as
email is better for discussions...

Note: I'm not planning to implement this yet as our first priority is
still the 1.0 release but once it's released, I'm volunteering for
implementing it, using a component strategy (cf new V2 architecture).

WDYT?

Sounds great.

cool