Hello!
Just some words about what the wiki model is and what it is not.
The main goal of the WikiModel is the creation of an API giving
access and control to the internal structure of individual wiki
documents.
Some features of the WikiModel:
- WikiModel itself does not depend on any particular wiki syntax
- The number of possible structural elements and their possible
assembling order is strictly fixed (which greatly simplifies the
validation and manipulation) but the final result is almost as
expressive as XHTML (and even more expressive, taking into
account notions of properties and embedded documents which can
recursively contain their own embedded documents :-)).
- WikiModel manipulates with a super-set of structural elements
available in existing wikis. And it has some features not
available in other wikis. For example using embedded documents in
WikiModel it is possible to put a table in a list and this table
can contains its own headers, paragraphs, and lists... Or using
embedded documents with the notion of properties it is possible
to define very complex structured objects directly on a wiki page.
- There is at least one wiki syntax ("Common" syntax) giving
access to all features of the Wiki Model. This syntax guaranties
that all structural elements of the WikiModel can be serialized/
de-serialized without loose of information and structure. Using
any other syntaxes can lead to the information lost (example: you
can not put table in a table in XWiki or in JSPWiki which is
possible using the Common Syntax).
- One of the goals of the WikiModel is to give a mean to *import*
information from various wiki engines without information lost.
The structure of documents can be serialized in various wiki
syntaxes as well, but there is no guaranties that some
information will not be lost. The information can be lost in the
case when a document contains some elements which have no
representation in a particular wiki syntax. Example: properties;
tables in lists; parameters of lists, paragraphs, and tables and
so on...
- All elements managed by the WikiModel can be serialized/
deserialized using XHTML with additional annotations (microformat-
like annotations)
Some features of the CommonSyntax:
- It is a native syntax for the WikiModel. It provides full
access to all features of the WikiModel. All structures in the
WikiModel can be serizlized/deszerialized in this syntax without
any information lost
- It uses markup characters available in most (in ideal situation
- in all) keyboard layouts (including Russian :-)). So you don't
have to switch keyboard layouts to write text, tables, lists and
headers. For example tables can be defined using pipe symbols
("|" - which is not available in many keyboard layouts) or the
"::" sequence.
- If there is a choice then the most commonly used markups are used
The current version of the WikiModel provides just an event-
based interface to work with the structure of documents (like
SAX for XML).
In previous versions of WIkiModel I had Document Object Model in
which each structural element had its own object representation.
In the current version an Object Model is not implemented (yet).
I thought to create just a set of utility classes manipulating
with the standard XML DOM. Example: the method
WikiTable#setCellContent(int row, int column, String content)
should create an XHTML table object, create the required number
of cells and columns and put the given string content in this
node. The same for all other structural elements (headers, lists,
internal documents, properties, styles, macros...)
On 9/14/07, Vincent Massol <vincent(a)massol.net > wrote: +1 to all
that. So let me summarizes and rephrase to see if I have
understood :)
1) We have 4 types of objects:
* TextProcessors: take text and generate text
* Parsers: take text and generate an internal DOM format (pivot
format)
* DomProcessors: take DOM and generate DOM
* Renderers: take DOM and generate anything (text, PDF, RTF, HTML,
XML, etc)
Yes.
2) Document contents are stored in the database in textual format in
the main xwiki syntax (whatever we decide it is - we could
standardize on creole for example)
It can be the "Common Syntax" for the reasons mentioned
above :-). Creole syntax is one of the most restrictive syntaxes.
And I tried to uses in the CommonSyntax as much markups of the
Creole as possible.
An another possibility is to store directly in XML or in XHTML
+microformat enhancements (for additional structural elements).
pro:
- it can be exported/imported directly and used by external
applications which knows nothing about wikis; just a standard XML
or XHTML
- this content can be transformed with XSLT processors directly
without usage of the WikiModel
- it can be faster to parse XML than the CommonWiki syntax (I
have no comparisons)
con:
- it is more difficult to work with diffs (but for diffs it is
*better* to use WkiModel and to generate a specific wiki syntax;
for example "Common syntax");
- it is not a "human readable" format; it is difficult to
understand what you loads from the DB
3) Use case 1: Viewing a document
a) Get the doc from the DB --> text1 (xwiki text format)
b) Apply TextProcessors --> text2
c) Call XWikiParser --> DOM1 (transforms XWiki text syntax into an
internal DOM)
d) Apply DomProcessors --> DOM2
e) Call the required Renderer --> PDF, XML, HTML, RTF, text, etc
Yes.
4) Use case 2: Editing a document, assuming the user wants to use
the
MediaWiki syntax for editing
a) Get the doc from the DB --> text1 (xwiki text format)
b) Call XWikiParser --> DOM1 (transforms XWiki text syntax into an
internal DOM)
c) Call MediaWikiRenderer --> text2 (text in MediaWiki format)
d) the user edits and hits save
e) MediaWikiParser --> DOM2 (transforms MediaWiki text syntax into
the internal DOM)
f) Call XWikiRenderer --> text" (transforms DOM into xwiki textual
format)
g) Save text3 in the database
Yes. (text1 and text3 can be XML, as I said above)
5) In practice this means the following classes:
* TextProcessorManager: to chain several text processors
Yes. But it can be just a composite processor implementing the
same ProcessorManager interfaces.
* TextProcessor
- VelocityTextProcessor
- GroovyTextProcessor
Yes.
* WikiParser: Takes wiki syntax and generates a DOM in a XWiki-
specific format (independent of the different wiki syntaxes).
- LegacyXWikiWikiParser
- XWikiWikiParser (or simply use CreoleWikiParser if we want our
internal format to be Creole)
- ConfluenceWikiParser
- MediaWikiWikiParser
- JSPWikiWikiParser
- CreoleWikiParser
- HTMLParser: I think all parsers above need to support HTML
since
the wiki syntaxes can be mixed with HTML. So this HTMLParser is
probably a parent of the other parsers in some regard. Anyway we
need
this one for the WYSIWYG editor which may need to transform HTML to
wiki syntax (so we may need a XWikiDomProcessor too to transform
into
XWiki syntax). The alternative (much better) is to have the WYSIWYG
editor only use the internal XWiki-specific DOM format for all its
manipulations.
If you want, you can put HTML as a non-interpreted block
("verbatim blocks") and interpret it in the client code. But
internally the WikiModel does not support "embedded" (X)HTML. The
main reason: in this case I loose control of the document
structure. And this control is the main goal of the WikiModel.
* DomProcessorManager: to chain several DOM processors
* DomProcessor
- Don't know yet what we're going to use this for.
TOCDomProcessor
as you say above maybe.
DOMProcessor can be used to transform the original DOM object
representing the document in the DB into a new (user and query-
specific) DOM object which can contain new elements, generated
dynamically. Now all dynamic page elements are interpreted as
simple Velocity or Groovy scripts and they generate text
documents which should be parsed using Radeox and transformed to
the final HTML document. Using the DOM representation it is
possible to interpret some nodes of this graph as Groovy scripts.
In WikiModel they will correspond to Verbatim blocks which are
opaque for WikiModel but they can be interpreted as scripts by
the DomProcessor(s). And these "Groovy"-nodes can be executed and
they will add new DOM elements to the DOM2. For example this
approach can be used to generate search results.
The advantages of this approach:
- You can put your parsed document DOM1 in the cache, which will
avoid you to to parse the document for each query. It is a
slowest step in the page processing. Even if the current version
of WikiModel is faster than before and it should be faster than
Radeox processor.
- Your Groovy scripts will manipulate with normal java classes
(DOM nodes) and it will produce DOM nodes and not a plain text.
It seems especially interesting taking into account Groovy's
Builders (
http://groovy.codehaus.org/Builders). It is enough to
write a very simple builder (see
http://groovy.codehaus.org/
BuilderSupport ) generating DOM nodes and ... voila! Your Groovy
node from a wiki page generates search results as DOM nodes!
These manipulations with DOM objects should be MUCH faster that
process plain text for every request. And all following steps are
fast as well - to generate an HTML page it is enough to visit all
node with an "XHTMLVisitor".
BTW: do you need Velocity at all? Using only Groovy is much
cleaner. It can be used as THE language of XWiki. It can be used
as a template *and* programming language at the same time. And if
you *really* want it is possible to integrate Jasper (from
Tomcat) engine to use it for pure templating. The code from Jetty
(th e org.mortbay.jetty.jspc.plugin package) can be used as an
example of integration with Jasper (see
http://jetty.mortbay.org/
xref/index.html).
In this case in templates it will be possible to use:
- JSP tag libraries (including standard ones)
- Multiple scripting languages (like javabeans, javascript,
jpython, jruby, groovy,...)
* Renderer
- XMLRenderer
- HTMLRenderer
- PDFRenderer
- RTFRenderer
- XWikiRenderer (or simply use CreoleRenderer if we want our
internal format to be Creole)
- ConfluenceRenderer
- MediaWikiRenderer
- JSPWikiRenderer
- CreoleRenderer
Yes. All these renderers should be written if you want to support
all these syntaxes. I think that it should not be very difficult.
WDYT? Do I have it right? :)
Best regards,
Mikhail
Thanks
-Vincent
On Sep 13, 2007, at 6:37 PM, StÃ(c)phane Laurière wrote:
> Hi Vincent, hi everyone,
>
> We discussed the WikiModel integration with Mikhail this
afternoon.
> Here
> is below our input.
>
> Vincent Massol wrote:
>> Hi,
>>
>> I've started working on designing the new Rendering/Parsing
>> components and API for XWiki. The implementation will be based on
>> WikiModel but we need some XWiki wrapping interfaces around
it. Note
>> that this is a prerequisite for the new WYSIWYG editor based
on GWT
>> (see
http://www.xwiki.org/xwiki/bin/view/Design/
>> NewWysiwygEditorBasedOnGwt).
>>
>> I've updated
http://www.xwiki.org/xwiki/bin/view/Design/
>> WikiModelIntegration with the information below, which I'm
pasting
>> here so that we can have a discussion about it. I'll
consolidate the
>> results on that wiki page.
>>
>> Componentize the Parsing/Rendering APIs
>> ==================================
>>
>> We need 4 main components:
>>
>> * A Scripting component to manage scripting inside XWiki
documents
>> and to evaluate them.
>
> On the topic of scripting we would like to propose a distinction
> between
> scripts that act on text and scripts that act on the DOM.
> Typically, the
> text rendering processing for flow would be the following, for say
> "text1":
>
> text1 =TextProcessor=> text2 =Parser=> dom1 =DomProcessor=> dom2
> => ...
>
> - the scripts contained in text1 are processed in the context of
> user1,
> this results into a new text: text2
> - the parser parses text2 and converts text2 to a DOM tree, dom1
> - dom1 is processed by scripts that work directly on the DOM
(example:
> table of content generator), this results in dom2
> - dom2 is made to available as such or is converted to XML,
HTML, PDF
> etc. depending on the user request
>
> TextProcessor and DomProcessor would have the following
interfaces:
>
> TextProcessor
> - String execute(String content)
>
> DomProcessor
> - DOM execute(DOM content)
>
> That means we should have a syntax to distinguish between
scripts that
> generate text content, and scripts that manipulate the DOM.
>
>> * A Rendering component to manage rendering Wiki syntax into
>> HTML and other (PDF, RTF, etc)
>> * A Wiki Parser component to offer a typed interface to
XWiki
>> content so that it can be manipulated
>> * A HTML Parser component (for the WYSIWYG editor)
>>
>> Different Syntaxes ===============
>>
>> Two possible solutions:
>>
>> 1. Have a WikiSyntax Object (A simple class with one
property: a
>> combox box with different syntaxes: XWiki Legacy, Creole,
MediaWiki,
>> Confluence, JSPWiki, etc) that users can attach to pages to
tell the
>> Renderers what syntax is used. If no such object is attached then
>> it'll default to XWiki's default syntax (XWiki Legacy or
Creole for
>> example).
>> 2. Have some special syntax, independent of the wiki
syntaxes to
>> tell the Rendered that such block of content should be
rendered with
>> that given syntax. Again there would be a default.
>>
>
> Here's our view regarding the syntax used in wiki edit mode:
document
> requested for edition are available from the database in a
serialized
> format, for instance XHTML. When entering into the edit action,
the
> user
> indicates his preferred syntax. If the text of the requested
document
> contains some blocks that are not handled by the chosen syntax,
the
> user
> gets a warning (example: the document contains a table as a
list item,
> and the user tries to edit the document using JSPWiki syntax).
If not,
> WikiModel converts the serialized format into a DOM, the user
edits
> the
> DOM and the WikiModel serializer serializes it back when the user
> saves it.
>
> Note that the DOM representation of wiki documents in the latest
> version
> of WikiModel is still pending.
>
>>
>> XWiki Interfaces
>> =============
>>
>> * ScriptingEngineManager: Manages the different Scripting
>> Engines, calling them in turn.
>> * ScriptingEngine
>> o Method: evaluate(String content)
>> o Implementation: VelocityScriptingEngine
>> o Implementation: GroovyScriptingEngine
>> * RenderingEngineManager: Manages the different Rendering
>> Engines, calling them in turn.
>> * RenderingEngine
>> o Method: render(String content)
>> o Implementation: XWikiLegacyRenderingEngine (current
>> rendering engine)
>> o Implementation: WikiModelRenderingEngine
>> * Parser: content parsing
>> o HTMLParser: parses HTML syntax
>> o WikiParser: parses wiki syntax
>> o Implementation: WikiModelHTMLParser
>> o Implementation: WikiModelWikiParser
>>
>> Open Questions:
>>
>> * Does WikiModel support a generic syntax for macros?
>
> WikiModel generates events for blocks that are not to be parsed
> (typically because they contain scripts).
>
> For example, in the WikiModel syntax currently called
"CommonSyntax",
> this looks like the following:
> ==============
> {{{macro:mymacro (String parameters)
> dothis
> dothat
>
> }}}
>
>
> $mymacro(parameters)
> ==============
>
> For each syntax, macro blocks are identified as far as possible
(we
> still have to check it's the case for all types of macro blocks
inde
> indeed).
>
>
>> * Is the Rendering also in charge of generating PDF, RTF,
>> XML, etc?
>> o I think so, need to modify interfaces above to
reflect
>> this.
>> * The WikiParser needs to recognizes scripts since this is
>> needed for the WYSIWYG editor.
>
> the WikiModel parser recognizes scripts indeed.
>
>
> Mikhail and StÃ(c)phane
>
>>
>> Use cases
>> ========
>>
>> * View page
>> o ViewAction -- template ->
>> ScriptingEngineManager.evaluate
>> () -- wiki syntax -> RenderingEngineManager.render() --->
HTML, XML,
>> PDF, RTF, etc
>> * Edit page in WYSIWYG editor
>> o Uses the WikiParser to create a "DOM" of the page
>> content and to render it accordingly. NOTE: This is required
since
>> rendering in the WYSIWYG editor is different from the final
>> rendering. For example, macros need to be shown in a special
way to
>> make them visible, etc.
>> o Changes done by the user are entered in HTML.
Note: it
>> would be better to capture them so that they are entered in the
>> "DOM". Is that possible? If not, then the HTMLParser is used to
>> convert from HTML to Wiki Syntax but they're likely be some
loss in
>> the conversion. The advantage is the ability to take any HTML
content
>> and generate wiki syntax from it.
>>
>>
>> This is my very earlier thinking but I wanted to make it
visible to
>> give everyone the change to 1) know what's happening and 2)
suggest
>> ideas.
>>
>> I'll refine this in the coming days and post again on this
thread.
>>
_______________________________________________
devs mailing list
devs(a)xwiki.org
_______________________________________________
devs mailing list
devs(a)xwiki.org