Re: [xwiki-devs] [Discussion] Designing the new Rendering/Parsing component/API

21 Sep 2007

Hi Mikhail,
Thanks for sharing this info with us! This makes it more clear for me.
 From what I understand below you're recommending to eliminate the
need for TextProcessor and instead do the following:
* Store the documents in the database in the DOM format (XML)
* Store scripts as a verbatim block in that DOM
* Only use DOMProcessor to make transformations to the DOM. For
example have a VelocityDomProcessor and GroovyDomProcessor to modify
the script DOM elements and evaluate them. Note: Velocity or Groovy
scripts can generate wiki syntax content and thus these would need to
generate new DOM elements. Not sure how easy that would be. This
means the VelocityDomProcessor would need internally to use a Parser
to parse the result of the evaluation and generate a sub-DOM. Is that
correct?
Thus the textual format would only be used when the user enters text
or when we want to export the content.
The main advantage would be performance since there'll be no need to
go back and forth between textual format and DOM format.
Makes sense to me.
Now you mention removing Velocity. This won't be possible since all
current XWiki instances used are using Velocity and we cannot tell
our users that they have to rewrite all their pages if they want to
move to XWiki v1.3. We'll need to continue supporting Velocity for
some time. Personally I currently find that the velocity syntaxes
mixes much better with the wiki syntax than groovy. If you look at
contributed code snippets you'll see that most are in Velocity which
is what most people use.
Now you mention other stuff about Jasper and Jetty but I'm not sure I
have understood that part.
Thanks
-Vincent
See below.
On Sep 19, 2007, at 6:54 PM, Mikhail Kotelnikov wrote:
...
  Hello!
 Just some words about what the wiki model is and what it is not.
 The main goal of the WikiModel is the creation of an API giving
 access and control to the internal structure of individual wiki
 documents.
 Some features of the WikiModel:
 - WikiModel itself does not depend on any particular wiki syntax
 - The number of possible structural elements and their possible
 assembling order is strictly fixed (which greatly simplifies the
 validation and manipulation) but the final result is almost as
 expressive as XHTML (and even more expressive, taking into account
 notions of properties and embedded documents which can recursively
 contain their own embedded documents :-)).
 - WikiModel manipulates with a super-set of structural elements
 available in existing wikis. And it has some features not available
 in other wikis. For example using embedded documents in WikiModel
 it is possible to put a table in a list and this table can contains
 its own headers, paragraphs, and lists... Or using embedded
 documents with the notion of properties it is possible to define
 very complex structured objects directly on a wiki page.
 - There is at least one wiki syntax ("Common" syntax) giving access
 to all features of the Wiki Model. This syntax guaranties that all
 structural elements of the WikiModel can be serialized/de-
 serialized without loose of information and structure. Using any
 other syntaxes can lead to the information lost (example: you can
 not put table in a table in XWiki or in JSPWiki which is possible
 using the Common Syntax).
 - One of the goals of the WikiModel is to give a mean to *import*
 information from various wiki engines without information lost. The
 structure of documents can be serialized in various wiki syntaxes
 as well, but there is no guaranties that some information will not
 be lost. The information can be lost in the case when a document
 contains some elements which have no representation in a particular
 wiki syntax. Example: properties; tables  in lists; parameters of
 lists, paragraphs, and tables and so on...
 - All elements managed by the WikiModel can be serialized/
 deserialized using XHTML with additional annotations (microformat-
 like annotations)
 Some features of the CommonSyntax:
 - It is a native syntax for the WikiModel. It provides full access
 to all features of the WikiModel.  All structures in the WikiModel
 can be serizlized/deszerialized in this syntax without any
 information lost
 - It uses markup characters available in most (in ideal situation -
 in all) keyboard layouts (including Russian :-)). So you don't have
 to switch keyboard layouts to write text, tables, lists and
 headers. For example tables can be defined using pipe symbols ("|"
 - which is not available in many keyboard layouts) or the "::"
 sequence.
 - If there is a choice then the most commonly used markups are used
 The current version  of the WikiModel provides just an event-based
 interface  to  work with the structure of documents (like SAX for
 XML).
 In previous versions of WIkiModel I had Document Object Model in
 which each structural element had its own object representation. In
 the current version an Object Model is not implemented (yet). I
 thought to create just a set of utility classes manipulating with
 the standard XML DOM. Example: the method WikiTable#setCellContent
 (int row, int column, String content) should create an XHTML table
 object, create the required number of cells and columns and put the
 given string content in this node. The same for all other
 structural elements (headers, lists, internal documents,
 properties, styles, macros...)
 On 9/14/07, Vincent Massol &lt;vincent(a)massol.net > wrote:
 +1 to all that. So let me summarizes and rephrase to see if I have
 understood :)
 1) We have 4 types of objects:
 * TextProcessors: take text and generate text
 * Parsers: take text and generate an internal DOM format (pivot
 format)
 * DomProcessors: take DOM and generate DOM
 * Renderers: take DOM and generate anything (text, PDF, RTF, HTML,
 XML, etc)
 Yes.
 2) Document contents are stored in the database in textual format in
 the main xwiki syntax (whatever we decide it is - we could
 standardize on creole for example)
 It can be the "Common Syntax" for the reasons mentioned above :-).
 Creole syntax is one of the most restrictive syntaxes. And I tried
 to uses in the CommonSyntax as much markups of the Creole as possible.
 An another possibility is to store directly in XML or in XHTML
 +microformat enhancements (for additional structural elements).
 pro:
 - it can be exported/imported directly and used by external
 applications which knows nothing about wikis; just a standard XML
 or XHTML
 - this content can be transformed with XSLT processors directly
 without usage of the WikiModel
 - it can be faster to parse XML than the CommonWiki syntax (I have
 no comparisons)
 con:
 - it is more difficult to work with diffs (but for diffs it is
 *better* to use WkiModel and to generate a specific wiki syntax;
 for example "Common syntax");
 - it is not a "human readable" format; it is difficult to
 understand what you loads from the DB
 3) Use case 1: Viewing a document
 a) Get the doc from the DB --> text1 (xwiki text format)
 b) Apply TextProcessors --> text2
 c) Call XWikiParser --> DOM1 (transforms XWiki text syntax into an
 internal DOM)
 d) Apply DomProcessors --> DOM2
 e) Call the required Renderer --> PDF, XML, HTML, RTF, text, etc
 Yes.
 4) Use case 2: Editing a document, assuming the user wants to use the
 MediaWiki syntax for editing
 a) Get the doc from the DB --> text1 (xwiki text format)
 b) Call XWikiParser --> DOM1 (transforms XWiki text syntax into an
 internal DOM)
 c) Call MediaWikiRenderer --> text2 (text in MediaWiki format)
 d) the user edits and hits save
 e) MediaWikiParser --> DOM2 (transforms MediaWiki text syntax into
 the internal DOM)
 f) Call XWikiRenderer --> text" (transforms DOM into xwiki textual
 format)
 g) Save text3 in the database
 Yes. (text1 and text3 can be XML, as I said above)
 5) In practice this means the following classes:
 * TextProcessorManager: to chain several text processors
 Yes. But it can be just a composite processor implementing the same
 ProcessorManager interfaces.
 * TextProcessor
    - VelocityTextProcessor
    - GroovyTextProcessor
 Yes.
 * WikiParser: Takes wiki syntax and generates a DOM in a XWiki-
 specific format (independent of the different wiki syntaxes).
    - LegacyXWikiWikiParser
    - XWikiWikiParser (or simply use CreoleWikiParser if we want our
 internal format to be Creole)
    - ConfluenceWikiParser
    - MediaWikiWikiParser
    - JSPWikiWikiParser
    - CreoleWikiParser
    - HTMLParser: I think all parsers above need to support HTML since
 the wiki syntaxes can be mixed with HTML. So this HTMLParser is
 probably a parent of the other parsers in some regard. Anyway we need
 this one for the WYSIWYG editor which may need to transform HTML to
 wiki syntax (so we may need a XWikiDomProcessor too to transform into
 XWiki syntax). The alternative (much better) is to have the WYSIWYG
 editor only use the internal XWiki-specific DOM format for all its
 manipulations.
 If you want, you can put HTML as a non-interpreted block ("verbatim
 blocks") and interpret it in the client code. But internally the
 WikiModel does not support "embedded" (X)HTML. The main reason: in
 this  case I loose control of the document structure. And this
 control is the main goal of the WikiModel.
 * DomProcessorManager: to chain several DOM processors
 * DomProcessor
    - Don't know yet what we're going to use this for. TOCDomProcessor
 as you say above maybe.
 DOMProcessor can be used to transform the original DOM object
 representing the document in the DB into a new (user and query-
 specific) DOM object which can contain new elements, generated
 dynamically. Now all dynamic page elements are interpreted as
 simple Velocity or Groovy scripts and they generate text documents
 which should be parsed using Radeox and transformed to the final
 HTML document. Using the DOM representation it is possible to
 interpret some nodes of this graph as Groovy scripts. In WikiModel
 they will correspond to Verbatim blocks which are opaque for
 WikiModel but they can be interpreted as scripts by the DomProcessor
 (s). And these "Groovy"-nodes can be executed and they will add new
 DOM elements to the DOM2. For example this approach can be used to
 generate search results.
 The advantages of this approach:
 - You can put your parsed document DOM1 in the cache, which will
 avoid you to to parse the document for each query. It is a slowest
 step in the page processing. Even if the current version of
 WikiModel is faster than before and it should be faster than Radeox
 processor.
 - Your Groovy scripts will manipulate with normal java classes (DOM
 nodes) and it will produce DOM nodes and not a plain text. It seems
 especially interesting taking into account Groovy's Builders
 ( http://groovy.codehaus.org/Builders). It is enough to write a
 very simple builder (see http://groovy.codehaus.org/BuilderSupport)
 generating DOM nodes and ... voila! Your Groovy node from a wiki
 page generates search results as DOM nodes!  These manipulations
 with DOM objects should be MUCH faster that process plain text for
 every request. And all following steps are fast as well - to
 generate an HTML page it is enough to visit all node with an
 "XHTMLVisitor".
 BTW: do you need Velocity at all? Using only Groovy is much
 cleaner. It can be used as THE language of XWiki. It  can be used
 as a template *and* programming language at the same time. And if
 you *really* want it is possible to integrate Jasper (from Tomcat)
 engine to use it for pure templating. The code from Jetty (th e
 org.mortbay.jetty.jspc.plugin package) can be used as an example of
 integration with Jasper (see http://jetty.mortbay.org/xref/
 index.html).
 In this case in templates it will be possible to use:
 - JSP tag libraries (including standard ones)
 - Multiple scripting languages (like javabeans, javascript,
 jpython, jruby, groovy,...)
 * Renderer
    - XMLRenderer
    - HTMLRenderer
    - PDFRenderer
    - RTFRenderer
    - XWikiRenderer (or simply use CreoleRenderer if we want our
 internal format to be Creole)
    - ConfluenceRenderer
    - MediaWikiRenderer
    - JSPWikiRenderer
    - CreoleRenderer
 Yes. All these renderers should be written if you want to support
 all these syntaxes. I think that it should not be very difficult.
 WDYT? Do I have it right? :)
 Best regards,
 Mikhail
 Thanks
 -Vincent
 On Sep 13, 2007, at 6:37 PM, StÃ©phane LauriÃ¨re wrote:
  Hi Vincent, hi everyone,
 We discussed the WikiModel integration with Mikhail this afternoon.
 Here
 is below our input.
 Vincent Massol wrote:
> Hi,
>
> I've started working on designing the new Rendering/Parsing
> components and API for XWiki. The implementation will be based on
> WikiModel but we need some XWiki wrapping interfaces around it.    Note
 >  that this is a prerequisite for the new
WYSIWYG editor based on    GWT
 >  (see
http://www.xwiki.org/xwiki/bin/view/Design/
> NewWysiwygEditorBasedOnGwt).
>
> I've updated http://www.xwiki.org/xwiki/bin/view/Design/
> WikiModelIntegration with the information below, which I'm pasting
> here so that we can have a discussion about it. I'll consolidate    the
    results
on that wiki page.
 Componentize the Parsing/Rendering APIs
 ==================================
 We need 4 main components:
 * A Scripting component to manage scripting inside XWiki documents
 and to evaluate them. 
 On the topic of scripting we would like to propose a distinction
 between
 scripts that act on text and scripts that act on the DOM.
 Typically, the
 text rendering processing for flow would be the following, for say
 "text1":
 text1 =TextProcessor=> text2 =Parser=> dom1 =DomProcessor=> dom2
 => ...
 - the scripts contained in text1 are processed in the context of
 user1,
 this results into a new text: text2
 - the parser parses text2 and converts text2 to a DOM tree, dom1
 - dom1 is processed by scripts that work directly on the DOM    (example:
  table of content generator), this results in dom2
 - dom2 is made to available as such or is converted to XML, HTML,    PDF
  etc. depending on the user request
 TextProcessor and DomProcessor would have the following interfaces:
 TextProcessor
 - String execute(String content)
 DomProcessor
 - DOM execute(DOM content)
 That means we should have a syntax to distinguish between scripts    that
  generate text content, and scripts that
manipulate the DOM.
>      * A Rendering component to manage rendering Wiki syntax into
> HTML and other (PDF, RTF, etc)
>      * A Wiki Parser component to offer a typed interface to XWiki
> content so that it can be manipulated
>      * A HTML Parser component (for the WYSIWYG editor)
>
> Different Syntaxes ===============
>
> Two possible solutions:
>
>     1. Have a WikiSyntax Object (A simple class with one    property: a
 > combox box with different syntaxes: XWiki
Legacy, Creole,    MediaWiki,
 > Confluence, JSPWiki, etc) that users can
attach to pages to tell    the
 > Renderers what syntax is used. If no such
object is attached then
> it'll default to XWiki's default syntax (XWiki Legacy or Creole for
> example).
>     2. Have some special syntax, independent of the wiki    syntaxes to
 > tell the Rendered that such block of content
should be rendered    with
   that
given syntax. Again there would be a default.

 Here's our view regarding the syntax used in wiki edit mode:    document
  requested for edition are available from the
database in a    serialized
  format, for instance XHTML. When entering into
the edit action, the
 user
 indicates his preferred syntax. If the text of the requested    document
  contains some blocks that are not handled by the
chosen syntax, the
 user
 gets a warning (example: the document contains a table as a list    item,
  and the user tries to edit the document using
JSPWiki syntax). If    not,
  WikiModel converts the serialized format into a
DOM, the user edits
 the
 DOM and the WikiModel serializer serializes it back when the user
 saves it.
 Note that the DOM representation of wiki documents in the latest
 version
 of WikiModel is still pending.

 XWiki Interfaces
 =============
      * ScriptingEngineManager: Manages the different Scripting
 Engines, calling them in turn.
      * ScriptingEngine
            o Method: evaluate(String content)
            o Implementation: VelocityScriptingEngine
            o Implementation: GroovyScriptingEngine
      * RenderingEngineManager: Manages the different Rendering
 Engines, calling them in turn.
      * RenderingEngine
            o Method: render(String content)
            o Implementation: XWikiLegacyRenderingEngine (current
 rendering engine)
            o Implementation: WikiModelRenderingEngine
      * Parser: content parsing
            o HTMLParser: parses HTML syntax
            o WikiParser: parses wiki syntax
            o Implementation: WikiModelHTMLParser
            o Implementation: WikiModelWikiParser
 Open Questions:
      * Does WikiModel support a generic syntax for macros? 
 WikiModel generates events for blocks that are not to be parsed
 (typically because they contain scripts).
 For example, in the WikiModel syntax currently called    "CommonSyntax",
  this looks like the following:
 ==============
 {{{macro:mymacro (String parameters)
 dothis
 dothat
 }}}
 $mymacro(parameters)
 ==============
 For each syntax, macro blocks are identified as far as possible (we
 still have to check it's the case for all types of macro blocks inde
 indeed).
       * Is the Rendering also in charge of
generating PDF, RTF,
 XML, etc?
            o I think so, need to modify interfaces above to reflect
 this.
      * The WikiParser needs to recognizes scripts since this is
 needed for the WYSIWYG editor. 
 the WikiModel parser recognizes scripts indeed.
 Mikhail and StÃ©phane
>
> Use cases
> ========
>
>      * View page
>            o ViewAction -- template ->
> ScriptingEngineManager.evaluate
> () -- wiki syntax -> RenderingEngineManager.render() ---> HTML,    XML,
 > PDF, RTF, etc
>      * Edit page in WYSIWYG editor
>            o Uses the WikiParser to create a "DOM" of the page
> content and to render it accordingly. NOTE: This is required since
> rendering in the WYSIWYG editor is different from the final
> rendering. For example, macros need to be shown in a special way to
> make them visible, etc.
>            o Changes done by the user are entered in HTML. Note: it
> would be better to capture them so that they are entered in the
> "DOM". Is that possible? If not, then the HTMLParser is used to
> convert from HTML to Wiki Syntax but they're likely be some loss in
> the conversion. The advantage is the ability to take any HTML    content
 >> and generate wiki syntax from it.
 >>
 >>
 >> This is my very earlier thinking but I wanted to make it visible to
 >> give everyone the change to 1) know what's happening and 2) suggest
 >> ideas.
 >>
 >> I'll refine this in the coming days and post again on this thread.
 >> 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [xwiki-devs] [Discussion] Designing the new Rendering/Parsing component/API