On 9/20/07, Sergiu Dumitriu < sergiu.dumitriu@gmail.com> wrote:

Sorry for the late reply, I didn't see your mail.

>
> > "HTMLParser: I think all parsers above need to support HTML since
> > the wiki syntaxes can be mixed with HTML"
> >
> > I don't understand this. What does manually entered HTML have to do
> > with wiki parsing?
>
> Because in wiki content user can introduce HTML.

Yes, but HTML should be left as-is. I don't see why it should be
parsed. Do you have a use-case for parsing HTML together with the wiki
syntax into a DOM?

If you put your HTML in a wiki page it *can* be left as is without any modifications if it is in a "verbatim" block. In this case it is up to the client code to interpret or not the content of such verbatim blocks. It can be something like: {{{html: <h1>Hello, world</h1>}}}
In this case the content of such a block can be directly inserted in the resulting generated HTML page.
In general verbatim blocks can be used to insert in pages all what you want to interpret yourself. It can be groovy scripts: {{{groovy: print "Hello, world!"}}}.

We need to parse HTML to transform any random web page to editable wiki content. BTW this code already exists and it works.

>
> To summarize we need to decide on the topic of whether we want to
> display wiki content in the user's preferred syntax or not. To be
> honest I also like that a lot since this is not something you find in
> other wikis but I'm worried about 2 things:
>
> A) feasibility. Aren't there always going to be lots of
> incompatibilities? Macros can be generic and work for all syntaxes so
> that's not an issue but what about links for example. XWiki's link
> syntax is richer than most other wiki's link syntax. For example if
> there's a reference to a another xwiki db in the link
> (otherwiki:SomeDocument) then what's going to happen when viewed
> with, say, a mediawiki syntax?

We either have to extend all the syntaxes, or to restrict the users
only to use the common subset among all syntaxes. Or something in the
middle, extend where possible, but trim all the things that don't have
an equivalent in one of the syntaxes. The best thing would be to try
to extend all syntaxes.

The main goal of multiple parsers for multiple wiki syntaxes is a possibility to import without loosing of information or formatting any external wiki pages and transform them to the WikiModel. If you wrote the same sequence of a titles, paragraphs, tables and lists formatted using XWiki, JSPWiki or MediaWiki syntaxes then they will give exactly the same structure in the WIkiModel.
If pages are simple (they don't contain any "advanced" stuff like hierarchical embedded documents or properties) then they *can* be serialized and edited using any particular simple wiki syntax (XWiki, JSPWiki, Creole, ...). WikiModel guaranties that any modifications introduced using these particular syntaxes will not be loosed. If you loose something then it should be considered as a bug.

> B) Complexity. Every user is going to be using his favorite syntax
> and thus when users talk together, copy/paste snippets on xwiki.org
> for example, they're all going to be in different syntaxes.
>

It does not matter what the syntax you use for editing of your document. All these syntaxes will produce the *same* structure. Using this structure it is possible to serialize documents in the CommonSyntax.
The cycle of editing is:
- Create and submit a new text using, for example, the JSPWiki syntax
- Parse the content using the JspWikiParser. This operation will produce a well-formed sequence of events for the listener (like: beginParagraph(..)/endParagraph(...); beginTable(...)/endTable(...)...). This step cleans up all user's errors like non-closed syntactic elements and so on.
- Using the CommonSyntaxSerializer a new wiki document will be generated with exactly the same structure as the original JSPWiki document
- This resulting document should be stored in the DB.

In WikiModel a document written using a particular syntax is just a reflection of the internal structure of this document. Each particular wiki syntax (JSPWiki, XWiki, Creole, ...) reflects only part of possible structural elements of the WikiModel. The CommonSyntax is the "native" syntax and it contains *all* possible elements available in the WikiModel (embedded documents, properties, extensions, ...). And it was designed taking into account availability and facility of usage of formatting symbols with various keyboard layouts.

Why it is important? Why do we need the CommonSyntax? Just some examples:
Ex1: Using the CommonSyntax it is possible to put 2 paragraphs, a list and a table into an another table. It can be done because there is a notion of "embedded document". AFAIK no other syntaxes give this possibilities. Even MediaWiki which have the most advanced (and most complicated) syntax.
Ex2: In a page containing the information about a person it is possible to define properties like "firstName", "lastName", "birthDate", "address" and so. So the document itself contains well structured semantic information as well as a normal text. In the future it can (and I think - it should) replace the notion of XWiki "objects" attached to documents.
Ex3: The symbol "|" does not exist in the Russian keyboard layout. To enter this symbol you have to switch from Russian to English. Imagine now that you want to create a table with 5 columns and 5 lines. How much times you have to switch? :-) So I use the sequence "::" as table cell delimiters (but "|" is recognized as well). Table cell delimiters are just one example. The same with many other structural elements.

C) Included documents. What syntax will they use? #include will copy
the content, and let radeox process it later, along with the includer
document. We can put the included document inside a
{syntax:$idoc.syntax} block.

As I said above WikiModel has a notion of "embedded documents". In WikiModel each wiki document is constructed from a sequence of block elements (headers, tables, lists, paragraphs,...). And block elements can have "embedded documents" which have exactly the same structural elements as the topmost one.
An example of a page with an embedded document (CommonSyntax):
----------------------------------------------
= Example1 =

The table below contains an embedded document.
Using such embedded documents you can insert table
in a list or a list in a table. And embedded documents
can contain their own embedded documents!!!

!! Header 1.1 !! Header 1.2
:: Cell 2.1 :: Cell 2.2 with an embedded document: (((
== This is an embedded document! ==
* list item one
* list item two
* sub-item A
* sub-item B
* list item three
)))
:: Cell 3.1 :: Cell 3.2

This is a paragraphs after the table...
----------------------------------------------

Please note that these "embedded documents" have nothing to do with external document inclusions. WikiModel works only with the content of one page. Its goal is just to recognize and manipulate with individual structural elements on wiki pages. If you want to make inclusions you can use "extensions" and interpret them in your code as you wish.
It can be something like that:
----------------------------------------------
= Example2 =

The text below will be recognized by the
WikiModel as an "extension" and it can be
interpreted in the user's code for example
to include an external page in this place.

$include(http://www.google.com)

The next paragraph...
----------------------------------------------

D) A common practice was to use velocity to generate wiki syntax that
Radeox would process, or to generate (radeox) macro parameters using
velocity, like the {rss:${userobj.feed}}. What happens if we
dynamically change the wiki syntax? Velocity can't know about that.
And with such fragmented code, it will be very hard to dynamically
change the syntax to the current user's preference.

If I understand well then the decision was to use the WikiModel instead of Radeox. WikiModel does exactly the same as Radeox does. With some differences:
- WikiModel guaranties that documents are well-formed. It is based on real grammars for JavaCC and not on regular expressions, like Radeox.
- WIkiModel contains parsers for multiple syntaxes (CommonSyntax, XWiki, JspWiki, Mediawiki, Creole, ...)
- WikiModel does not generate HTML; it just notify listeners about individual structural elements found in a document; And it is up to the implementors of these listeners to do something. For example - there is a listener which generates an HTML. An another can generate a wiki page with another wiki syntax. And so on...

So if you want to include an external document you can extend the HTML Listener, overload the method onExtension(String extensionContent) and make this inclusion operation.
In CommonSyntax extensions are defined as following:
----------------------------------------------
= Example3 =
This is an {{{rss: ${userobj.feed} }}}
----------------------------------------------

About Velocity... Personally I think that it is better NOT to use Velocity at all and to use Groovy templates instead.

> Maybe these 2 points aren't going to be an issue but I'd like to make
> sure they're not since this is an important decision and what we gain
> from implementing it is not so high in my opinion when compared with
> the option of deciding the syntax at the level of the page or the
> whole wiki.
>
> Actually if there are performance issues it might even be possible to
> combine both:
> * the page is edited in the default syntax (the one configured at the
> page level or wiki level)
> * there's an export option to export in a different syntax, same as
> exporting in PDF, RTF, etc.
>

What happens after the export? The user edits that version in an
offline tool (XEclipse, for example), then he can reimport the changed
document, which will be converted back to the original syntax.

I'd rather have a "convert" button, which will try to convert the
document to another syntax. If there are things that can't be
converted, then warn about this, and offer a Yes/No choice to the
user, allowing him to force the converted syntax, or abandon the
conversion.

Hm, talking about XEclipse, maybe we can leave the "edit using XYZ
syntax" as an XEclipse feature, not present in XWiki platform. This
way we'll remove the stress from the server, as the conversion could
be performed on the client.

You can loose some information only when you transform WikiModel-specific structural elements (like embedded documents or properties) into an external format (XWiki, JSpWiki, ...). When you import from other format to WikiModel you should loose nothing. Otherwise it is considered as a bug in the implementation of WikiModel's parsers .
So if you exported a wiki page to particular syntax without warnings you can be sure that all your modifications will be seamlessly integrated back.

> Thanks
> -Vincent
>

And a question on wikimodel, what if there's a feature we need but
doesn't have an equivalent in the WikiDOM?

Hmm...
- WikiModel works with a super-set of structural elements available in existing wikis (in those wikis which I know :-)) and it contains additional features like embedded documents or properties. So if you found a structural element existing in other wikis and not presented in WikiModel (and which can not be *easly* simulated with existing elements) then you should consider it as a bug. And such a structural element should be added to the WikiModel as soon as possible.
- If you need some additional features and they can not be "externalized" in verbatim blocks then... write me. We will discuss :-)

I think that the WikiModel can give the common infrastructure which works with well-known elements. If you need something specific - just put it in a verbatim block and interpret it yourself in your code.
----------------------------------------------
= Example4 =
This is a verbatim block:
{{{
This is a verbatim block.
It can be used to insert in
the final page
a <strong>junk
and <em>bad-formed</strong>
html</em>!!!
}}}

And the next block can be interpreted in
your code as a groovy script:

{{{groovy: println "Hello, world!" }}}

----------------------------------------------

WikiModel is written using JavaCC grammars. Modifications of these grammars is not a very complicated task but it is definitely requires more work than just changing of configuration files.

And about the WikiModel DOM... As I wrote above, the last version of the WikiModel does not contain DOM yet. Just the common infrastructure and a set of parsers for various wiki syntaxes generating well-formed events for structural elements.

Best regards,
Mikhail

Sergiu
--
http://purl.org/net/sergiu
_______________________________________________
devs mailing list
devs@xwiki.org
http://lists.xwiki.org/mailman/listinfo/devs