Hello!
I tried to response to your questions below from the point of view of a
WikiModel developer :-)
On 9/20/07, Sergiu Dumitriu <sergiu.dumitriu(a)gmail.com> wrote:
Sorry for the late reply, I didn't see your mail.
"HTMLParser: I think all parsers above need
to support HTML since
the wiki syntaxes can be mixed with HTML"
I don't understand this. What does manually entered HTML have to do
with wiki parsing?
Because in wiki content user can introduce HTML.
Yes, but HTML should be left as-is. I don't see why it should be
parsed. Do you have a use-case for parsing HTML together with the wiki
syntax into a DOM?
If you put your HTML in a wiki page it *can* be left as is without any
modifications if it is in a "verbatim" block. In this case it is up to the
client code to interpret or not the content of such verbatim blocks. It can
be something like: {{{html: <h1>Hello, world</h1>}}}
In this case the content of such a block can be directly inserted in the
resulting generated HTML page.
In general verbatim blocks can be used to insert in pages all what you want
to interpret yourself. It can be groovy scripts: {{{groovy: print "Hello,
world!"}}}.
We need to parse HTML to transform any random web page to editable wiki
content. BTW this code already exists and it works.
To summarize we need to decide on the topic of
whether we want to
display wiki content in the user's preferred syntax or not. To be
honest I also like that a lot since this is not something you find in
other wikis but I'm worried about 2 things:
A) feasibility. Aren't there always going to be lots of
incompatibilities? Macros can be generic and work for all syntaxes so
that's not an issue but what about links for example. XWiki's link
syntax is richer than most other wiki's link syntax. For example if
there's a reference to a another xwiki db in the link
(otherwiki:SomeDocument) then what's going to happen when viewed
with, say, a mediawiki syntax?
We either have to extend all the syntaxes, or to restrict the users
only to use the common subset among all syntaxes. Or something in the
middle, extend where possible, but trim all the things that don't have
an equivalent in one of the syntaxes. The best thing would be to try
to extend all syntaxes.
The main goal of multiple parsers for multiple wiki syntaxes is a
possibility to import without loosing of information or formatting any
external wiki pages and transform them to the WikiModel. If you wrote the
same sequence of a titles, paragraphs, tables and lists formatted using
XWiki, JSPWiki or MediaWiki syntaxes then they will give exactly the same
structure in the WIkiModel.
If pages are simple (they don't contain any "advanced" stuff like
hierarchical embedded documents or properties) then they *can* be serialized
and edited using any particular simple wiki syntax (XWiki, JSPWiki, Creole,
...). WikiModel guaranties that any modifications introduced using these
particular syntaxes will not be loosed. If you loose something then it
should be considered as a bug.
B) Complexity. Every user is going to be using his
favorite syntax
> and thus when users talk together, copy/paste snippets on
xwiki.org
> for example, they're all going to be in different syntaxes.
>
It does not matter what the syntax you use for editing of your document. All
these syntaxes will produce the *same* structure. Using this structure it is
possible to serialize documents in the CommonSyntax.
The cycle of editing is:
- Create and submit a new text using, for example, the JSPWiki syntax
- Parse the content using the JspWikiParser. This operation will produce a
well-formed sequence of events for the listener (like:
beginParagraph(..)/endParagraph(...); beginTable(...)/endTable(...)...).
This step cleans up all user's errors like non-closed syntactic elements and
so on.
- Using the CommonSyntaxSerializer a new wiki document will be generated
with exactly the same structure as the original JSPWiki document
- This resulting document should be stored in the DB.
In WikiModel a document written using a particular syntax is just a
reflection of the internal structure of this document. Each particular wiki
syntax (JSPWiki, XWiki, Creole, ...) reflects only part of possible
structural elements of the WikiModel. The CommonSyntax is the "native"
syntax and it contains *all* possible elements available in the WikiModel
(embedded documents, properties, extensions, ...). And it was designed
taking into account availability and facility of usage of formatting symbols
with various keyboard layouts.
Why it is important? Why do we need the CommonSyntax? Just some examples:
Ex1: Using the CommonSyntax it is possible to put 2 paragraphs, a list and a
table into an another table. It can be done because there is a notion of
"embedded document". AFAIK no other syntaxes give this possibilities. Even
MediaWiki which have the most advanced (and most complicated) syntax.
Ex2: In a page containing the information about a person it is possible to
define properties like "firstName", "lastName", "birthDate",
"address" and
so. So the document itself contains well structured semantic information as
well as a normal text. In the future it can (and I think - it should)
replace the notion of XWiki "objects" attached to documents.
Ex3: The symbol "|" does not exist in the Russian keyboard layout. To enter
this symbol you have to switch from Russian to English. Imagine now that you
want to create a table with 5 columns and 5 lines. How much times you have
to switch? :-) So I use the sequence "::" as table cell delimiters (but
"|"
is recognized as well). Table cell delimiters are just one example. The same
with many other structural elements.
C) Included documents. What syntax will they use? #include will copy
the content, and let radeox process it later, along
with the includer
document. We can put the included document inside a
{syntax:$idoc.syntax} block.
As I said above WikiModel has a notion of "embedded documents". In WikiModel
each wiki document is constructed from a sequence of block elements
(headers, tables, lists, paragraphs,...). And block elements can have
"embedded documents" which have exactly the same structural elements as the
topmost one.
An example of a page with an embedded document (CommonSyntax):
----------------------------------------------
= Example1 =
The table below contains an embedded document.
Using such embedded documents you can insert table
in a list or a list in a table. And embedded documents
can contain their own embedded documents!!!
!! Header 1.1 !! Header 1.2
:: Cell 2.1 :: Cell 2.2 with an embedded document: (((
== This is an embedded document! ==
* list item one
* list item two
* sub-item A
* sub-item B
* list item three
)))
:: Cell 3.1 :: Cell 3.2
This is a paragraphs after the table...
----------------------------------------------
Please note that these "embedded documents" have nothing to do with external
document inclusions. WikiModel works only with the content of one page. Its
goal is just to recognize and manipulate with individual structural elements
on wiki pages. If you want to make inclusions you can use "extensions" and
interpret them in your code as you wish.
It can be something like that:
----------------------------------------------
= Example2 =
The text below will be recognized by the
WikiModel as an "extension" and it can be
interpreted in the user's code for example
to include an external page in this place.
$include(http://www.google.com)
The next paragraph...
----------------------------------------------
D) A common practice was to use velocity to generate wiki syntax that
Radeox would process, or to generate (radeox) macro
parameters using
velocity, like the {rss:${userobj.feed}}. What happens if we
dynamically change the wiki syntax? Velocity can't know about that.
And with such fragmented code, it will be very hard to dynamically
change the syntax to the current user's preference.
If I understand well then the decision was to use the WikiModel instead of
Radeox. WikiModel does exactly the same as Radeox does. With some
differences:
- WikiModel guaranties that documents are well-formed. It is based on real
grammars for JavaCC and not on regular expressions, like Radeox.
- WIkiModel contains parsers for multiple syntaxes (CommonSyntax, XWiki,
JspWiki, Mediawiki, Creole, ...)
- WikiModel does not generate HTML; it just notify listeners about
individual structural elements found in a document; And it is up to the
implementors of these listeners to do something. For example - there is a
listener which generates an HTML. An another can generate a wiki page with
another wiki syntax. And so on...
So if you want to include an external document you can extend the HTML
Listener, overload the method onExtension(String extensionContent) and make
this inclusion operation.
In CommonSyntax extensions are defined as following:
----------------------------------------------
= Example3 =
This is an {{{rss: ${userobj.feed} }}}
----------------------------------------------
About Velocity... Personally I think that it is better NOT to use Velocity
at all and to use Groovy templates instead.
Maybe these 2 points aren't going to be an issue
but I'd like to make
sure they're not since this is an important
decision and what we gain
from implementing it is not so high in my opinion when compared with
the option of deciding the syntax at the level of the page or the
whole wiki.
Actually if there are performance issues it might even be possible to
combine both:
* the page is edited in the default syntax (the one configured at the
page level or wiki level)
* there's an export option to export in a different syntax, same as
exporting in PDF, RTF, etc.
What happens after the export? The user edits that version in an
offline tool (XEclipse, for example), then he can reimport the changed
document, which will be converted back to the original syntax.
I'd rather have a "convert" button, which will try to convert the
document to another syntax. If there are things that can't be
converted, then warn about this, and offer a Yes/No choice to the
user, allowing him to force the converted syntax, or abandon the
conversion.
Hm, talking about XEclipse, maybe we can leave the "edit using XYZ
syntax" as an XEclipse feature, not present in
XWiki platform. This
way we'll remove the stress from the server, as the conversion could
be performed on the client.
You can loose some information only when you transform WikiModel-specific
structural elements (like embedded documents or properties) into an
external format (XWiki, JSpWiki, ...). When you import from other format to
WikiModel you should loose nothing. Otherwise it is considered as a bug in
the implementation of WikiModel's parsers .
So if you exported a wiki page to particular syntax without warnings you can
be sure that all your modifications will be seamlessly integrated back.
Thanks
-Vincent
And a question on wikimodel, what if there's a feature we need but
doesn't have an equivalent in the WikiDOM?
Hmm...
- WikiModel works with a super-set of structural elements available in
existing wikis (in those wikis which I know :-)) and it contains additional
features like embedded documents or properties. So if you found a structural
element existing in other wikis and not presented in WikiModel (and which
can not be *easly* simulated with existing elements) then you should
consider it as a bug. And such a structural element should be added to the
WikiModel as soon as possible.
- If you need some additional features and they can not be "externalized" in
verbatim blocks then... write me. We will discuss :-)
I think that the WikiModel can give the common infrastructure which works
with well-known elements. If you need something specific - just put it in a
verbatim block and interpret it yourself in your code.
----------------------------------------------
= Example4 =
This is a verbatim block:
{{{
This is a verbatim block.
It can be used to insert in
the final page
a <strong>junk
and <em>bad-formed</strong>
html</em>!!!
}}}
And the next block can be interpreted in
your code as a groovy script:
{{{groovy: println "Hello, world!" }}}
----------------------------------------------
WikiModel is written using JavaCC grammars. Modifications of these grammars
is not a very complicated task but it is definitely requires more work than
just changing of configuration files.
And about the WikiModel DOM... As I wrote above, the last version of the
WikiModel does not contain DOM yet. Just the common infrastructure and a set
of parsers for various wiki syntaxes generating well-formed events for
structural elements.
Best regards,
Mikhail
Sergiu
--
http://purl.org/net/sergiu
_______________________________________________
devs mailing list
devs(a)xwiki.org
http://lists.xwiki.org/mailman/listinfo/devs