Right approach for filtering

Zeljko Trogrlic

27 Mar 2007 27 Mar '07

6:17 a.m.

Hi, For our department Wiki, I really needed automatic links for acronyms because we have a whole bunch for them. So, after couple of days playing with code and regular expressions I managed to create Radeox filter, which converts acronyms to links. My questions are: 1) Is that the right approach to do that or I should write plug-in? Can you write a plug-in that processes every page, or you have to call it explicitly? 2) What is the right place to attach it to rendering? I attached it to the end, just before link filter, but then I had to struggle with HTML. It will be probably easier to write something that processes pure Wiki or abstract syntax tree, but it seems to me that there is no such place in Wiki. If I do the processing immediately after page is loaded, macros are not evaluated yet, therefore text generated by macros can't be processed (I don't know is this a big deal, though). It seems to me that macros are generating HTML instead of Wiki, which is not very clean. On the other hand it will be hard to write event simple macros with formatted text (e.g. "warning") without generating HTML directly, or introducing some kind of style tags for Wiki markup. 3) Radeox status: What is your experienece with Radeox team? Is there anybody still working on the project? I found some some funny stuff in the source code, which I would like to fix. They "swallowed" exceptions in some places, so if you mis-configure filter, you will not why it doesn't work.

Show replies by date

Sergiu Dumitriu

27 Mar 27 Mar

9:59 a.m.

New subject: [xwiki-dev] Right approach for filtering

On 3/27/07, Zeljko Trogrlic <[email protected]> wrote:

...

Hi,

For our department Wiki, I really needed automatic links for acronyms because we have a whole bunch for them. So, after couple of days playing with code and regular expressions I managed to create Radeox filter, which converts acronyms to links.

When you finalize it, you can post it in the code zone on xwiki.org My questions are:

...

1) Is that the right approach to do that or I should write plug-in? Can you write a plug-in that processes every page, or you have to call it explicitly?

Yes, that is the right approach (for the moment). Plugins are usually more complex, converting wiki syntax to html is the job of Radeox. 2)

...

What is the right place to attach it to rendering? I attached it to the end, just before link filter, but then I had to struggle with HTML. It will be probably easier to write something that processes pure Wiki or abstract syntax tree, but it seems to me that there is no such place in Wiki.

There is no global right place, it all depends on how it is influenced or it influences the rest of the filters. You have to analyze and test this. For example, a bug regarding links and emphasizing syntax was fixed just by reordering the filters. An abstract syntax tree will be available once we switch from Radeox to wikimodel. If I do the processing immediately after page is loaded, macros are not

...

evaluated yet, therefore text generated by macros can't be processed (I don't know is this a big deal, though).

It seems to me that macros are generating HTML instead of Wiki, which is not very clean. On the other hand it will be hard to write event simple macros with formatted text (e.g. "warning") without generating HTML directly, or introducing some kind of style tags for Wiki markup.

3) Radeox status: What is your experienece with Radeox team? Is there anybody still working on the project? I found some some funny stuff in the source code, which I would like to fix. They "swallowed" exceptions in some places, so if you mis-configure filter, you will not why it doesn't work.

The Radeox team decided that it is perfect some time ago, although there are some bugs, filters that should have been written differently, invalid markup generated. Because of this, and the problems mentioned above (finding the right ordering in order to avoid processing already generated HTML, lack of a wiki abstract tree), we decided to move to WikiModel as soon as possible. But this "ASAP" is not going to happen soon enough, so any patch you can provide before XWiki 1.0 will be much appreciated. Sergiu -- http://purl.org/net/sergiu

Zeljko Trogrlic

11:32 a.m.

Sergiu Dumitriu <sergiu.dumitriu@...> writes:

...

On 3/27/07, Zeljko Trogrlic <[email protected]> wrote: Hi,For our department Wiki, I really needed automatic links for acronymsbecause we have a whole bunch for them. So, after couple of days playingwith code and regular expressions I managed to create Radeox filter, which converts acronyms to links.

When you finalize it, you can post it in the code zone on xwiki.org

I think I'm done for now and I will post it ASAP. I'm not sure where, though. Maybe under extensions? It doesn't fit into any category. Basically, it's a simple Java class and a small configuration change.

...

The Radeox team decided that it is perfect some time ago, although there are some bugs, filters that should have been written differently, invalid markup generated. Because of this, and the problems mentioned above (finding the right ordering in order to avoid processing already generated HTML, lack of a wiki abstract tree), we decided to move to WikiModel as soon as possible. But this "ASAP" is not going to happen soon enough, so any patch you can provide before XWiki 1.0 will be much appreciated.Sergiu-- http://purl.org/net/sergiu

OK, but where to put fix for Radeox? Do you keep the copy of their source tree, too?

Vincent Massol

4:56 p.m.

New subject: [xwiki-dev] Re: Right approach for filtering

On Mar 27, 2007, at 11:32 AM, Zeljko Trogrlic wrote:

...

Sergiu Dumitriu <sergiu.dumitriu@...> writes:

...
On 3/27/07, Zeljko Trogrlic <[email protected]> wrote: Hi,For our department Wiki, I really needed automatic links for acronymsbecause we have a whole bunch for them. So, after couple of days playingwith code and regular expressions I managed to create Radeox filter, which converts acronyms to links.

When you finalize it, you can post it in the code zone on xwiki.org

I think I'm done for now and I will post it ASAP. I'm not sure where, though. Maybe under extensions? It doesn't fit into any category. Basically, it's a simple Java class and a small configuration change.

I'd put it in the Plugin section. Can you package it as a JAR and attach it there. Then user drop it in the WEB-INF/lib and make the config change you'll describe on the plugin form.

...

...
The Radeox team decided that it is perfect some time ago, although there are some bugs, filters that should have been written differently, invalid markup generated. Because of this, and the problems mentioned above (finding the right ordering in order to avoid processing already generated HTML, lack of a wiki abstract tree), we decided to move to WikiModel as soon as possible. But this "ASAP" is not going to happen soon enough, so any patch you can provide before XWiki 1.0 will be much appreciated.Sergiu-- http://purl.org/net/sergiu

OK, but where to put fix for Radeox? Do you keep the copy of their source tree, too?

Is that required for your code to work or is it just something nice to have? If it's required for your code, then you could attach the patch to the Plugin form and also attach a modified and patched radeox JAR. Of course it might be better to implement something like what I suggested in my earlier email (I'm writing this offline now so you may have answered to that email already...) Thanks -Vincent

Zeljko Trogrlic

9:43 p.m.

Vincent Massol wrote:

...

...
OK, but where to put fix for Radeox? Do you keep the copy of their source tree, too?

Is that required for your code to work or is it just something nice to have? If it's required for your code, then you could attach the patch to the Plugin form and also attach a modified and patched radeox JAR.

Of course it might be better to implement something like what I suggested in my earlier email (I'm writing this offline now so you may have answered to that email already...)

No, it is not critical, but it will be easier for all of us if exceptions are actually reported ;) To which other email are you referring to?

Vincent Massol

9:50 p.m.

New subject: [xwiki-dev] Re: Right approach for filtering

On Mar 27, 2007, at 9:43 PM, Zeljko Trogrlic wrote:

...

Vincent Massol wrote:

...
...
OK, but where to put fix for Radeox? Do you keep the copy of their source tree, too? Is that required for your code to work or is it just something nice to have? If it's required for your code, then you could attach the patch to the Plugin form and also attach a modified and patched radeox JAR. Of course it might be better to implement something like what I suggested in my earlier email (I'm writing this offline now so you may have answered to that email already...)

No, it is not critical, but it will be easier for all of us if exceptions are actually reported ;)

To which other email are you referring to?

The one where I said: " Interesting problem... I think there are several possibilities: 1) You intercept page save and parse the content to add links for acronyms. 2) You do as you suggest and play with the rendering. This means that the links will not exist in the database though I prefer 1) because with 2) you'll probably not benefit from features like automatic page rename. In addition 1) is easier to implement as we have a notification API to get called when the page is saved for example. What algorithm will you use to detect an acronym vs a standard word? " Thanks -Vincent

Zeljko Trogrlic

11:37 p.m.

Vincent Massol wrote:

...

...
To which other email are you referring to?

The one where I said:

Was it in reply to my email? Can't see it.

...

" Interesting problem... I think there are several possibilities:

1) You intercept page save and parse the content to add links for acronyms. 2) You do as you suggest and play with the rendering. This means that the links will not exist in the database though

I prefer 1) because with 2) you'll probably not benefit from features like automatic page rename. In addition 1) is easier to implement as we have a notification API to get called when the page is saved for example.

I agree that 1) is better. It is also faster, and it is step to phase two - automatic links. Where/how to hook?

...

What algorithm will you use to detect an acronym vs a standard word?

2 letters or more, starting with upper case, letters in the middle, numbers or upper case at the end. E.g. MS IPv6 RADIUS I'm writing from home so I don't have exact regex here. It took me 3 days to figure out how to skip HTML tags and existing links.

Zeljko Trogrlic

30 Mar 30 Mar

8:06 a.m.

Vincent Massol wrote:

...

Interesting problem... I think there are several possibilities:

1) You intercept page save and parse the content to add links for acronyms. 2) You do as you suggest and play with the rendering. This means that the links will not exist in the database though

I prefer 1) because with 2) you'll probably not benefit from features like automatic page rename. In addition 1) is easier to implement as we have a notification API to get called when the page is saved for example.

What algorithm will you use to detect an acronym vs a standard word?

2) done and stored in plugin section. On second thought, 1) is not that important with acronyms because they are immutable. However, if existing renaming concept tries to rename everything at once, and if 1) is used, then there is a race condition between automatic links and renaming. Renaming should leave old page as redirect page for some time. Redirect page should be detectable by link processors (using special class?), so they can detect outdated links and replace them eventually.

Sergiu Dumitriu

9:01 a.m.

New subject: [xwiki-dev] Re: Right approach for filtering

On 3/30/07, Zeljko Trogrlic <[email protected]> wrote:

...

Renaming should leave old page as redirect page for some time. Redirect page should be detectable by link processors (using special class?), so they can detect outdated links and replace them eventually.

Very good observation. I do agree with this. Vincent, can you take care of this? -- http://purl.org/net/sergiu

Vincent Massol

9:10 a.m.

New subject: [xwiki-dev] Re: Right approach for filtering

On Mar 30, 2007, at 9:01 AM, Sergiu Dumitriu wrote:

...

On 3/30/07, Zeljko Trogrlic <[email protected]> wrote: Renaming should leave old page as redirect page for some time. Redirect page should be detectable by link processors (using special class?), so they can detect outdated links and replace them eventually.

Very good observation. I do agree with this. Vincent, can you take care of this?

I also agree with this. We need a JIRA for it. It's probably not going to happen before 1.0 final though (unless someone submits a full working patch for it). It requires some changes to the data model I believe, in order to do it properly (because deleted pages should not be visible as normal pages). Thanks -Vincent

Sergiu Dumitriu

11:02 a.m.

New subject: [xwiki-dev] Re: Right approach for filtering

1. We can add a new table, xwikiredirects, which holds redirect URLs. 2. We add document.getRedirects() (maybe both ways, from and to) 3. We need can change docdoesnotexist.vm to check for redirects. On 3/30/07, Vincent Massol <[email protected]> wrote:

...

On Mar 30, 2007, at 9:01 AM, Sergiu Dumitriu wrote:

On 3/30/07, Zeljko Trogrlic <[email protected]> wrote:

...
Renaming should leave old page as redirect page for some time. Redirect page should be detectable by link processors (using special class?), so they can detect outdated links and replace them eventually.

Very good observation. I do agree with this. Vincent, can you take care of this?

I also agree with this. We need a JIRA for it. It's probably not going to happen before 1.0 final though (unless someone submits a full working patch for it). It requires some changes to the data model I believe, in order to do it properly (because deleted pages should not be visible as normal pages).

Thanks -Vincent

-- http://purl.org/net/sergiu

Zeljko Trogrlic

31 Mar 31 Mar

7:08 p.m.

Sergiu Dumitriu wrote:

...

1. We can add a new table, xwikiredirects, which holds redirect URLs. 2. We add document.getRedirects() (maybe both ways, from and to) 3. We need can change docdoesnotexist.vm to check for redirects.

That would be great. Or, you can take slightly different approach: Use two tables: * "dictionary" * "document" (this is existing table). In "dictionary" table, store all possible names for document. It could be used for * renaming * synonyms. Same document can have multiple names, e.g. "IP" and "Internet Protocol". Both entries in dictionary point to the same entry in documents. If document is renamed, create new entry and add put its name in the redirect field of the original entry. Original entry becomes deprecated, but it is still working. Optionally, update all old links. Optionally, garbage collect old synonyms. You should probably do this after 1.0

Sergiu Dumitriu

7:26 p.m.

New subject: [xwiki-dev] Re: Right approach for filtering

On 3/31/07, Zeljko Trogrlic <[email protected]> wrote:

...

Sergiu Dumitriu wrote:

...
1. We can add a new table, xwikiredirects, which holds redirect URLs. 2. We add document.getRedirects() (maybe both ways, from and to) 3. We need can change docdoesnotexist.vm to check for redirects.

That would be great. Or, you can take slightly different approach: Use two tables: * "dictionary" * "document" (this is existing table).

In "dictionary" table, store all possible names for document. It could be used for * renaming * synonyms.

Same document can have multiple names, e.g. "IP" and "Internet Protocol". Both entries in dictionary point to the same entry in documents.

You mean that pointing to ..../view/Doc/IP will display the Doc/Internet_Protocol document, but without redirecting? (this is what MediaWiki does). Otherwise, I don't see the difference with the approach above. I did not describe the actual structure of the table, but how it can be used. If document is renamed, create new entry and add put its name in the

...

redirect field of the original entry. Original entry becomes deprecated, but it is still working.

"It is still working" means that the page content is preserved as it is at the moment of renaming? Optionally, update all old links.

...

Optionally, garbage collect old synonyms.

Not automatically, because IP should always be a synonim for Internet Protocol (until we make a disambiguation page). But anyway, the redirects/synonyms table should be editable from the wiki interface. You should probably do this after 1.0 Definitely. Sergiu -- http://purl.org/net/sergiu

Zeljko Trogrlic

11:47 p.m.

Sergiu Dumitriu wrote:

...

On 3/31/07, *Zeljko Trogrlic* <[email protected] <mailto:[email protected]>> wrote:

Sergiu Dumitriu wrote: > 1. We can add a new table, xwikiredirects, which holds redirect URLs. > 2. We add document.getRedirects() (maybe both ways, from and to) > 3. We need can change docdoesnotexist.vm to check for redirects.

That would be great. Or, you can take slightly different approach: Use two tables: * "dictionary" * "document" (this is existing table).

In "dictionary" table, store all possible names for document. It could be used for * renaming * synonyms.

Same document can have multiple names, e.g. "IP" and "Internet Protocol". Both entries in dictionary point to the same entry in documents.

You mean that pointing to ..../view/Doc/IP will display the Doc/Internet_Protocol document, but without redirecting? (this is what MediaWiki does). Otherwise, I don't see the difference with the approach above. I did not describe the actual structure of the table, but how it can be used.

I just described what was on my mind, I'm not sure yet is it practical from both technical and practical point of view. I think that MediaWiki has a main name + redirections. "Redirected from" is shown only if you use redirection and you can actually go to the redirect page by clicking on link. This is actually closer to your solution, and I think it better than my idea to keep all synonyms equal, because it will be confusing for the user if he writes one thing and gets another. This is why "redirected from" message is important.

...

If document is renamed, create new entry and add put its name in the redirect field of the original entry. Original entry becomes deprecated, but it is still working.

"It is still working" means that the page content is preserved as it is at the moment of renaming?

In this approach, documents are independent of their names. "Original entry" means "original synonym". Document stays alive as long as it has at least one synonym attached to it. The point was, if synonym was renamed, XWiki should somehow discourage user from using it, but it should still work for old links.

...

Optionally, update all old links.

Optionally, garbage collect old synonyms.

Not automatically, because IP should always be a synonim for Internet Protocol (until we make a disambiguation page). But anyway, the redirects/synonyms table should be editable from the wiki interface.

Note that there are two kinds of synonyms: * "active" snonyms * renamed synonyms; these should be updated/garbage collected. Example: you described IPv4 and you gave it synonyms "IP" and "Internet Protocol". Later you noticed that you have to add IPv6 and that you have to rename "IP" to "IPv4". After renaming, you have 3 synonyms: * IPv4 - active * IP - inactive, marked as replaced by IP, and pointing to the same doc * Internet protocol - active. All three synonyms are pointing to the same document. Synonym will allow some interesting features, like automatic linking. We could even provide different linking policy: e.g. if term is used more than once on the page, mark just first occurrence to avoid too much links.

Zeljko Trogrlic

27 Mar 27 Mar

1:55 p.m.

Sergiu Dumitriu <sergiu.dumitriu@...> writes:

...

The Radeox team decided that it is perfect some time ago, although there are some bugs, filters that should have been written differently, invalid markup generated. Because of this, and the problems mentioned above (finding the right ordering in order to avoid processing already generated HTML, lack of a wiki abstract tree), we decided to move to WikiModel as soon as possible. But this "ASAP" is not going to happen soon enough, so any patch you can provide before XWiki

WikiModel looks cool. Do they provide extension mechanism for macros and stuff?

7032

Age (days ago)

7036

Last active (days ago)

List overview

Download

14 comments

3 participants

participants (3)

Sergiu Dumitriu
Vincent Massol
Zeljko Trogrlic