On Oct 31, 2009, at 12:26 PM, Eduard Moraru wrote:
Context left and context right could be an idea.
However, what do you do about the static size of the context when, for
example, you have a 500 character document and you make only 2
annotations?
That results in storing 2x300 = 600 characters in just 2
annotations. That
is already duplicating the document's content in size. If you make
additional annotations, you duplicate the document several times.
That's a tradeoff of course...
the underlying problem is that many XWiki documents have the content
users finally see, generated in some way (think of a blog post, or the
Watch news coming from objects)
We need to find a way to map what the user see to where it comes from.
Now, since XWiki allows you to display everything using several Turing-
Complete mechanisms (groovy scripts, velocity, etc.), making this
mapping implies being able to understand what's coded in a page (not
possible), or force the author to "mark" somehow the source of the
content in their script (impractical), or give the user a constrained
scripting language where this information is made explicit (limiting)
The solution is to apply heuristics in order to retrieve annotations
in the text the users really sees : we called this "Canonical
Representation", which basically corresponds to the XDOM after the
transformations and before the rendering. In this way we don't really
care where the annotated content comes from. As long as it's there and
we are able to recognize and locate it, we can display it as annotated
content. If we are unable to do so then we simply don't display the
annotation.
Now the problem is : what are reasonable heuristics that work in the
most common cases? (80/20 rule) We proposed one.
The part where annotations appear depending on user
rights, sounds
cool, but
how can you detect when the dynamic content changes and fix your
annotations? (like you do for static content)
Again, heuristics.
In the case when you have no generated content (what the user sees is
all contained in a single page) you can rely on a diff from the
previous version of the page and be able to understand what happened
(adjusting annotation accordingly). This, imho, should work perfectly.
In the case of generated content you could not be able to do a diff
(because you don't know where the content came from, and consequently
what changed) but you can still be able to do some smart things in
order to "guess" what happened to your annotation. And if you are
unable to do this guess then you display in a box that there are
"stale" annotations that were there before and that cannot be placed
anymore.
While I'm not convinced about this approach, you
may be right and,
comparing
with the existing one (which I did not take the time to understand
in detail
as you had), and other issues which you underlined in your reply, it
sounds
like a start.
It's surely a start. But what was clear during our discussions with
Anca was that offsets are brittle and cumbersome when content comes
from different sources : if you want to use offsets and annotate a
blog post, for example, you should be able to say that the annotation
starts at offset X of the field Y of the object Z on the page P (or
any variant of this for any possible content source). Who gives you
this information if all that you can see in the requested page is a
#include('something') ? How could you encode this information in a
standard way ? Too much complicated.
Since we have a lot of use cases of this type (blog post, watch feeds,
and in general data coming from objects and displayed using general
purposes languages) we should think about another simpler solution.
The proposed one is not perfect (this is the price to pay for having
such a powerful wiki platform that allows you to do whatever it's
calculable) but it should work nicely in most cases. As I said before
it should cover correctly all the cases where documents are self-
contained (i.e., all the use cases where the current annotation system
works)
Returning to your remark at the beginning about the storage... That's
a tradeoff. It's sure that the more data we store, the higher is the
degree of correctness in the dynamic cases.
Hope that this clarified a little bit what we are trying to achieve.
Anyway, if you have more ideas/comments don't hesitate.
-Fabio
P.S.: Offsets could be useful in the heuristics too and we could
continue to store them as well. In fact they could help to locate,
more or less, where the annotation was done. But they should only give
a hint, not a precise information.