[xwiki-dev] Difference Engine Refactoring and Improvements

Vincent Massol vincent at massol.net
Wed May 2 10:34:24 CEST 2007


On May 2, 2007, at 10:13 AM, Ludovic Dubost wrote:

[snip]

>>> Since there is no generic diff for objects, I'd like to write a  
>>> Diff plugin allowing to make a nice diff of any two strings passed.
>>
>> Are you talking about Object diffs or String diffs here? Or do you  
>> mean XML diff?
>>
>> I think we have 2 options:
>> - Diff of Objects: Difference getDifferences(Object o1, Object  
>> o2). I guess the diff could then be the difference of object  
>> fields. This would need to be implemented for each XWiki Object.
>> - XML difference. This is the XML representation of XWiki Objects.  
>> I don't think there are any good/simple XML diff frameworks so we  
>> would also need to implement that.
>>
>> I think it's better to do an Object diff as otherwise the xml diff  
>> would need to be transformed to be presented to the user and this  
>> will require extra parsing. Better to operate on Object as we  
>> already have them in our java code.
> I'm talking about Object Diff.. We already have a diff of objects  
> but it was not differentiating the text inside the fields. Now it  
> is doing this..

ok we're in line then. I thought you were talking about some text  
diff because the api belows only takes strings.

>>
>>> At the same time I'd like to start a refactoring of the current  
>>> diff in the same plugin.
>>
>>> Currently I see the following APIs:
>>>
>>> DiffPlugin
>>>  // returns a list of org.suigeneris.jrcs.diff.Delta (which  
>>> representd differences)
>>>  getLineDiffAsList(String content1, String content2)
>>>  // returns a list of org.suigeneris.jrcs.diff.Delta (which  
>>> representd differences)
>>>  getWordDiffAsList(String content1, String content2)
>>>
>>>  // returns an HTML view of differences
>>>  getLineDiffAsList(String content1, String content2)
>>>  // returns an HTML view of differences
>>>  getWordDiffAsList(String content1, String content2)
>>>
>>>  // returns an Text view of differences
>>>  getLineDiffAsList(String content1, String content2)
>>>  // returns an Text view of differences
>>>  getWordDiffAsList(String content1, String content2)
>>>
>>
>> I don't understand. I would have used something like:
>>
>> List<Difference> getDifferences(XWikiDocument, XWikiDocument)
>> List<Difference> getDifferences(XObject, XObject)
>> List<Difference> getDifferences(String, String)
> Ok.. I can look at changing these APIs. However the return of  
> getDifferences(XWikiDocument,XWikiDocument) or getDifferences 
> (XObject, XObject) can be quite complex in terms of Java structure.

Without thinking a lot about it, I would see something like this for  
the Difference object:

- Context information (String). This would be the property for  
example when comparing 2 objects. It could be "page" when comparing  
the content of a page, etc.
- Old value (String).
- New value (String).
- Location: some information where the change appears, possibly also  
the text surrounding the difference, etc

I think we need both StringDifference and ObjectDifference which both  
implement Difference, so getDifferences(XWikiDocument, XWikiDocument)  
would return a list of both, getDifferences(XObject, XObject) would  
return a list of ObjectDifference and getDifferences(String, String)  
a list of StringDifference.

> Currently we have similar functions in XWikiDocument  
> (getObjectDiff, getMetaDataDiff, getContentDiff). I'm not  
> completely sure we should move them to the DiffPlugin.
>
> I have a first prototype of the DiffPlugin (with only strings API)  
> and with that I was able to do a complete diff page for a document  
> (including Objects and MetaData)
>
> Check http://jira.xwiki.org/jira/browse/XWIKI-1162 for the  
> DiffPlugin patch..

I will... I'm trying to focus on the WYSIWYG editor today and the RC4  
release but I'll try to find some time.

>>
>> Also, I think it's critical that in an API we should only use our  
>> own classes/interfaces and no external ones, so I think we should  
>> have our own Difference class and not use JRCS'. It could possibly  
>> wrap it if necessary.
>>
> The wrapping would require quite a lot of cloning work. The result  
> of the JRCS engine is a Delta object which contains a list of  
> Chunks which contains a list of Strings. These objects are quite  
> plain.
> I would not see what to do except completely cloning them and write  
> a copy function from JRCS to XWiki.

What about the Difference object as summarily described above? If  
internally we have JRCS objects, we could always construct our own  
Difference object, no? I think it's really much safer from an API  
point of view as we're not convinced we'll stay with JRCS I think.  
Also (and even more importantly) the interface is meant so that  
someone else can implement a different diff algorithm. I'm not sure  
it would be good to force them to use JRCS objects, especially as we  
don't control JRCS.

>>> Other APIs could be a function to get a complete diff of an  
>>> XWikiDocument (includes objects, attachements), however the  
>>> implementation itself should probably reside in an velocity  
>>> template.
>>
>> The implementation of a document difference should in the plugin I  
>> think. The plugin should only do backend stuff though and return a  
>> list of named differences. I agree it would be up to the vm files  
>> to do the presentation of it (be if for the wiki, for an email to  
>> be sent, etc).
>>
> Ok.. That will deprecate a few functions in XWikiDocument. So the  
> plugin functions should set a few objects un the context  
> representing the diff. The template would then present these  
> different diffs in a nice way.

I haven't looked at the current API. I guess we could keep a  
XWikiDocument getDifferences(XWikiDocument) api in XWikiDocument. It  
would use the Diff plugin.

>>> There is an interesting discussion to have about how the  
>>> representation of the Text and HTML views should be.
>>
>> Yes, that's hard. I'd like to see a wiki markup diff in addition  
>> to the current HTML diff we have as I find our current diff not  
>> very good. I think we need both.
>>
> I don't think you understood exactly what I mean by Text and HTML  
> view. They are both Wiki markup diff but rendered in Text or HTML.  
> A HTML-Diff is more complex as there are risks of failing to  
> generate a valid markup.
> I think we should stick to Wiki Markup diff.

We're talking about the same thing. I also prefer a wiki markup diff.  
I was under the impression that the current diff representation was a  
rendered version of the textual diff.

-Vincent

>>> Any ideas ?
>>>
>>> Another question is wether it is a good idea to put this as a  
>>> plugin. I think yes since it could be use for other things than  
>>> the wiki content.
>>
>> A plugin would be good I think as it means the implementation  
>> becomes pluggable. In the future it would be transformed into a  
>> component but that's the same idea.
>>
>> Thanks
>> -Vincent





More information about the devs mailing list