At the moment, we're using MyersDiff from jrcs to compute the difference between two revisions for storing a diff. However, this algorithm needs time and space O(ND) where N is the sum of the lengths of the compared document revisions and D is the length of a minimum edit script between them. Therefore, in the worst case, storing a revision can be quadratic in the size of the document. See also XWIKI-21224 for a similar issue with slow diff algorithms. When storing a revision, there is no need to store a human-readable diff. Therefore, we should instead use an algorithm that needs linear time and space like XDelta for storing the difference between two revisions. Unfortunately, I couldn't find a maintained Java implementation. Bentley-McIlroy as implemented by Google's Open VCDiff seems an alternative that has a Java port that doesn't look completely abandoned (an update a year ago). See also https://ably.com/blog/practical-guide-to-diff-algorithms for a discussion of different diff algorithms. An alternative could be to limit the time spent on computing the diff and if it takes too long, just store a full revision. |