[xwiki-dev] [Proposal] Document history storage

Ludovic Dubost ludovic at xwiki.com
Tue Jul 24 23:30:06 CEST 2007


Artem Melentyev a écrit :
> Hi.
>
> I implement some of this proposal in XWIKI-1459.
>
> And I want to discuss some problems about it.
>
> 1) Separate diffs.
>  Sergiu propose to store document archive in separate fields (content, 
> metadata, objects, attachments) instead of one field.
>  But it is incompatible with old document history system (If we know 
> one diff for all, it is impossible to understand what field has changed)
>  If we will implement this, we will lose old document history or we 
> will  be needed to write complicated migrator from old history to new.
>
>  Need we save compatibility of 1.0 document history in xwiki-1.1 ?
>
>  I think separate diff will bring more complex than profit and no 
> needed at least in xwiki-platform-1.1.
>
> WDYT?
>
I think we don't need history to be compatible but we need a migration 
path (a script to migrate the previous history).
I'm more and more thinking we should get rid of RCS as the versioning 
system. In the P2P XWiki Project we have been talking about implementing 
a "XWiki Patch" notion because we need it to send it over the P2P 
network for replication. This "XWiki Patch" could be the new minimal set 
of information we need for a version.

Now I think we also need a table of versions to hold some key meta data 
directly available (not as diff) so that we can display it in the 
history page quickly. We could decide to store either the patch (less 
space) or the full XML version in this table (more space but very safe 
and faster).

>
> 2) Fetching strategy.
>
> Now I load all version infos at once and version contents (diff) one by
> one demand (fetching strategy #2).
>
> I see following possible fetching strategies for history storage:
>
> 1. Load all content at once
>  This is bad as old history storage
Currently we have a lazy fetching strategy already except when we need a 
specific version we need to load the full RCS file to be able to 
retrieve it.
>
>
> 2. Load one content by demand and cache (RCSNodeInfo contains 
> softreference to RCSNodeContent)
>  (code: foreach needed versions do getContent(context) )
>  - Many sql requests for first time.
>
> 3. Load list of the needed content per request
>  (hql: from NodeContent where version>=1.2)
>  One sql request per http request but always.
>
> 4. Cache list of latest nodes (from some node to latest node). Make 
> only needed requests and recache.
>  (cache = softref to SortedMap<version, RCSNodeContent>,
>  If not finded in cache - fetch by hql (where version>=1.2 and 
> version<=2.3) )
>  I think it is the best fetching strategy concerning performance.
>
> 5. Something else?
>
> What fetching strategy is best for history storage?
>
We could decide to store the full document every 10 versions and store 
only the patch (RCS or new XWiki Patch) for each intermediary version..
This would mean that to retrieve any version you need one full version + 
10 nodes..

It would be great to work on the new "XWiki Patch" system since it is 
needed for the P2P. What we discussed at the meeting was a language like:

ins(content,6,'Hello')   =  insert in field 'content'  at char 6 the 
text 'Hello'
del(content,6,5) = delete 5 char from field content starting at char 6
set(author,'XWiki.LudovicDubost' = set author field to XWiki.LudovicDubost
setObjectProperty('XWiki.ArticleClass',0,'propname','propvalue')
insObjectProperty('XWiki.ArticleClass',0,'propname',6,'propvalue')

etc...

Ludovic


> Any comments about XWIKI-1459 also welcome.
>
> Sergiu Dumitriu wrote:
>> Hi,
>>
>> Sometime ago, there was a discussion regarding how should the document
>> history be stored in a better way.
>>
>> Right now, the complete history is stored as one field in the xwikidoc
>> table. From my PoV, this has some major disadvantages:
>> - loading an older version requires parsing all the history -> memory
>> inefficiency
>> - as the documents grow older, loading a document will take a lot of
>> time ->
>> time inefficiency
>> - queries on archives cannot return just one version, but they match the
>> whole document (somewhere in the history, there was a version containing
>> "search term")
>>
>> The blocking issue with storing old version in a different table was, at
>> that time, the fact that a document archive should contain all 
>> information
>> needed for completely restoring the document, like content, metadata,
>> objects.
>>
>> I don't think that is actually an issue. We are archiving document
>> versions,
>> but we're joining all versions in one large string. Why don't we archive
>> the
>> complete version, but one version per row?
>>
>> So, the archive table should look like:
>> - document name
>> - version number
>> - language (for translations)
>> - content
>> - archived metadata (one field, or the same fields as in xwikidoc)
>> - archived objects (one field)
>> - attachment names and versions
>> It is not like storing the version as a normal document is, with 
>> separate
>> objects and properties, but at least it provides a better storage and
>> retrieval mechanism, and it separates a bit the parts of a 
>> wikidocument -
>> content, metadata, objects.
>


-- 
Ludovic Dubost
Blog: http://www.ludovic.org/blog/
XWiki: http://www.xwiki.com
Skype: ldubost GTalk: ldubost 
AIM: nvludo Yahoo: ludovic

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ludovic.vcf
Type: text/x-vcard
Size: 286 bytes
Desc: not available
Url : http://lists.xwiki.org/pipermail/devs/attachments/20070724/9fa0b8aa/attachment.vcf 


More information about the devs mailing list