[xwiki-dev] [Proposal] Document history storage

Ludovic Dubost ludovic at xwiki.com
Thu Mar 1 22:38:44 CET 2007


Hi,

I think it's a good idea to have a versions table. One thing I'm not 
sure of is wether this table should hold the master information or just 
a cache for the information stored in the revision. If it is a cache it 
could not have all the info but just the most important one.
What I'm worried about is the volume of information when there are many 
changes. Suppose we get a comment spam of 500 comments. The JRCS 
revision system will only add you the actual spam. If you have the 
archived info in the table system you get 500 times the size of the 
document. And how will you export the whole document including archives. 
Would you use RCS or would you have the whole history inside an XML 
field inside the document.

One downside of RCS is that you need to parse the whole RCS document to 
get the version. But we could solve this by cutting the RCS file in 
chunks of 50 versions so that we get faster retrieval. It's true that 
this is a little painfull to code.
The cache table with the most important metadata (version, date, author, 
comment) would allow to have what we need for getting information about 
contributors and number of contributions, retrieving comments at edit time.

Ludovic

Sergiu Dumitriu a écrit :
> Hi,
>
> Sometime ago, there was a discussion regarding how should the document 
> history be stored in a better way.
>
> Right now, the complete history is stored as one field in the xwikidoc 
> table. From my PoV, this has some major disadvantages:
> - loading an older version requires parsing all the history -> memory 
> inefficiency
> - as the documents grow older, loading a document will take a lot of 
> time -> time inefficiency
> - queries on archives cannot return just one version, but they match 
> the whole document (somewhere in the history, there was a version 
> containing "search term")
>
> The blocking issue with storing old version in a different table was, 
> at that time, the fact that a document archive should contain all 
> information needed for completely restoring the document, like 
> content, metadata, objects.
>
> I don't think that is actually an issue. We are archiving document 
> versions, but we're joining all versions in one large string. Why 
> don't we archive the complete version, but one version per row?
>
> So, the archive table should look like:
> - document name
> - version number
> - language (for translations)
> - content
> - archived metadata (one field, or the same fields as in xwikidoc)
> - archived objects (one field)
> - attachment names and versions
> It is not like storing the version as a normal document is, with 
> separate objects and properties, but at least it provides a better 
> storage and retrieval mechanism, and it separates a bit the parts of a 
> wikidocument - content, metadata, objects.
>
> WDYT?
>
> -- 
> http://purl.org/net/sergiu
> ------------------------------------------------------------------------
>
>
> --
> You receive this message as a subscriber of the xwiki-dev at objectweb.org mailing list.
> To unsubscribe: mailto:xwiki-dev-unsubscribe at objectweb.org
> For general help: mailto:sympa at objectweb.org?subject=help
> ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
>   


-- 
Ludovic Dubost
Blog: http://www.ludovic.org/blog/
XWiki: http://www.xwiki.com
Skype: ldubost GTalk: ldubost 
AIM: nvludo Yahoo: ludovic





More information about the devs mailing list