[xwiki-dev] [Proposal] Document history storage

Sergiu Dumitriu sergiu.dumitriu at gmail.com
Fri Mar 2 09:13:24 CET 2007


To clarify one misunderstanding: the attachments are not stored, just the
attachments' name and version (number). AFAIK, the attachment history is
stored separately.

I know that it is not so efficient to store the complete document, with
content and objects, even if there is a small change like a comment added.
But this is how it is done now too, and this is a change that tries to do
better, not perfect.


On 3/1/07, Ludovic Dubost <ludovic at xwiki.com> wrote:
>
>
> Hi,
>
> I think it's a good idea to have a versions table. One thing I'm not
> sure of is whether this table should hold the master information or just
> a cache for the information stored in the revision. If it is a cache it
> could not have all the info but just the most important one.
> What I'm worried about is the volume of information when there are many
> changes. Suppose we get a comment spam of 500 comments. The JRCS
> revision system will only add you the actual spam. If you have the
> archived info in the table system you get 500 times the size of the
> document. And how will you export the whole document including archives.
> Would you use RCS or would you have the whole history inside an XML
> field inside the document.
>
> One downside of RCS is that you need to parse the whole RCS document to
> get the version. But we could solve this by cutting the RCS file in
> chunks of 50 versions so that we get faster retrieval. It's true that
> this is a little painfull to code.
> The cache table with the most important metadata (version, date, author,
> comment) would allow to have what we need for getting information about
> contributors and number of contributions, retrieving comments at edit
> time.
>
> Ludovic
>
> Sergiu Dumitriu a écrit :
> > Hi,
> >
> > Sometime ago, there was a discussion regarding how should the document
> > history be stored in a better way.
> >
> > Right now, the complete history is stored as one field in the xwikidoc
> > table. From my PoV, this has some major disadvantages:
> > - loading an older version requires parsing all the history -> memory
> > inefficiency
> > - as the documents grow older, loading a document will take a lot of
> > time -> time inefficiency
> > - queries on archives cannot return just one version, but they match
> > the whole document (somewhere in the history, there was a version
> > containing "search term")
> >
> > The blocking issue with storing old version in a different table was,
> > at that time, the fact that a document archive should contain all
> > information needed for completely restoring the document, like
> > content, metadata, objects.
> >
> > I don't think that is actually an issue. We are archiving document
> > versions, but we're joining all versions in one large string. Why
> > don't we archive the complete version, but one version per row?
> >
> > So, the archive table should look like:
> > - document name
> > - version number
> > - language (for translations)
> > - content
> > - archived metadata (one field, or the same fields as in xwikidoc)
> > - archived objects (one field)
> > - attachment names and versions
> > It is not like storing the version as a normal document is, with
> > separate objects and properties, but at least it provides a better
> > storage and retrieval mechanism, and it separates a bit the parts of a
> > wikidocument - content, metadata, objects.
> >
> > WDYT?
> >
>

-- 
http://purl.org/net/sergiu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xwiki.org/pipermail/devs/attachments/20070302/fd55fca7/attachment.html 


More information about the devs mailing list