Re: [xwiki-dev] [Proposal] Document history storage

23 Jul 2007

Hi.
I implement some of this proposal in XWIKI-1459.
And I want to discuss some problems about it.
1) Separate diffs.
  Sergiu propose to store document archive in separate fields (content,
metadata, objects, attachments) instead of one field.
  But it is incompatible with old document history system (If we know
one diff for all, it is impossible to understand what field has changed)
  If we will implement this, we will lose old document history or we
will  be needed to write complicated migrator from old history to new.
  Need we save compatibility of 1.0 document history in xwiki-1.1 ?
  I think separate diff will bring more complex than profit and no
needed at least in xwiki-platform-1.1.
WDYT?
2) Fetching strategy.
Now I load all version infos at once and version contents (diff) one by
one demand (fetching strategy #2).
I see following possible fetching strategies for history storage:
1. Load all content at once
  This is bad as old history storage
2. Load one content by demand and cache (RCSNodeInfo contains
softreference to RCSNodeContent)
  (code: foreach needed versions do getContent(context) )
  - Many sql requests for first time.
3. Load list of the needed content per request
  (hql: from NodeContent where version>=1.2)
  One sql request per http request but always.
4. Cache list of latest nodes (from some node to latest node). Make only
needed requests and recache.
  (cache = softref to SortedMap<version, RCSNodeContent>,
  If not finded in cache - fetch by hql (where version>=1.2 and
version<=2.3) )
  I think it is the best fetching strategy concerning performance.
5. Something else?
What fetching strategy is best for history storage?
Any comments about XWIKI-1459 also welcome.
Sergiu Dumitriu wrote:
...
  Hi,
 Sometime ago, there was a discussion regarding how should the document
 history be stored in a better way.
 Right now, the complete history is stored as one field in the xwikidoc
 table. From my PoV, this has some major disadvantages:
 - loading an older version requires parsing all the history -> memory
 inefficiency
 - as the documents grow older, loading a document will take a lot of
 time ->
 time inefficiency
 - queries on archives cannot return just one version, but they match the
 whole document (somewhere in the history, there was a version containing
 "search term")
 The blocking issue with storing old version in a different table was, at
 that time, the fact that a document archive should contain all information
 needed for completely restoring the document, like content, metadata,
 objects.
 I don't think that is actually an issue. We are archiving document
 versions,
 but we're joining all versions in one large string. Why don't we archive
 the
 complete version, but one version per row?
 So, the archive table should look like:
 - document name
 - version number
 - language (for translations)
 - content
 - archived metadata (one field, or the same fields as in xwikidoc)
 - archived objects (one field)
 - attachment names and versions
 It is not like storing the version as a normal document is, with separate
 objects and properties, but at least it provides a better storage and
 retrieval mechanism, and it separates a bit the parts of a wikidocument -
 content, metadata, objects. 
--
   Artem Melentyev

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [xwiki-dev] [Proposal] Document history storage