Re: [xwiki-devs] [VOTE] Change document id stored in the database to reduce the likelihood of duplicate id

9 Jan 2012

On Jan 9, 2012, at 11:09 AM, Denis Gervalle wrote:
...
  On Mon, Jan 9, 2012 at 10:07, Vincent Massol
&lt;vincent(a)massol.net&gt; wrote:
  +1 with the following caveats:
 * We need to guarantee that a migration cannot corrupt the DB. 
 The evolution of the migration was the first steps in that procedure, since
 accessing a DB with an inappropriate XWiki core could have corrupt it.
  For example imagine that we change a document id
but this id is also used
 in some other tables and the migration stops before it's changed in the
 other tables. The change needs to be done in transactions for each doc
 being changed across all tables. 
 That would be nice, but MySQL does not support transaction on ISAM table.
 I use a single transaction for the whole migration process, 
I think we should have one transaction per document update instead. We've had this
problem in the past when upgrading very large systems. The migration was never going
through in one go for some reason which I have forgotten so we had needed to use several
tx so that the migrations could be restarted when it failed and so that it could complete.
...
  so on systems
 that support it (Oracle ?), there will be migration or not. But I could not
 secure MySQL better that it is possible to. 
It should work fine on MySQL with InnoDB which recommend (see
http://platform.xwiki.org/xwiki/bin/view/AdminGuide/InstallationMySQL).
Thanks
-Vincent
...
   Said
differently the migrator should be allowed to be ctrl-c-ed at any
 time and you safely restart xwiki and the migrator will just carry on from
 where it was.

 The migrator will restart were it left-off, but the granularity is the
 document. I proceed the updates by documents, updating all tables for each
 one. If there is some issue during the migration let say on MySQL, and it
 is restarted, it will start again skipping documents that have been
 converted previously. So the corruption could be limited to a single
 document.
  * OR we need to have a configuration parameter
for deciding to run this
 migration or not so that users run it only when they decide thus ensuring
 that they've done the proper backups and saving of DBs.

 This is true using the new migration procedure, but not as flexible as you
 seems to expect. Supporting two hashing algorithm is not a feature, but an
 augmented risk of causing corruption for me.
 Now, if you use a recent core, that use new id, and on the other side, you
 have not activated migrations and access an old db, you will simply be
 unable to access the database. You will receive a "db require migration"
 exception.
 Anyway, migration are disable by default, and should be enabled by an
 administrator in xwiki.cfg. The release notes will mention the needs to
 proceed to it, and of course, to make a backup before. And you are always
 supposed to have backup when you upgrade, or you are not a system admin ;)
  I prefer the first option but we need to
guarantee it.

 We will never be able to guarantee it, but I have done my best to have it
 the most secure.

 Thanks
 -Vincent
 On Jan 7, 2012, at 10:39 PM, Denis Gervalle wrote:
  Now that the database migration mechanism has
been improved, I would like
 to go ahead with my patch to improve document ids.
 Currently, ids are simple string hashcode of a locally serialized  document
  reference, including the language for translated
documents. The  likelihood
  of having duplicates with the string hashing
algorithm of java is really
 high.
 What I propose is:
 1) use an MD5 hashing which is particularly good at distributing.
 2) truncate the hash to the first 64bits, since the XWD_ID column is a
 64bit long.
 3) use a better string representation as the source of hashing
 Based on previous discussion, point 1) and 2) has already been agreed,  and
  this vote is in particular about the string used
for 3).
 I propose it in 2 steps:
 1) before locale are fully supported in document reference, use this
 format:

<lengthOfLastSpaceName>:<lastSpaceName><lengthOfDocumentName>:<documentName><lengthOfLanguage>:<language>
    where language would be an empty string for the
default document, so  it
  would look like 7:mySpace5:myDoc0: and its french
translation could be
 7:mySpace5:myDoc2:fr
 2) when locale are included in reference, we will replace the
 implementation by a reference serializer that would produce the same kind
 of representation, but that will include all spaces (not only the last
 one), to be prepared for the future.
 While doing so, I also propose to fix the cache key issue by using the  same
  reference, but prefixed by
<lengthOfWikiName>:<wikiName>, so the previous
 examples will have the following key in the document cache:
 5:xwiki7:mySpace5:myDoc0: and 5:xwiki7:mySpace5:myDoc2:fr
 Using such a key (compared to the usual serialization) has the following
 advantages:
 - ensure uniqueness of the reference without requiring a complex escaping
 algorithm, which is unneeded here.
 - potentially reversible
 - faster than the usual serialization
 - support language
 - independent of the current serialization that may evolved  independently,
  so it will be stable over time which is really
important when it is used  as
  a base for the hashing algorithm used for
document ids stored in the
 database.
 I would like to introduce this as early as possible, which means has soon
 has we are confident with the migration mechanism recently introduced.
 Since the migration of ids will convert 32bits hashes into 64bits ones,  the
  risk of collision is really low, and to be
careful, I have written a
 migration algorithm that would support such collision (unless it cause a
 circular reference collision, but this is really unexpected). However,
 changing ids again later, if we change our mind, will be really more  risky
  and the migration difficult to implements, so it
is really important that
 we agree on the way we compute these ids, once for all.
 Here is my +1,
 --
 Denis Gervalle  _______________________________________________
 devs mailing list
 devs(a)xwiki.org
 http://lists.xwiki.org/mailman/listinfo/devs

 --
 Denis Gervalle
 SOFTEC sa - CEO
 eGuilde sarl - CTO
 _______________________________________________
 devs mailing list
 devs(a)xwiki.org
 http://lists.xwiki.org/mailman/listinfo/devs 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [xwiki-devs] [VOTE] Change document id stored in the database to reduce the likelihood of duplicate id