1) admin has disabled migration
2) admin upgrade XWiki, and reuse its old config
3) admin start the new version, which use changed id, and the wiki start
creating many duplicate document like in an empty DB.
4) the only option of admin, if he understand what happened is to restore
all wiki that have been accessed during 3)
Currently
I have added code that throws if migrations fails, preventing
the
wiki to be started. Should I improve further ?
could we expect user to
take
care to migration requirement, or should we have
mandatory migration and
optional one ?
I don't think that there are any optional migrations currently...
They're all important.
That was my feeling, simply some has less consequences than others when not
applied.
I am also wondering if I should target to commit
in release 3.2 or 3.3
since
the name of the migrator and the db version will
depends on this.
We're getting close to the 3.2 release, so the codebase should be moving
towards a lower entropy point. If you fully trust your changes and can
commit them during the next couple of days, you have my +0.
The time to commit depends mainly on this discussion.
I trust my code but a review could be welcome.
Hoping to
have some comments this time.
On Mon, Sep 19, 2011 at 12:54, Denis Gervalle<dgl(a)softec.lu> wrote:
>
> Hi Devs,
> Since I fall on this well-know XWiki caveat recently, I would like to
improve this.
> Currently, XWikiDocument.getId() is almost equivalent to
String.hashcode(fullName:lang). Since this is a really poor hashing
method
for small changes, the risk that two documents
with similar names of
same
length are colliding is really high. This Id is
used by the Hibernate
store
for document unicity and really needs to be
unique, and at most a
64bit-numeric at the same time. Currently we use only 32bits since java
hashcode are limited to 32bits integers.
> The ideal would be not to have this link between ids and documents
fullName:lang, but converting the current implementation is not really
easy.
This is probably why XWIKI-4396 has never been
fixed. Therefore, my
current
goal is to reduce the likeliness of a collision
by choosing a better
hashing
method and taking into account the fact that
document fullname are short
string and the number of unique ids required are very limited (since the
unicity is only expected for a given XWiki database) compared to the
64bits
integrer range.
> So we need to choose a better algorithm, and here are IMHO the
potential
options:
> A) use a simple but more efficient non-cryptographic hashing function
that
runs on 64bits, I was thinking about using the
algorithm produced by
Professor Daniel J. Bernstein (DJB) since it is well-know, wildly used,
easy
to implement algorithm with a good distribution
on small strings.
> Pro: no dependency; fast; 64bits better than hashcode
> Cons: probably more risk of collision compare to MD5 or SHA, but far
less
than now; require db migration of all document
keys
> B) use an MD5 or even stronger SHA1 or SHA256 algorithm from JCA,
truncating to the lower 64bits. Note that oldcore already use MDA5 for
hashcoding a whole XWikiDocument to provide the API with a
getVersionHashcode(), and for the validation hash used by the persistent
login manager. The first use Object.hashcode() as a fallback, which is
really bad and defeat the purpose. The second does not provide any
fallback
and may fail unexpectedly. For our case, if we
really want a fallback,
we
needs to store the hashing algorithm used in the
database at creation
time,
and anyway, fail when it was not available.
> Pro: already used in oldcore, probably less collision; with fallback,
really flexible since it would be possible to choose the algorithm at
creation time and does not require full migration for existing database.
> Cons: require at least a DB schema change to add the hashing algorithm,
probably as a column of xwikidbversion; if this config value is broken,
the
whole DB is broken
> C) use our own MD5 implementation when the JCA provider is missing it.
I
was thinking about integrating something like
http://twmacinta.com/myjava/fast_md5.php (non-native version) which is
LGPL.
This will ensure availability of the hashing
algorythm while having a
rather
strong one.
> Pro: no dependency, could also provide MD5 to getVersionHashcode and
the
persistent manager
I don't think C is needed, we depend on cryptography in many other
places, like the password hashes, not to mention the whole crypto module.
> Cons: require db migration of all document
keys
> A) is really quick to implement, simple, and the less risky, but some
may
found it insufficient. Caleb ?
> Obviously, B) with fallback is really nice, but I wonder if it is not
overkill ?
> I am worried about the B) without fallback, but maybe I want it too
flawless
> C) is rather solid, while staying simple, but maybe overkill too.
> I am really undecided.
> WDYT ?
>
--
Sergiu Dumitriu
http://purl.org/net/sergiu/
_______________________________________________
devs mailing list
devs(a)xwiki.org
http://lists.xwiki.org/mailman/listinfo/devs