Hi Sergiu,
Great thanks for your feedback, I am no more alone in the dark ;)
On Thu, Sep 22, 2011 at 02:02, Sergiu Dumitriu <sergiu(a)xwiki.com> wrote:
On 09/20/2011 01:32 PM, Denis Gervalle wrote:
Since nobody seems to have remark nor opinion, I
have started to do A),
but
it was still insufficient for my requirements. So
I have now done B)
without
fallback using MD5, and if no one stop me, I will
probably do C) soon.
Could you easily switch to SHA-1 instead?
I can, but I do not feel this could be a huge improvement since we keep only
the lower 64bits.
Since oldcore already use MD5 for other stuffs (login persistence and
hashcoding of document content for comparison), I would prefer not to
introduce new dependencies on a different algorithm without a real need.
Since the storage is entirely broken by this
change, I need the startup
to
be stopped on migration failure and migration to
be mandatory. Migration
error was ignored until now, but this is wrong IMO. It would means that
the
code was both backward compatible with old data
and failsafe to any
migration failure. Moreover, Migration could also be disabled by config,
but
this is not really safe IMO.
I agree that migration failure should stop the wiki from starting,
provided the failure is very clear about error, not just a dumb
"database bad, good bye, and good luck!".
The failure is in two steps:
1) a "migration failure on database x" with the exception returned by the
migrator attached
2) a "could not continue due to x migration failures"
I don't fully agree that they should be mandatory, though. If the
sysadmin is competent enough to disable migrations, since they are
enabled by default, he should be able to decide when to run them and
when to disable them. It should be known that during an upgrade
migrations should be run, but it's better for performance to disable
this extra check that could take a while on large farms.
Probably that I should have been clearer, but it means for me a simple step
which ensure that the database is in a version higher enough for being
supported by the running core. This means that even if migration are
disabled (meaning I do not want my database to be migrated without my
explicit authorization), we still need to check migration, and stop if
anyone have not been applied. I agree that this could have a performance
issue at startup of large farms, maybe we could delay the check until a real
wiki is set to a given database (I have not check if this is complex or
not). What afraid me is the following:
1) admin has disabled migration
2) admin upgrade XWiki, and reuse its old config
3) admin start the new version, which use changed id, and the wiki start
creating many duplicate document like in an empty DB.
4) the only option of admin, if he understand what happened is to restore
all wiki that have been accessed during 3)
Currently I
have added code that throws if migrations fails, preventing
the
wiki to be started. Should I improve further ?
could we expect user to
take
care to migration requirement, or should we have
mandatory migration and
optional one ?
I don't think that there are any optional migrations currently...
They're all important.
That was my feeling, simply some has less consequences than others when not
applied.
I am also wondering if I should target to commit
in release 3.2 or 3.3
since
the name of the migrator and the db version will
depends on this.
We're getting close to the 3.2 release, so the codebase should be moving
towards a lower entropy point. If you fully trust your changes and can
commit them during the next couple of days, you have my +0.
The time to commit depends mainly on this discussion.
I trust my code but a review could be welcome.
Hoping to have
some comments this time.
On Mon, Sep 19, 2011 at 12:54, Denis Gervalle<dgl(a)softec.lu> wrote:
Hi Devs,
Since I fall on this well-know XWiki caveat recently, I would like to
improve
this.
Currently, XWikiDocument.getId() is almost
equivalent to
String.hashcode(fullName:lang). Since this is a really poor hashing
method
for small changes, the risk that two documents
with similar names of same
length are colliding is really high. This Id is used by the Hibernate
store
for document unicity and really needs to be
unique, and at most a
64bit-numeric at the same time. Currently we use only 32bits since java
hashcode are limited to 32bits integers.
The ideal would be not to have this link between
ids and documents
fullName:lang, but converting the current implementation is not
really
easy.
This is probably why XWIKI-4396 has never been
fixed. Therefore, my
current
goal is to reduce the likeliness of a collision
by choosing a better
hashing
method and taking into account the fact that
document fullname are short
string and the number of unique ids required are very limited (since the
unicity is only expected for a given XWiki database) compared to the
64bits
integrer range.
So we need to choose a better algorithm, and here
are IMHO the potential
options:
> A) use a simple but more efficient non-cryptographic hashing function
that
runs on 64bits, I was thinking about using the
algorithm produced by
Professor Daniel J. Bernstein (DJB) since it is well-know, wildly used,
easy
to implement algorithm with a good distribution
on small strings.
> Pro: no dependency; fast; 64bits better than hashcode
> Cons: probably more risk of collision compare to MD5 or SHA, but far
less
than now; require db migration of all document
keys
B) use an MD5 or even stronger SHA1 or SHA256
algorithm from JCA,
truncating to the lower 64bits. Note that oldcore already use
MDA5 for
hashcoding a whole XWikiDocument to provide the API with a
getVersionHashcode(), and for the validation hash used by the persistent
login manager. The first use Object.hashcode() as a fallback, which is
really bad and defeat the purpose. The second does not provide any
fallback
and may fail unexpectedly. For our case, if we
really want a fallback, we
needs to store the hashing algorithm used in the database at creation
time,
and anyway, fail when it was not available.
Pro: already used in oldcore, probably less
collision; with fallback,
really flexible since it would be possible to choose the
algorithm at
creation time and does not require full migration for existing database.
Cons: require at least a DB schema change to add
the hashing algorithm,
probably as a column of xwikidbversion; if this config value
is broken,
the
whole DB is broken
C) use our own MD5 implementation when the JCA
provider is missing it. I
was thinking about integrating something like
http://twmacinta.com/myjava/fast_md5.php (non-native version) which is
LGPL.
This will ensure availability of the hashing
algorythm while having a
rather
strong one.
Pro: no dependency, could also provide MD5 to
getVersionHashcode and the
persistent manager
I don't think C is needed, we depend on cryptography in many other
places, like the password hashes, not to mention the whole crypto module.
> Cons: require db migration of all document
keys
> A) is really quick to implement, simple, and the less risky, but some
may
found it insufficient. Caleb ?
Obviously, B) with fallback is really nice, but I
wonder if it is not
overkill ?
I am worried about the B) without fallback, but
maybe I want it too
flawless
> C) is rather solid, while staying simple, but maybe overkill too.
> I am really undecided.
> WDYT ?
>
--
Sergiu Dumitriu
http://purl.org/net/sergiu/
_______________________________________________
devs mailing list
devs(a)xwiki.org
http://lists.xwiki.org/mailman/listinfo/devs
--
Denis Gervalle
SOFTEC sa - CEO
eGuilde sarl - CTO