Hi Marc and Thomas,
I followed your discussion with great interest. I agree that Thomas very light proposal is
good to put in place, since it has almost no negative impact and only benefit. I think
there is also a possibility to mitigate the object issue with something close (check
integrity of what we get, to at least detect an issue), but that's not perfect of
course.
That’s said, I would like to point you to this interesting question on StackOverflow
(
https://stackoverflow.com/questions/22029012/probability-of-64bit-hash-code…)
and remind you that base on the Birthday Paradox, with the released of 4.x, we have raised
our worrying threshold of documents/objects from 65535, to more than 4 billion… and it
took a while (4 versions of XWiki) before we had the strong feeling we need to raise. So,
while before 4.x, the worrying threshold was really low, the effective happening of a
collision was already low.
My own experience was the risk before 4.x was really high with generated names, much hight
than with names use by real user. When I was it by that issue, I remember being really bad
about it. This is also probably why you have raised this thread. The previous hash was too
small and had also a discutable distribution.
The MD5 algorithm like many crypto hashes is particularly well suited for providing a good
distribution (
http://michiel.buddingh.eu/distribution-of-hash-values), the cutting at 64
bits may lower this, but I doubt it would be significant for us. So, personally, I feel
really comfortable with the current implementation, and I think you can sleep in peace as
well.
Just my thought about not raising fears when it’s no more really justified.
Regards,
--
Denis Gervalle
SOFTEC sa - CEO
On 7 Feb 2018, 16:10 +0100, Denis Gervalle <denis.gervalle(a)softec.lu>lu>, wrote:
Hi Marc and Thomas,
I followed your discussion with great interest. I agree that Thomas very light proposal
is good to put in place, since it has almost no negative impact and only benefit. I think
there is also a possibility to mitigate the object issue with something close (check
integrity of what we get, to at least detect an issue), but that's not perfect of
course.
That’s said, I would like to point you to this interesting question on StackOverflow
(
https://stackoverflow.com/questions/22029012/probability-of-64bit-hash-code…) and
remind you that base on the Birthday Paradox, with the released of 4.x, we have raised our
worrying threshold of documents/objects from 65535, to more than 4 billion… and it took a
while (4 versions of XWiki) before we had the strong feeling we need to raise. So, while
before 4.x, the worrying threshold was really low, the effective happening of a collision
was already low.
My own experience was the risk before 4.x was really high with generated names, much
hight than with names use by real user. When I was it by that issue, I remember being
really bad about it. This is also probably why you have raised this thread. The previous
hash was too small and had also a discutable distribution.
The MD5 algorithm like many crypto hashes is particularly well suited for providing a
good distribution (
http://michiel.buddingh.eu/distribution-of-hash-values), the cutting at
64 bits may lower this, but I doubt it would be significant for us. So, personally, I feel
really comfortable with the current implementation, and I think you can sleep in peace as
well.
Just my thought about not raising fears when it’s no more really justified.
Regards,