Hi Denis
I personaly think, that this is not a discussion about propabilities but
about fail-safe and robust implementation.
I understand that you do not want to implement a logic to the id generation
which can mitigate collisions
(e.g. collision-bit), because the probability that this logic ever solves a
problem is "low" and the code must
be maintained. I agree that this is a question about probabilities if this
feature is useful and necessary.
Yet, I think if anybody hits a collision, which still may happen with
enough bad luck, than this person wants
the system to behave robust and fail-safe and that it never ever corrupts
stored data. IMHO this is therefore
not a question of probabilites but a question of fail-safe implementation
and thus a question of software quality.
Only my 5cents.
Regards,
Fabian
2018-02-08 9:24 GMT+01:00 Denis Gervalle <dgl(a)softec.lu>lu>:
Hi Marc and Thomas,
I followed your discussion with great interest. I agree that Thomas very
light proposal is good to put in place, since it has almost no negative
impact and only benefit. I think there is also a possibility to mitigate
the object issue with something close (check integrity of what we get, to
at least detect an issue), but that's not perfect of course.
That’s said, I would like to point you to this interesting question on
StackOverflow (
https://stackoverflow.com/questions/22029012/
probability-of-64bit-hash-code-collisions) and remind you that base on
the Birthday Paradox, with the released of 4.x, we have raised our worrying
threshold of documents/objects from 65535, to more than 4 billion… and it
took a while (4 versions of XWiki) before we had the strong feeling we need
to raise. So, while before 4.x, the worrying threshold was really low, the
effective happening of a collision was already low.
My own experience was the risk before 4.x was really high with generated
names, much hight than with names use by real user. When I was it by that
issue, I remember being really bad about it. This is also probably why you
have raised this thread. The previous hash was too small and had also a
discutable distribution.
The MD5 algorithm like many crypto hashes is particularly well suited for
providing a good distribution (
http://michiel.buddingh.eu/
distribution-of-hash-values), the cutting at 64 bits may lower this, but
I doubt it would be significant for us. So, personally, I feel really
comfortable with the current implementation, and I think you can sleep in
peace as well.
Just my thought about not raising fears when it’s no more really justified.
Regards,
--
Denis Gervalle
SOFTEC sa - CEO
On 7 Feb 2018, 16:10 +0100, Denis Gervalle <denis.gervalle(a)softec.lu>lu>,
wrote:
Hi Marc and Thomas,
I followed your discussion with great interest. I agree that Thomas very
light
proposal is good to put in place, since it has almost no negative
impact and only benefit. I think there is also a possibility to mitigate
the object issue with something close (check integrity of what we get, to
at least detect an issue), but that's not perfect of course.
That’s said, I would like to point you to this interesting question on
StackOverflow (
https://stackoverflow.com/questions/22029012/
probability-of-64bit-hash-code-collisions) and remind you that base on
the Birthday Paradox, with the released of 4.x, we have raised our worrying
threshold of documents/objects from 65535, to more than 4 billion… and it
took a while (4 versions of XWiki) before we had the strong feeling we need
to raise. So, while before 4.x, the worrying threshold was really low, the
effective happening of a collision was already low.
My own experience was the risk before 4.x was really high with generated
names,
much hight than with names use by real user. When I was it by that
issue, I remember being really bad about it. This is also probably why you
have raised this thread. The previous hash was too small and had also a
discutable distribution.
The MD5 algorithm like many crypto hashes is particularly well suited
for
providing a good distribution (
http://michiel.buddingh.eu/
distribution-of-hash-values), the cutting at 64 bits may lower this, but
I doubt it would be significant for us. So, personally, I feel really
comfortable with the current implementation, and I think you can sleep in
peace as well.
Just my thought about not raising fears when it’s no more really
justified.
Regards,