Re: [xwiki-devs] BaseObject ID collisions

2 Feb 2018

Hi Denis,
Thanks a lot for your answer. I know it's been a while, but I'd still like
to follow up on it since it's quite the fundamental issue.
...
  Therefore, by improving the hash algorithm, the size
of the ids, and the quality of the hashed key, we have considered ourselves to be
saved enough
for a normal usage.
Still, with enough bad luck, documents and objects may be overwritten
without a trace. This is not a stable implementation. And even worse, if on
any XWiki installation hash collisions will happen in the future (or have
already happened since 4.x), they probably won't be easily associated with
this issue because it's nearly impossible to debug.
While I do now understand the motivation to stick with hashes, I'm still
not sure why a collision detection would be difficult to introduce and why
it's even "impossible for some API". Let me briefly outline an idea:
In XWikiHibernateStore#saveXWikiDoc on L615
<https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/xwiki-platform-core/xwiki-platform-oldcore/src/main/java/com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
an exists check on the doc id is already performed. If now
xwikidoc.fullName is also selected in the HQL, a comparison to
doc.getDocumentReference() can expose an imminent collision before data is
overwritten. At least an XWikiException should be thrown in this case. A
similar thing could be done before saving BaseObjects on L1203
<https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/xwiki-platform-core/xwiki-platform-oldcore/src/main/java/com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
to avoid collisions on Object IDs.
I don't think a change like this would be difficult to implement, I could
provide a PR of that sort. The performance penalty has to be tested for
your systems though, since the full name isn't indexed afaik.
Regards
Marc Sladek
synventis gmbh
On 30 November 2017 at 15:21, Denis Gervalle &lt;dgl(a)softec.lu&gt; wrote:
...
  Hi Marc,
 Here are some answers:
 1) MD5 was already a dependency of our oldcore and using SHA1 would have
 added a dependency without bringing much benefit. Since we only used 64
 bits of the MD5 anyway, I doubt using SHA1 would have provided a better
 distribution.
 2) Such a collision detection is difficult to be introduced in the
 existing code base, for some API it is even impossible. What you experience
 with the 32-bit ids had been my motivation to the changes in 4.x and I
 could say, based on my long XWiki experience, that even with the poor
 32 bit ids, very few users had been affected. Therefore, by improving the
 hash algorithm, the size of the ids, and the quality of the hashed key, we
 have considered ourselves to be saved enough for a normal usage.
 3) That’s the worst point. I cannot answer about the first decision, I
 wasn’t yet involve, but regarding the changes introduced in 4.0,  a change
 had been considered. The ids are only there to satisfy Hibernate and its
 loading mechanism. If we had used a counter, we had to manage a conversion
 table between ids and entity references with all the additional complexity
 (consistency issues, caching, ...). This is so because we use entity
 reference to point directly to document (or even objects) everywhere in
 XWiki. This would have been a huge work to introduce that behaviour and at
 the same time keeping all the existing API unchanged. It would probably
 have introduced a performance penalty as well. This is why we resigned and
 go for an improved hash solution. IMO, if we had to make such a change, we
 are even better rewriting the storage service completely, and even stop
 using Hibernate, which, to be honest, does not bring much benefit to
 XWiki with its ORM aspects.
 But if you really want the complete answers, you can look at those threads:
 http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
 http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
 Regards,
 --
 Denis Gervalle
 SOFTEC sa - CEO
 On 30 Nov 2017, 14:14 +0100, Marc Sladek &lt;marc.sladek(a)synventis.com&gt;om>,
 wrote:
  Dear XWiki devs
 We are using the XWiki platform for our applications but sadly are still
 stuck with 2.7.2. Lately we ran into issues on a large database and  noticed
  "disappearing" BaseObjects. We were
able to link it to XWIKI-6990
 <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs collided
 (hash collisions) and overwrote other objects without any trace - neither
 visible in the history nor in a log file.
 We analysed your implemented solution from 4.0+ in XWikiDocument
 <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x 
wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
 xpn/xwiki/doc/XWikiDocument.java#L841
  and BaseElement
 <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x 
wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
 xpn/xwiki/objects/BaseElement.java#L237
  and
 noticed that you changed the 32bit String#hashCode to 64bit MD5, which
 makes a collision less likely. I have a few questions regarding your
 solution:
 1) Is there any specific reason why you have chosen MD5 over SHA-1 or 2?
 2) Collisions are still possible and would be extremely hard to notice
 since they are completely silent. Have you considered to implement a
 collision detection to at least log occurring collisions - or even better
 reserve 1-2bits of the 64bit to be used as collision counter in the case  of
  it happening?
 3) To question the concept of generating a hash for an ID in general:
 Wouldn't a database defined "auto increment" be a much more robust 
solution
  for the hibernate IDs? A collision would be
impossible and
 clustering/scalability is still possible with e.g. the InnoDB  “interleaved”
  autoincrement lock mode. Why have you chosen a
hash based solution in the
 first place?
 I'm sorry if these questions were already answered in the dev mailing  list
  or on issues, please link me to them since I
couldn't find any concrete
 answers.
 Thanks for your time and regards
 Marc Sladek
 synventis gmbh

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [xwiki-devs] BaseObject ID collisions