Re: [xwiki-devs] BaseObject ID collisions

2 Feb 2018

Introducing this certainly doesn't hurt, but 'm not sure how useful it is.
Firstly, it can show collisions only after a document is already
overwritten, thus the damage is already done. Secondly, loadXWikiDoc has to
be called for the document which doesn't exist anymore, I guess this
doesn't happen so often since the system won't list it anymore.
On 2 February 2018 at 13:36, Thomas Mortagne &lt;thomas.mortagne(a)xwiki.com&gt;
wrote:
...
  For document what could help a lot already without any
performance
 penalty is to compare the loaded document reference and the passed one
 in XWikiHibernateStore#loadXWikiDoc. That's because most of the code
 in XWiki apply the following logic: getDocument(), modify it,
 saveDocument().
 On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek &lt;marc.sladek(a)synventis.com&gt;
 wrote:
  Hi Denis,
 Thanks a lot for your answer. I know it's been a while, but I'd still  like
  to follow up on it since it's quite the
fundamental issue.
  Therefore, by improving the hash algorithm, the
size of the ids, and the  quality of the hashed key, we have considered ourselves to
be saved  enough
  for a normal usage.
 Still, with enough bad luck, documents and objects may be overwritten
 without a trace. This is not a stable implementation. And even worse, if  on
  any XWiki installation hash collisions will
happen in the future (or have
 already happened since 4.x), they probably won't be easily associated  with
  this issue because it's nearly impossible to
debug.
 While I do now understand the motivation to stick with hashes, I'm still
 not sure why a collision detection would be difficult to introduce and  why
  it's even "impossible for some
API". Let me briefly outline an idea:
 In XWikiHibernateStore#saveXWikiDoc on L615
 <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/ 
xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
 com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
  an exists check on the doc id is already
performed. If now
 xwikidoc.fullName is also selected in the HQL, a comparison to
 doc.getDocumentReference() can expose an imminent collision before data  is
  overwritten. At least an XWikiException should be
thrown in this case. A
 similar thing could be done before saving BaseObjects on L1203
 <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/ 
xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
 com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
  to avoid collisions on Object IDs.
 I don't think a change like this would be difficult to implement, I could
 provide a PR of that sort. The performance penalty has to be tested for
 your systems though, since the full name isn't indexed afaik.
 Regards
 Marc Sladek
 synventis gmbh
 On 30 November 2017 at 15:21, Denis Gervalle &lt;dgl(a)softec.lu&gt; wrote:
> Hi Marc,
>
> Here are some answers:
>
> 1) MD5 was already a dependency of our oldcore and using SHA1 would have
> added a dependency without bringing much benefit. Since we only used 64
> bits of the MD5 anyway, I doubt using SHA1 would have provided a better
> distribution.
>
> 2) Such a collision detection is difficult to be introduced in the
> existing code base, for some API it is even impossible. What you  experience
 > with the 32-bit ids had been my motivation to
the changes in 4.x and I
> could say, based on my long XWiki experience, that even with the poor
> 32 bit ids, very few users had been affected. Therefore, by improving  the
 > hash algorithm, the size of the ids, and the
quality of the hashed key,  we
 > have considered ourselves to be saved enough
for a normal usage.
>
> 3) That’s the worst point. I cannot answer about the first decision, I
> wasn’t yet involve, but regarding the changes introduced in 4.0,  a  change
 > had been considered. The ids are only there
to satisfy Hibernate and its
> loading mechanism. If we had used a counter, we had to manage a  conversion
 > table between ids and entity references with
all the additional  complexity
 > (consistency issues, caching, ...). This is
so because we use entity
> reference to point directly to document (or even objects) everywhere in
> XWiki. This would have been a huge work to introduce that behaviour and  at
 > the same time keeping all the existing API
unchanged. It would probably
> have introduced a performance penalty as well. This is why we resigned  and
 > go for an improved hash solution. IMO, if we
had to make such a change,  we
 > are even better rewriting the storage service
completely, and even stop
> using Hibernate, which, to be honest, does not bring much benefit to
> XWiki with its ORM aspects.
>
> But if you really want the complete answers, you can look at those  threads:
 >
http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
> http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
>
> Regards,
>
> --
> Denis Gervalle
> SOFTEC sa - CEO
>
> On 30 Nov 2017, 14:14 +0100, Marc Sladek &lt;marc.sladek(a)synventis.com&gt;om>,
> wrote:
> > Dear XWiki devs
> >
> > We are using the XWiki platform for our applications but sadly are  still
 > > stuck with 2.7.2. Lately we ran into
issues on a large database and
> noticed
> > "disappearing" BaseObjects. We were able to link it to XWIKI-6990
> > <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs 
collided
 > > (hash collisions) and overwrote other
objects without any trace -  neither
 > > visible in the history nor in a log
file.
> >
> > We analysed your implemented solution from 4.0+ in XWikiDocument
> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> xpn/xwiki/doc/XWikiDocument.java#L841
> > and BaseElement
> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> xpn/xwiki/objects/BaseElement.java#L237
> > and
> > noticed that you changed the 32bit String#hashCode to 64bit MD5, which
> > makes a collision less likely. I have a few questions regarding your
> > solution:
> >
> > 1) Is there any specific reason why you have chosen MD5 over SHA-1 or  2?
 > >
> > 2) Collisions are still possible and would be extremely hard to notice
> > since they are completely silent. Have you considered to implement a
> > collision detection to at least log occurring collisions - or even  better
 > > reserve 1-2bits of the 64bit to be used
as collision counter in the  case
 > of
> > it happening?
> >
> > 3) To question the concept of generating a hash for an ID in general:
> > Wouldn't a database defined "auto increment" be a much more robust
> solution
> > for the hibernate IDs? A collision would be impossible and
> > clustering/scalability is still possible with e.g. the InnoDB
> “interleaved”
> > autoincrement lock mode. Why have you chosen a hash based solution in  the
 > > first place?
> >
> > I'm sorry if these questions were already answered in the dev mailing
> list
> > or on issues, please link me to them since I couldn't find any 
concrete
 > > answers.
> >
> > Thanks for your time and regards
> >
> > Marc Sladek
> > synventis gmbh
> 
 --
 Thomas Mortagne

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [xwiki-devs] BaseObject ID collisions