By "most of the code" I mean most of the code I know :)
AFAIK everything in XWiki platform and in most extensions is doing
this and it includes of course everything you do trough the standard
UI.
On Fri, Feb 2, 2018 at 3:51 PM, Marc Sladek <marc.sladek(a)synventis.com> wrote:
With "most of the code", do you also mean
when new documents are being
created and stored (the scenario where collisions happen)?
Imho there is nothing preventing collisions when doing:
* new XWikiDocument(docRef)
* modify instance
* exists check and save
On 2 February 2018 at 15:28, Thomas Mortagne <thomas.mortagne(a)xwiki.com>
wrote:
> On Fri, Feb 2, 2018 at 3:07 PM, Marc Sladek <marc.sladek(a)synventis.com>
> wrote:
> > Introducing this certainly doesn't hurt, but 'm not sure how useful it
> is.
>
> I never said it was the solution to all collisions but it will cover
> most of the (very rare and never reported) overwrites at 0 cost.
>
> > Firstly, it can show collisions only after a document is already
> > overwritten, thus the damage is already done.
>
> Again keep in mind that most of the code does:
> * getDocument(DocumentReference)
> * modify the XWikiDocument instance
> * saveDocument()
>
> so if getDocument() fail you are not going to overwrite anything.
>
> > Secondly, loadXWikiDoc has to
> > be called for the document which doesn't exist anymore, I guess this
> > doesn't happen so often since the system won't list it anymore.
>
> Not sure which use case you are referring to here. Are you talking
> about document deleted the a document with a different reference but
> same hash is saved ?
>
> >
> > On 2 February 2018 at 13:36, Thomas Mortagne <thomas.mortagne(a)xwiki.com>
> > wrote:
> >
> >> For document what could help a lot already without any performance
> >> penalty is to compare the loaded document reference and the passed one
> >> in XWikiHibernateStore#loadXWikiDoc. That's because most of the code
> >> in XWiki apply the following logic: getDocument(), modify it,
> >> saveDocument().
> >>
> >> On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek <marc.sladek(a)synventis.com
> >
> >> wrote:
> >> > Hi Denis,
> >> >
> >> > Thanks a lot for your answer. I know it's been a while, but I'd
still
> >> like
> >> > to follow up on it since it's quite the fundamental issue.
> >> >
> >> >> Therefore, by improving the hash algorithm, the size of the ids,
and
> the
> >> > quality of the hashed key, we have considered ourselves to be saved
> >> enough
> >> > for a normal usage.
> >> >
> >> > Still, with enough bad luck, documents and objects may be overwritten
> >> > without a trace. This is not a stable implementation. And even worse,
> if
> >> on
> >> > any XWiki installation hash collisions will happen in the future (or
> have
> >> > already happened since 4.x), they probably won't be easily
associated
> >> with
> >> > this issue because it's nearly impossible to debug.
> >> >
> >> > While I do now understand the motivation to stick with hashes, I'm
> still
> >> > not sure why a collision detection would be difficult to introduce and
> >> why
> >> > it's even "impossible for some API". Let me briefly
outline an idea:
> >> >
> >> > In XWikiHibernateStore#saveXWikiDoc on L615
> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
> >> > an exists check on the doc id is already performed. If now
> >> > xwikidoc.fullName is also selected in the HQL, a comparison to
> >> > doc.getDocumentReference() can expose an imminent collision before
> data
> >> is
> >> > overwritten. At least an XWikiException should be thrown in this
> case. A
> >> > similar thing could be done before saving BaseObjects on L1203
> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
> >> > to avoid collisions on Object IDs.
> >> >
> >> > I don't think a change like this would be difficult to implement,
I
> could
> >> > provide a PR of that sort. The performance penalty has to be tested
> for
> >> > your systems though, since the full name isn't indexed afaik.
> >> >
> >> > Regards
> >> >
> >> > Marc Sladek
> >> > synventis gmbh
> >> >
> >> > On 30 November 2017 at 15:21, Denis Gervalle <dgl(a)softec.lu>
wrote:
> >> >
> >> >> Hi Marc,
> >> >>
> >> >> Here are some answers:
> >> >>
> >> >> 1) MD5 was already a dependency of our oldcore and using SHA1
would
> have
> >> >> added a dependency without bringing much benefit. Since we only
used
> 64
> >> >> bits of the MD5 anyway, I doubt using SHA1 would have provided a
> better
> >> >> distribution.
> >> >>
> >> >> 2) Such a collision detection is difficult to be introduced in the
> >> >> existing code base, for some API it is even impossible. What you
> >> experience
> >> >> with the 32-bit ids had been my motivation to the changes in 4.x
and
> I
> >> >> could say, based on my long XWiki experience, that even with the
poor
> >> >> 32 bit ids, very few users had been affected. Therefore, by
improving
> >> the
> >> >> hash algorithm, the size of the ids, and the quality of the hashed
> key,
> >> we
> >> >> have considered ourselves to be saved enough for a normal usage.
> >> >>
> >> >> 3) That’s the worst point. I cannot answer about the first
decision,
> I
> >> >> wasn’t yet involve, but regarding the changes introduced in 4.0,
a
> >> change
> >> >> had been considered. The ids are only there to satisfy Hibernate
and
> its
> >> >> loading mechanism. If we had used a counter, we had to manage a
> >> conversion
> >> >> table between ids and entity references with all the additional
> >> complexity
> >> >> (consistency issues, caching, ...). This is so because we use
entity
> >> >> reference to point directly to document (or even objects)
everywhere
> in
> >> >> XWiki. This would have been a huge work to introduce that
behaviour
> and
> >> at
> >> >> the same time keeping all the existing API unchanged. It would
> probably
> >> >> have introduced a performance penalty as well. This is why we
> resigned
> >> and
> >> >> go for an improved hash solution. IMO, if we had to make such a
> change,
> >> we
> >> >> are even better rewriting the storage service completely, and even
> stop
> >> >> using Hibernate, which, to be honest, does not bring much benefit
to
> >> >> XWiki with its ORM aspects.
> >> >>
> >> >> But if you really want the complete answers, you can look at those
> >> threads:
> >> >>
http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
> >> >>
http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
> >> >>
> >> >> Regards,
> >> >>
> >> >> --
> >> >> Denis Gervalle
> >> >> SOFTEC sa - CEO
> >> >>
> >> >> On 30 Nov 2017, 14:14 +0100, Marc Sladek
<marc.sladek(a)synventis.com
> >,
> >> >> wrote:
> >> >> > Dear XWiki devs
> >> >> >
> >> >> > We are using the XWiki platform for our applications but sadly
are
> >> still
> >> >> > stuck with 2.7.2. Lately we ran into issues on a large
database and
> >> >> noticed
> >> >> > "disappearing" BaseObjects. We were able to link it
to XWIKI-6990
> >> >> > <http://jira.xwiki.org/browse/XWIKI-6990>, where
hibernate IDs
> >> collided
> >> >> > (hash collisions) and overwrote other objects without any
trace -
> >> neither
> >> >> > visible in the history nor in a log file.
> >> >> >
> >> >> > We analysed your implemented solution from 4.0+ in
XWikiDocument
> >> >> >
<https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> >> >> xpn/xwiki/doc/XWikiDocument.java#L841
> >> >> > and BaseElement
> >> >> >
<https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> >> >> xpn/xwiki/objects/BaseElement.java#L237
> >> >> > and
> >> >> > noticed that you changed the 32bit String#hashCode to 64bit
MD5,
> which
> >> >> > makes a collision less likely. I have a few questions
regarding
> your
> >> >> > solution:
> >> >> >
> >> >> > 1) Is there any specific reason why you have chosen MD5 over
SHA-1
> or
> >> 2?
> >> >> >
> >> >> > 2) Collisions are still possible and would be extremely hard
to
> notice
> >> >> > since they are completely silent. Have you considered to
implement
> a
> >> >> > collision detection to at least log occurring collisions - or
even
> >> better
> >> >> > reserve 1-2bits of the 64bit to be used as collision counter
in the
> >> case
> >> >> of
> >> >> > it happening?
> >> >> >
> >> >> > 3) To question the concept of generating a hash for an ID in
> general:
> >> >> > Wouldn't a database defined "auto increment" be
a much more robust
> >> >> solution
> >> >> > for the hibernate IDs? A collision would be impossible and
> >> >> > clustering/scalability is still possible with e.g. the InnoDB
> >> >> “interleaved”
> >> >> > autoincrement lock mode. Why have you chosen a hash based
solution
> in
> >> the
> >> >> > first place?
> >> >> >
> >> >> > I'm sorry if these questions were already answered in the
dev
> mailing
> >> >> list
> >> >> > or on issues, please link me to them since I couldn't find
any
> >> concrete
> >> >> > answers.
> >> >> >
> >> >> > Thanks for your time and regards
> >> >> >
> >> >> > Marc Sladek
> >> >> > synventis gmbh
> >> >>
> >>
> >>
> >>
> >> --
> >> Thomas Mortagne
> >>
>
>
>
> --
> Thomas Mortagne
>