As I said it's not a perfect solution but while waiting for one that
does not reduce a lot the performances it cost nothing and probably
covers most existing use cases.
On Wed, Feb 7, 2018 at 10:55 AM, Marc Sladek <marc.sladek(a)synventis.com> wrote:
Ok, I understand now and agree, thanks for clarifying.
Two caveats though:
1) As you mentioned, it applies only to document hash collisions. I'd
however expect object hash collisions to be more frequent because of a
higher object than document count in the average database.
2) It's not guaranteed that all projects using the XWiki platform call
getDocument before saveDocument on the same instance. For example our CMS
'celements' uses an XWikiDocumentCreator
<https://github.com/celements/celements-xwiki/blob/dev/celements-model/src/main/java/com/celements/model/access/DefaultXWikiDocumentCreator.java#L23>
to build XWikiDocuments from scratch without a getDocument call beforehand.
It's possible that there is more code around creating documents this way,
they would not be covered by your proposed change.
On 2 February 2018 at 16:14, Thomas Mortagne <thomas.mortagne(a)xwiki.com>
wrote:
> By "most of the code" I mean most of the code I know :)
>
> AFAIK everything in XWiki platform and in most extensions is doing
> this and it includes of course everything you do trough the standard
> UI.
>
> On Fri, Feb 2, 2018 at 3:51 PM, Marc Sladek <marc.sladek(a)synventis.com>
> wrote:
> > With "most of the code", do you also mean when new documents are
being
> > created and stored (the scenario where collisions happen)?
> >
> > Imho there is nothing preventing collisions when doing:
> > * new XWikiDocument(docRef)
> > * modify instance
> > * exists check and save
> >
> > On 2 February 2018 at 15:28, Thomas Mortagne <thomas.mortagne(a)xwiki.com>
> > wrote:
> >
> >> On Fri, Feb 2, 2018 at 3:07 PM, Marc Sladek
<marc.sladek(a)synventis.com>
> >> wrote:
> >> > Introducing this certainly doesn't hurt, but 'm not sure how
useful it
> >> is.
> >>
> >> I never said it was the solution to all collisions but it will cover
> >> most of the (very rare and never reported) overwrites at 0 cost.
> >>
> >> > Firstly, it can show collisions only after a document is already
> >> > overwritten, thus the damage is already done.
> >>
> >> Again keep in mind that most of the code does:
> >> * getDocument(DocumentReference)
> >> * modify the XWikiDocument instance
> >> * saveDocument()
> >>
> >> so if getDocument() fail you are not going to overwrite anything.
> >>
> >> > Secondly, loadXWikiDoc has to
> >> > be called for the document which doesn't exist anymore, I guess
this
> >> > doesn't happen so often since the system won't list it
anymore.
> >>
> >> Not sure which use case you are referring to here. Are you talking
> >> about document deleted the a document with a different reference but
> >> same hash is saved ?
> >>
> >> >
> >> > On 2 February 2018 at 13:36, Thomas Mortagne <
> thomas.mortagne(a)xwiki.com>
> >> > wrote:
> >> >
> >> >> For document what could help a lot already without any performance
> >> >> penalty is to compare the loaded document reference and the passed
> one
> >> >> in XWikiHibernateStore#loadXWikiDoc. That's because most of the
code
> >> >> in XWiki apply the following logic: getDocument(), modify it,
> >> >> saveDocument().
> >> >>
> >> >> On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek <
> marc.sladek(a)synventis.com
> >> >
> >> >> wrote:
> >> >> > Hi Denis,
> >> >> >
> >> >> > Thanks a lot for your answer. I know it's been a while,
but I'd
> still
> >> >> like
> >> >> > to follow up on it since it's quite the fundamental
issue.
> >> >> >
> >> >> >> Therefore, by improving the hash algorithm, the size of
the ids,
> and
> >> the
> >> >> > quality of the hashed key, we have considered ourselves to be
saved
> >> >> enough
> >> >> > for a normal usage.
> >> >> >
> >> >> > Still, with enough bad luck, documents and objects may be
> overwritten
> >> >> > without a trace. This is not a stable implementation. And
even
> worse,
> >> if
> >> >> on
> >> >> > any XWiki installation hash collisions will happen in the
future
> (or
> >> have
> >> >> > already happened since 4.x), they probably won't be
easily
> associated
> >> >> with
> >> >> > this issue because it's nearly impossible to debug.
> >> >> >
> >> >> > While I do now understand the motivation to stick with hashes,
I'm
> >> still
> >> >> > not sure why a collision detection would be difficult to
introduce
> and
> >> >> why
> >> >> > it's even "impossible for some API". Let me
briefly outline an
> idea:
> >> >> >
> >> >> > In XWikiHibernateStore#saveXWikiDoc on L615
> >> >> >
<https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
> >> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
> >> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
> >> >> > an exists check on the doc id is already performed. If now
> >> >> > xwikidoc.fullName is also selected in the HQL, a comparison
to
> >> >> > doc.getDocumentReference() can expose an imminent collision
before
> >> data
> >> >> is
> >> >> > overwritten. At least an XWikiException should be thrown in
this
> >> case. A
> >> >> > similar thing could be done before saving BaseObjects on
L1203
> >> >> >
<https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
> >> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
> >> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
> >> >> > to avoid collisions on Object IDs.
> >> >> >
> >> >> > I don't think a change like this would be difficult to
implement, I
> >> could
> >> >> > provide a PR of that sort. The performance penalty has to be
tested
> >> for
> >> >> > your systems though, since the full name isn't indexed
afaik.
> >> >> >
> >> >> > Regards
> >> >> >
> >> >> > Marc Sladek
> >> >> > synventis gmbh
> >> >> >
> >> >> > On 30 November 2017 at 15:21, Denis Gervalle
<dgl(a)softec.lu>
> wrote:
> >> >> >
> >> >> >> Hi Marc,
> >> >> >>
> >> >> >> Here are some answers:
> >> >> >>
> >> >> >> 1) MD5 was already a dependency of our oldcore and using
SHA1
> would
> >> have
> >> >> >> added a dependency without bringing much benefit. Since we
only
> used
> >> 64
> >> >> >> bits of the MD5 anyway, I doubt using SHA1 would have
provided a
> >> better
> >> >> >> distribution.
> >> >> >>
> >> >> >> 2) Such a collision detection is difficult to be
introduced in the
> >> >> >> existing code base, for some API it is even impossible.
What you
> >> >> experience
> >> >> >> with the 32-bit ids had been my motivation to the changes
in 4.x
> and
> >> I
> >> >> >> could say, based on my long XWiki experience, that even
with the
> poor
> >> >> >> 32 bit ids, very few users had been affected. Therefore,
by
> improving
> >> >> the
> >> >> >> hash algorithm, the size of the ids, and the quality of
the hashed
> >> key,
> >> >> we
> >> >> >> have considered ourselves to be saved enough for a normal
usage.
> >> >> >>
> >> >> >> 3) That’s the worst point. I cannot answer about the
first
> decision,
> >> I
> >> >> >> wasn’t yet involve, but regarding the changes introduced
in 4.0,
> a
> >> >> change
> >> >> >> had been considered. The ids are only there to satisfy
Hibernate
> and
> >> its
> >> >> >> loading mechanism. If we had used a counter, we had to
manage a
> >> >> conversion
> >> >> >> table between ids and entity references with all the
additional
> >> >> complexity
> >> >> >> (consistency issues, caching, ...). This is so because we
use
> entity
> >> >> >> reference to point directly to document (or even objects)
> everywhere
> >> in
> >> >> >> XWiki. This would have been a huge work to introduce that
> behaviour
> >> and
> >> >> at
> >> >> >> the same time keeping all the existing API unchanged. It
would
> >> probably
> >> >> >> have introduced a performance penalty as well. This is why
we
> >> resigned
> >> >> and
> >> >> >> go for an improved hash solution. IMO, if we had to make
such a
> >> change,
> >> >> we
> >> >> >> are even better rewriting the storage service completely,
and even
> >> stop
> >> >> >> using Hibernate, which, to be honest, does not bring much
benefit
> to
> >> >> >> XWiki with its ORM aspects.
> >> >> >>
> >> >> >> But if you really want the complete answers, you can look
at those
> >> >> threads:
> >> >> >>
http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
> >> >> >>
http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
> >> >> >>
> >> >> >> Regards,
> >> >> >>
> >> >> >> --
> >> >> >> Denis Gervalle
> >> >> >> SOFTEC sa - CEO
> >> >> >>
> >> >> >> On 30 Nov 2017, 14:14 +0100, Marc Sladek <
> marc.sladek(a)synventis.com
> >> >,
> >> >> >> wrote:
> >> >> >> > Dear XWiki devs
> >> >> >> >
> >> >> >> > We are using the XWiki platform for our applications
but sadly
> are
> >> >> still
> >> >> >> > stuck with 2.7.2. Lately we ran into issues on a
large database
> and
> >> >> >> noticed
> >> >> >> > "disappearing" BaseObjects. We were able to
link it to
> XWIKI-6990
> >> >> >> > <http://jira.xwiki.org/browse/XWIKI-6990>,
where hibernate IDs
> >> >> collided
> >> >> >> > (hash collisions) and overwrote other objects without
any trace
> -
> >> >> neither
> >> >> >> > visible in the history nor in a log file.
> >> >> >> >
> >> >> >> > We analysed your implemented solution from 4.0+ in
XWikiDocument
> >> >> >> >
<https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> >> >> >>
wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> >> >> >> xpn/xwiki/doc/XWikiDocument.java#L841
> >> >> >> > and BaseElement
> >> >> >> >
<https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> >> >> >>
wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> >> >> >> xpn/xwiki/objects/BaseElement.java#L237
> >> >> >> > and
> >> >> >> > noticed that you changed the 32bit String#hashCode to
64bit MD5,
> >> which
> >> >> >> > makes a collision less likely. I have a few questions
regarding
> >> your
> >> >> >> > solution:
> >> >> >> >
> >> >> >> > 1) Is there any specific reason why you have chosen
MD5 over
> SHA-1
> >> or
> >> >> 2?
> >> >> >> >
> >> >> >> > 2) Collisions are still possible and would be
extremely hard to
> >> notice
> >> >> >> > since they are completely silent. Have you considered
to
> implement
> >> a
> >> >> >> > collision detection to at least log occurring
collisions - or
> even
> >> >> better
> >> >> >> > reserve 1-2bits of the 64bit to be used as collision
counter in
> the
> >> >> case
> >> >> >> of
> >> >> >> > it happening?
> >> >> >> >
> >> >> >> > 3) To question the concept of generating a hash for
an ID in
> >> general:
> >> >> >> > Wouldn't a database defined "auto
increment" be a much more
> robust
> >> >> >> solution
> >> >> >> > for the hibernate IDs? A collision would be
impossible and
> >> >> >> > clustering/scalability is still possible with e.g.
the InnoDB
> >> >> >> “interleaved”
> >> >> >> > autoincrement lock mode. Why have you chosen a hash
based
> solution
> >> in
> >> >> the
> >> >> >> > first place?
> >> >> >> >
> >> >> >> > I'm sorry if these questions were already
answered in the dev
> >> mailing
> >> >> >> list
> >> >> >> > or on issues, please link me to them since I
couldn't find any
> >> >> concrete
> >> >> >> > answers.
> >> >> >> >
> >> >> >> > Thanks for your time and regards
> >> >> >> >
> >> >> >> > Marc Sladek
> >> >> >> > synventis gmbh
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Thomas Mortagne
> >> >>
> >>
> >>
> >>
> >> --
> >> Thomas Mortagne
> >>
>
>
>
> --
> Thomas Mortagne
>