With "most of the code", do you also mean when new documents are being
created and stored (the scenario where collisions happen)?
Imho there is nothing preventing collisions when doing:
* new XWikiDocument(docRef)
* modify instance
* exists check and save
On 2 February 2018 at 15:28, Thomas Mortagne <thomas.mortagne(a)xwiki.com>
wrote:
On Fri, Feb 2, 2018 at 3:07 PM, Marc Sladek
<marc.sladek(a)synventis.com>
wrote:
Introducing this certainly doesn't hurt, but
'm not sure how useful it
is.
I never said it was the solution to all collisions but it will cover
most of the (very rare and never reported) overwrites at 0 cost.
Firstly, it can show collisions only after a
document is already
overwritten, thus the damage is already done.
Again keep in mind that most of the code does:
* getDocument(DocumentReference)
* modify the XWikiDocument instance
* saveDocument()
so if getDocument() fail you are not going to overwrite anything.
Secondly, loadXWikiDoc has to
be called for the document which doesn't exist anymore, I guess this
doesn't happen so often since the system won't list it anymore.
Not sure which use case you are referring to here. Are you talking
about document deleted the a document with a different reference but
same hash is saved ?
On 2 February 2018 at 13:36, Thomas Mortagne <thomas.mortagne(a)xwiki.com>
wrote:
For document what could help a lot already
without any performance
penalty is to compare the loaded document reference and the passed one
in XWikiHibernateStore#loadXWikiDoc. That's because most of the code
in XWiki apply the following logic: getDocument(), modify it,
saveDocument().
On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek <marc.sladek(a)synventis.com
> wrote:
> > Hi Denis,
> >
> > Thanks a lot for your answer. I know it's been a while, but I'd still
> like
> > to follow up on it since it's quite the fundamental issue.
> >
> >> Therefore, by improving the hash algorithm, the size of the ids, and
the
> > quality of the hashed key, we have
considered ourselves to be saved
> enough
> > for a normal usage.
> >
> > Still, with enough bad luck, documents and objects may be overwritten
> > without a trace. This is not a stable implementation. And even worse,
if
> on
> > any XWiki installation hash collisions will happen in the future (or
have
> > already happened since 4.x), they
probably won't be easily associated
> with
> > this issue because it's nearly impossible to debug.
> >
> > While I do now understand the motivation to stick with hashes, I'm
still
> > not sure why a collision detection would
be difficult to introduce and
> why
> > it's even "impossible for some API". Let me briefly outline an
idea:
> >
> > In XWikiHibernateStore#saveXWikiDoc on L615
> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
> com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
> > an exists check on the doc id is already performed. If now
> > xwikidoc.fullName is also selected in the HQL, a comparison to
> > doc.getDocumentReference() can expose an imminent collision before
data
> is
> > overwritten. At least an XWikiException should be thrown in this
case. A
> > similar thing could be done before
saving BaseObjects on L1203
> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
> com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
> > to avoid collisions on Object IDs.
> >
> > I don't think a change like this would be difficult to implement, I
could
> > provide a PR of that sort. The
performance penalty has to be tested
for
> > your systems though, since the full name
isn't indexed afaik.
> >
> > Regards
> >
> > Marc Sladek
> > synventis gmbh
> >
> > On 30 November 2017 at 15:21, Denis Gervalle <dgl(a)softec.lu> wrote:
> >
> >> Hi Marc,
> >>
> >> Here are some answers:
> >>
> >> 1) MD5 was already a dependency of our oldcore and using SHA1 would
have
> >> added a dependency without bringing
much benefit. Since we only used
64
> >> bits of the MD5 anyway, I doubt
using SHA1 would have provided a
better
> >> distribution.
> >>
> >> 2) Such a collision detection is difficult to be introduced in the
> >> existing code base, for some API it is even impossible. What you
> experience
> >> with the 32-bit ids had been my motivation to the changes in 4.x and
I
> >> could say, based on my long XWiki
experience, that even with the poor
> >> 32 bit ids, very few users had been affected. Therefore, by improving
> the
> >> hash algorithm, the size of the ids, and the quality of the hashed
key,
> we
> >> have considered ourselves to be saved enough for a normal usage.
> >>
> >> 3) That’s the worst point. I cannot answer about the first decision,
I
> >> wasn’t yet involve, but regarding
the changes introduced in 4.0, a
> change
> >> had been considered. The ids are only there to satisfy Hibernate and
its
> >> loading mechanism. If we had used a
counter, we had to manage a
> conversion
> >> table between ids and entity references with all the additional
> complexity
> >> (consistency issues, caching, ...). This is so because we use entity
> >> reference to point directly to document (or even objects) everywhere
in
> >> XWiki. This would have been a huge
work to introduce that behaviour
and
> at
> >> the same time keeping all the existing API unchanged. It would
probably
> >> have introduced a performance
penalty as well. This is why we
resigned
> and
> >> go for an improved hash solution. IMO, if we had to make such a
change,
> we
> >> are even better rewriting the storage service completely, and even
stop
> using Hibernate, which, to be honest, does not
bring much benefit to
> XWiki with its ORM aspects.
>
> But if you really want the complete answers, you can look at those
threads:
>>
http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
>>
http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
>>
>> Regards,
>>
>> --
>> Denis Gervalle
>> SOFTEC sa - CEO
>>
>> On 30 Nov 2017, 14:14 +0100, Marc Sladek <marc.sladek(a)synventis.com
,
> >> wrote:
> >> > Dear XWiki devs
> >> >
> >> > We are using the XWiki platform for our applications but sadly are
> still
> >> > stuck with 2.7.2. Lately we ran into issues on a large database and
> >> noticed
> >> > "disappearing" BaseObjects. We were able to link it to
XWIKI-6990
> >> > <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs
> collided
> >> > (hash collisions) and overwrote other objects without any trace -
> neither
> >> > visible in the history nor in a log file.
> >> >
> >> > We analysed your implemented solution from 4.0+ in XWikiDocument
> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> >> xpn/xwiki/doc/XWikiDocument.java#L841
> >> > and BaseElement
> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
> >> xpn/xwiki/objects/BaseElement.java#L237
> >> > and
> >> > noticed that you changed the 32bit String#hashCode to 64bit MD5,
which
> >> > makes a collision less likely.
I have a few questions regarding
your
> >> > solution:
> >> >
> >> > 1) Is there any specific reason why you have chosen MD5 over SHA-1
or
> 2?
> >> >
> >> > 2) Collisions are still possible and would be extremely hard to
notice
> >> > since they are completely
silent. Have you considered to implement
a
> >> > collision detection to at least
log occurring collisions - or even
> better
> >> > reserve 1-2bits of the 64bit to be used as collision counter in the
> case
> >> of
> >> > it happening?
> >> >
> >> > 3) To question the concept of generating a hash for an ID in
general:
> >> > Wouldn't a database defined
"auto increment" be a much more robust
> >> solution
> >> > for the hibernate IDs? A collision would be impossible and
> >> > clustering/scalability is still possible with e.g. the InnoDB
> >> “interleaved”
> >> > autoincrement lock mode. Why have you chosen a hash based solution
in
> the
> >> > first place?
> >> >
> >> > I'm sorry if these questions were already answered in the dev
mailing
> >> list
> >> > or on issues, please link me to them since I couldn't find any
> concrete
> >> > answers.
> >> >
> >> > Thanks for your time and regards
> >> >
> >> > Marc Sladek
> >> > synventis gmbh
> >>
>
>
>
> --
> Thomas Mortagne
>
--
Thomas Mortagne