Re: [xwiki-devs] BaseObject ID collisions

2 Feb 2018

      By "most of the code" I mean most of the code I know :)

AFAIK everything in XWiki platform and in most extensions is doing
this and it includes of course everything you do trough the standard
UI.

On Fri, Feb 2, 2018 at 3:51 PM, Marc Sladek <[email protected]> wrote:
...
With "most of the code", do you also mean when new documents are being
created and stored (the scenario where collisions happen)?
Imho there is nothing preventing collisions when doing:
* new XWikiDocument(docRef)
* modify instance
* exists check and save
On 2 February 2018 at 15:28, Thomas Mortagne <[email protected]>
wrote:
...
On Fri, Feb 2, 2018 at 3:07 PM, Marc Sladek <[email protected]>
wrote:
...
Introducing this certainly doesn't hurt, but 'm not sure how useful it
is.
I never said it was the solution to all collisions but it will cover
most of the (very rare and never reported) overwrites at 0 cost.
...
Firstly, it can show collisions only after a document is already
overwritten, thus the damage is already done.
Again keep in mind that most of the code does:
* getDocument(DocumentReference)
* modify the XWikiDocument instance
* saveDocument()
so if getDocument() fail you are not going to overwrite anything.
...
Secondly, loadXWikiDoc has to
be called for the document which doesn't exist anymore, I guess this
doesn't happen so often since the system won't list it anymore.
Not sure which use case you are referring to here. Are you talking
about document deleted the a document with a different reference but
same hash is saved ?
...
On 2 February 2018 at 13:36, Thomas Mortagne <[email protected]>
wrote:
...
For document what could help a lot already without any performance
penalty is to compare the loaded document reference and the passed one
in XWikiHibernateStore#loadXWikiDoc. That's because most of the code
in XWiki apply the following logic: getDocument(), modify it,
saveDocument().
On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek <[email protected]
...
wrote:
...
Hi Denis,
Thanks a lot for your answer. I know it's been a while, but I'd still
like
to follow up on it since it's quite the fundamental issue.
...
Therefore, by improving the hash algorithm, the size of the ids, and
the
...
...
...
quality of the hashed key, we have considered ourselves to be saved
enough
for a normal usage.
Still, with enough bad luck, documents and objects may be overwritten
without a trace. This is not a stable implementation. And even worse,
if
on
any XWiki installation hash collisions will happen in the future (or
have
already happened since 4.x), they probably won't be easily associated
with
this issue because it's nearly impossible to debug.
While I do now understand the motivation to stick with hashes, I'm
still
not sure why a collision detection would be difficult to introduce and
why
it's even "impossible for some API". Let me briefly outline an idea:
In XWikiHibernateStore#saveXWikiDoc on L615
<https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
an exists check on the doc id is already performed. If now
xwikidoc.fullName is also selected in the HQL, a comparison to
doc.getDocumentReference() can expose an imminent collision before
data
is
overwritten. At least an XWikiException should be thrown in this
case. A
similar thing could be done before saving BaseObjects on L1203
<https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
to avoid collisions on Object IDs.
I don't think a change like this would be difficult to implement, I
could
provide a PR of that sort. The performance penalty has to be tested
for
your systems though, since the full name isn't indexed afaik.
Regards
Marc Sladek
synventis gmbh
On 30 November 2017 at 15:21, Denis Gervalle <[email protected]> wrote:
...
Hi Marc,
Here are some answers:
1) MD5 was already a dependency of our oldcore and using SHA1 would
have
added a dependency without bringing much benefit. Since we only used
64
bits of the MD5 anyway, I doubt using SHA1 would have provided a
better
distribution.
2) Such a collision detection is difficult to be introduced in the
existing code base, for some API it is even impossible. What you
experience
with the 32-bit ids had been my motivation to the changes in 4.x and
I
could say, based on my long XWiki experience, that even with the poor
32 bit ids, very few users had been affected. Therefore, by improving
the
hash algorithm, the size of the ids, and the quality of the hashed
key,
we
have considered ourselves to be saved enough for a normal usage.
3) That’s the worst point. I cannot answer about the first decision,
I
wasn’t yet involve, but regarding the changes introduced in 4.0,  a
change
had been considered. The ids are only there to satisfy Hibernate and
its
loading mechanism. If we had used a counter, we had to manage a
conversion
table between ids and entity references with all the additional
complexity
(consistency issues, caching, ...). This is so because we use entity
reference to point directly to document (or even objects) everywhere
in
XWiki. This would have been a huge work to introduce that behaviour
and
at
the same time keeping all the existing API unchanged. It would
probably
have introduced a performance penalty as well. This is why we
resigned
and
go for an improved hash solution. IMO, if we had to make such a
change,
we
are even better rewriting the storage service completely, and even
stop
using Hibernate, which, to be honest, does not bring much benefit to
XWiki with its ORM aspects.
But if you really want the complete answers, you can look at those
threads:
http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
Regards,
--
Denis Gervalle
SOFTEC sa - CEO
On 30 Nov 2017, 14:14 +0100, Marc Sladek <[email protected]
,
wrote:
> Dear XWiki devs
>
> We are using the XWiki platform for our applications but sadly are
still
> stuck with 2.7.2. Lately we ran into issues on a large database and
noticed
> "disappearing" BaseObjects. We were able to link it to XWIKI-6990
> <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs
collided
> (hash collisions) and overwrote other objects without any trace -
neither
> visible in the history nor in a log file.
>
> We analysed your implemented solution from 4.0+ in XWikiDocument
> <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
xpn/xwiki/doc/XWikiDocument.java#L841
> and BaseElement
> <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
xpn/xwiki/objects/BaseElement.java#L237
> and
> noticed that you changed the 32bit String#hashCode to 64bit MD5,
which
> makes a collision less likely. I have a few questions regarding
your
> solution:
>
> 1) Is there any specific reason why you have chosen MD5 over SHA-1
or
2?
>
> 2) Collisions are still possible and would be extremely hard to
notice
> since they are completely silent. Have you considered to implement
a
> collision detection to at least log occurring collisions - or even
better
> reserve 1-2bits of the 64bit to be used as collision counter in the
case
of
> it happening?
>
> 3) To question the concept of generating a hash for an ID in
general:
> Wouldn't a database defined "auto increment" be a much more robust
solution
> for the hibernate IDs? A collision would be impossible and
> clustering/scalability is still possible with e.g. the InnoDB
“interleaved”
> autoincrement lock mode. Why have you chosen a hash based solution
in
the
> first place?
>
> I'm sorry if these questions were already answered in the dev
mailing
list
> or on issues, please link me to them since I couldn't find any
concrete
> answers.
>
> Thanks for your time and regards
>
> Marc Sladek
> synventis gmbh
--
Thomas Mortagne
--
Thomas Mortagne
-- 
Thomas Mortagne