Consider the following scenario:
- User A logs in and an authenticator starts adding user A to group G, loading G in version 10.
- User B logs in and an authenticator starts adding user B to group G, loading G in version 10.
- The request of user A adds the group object with A at position 10 and saves.
- The request of user B adds the group object with B at the same position 10 and saves.
Expected result: The saving in step 4 fails and the request of user B re-tries saving. Actual result: The saving in step 4 succeeds and user A is removed from G again. In some authenticators, there is a lock on the group document and the cached document instance is updated, which means that in this scenario, actually everything might be okay. However, in cluster setups, if the requests arrive on different nodes of the cluster, the lock doesn't help in any way. Further, if both requests start when the document isn't in the cache, they could also end up with different instances of the document even without a cluster. The affects version is arbitrary, this issue should exist in basically all versions of XWiki. As part of XWIKI-13473, a lock has been added that prevents that step 3 and 4 run in parallel on a non-cluster setup but it doesn't prevent saving a document based on an old version and it has no effect on a cluster setup. I think what we should do is to check inside the save transaction if the revision number of the original document corresponds to the revision in the database and fail saving if it doesn't. Then the code that tried to save can reload the document and re-try. A problem is that we don't have a group API for adding users to groups and thus every authenticator has its own copy of the code for adding a user to a group that of course doesn't include such a loop. |