Steps to reproduce: Use a lot of async content like the Jira macro (see JIRA-58) on a busy instance. Expected result: The async content always loads. Actual result: Sometimes the async content cannot be loaded as XWiki cannot find the job status. I think this is caused by the fact that the async jobs are store in a cache only. As this cache has a limited size and there is no guarantee that a cache will actually store the value, this could explain the failures. To me the proper fix is not to use a cache for storing async items but a data structure that has more reliable storage. Some time ago I thought about that and did a quick brainstorming with an LLM. Basically the idea would be to:
- use a ConcurrentHashMap to store the actual data.
- use a ConcurrentSkipListSet sorted by last access time to manage time-based eviction, allowing to efficiently find and then remove the oldest entries (or arbitrary entries that we can also find based on the timestamp that we would also store in the map). Evictions could be handled by a dedicated thread or by insertions, for the latter we would need some strategy to avoid too much concurrency, though, like a lock that each insertion tries to get without waiting and when it gets it, it removes some entries.
- configure a maximum size, and when trying to insert an item and the data structure is full such that no item can be evicted, the insertion would fail and async rendering would fall back to synchronuous rendering to prevent storing too much data in memory.
The maximum size wouldn't necessarily be perfectly maintained in the case of concurrent insertions but this should be okay. |