Hi devs,
Yesterday
xwiki.org crashed and I had configured it to take a heap dump. I’ve done a quick
analysis that I’m sharing here (I’ll continue to analyse):
Memory retained: 1GB
Main contenders:
1) Document cache: 178MB
2) Lucene WeightedSpanTermExtractor: 166MB
3) IRCBot Threads: 165MB
4) Velocity RuntimeInstance: 38MB
5) SOLR LRUCache (Lucene Document): 38MB
6) EM DefaultCoreExtensionRepository: 38MB
7) NamespaceURLClassLoader: 23MB
I’ve started analyzing some below.
1 - Document Cache Analysis
=======================
* There are 3552 XWikiDocument in memory for 195MB
* The document cache size is 2000 on
xwiki.org
* Large documents (such as Test Reports) take 6MB each (XDOM caching)
* So if we had only large documents in the wiki the cache would need 2000*6MB = 12TO
* I don’t think this cache is memory aware, meaning it doesn’t free its entries when
memory is low
* 178MB for 2000 docs means average of 89K per document. Huge variation between docs with
big content and docs with no or small content
This means that when memory is low on
xwiki.org it should be enough to call a few pages
with some large content to get an OOM.
4 ideas to explore:
Idea 1: Use a cache that evicts entries when some max threshold is reached
** Infinispan doesn’t support this
yet:
https://issues.jboss.org/browse/ISPN-863 and
https://community.jboss.org/thread/165951?start=0&tstart=0
** Guava seems to support size-based eviction with the ability to pass a weight
function:
http://code.google.com/p/guava-libraries/wiki/CachesExplained#Size-based_Ev…
Idea 2: Another idea is a usage of a distributed cache such as memcached or elasticsearch.
I wonder if the overhead of the network communication is too high to make it interesting
vs not caching the XDOM and rendering it every time it’s needed.
Idea 3: Try to reduce even more how the XDOM is stored in memory
Idea 4: Don’t cache the XDOM and render every time and use a title cache for titles. Also
do that for getting sections. I think they are the 2 main uses cases for getting the
XDOM.
As a short term action, I’d recommend to immediately reduce the document cache size from
2000 to 1000 on
wiki.org or double the heap memory.
2 - Lucene WeightedSpanTermExtractor Analysis
=====================================
I’m not sure what this is about yet but it looks strange.
* There is 166MB stored in the Map<String,AtomicReaderContext> of
WeightedSpanTermExtractor.
* That map contains 192 instances
* Example of map items: “doccontent_pt” (2.4MB), “title_ru” (1.8MB), “title_ro” (1.8MB),
etc
Any idea Marius?
3 - IRCBot Analysis
===============
* We use 3 IRCBot threads. They take 55MB each!
* The 55MB is taken by the ExecutionContext
* More precisely the 55MB are held in 77371 org.apache.velocity.runtime.parser.node.Node[]
objects
I need to understand more why it’s so large since it doesn’t look normal.
I also wonder if it keeps increasing or not.
5 - SOLR LRUCache Analysis
=======================
* It’s map of 512 entries (Lucene Document objects). 512 is the cache size.
* Entries are instances of DocSlice
Looks ok and normal.
6 - EM DefaultCoreExtensionRepository Analysis
======================================
* 38MB in "Map<String, DefaultCoreExtension> extensions”
* 33MB in org.codehaus.plexus.util.xml.Xpp3Dom instances (44844 instances) which I guess
corresponds to the pom.xml of all our core extensions mostly.
Looks normal even though 33MB is quite a lot.
7 - NamespaceURLClassLoader Analysis
================================
* 23MB in org.eclipse.jgit.storage.file.WindowCache
* So this seems related to the XWiki Git Module used by the GitHubStats application
installed on
dev.xwiki.org
This looks ok and normal according
to
http://download.eclipse.org/jgit/docs/jgit-2.0.0.201206130900-r/apidocs/org…
Thanks
-Vincent