Re: [xwiki-users] Monitoring an Xwiki stack

1 Nov 2014

Hi Paul,
Many thanks for your contribution. I'll certainly look into Zabbix, although I must
confess to being aghast at what appears to be a large and complex tool for what I'd
hoped was quite simple. I hadn't realised these servers were so temperamental. Before
I loose myself in getting acquainted with a new sophisticated product, could you tell me
whether Zabbix (or something else) will help me identify the following?:
- When are users suffering timeouts (doesn't have to be real time, happy to check
summary later)
- Where was the timeout occuring (network, Apache, Tomcat, Postgres)
- What was the cause of the timeout (too many connections, low memory, long Java
operation, long query, etc)
- What specific item (Java program, DB query) was responsible
I wonder whether all this should be discoverable in the logs, with the right
configuration.
I've seen a lot of mention of JMX for Tomcat monitoring, but I've shied away from
it since I wanted to start simple, but perhaps there is no simple ... ;-(
________________________________________
From: Paul Libbrecht [paul(a)hoplahup.net]
Sent: 01 November 2014 09:41
To: XWiki Users
Subject: Re: [xwiki-users] Monitoring an Xwiki stack
Hello all,
Here's my experience at monitoring XWikis.
With i2geo.net and with my private XWiki, I use a zabbix server.
This php-based monitoring tool is quite easy to configure for http monitoring and with a
few more steps you get a mail notification when, e.g., a timeout occurs in connections.
I've been using HypericHQ for a while, a java based monitoring, which was rather nice
to manipulate but a machine-name-change broke everything, so I looked for something a tick
more modern.
At curriki.org, a site with lots of visitors, there's quite a few tools used to
monitor.
- First, for the safety and honesty of a system outside, alertsite.com is used. It is very
effective at detecting breakges, including potential internet backbones'. We use
monitoring from three locations.
- Second, because, indeed, the XWiki servers sometimes need a push, there used to be a
regular script that checks a basic page and, if failed, auto-restarts the app-server. For
us, this is a bit unsafe because we like to control things after a restart.
- Third, for a while, we have been running a "combined monitoring" which allowed
to combine a small graphical view synced with logs of apache, the app-server,
thread-dumps, and mysql. This allowed to catch "bad actions" which sometimes
happen when power users perform actions which trigger too big queries which locked others
(group-deletions were such an action).
- Finally, we also added a zabbix which collects http monitoring as well as other
"classical" values (disks, memory, apache-stats, …).
The rhythm at curriki is about a week… after a week, one of the two cluster nodes
(there's two currently) needs a restart because some memory gets exhausted and the GC
starts to fail. We generally get alertsite errors then.
The interest of running a monitoring infrastructure such as zabbix, is that you can
analyze the behaviors of multiple variables and see if there is a way to predict if things
are getting wrong. It remains a guts' feeling story but still gives you quite some
confidence.
It would be really nice if we could converge on a set of JMX analysis "items"
for zabbix so that we could be analyzing more concretely the xwiki-relevant information
(in particular the cache behaviors) and start adjusting to less  fall out of memory.
paul

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [xwiki-users] Monitoring an Xwiki stack