Hello all,
Here's my experience at monitoring XWikis.
With
i2geo.net and with my private XWiki, I use a zabbix server.
This php-based monitoring tool is quite easy to configure for http monitoring and with a
few more steps you get a mail notification when, e.g., a timeout occurs in connections.
I've been using HypericHQ for a while, a java based monitoring, which was rather nice
to manipulate but a machine-name-change broke everything, so I looked for something a tick
more modern.
At
curriki.org, a site with lots of visitors, there's quite a few tools used to
monitor.
- First, for the safety and honesty of a system outside,
alertsite.com is used. It is very
effective at detecting breakges, including potential internet backbones'. We use
monitoring from three locations.
- Second, because, indeed, the XWiki servers sometimes need a push, there used to be a
regular script that checks a basic page and, if failed, auto-restarts the app-server. For
us, this is a bit unsafe because we like to control things after a restart.
- Third, for a while, we have been running a "combined monitoring" which allowed
to combine a small graphical view synced with logs of apache, the app-server,
thread-dumps, and mysql. This allowed to catch "bad actions" which sometimes
happen when power users perform actions which trigger too big queries which locked others
(group-deletions were such an action).
- Finally, we also added a zabbix which collects http monitoring as well as other
"classical" values (disks, memory, apache-stats, …).
The rhythm at curriki is about a week… after a week, one of the two cluster nodes
(there's two currently) needs a restart because some memory gets exhausted and the GC
starts to fail. We generally get alertsite errors then.
The interest of running a monitoring infrastructure such as zabbix, is that you can
analyze the behaviors of multiple variables and see if there is a way to predict if things
are getting wrong. It remains a guts' feeling story but still gives you quite some
confidence.
It would be really nice if we could converge on a set of JMX analysis "items"
for zabbix so that we could be analyzing more concretely the xwiki-relevant information
(in particular the cache behaviors) and start adjusting to less fall out of memory.
paul
On 31 oct. 2014, at 22:29, Jason Clemons <jason.clemons(a)live.com> wrote:
I's also find any suggestions very helpful,
I've had that happen a few times and outside of monitoring CPU and RAM, I've found
logging to be difficult to use and configure, and even when I get it configured it's
not very helpful.
> On Oct 31, 2014, at 1:57 PM, Bryn Jeffries <bryn.jeffries(a)sydney.edu.au>
wrote:
>
> Having made my XWiki site available to other users, I was concerned to find that the
site became unusable at one point with client connections eventually timing out. I had no
way to diagnose the problem, but eventually I managed to make a (slow) SSH connection to
the server and restarted Tomcat, and things seemed to settle back to normal.
>
> The problem is I have no real sense of what happened and how to prevent it happening
again. To that end, I'd appreciate any suggestions for monitoring the server and
diagnosing poor performance. What do others typically use? I have an Apache2 server
passing wiki page requests to Tomcat7 via an ajp connector, and a PostgreSQL database. My
guess is that Tomcat is doing most of the work here so that's probably what I need to
monitor the most.