David’s Blog

Ganglia rules!

Posted in Systems Engineering / Unix Systems Operations by david415 on February 1, 2009

Ganglia is an excellent tool. Ganglia makes me happy.
It seems to have an excellently efficient and reliable/highly available design.

Ganglia seems to be mostly self configuring… except for a few modules out there that need configuration parameters. If I need to use these modules I’ll modify them to be self configuring like I did with multidisk.py… Anyone who maintains a fair sized cluster knows with a little code it pays to be lazy.

Ganglia seems to be way better than Cacti and the rest. Initially I was disappointed with the standard set of modules which didn’t allow me to monitor the throughput on two different ethernet interfaces. Perhaps the ganglia-developers don’t run multi-homed servers. Or maybe they just don’t care about how much throughput they use. I wrote a python module to graph usage for an internal and external interface because this will help us project how much we’d be paying a different data-center facility to host our cluster.

Gilad Raphaelli wrote a really cool embedded python module for monitoring mysql metrics… It monitors 100 metrics including various Innodb buffers. Perfect! It also comes with one custom report for queries :


mysql query report

innodb transactionsl
mysql threads

I haven’t had time to look at it yet but Silvan Mühlemann wrote
a custom report php script for mysql

Admittedly I don’t know how best to use gmond’s python module interface even though I’ve written several modules already… I think its supposed to make it easy to write metric monitors with some embedded python code. But I like using gmetric more. It seems to be less code to produce the equivalent monitoring of metrics than the embedded python interface.

Also I’ve noticed people trying to use the python interface in interesting ways… e.g. spawning metric collector threads that populate a cache etc. I was hoping to be able to write ganglia metric modules without having to think about threads, cache and race conditions. Couldn’t a module harness take care of these details?

I’m working on gmetric-daemon which is a simple python forking daemon with a modular interface that calls gmetric (via system() or popen())… It’s not very memory efficient.
Perhaps using gmetric in this way is a silly approach… But right now I think I like it.

Here’s the gitHub repository for the work in progress I call gmetric-daemon

I’m going to try and make available (e.g. opensource via apache software license)
lots more code that I write extending Ganglia.

I might even write modules that send a “passive” Nagios alerts about metrics exceeding thresholds.

Leave a Reply