David’s Blog

Mysql Innodb Hotcopy in Python

Posted in Systems Engineering / Unix Systems Operations by david415 on February 1, 2009

Lately for Spinn3r I’ve been writing programs in Python to automate database operations. This hotcopy script utilizes LVM. Previously we didn’t use LVM and so this script used to do an Innodb Freeze. The simple idea here is to restore a mysql shard replica’s data from another replica…

Look at the code here:
mysql cluster tools @ bitbucket.org

The cool feature this hotcopy script has is an undo/restore mechanism that
kicks in when an exception is caught. Basically I’ve implemented a nullary function queue.
With each state change, I push the reverse (or undo/restore operation) onto the queue. When an exception is caught the rollback() method repeatedly pops and executes the last nullary off the queue until there are none left.

What I’m calling a nullary function is merely a function call wrapped in a closure (although maybe that’s not correct because they say Python doesn’t have true closures) in the form of a lambda with zero arguments.

Python’s excellent exception handling makes this rollback() feature really useful because there are lots of moving parts at work that could break and throw an exception. The hotcopy process causes several state changes on the source replica such as : set single user mode, stop replication (if the source is a slave), take LVM snapshot, mount snapshot etc…

I don’t like LVM snapshots laying around so we can COW forever! Nor would I want a database server to remain in “single user mode”…

Three ways to extend this project:

  • Patch MySQL for faster InnoDB crash recovery.
  • A distributed/highly available persistent storage mechanism for the restore queue to allow a rollback even after the server running the hotcopy program, crashes.
  • A mechanism to invoke this program/API to fully automate crash recovery. A centralized design involving a voting protocol…
  • I’m sure many like Spinn3r have similar infrastructure goals. The above InnoDB modification is one of several Desirable Innodb Features that Spinn3r will probably throw down cash for…

    Check it out at bitbucket.org, my Mysql Innodb Hotcopy program.
    This is not even a release candidate… but if anyone wants to look at the code… feel free.

    Please leave your thoughts and comments.

    Ganglia rules!

    Posted in Systems Engineering / Unix Systems Operations by david415 on February 1, 2009

    Ganglia is an excellent tool. Ganglia makes me happy.
    It seems to have an excellently efficient and reliable/highly available design.

    Ganglia seems to be mostly self configuring… except for a few modules out there that need configuration parameters. If I need to use these modules I’ll modify them to be self configuring like I did with multidisk.py… Anyone who maintains a fair sized cluster knows with a little code it pays to be lazy.

    Ganglia seems to be way better than Cacti and the rest. Initially I was disappointed with the standard set of modules which didn’t allow me to monitor the throughput on two different ethernet interfaces. Perhaps the ganglia-developers don’t run multi-homed servers. Or maybe they just don’t care about how much throughput they use. I wrote a python module to graph usage for an internal and external interface because this will help us project how much we’d be paying a different data-center facility to host our cluster.

    Gilad Raphaelli wrote a really cool embedded python module for monitoring mysql metrics… It monitors 100 metrics including various Innodb buffers. Perfect! It also comes with one custom report for queries :


    mysql query report

    innodb transactionsl
    mysql threads

    I haven’t had time to look at it yet but Silvan Mühlemann wrote
    a custom report php script for mysql

    Admittedly I don’t know how best to use gmond’s python module interface even though I’ve written several modules already… I think its supposed to make it easy to write metric monitors with some embedded python code. But I like using gmetric more. It seems to be less code to produce the equivalent monitoring of metrics than the embedded python interface.

    Also I’ve noticed people trying to use the python interface in interesting ways… e.g. spawning metric collector threads that populate a cache etc. I was hoping to be able to write ganglia metric modules without having to think about threads, cache and race conditions. Couldn’t a module harness take care of these details?

    I’m working on gmetric-daemon which is a simple python forking daemon with a modular interface that calls gmetric (via system() or popen())… It’s not very memory efficient.
    Perhaps using gmetric in this way is a silly approach… But right now I think I like it.

    Here’s the gitHub repository for the work in progress I call gmetric-daemon

    I’m going to try and make available (e.g. opensource via apache software license)
    lots more code that I write extending Ganglia.

    I might even write modules that send a “passive” Nagios alerts about metrics exceeding thresholds.