Cassandra data storage performance statistics with inotify and fincore
We were interested in knowing which files were accessed (reads and seeks)
the most among the Cassandra data files (index, filter and data files)…
I wrote this simple Python program, inotify-access, to print these file read access statistics for a given time duration and directory. Find it in my github repository:
http://github.com/david415/inotify-access
This program makes use of the Linux kernel’s inotify system call.
If you run Debian/Ugabuga (I mean Ubuntu) then you’ll need to install the pynotify library: apt-get install python-pyinotify
To determine page cache usage of these files you can use fincore. I have forked the linux-ftools’s fincore to make the output more easily parsable at my github repository here:
http://github.com/david415/linux-ftools
Additionally I’ve written cassandra_pagecache_usage, a Python program that uses the mincore system call to report page cache usage for Cassandra data sets. Previously this program used to parse the output of linux-ftools’s fincore. However I have since switched to using the
the Python C extension fincore_ratio which returns a 2 tuples (cached pages, total pages). python-ftools is a linux-ftools port to Python C extensions; find it in my github repository here :
http://github.com/david415/python-ftools
I’ve written a Python version of fadvise for the commandline…
fadvise example usage:
Perhaps your cassandra node has been rebooted. You could “warm up” certain Column Families like this :
./fadvise -m willneed /mnt/var/cassandra/data/BunnyFufu/ForestActivity*
Find cassandra_pagecache_usage at my github repository here:
http://github.com/david415/cassandra-pagecache-usage
Usage: cassandra_pagecache_usage [options] <cassandra-data-directory>
Options:
-h, --help show this help message and exit
-c, --columnfamily-summarize
Summarize cached Cassandra data on a per Column Family
basis.
--exclude-filter Exclude statistics for Cassandra Filter files.
--exclude-index Exclude statistics for Cassandra Index files.
--exclude-data Exclude statistics for Cassandra Data files.
example output:
my-cassandra-node:~/bin# PYTHONPATH=~/lib ./cassandra_pagecache_usage -c /mnt/var/cassandra/data/BunnyFufu/ Column Family Bytes in FS page-cache ForestActivity 3712839680 Indexes 2902822912 AnimalIndex 2369015808 AnimalCounts 1470619648 Items 786214912 Activity 264978432 Animals 133816320 Hops 127442944
Would love to see some example output… this looks like a great toolset.
i’ve updated this blog post with example output for cassandra_pagecache_usage