David’s Blog

running linux with no swap

Posted in Systems Engineering / Unix Systems Operations by david415 on November 21, 2009

I do not ever want my laptop or any of the servers I maintain to swap.
For all my applications it is never a good idea to swap. RAM is cheap.
It is true that I don’t want the OOM killer to kill the wrong process… But I don’t worry about
that because I know how to use the oom_adj to prevent Mysql or sshd from getting killed.

According to a knowledgeable Linux kernel developer I spoke with if you are going to disable swap (as in swapoff -a) it is beneficial to recompile with the kernel config option SWAP=n. This avoids a performance problem; it’s been reported that kswapd will otherwise use lots of CPU. While your configuring your new kernel you should also disable tmpfs

Dear Linux,
Please do not swap ever again!
kthx bye

Mysql Innodb Hotcopy in Python

Posted in Systems Engineering / Unix Systems Operations by david415 on February 1, 2009

Lately for Spinn3r I’ve been writing programs in Python to automate database operations. This hotcopy script utilizes LVM. Previously we didn’t use LVM and so this script used to do an Innodb Freeze. The simple idea here is to restore a mysql shard replica’s data from another replica…

Look at the code here:
mysql cluster tools @ bitbucket.org

The cool feature this hotcopy script has is an undo/restore mechanism that
kicks in when an exception is caught. Basically I’ve implemented a nullary function queue.
With each state change, I push the reverse (or undo/restore operation) onto the queue. When an exception is caught the rollback() method repeatedly pops and executes the last nullary off the queue until there are none left.

What I’m calling a nullary function is merely a function call wrapped in a closure (although maybe that’s not correct because they say Python doesn’t have true closures) in the form of a lambda with zero arguments.

Python’s excellent exception handling makes this rollback() feature really useful because there are lots of moving parts at work that could break and throw an exception. The hotcopy process causes several state changes on the source replica such as : set single user mode, stop replication (if the source is a slave), take LVM snapshot, mount snapshot etc…

I don’t like LVM snapshots laying around so we can COW forever! Nor would I want a database server to remain in “single user mode”…

Three ways to extend this project:

  • Patch MySQL for faster InnoDB crash recovery.
  • A distributed/highly available persistent storage mechanism for the restore queue to allow a rollback even after the server running the hotcopy program, crashes.
  • A mechanism to invoke this program/API to fully automate crash recovery. A centralized design involving a voting protocol…
  • I’m sure many like Spinn3r have similar infrastructure goals. The above InnoDB modification is one of several Desirable Innodb Features that Spinn3r will probably throw down cash for…

    Check it out at bitbucket.org, my Mysql Innodb Hotcopy program.
    This is not even a release candidate… but if anyone wants to look at the code… feel free.

    Please leave your thoughts and comments.

    Ganglia rules!

    Posted in Systems Engineering / Unix Systems Operations by david415 on February 1, 2009

    Ganglia is an excellent tool. Ganglia makes me happy.
    It seems to have an excellently efficient and reliable/highly available design.

    Ganglia seems to be mostly self configuring… except for a few modules out there that need configuration parameters. If I need to use these modules I’ll modify them to be self configuring like I did with multidisk.py… Anyone who maintains a fair sized cluster knows with a little code it pays to be lazy.

    Ganglia seems to be way better than Cacti and the rest. Initially I was disappointed with the standard set of modules which didn’t allow me to monitor the throughput on two different ethernet interfaces. Perhaps the ganglia-developers don’t run multi-homed servers. Or maybe they just don’t care about how much throughput they use. I wrote a python module to graph usage for an internal and external interface because this will help us project how much we’d be paying a different data-center facility to host our cluster.

    Gilad Raphaelli wrote a really cool embedded python module for monitoring mysql metrics… It monitors 100 metrics including various Innodb buffers. Perfect! It also comes with one custom report for queries :


    mysql query report

    innodb transactionsl
    mysql threads

    I haven’t had time to look at it yet but Silvan Mühlemann wrote
    a custom report php script for mysql

    Admittedly I don’t know how best to use gmond’s python module interface even though I’ve written several modules already… I think its supposed to make it easy to write metric monitors with some embedded python code. But I like using gmetric more. It seems to be less code to produce the equivalent monitoring of metrics than the embedded python interface.

    Also I’ve noticed people trying to use the python interface in interesting ways… e.g. spawning metric collector threads that populate a cache etc. I was hoping to be able to write ganglia metric modules without having to think about threads, cache and race conditions. Couldn’t a module harness take care of these details?

    I’m working on gmetric-daemon which is a simple python forking daemon with a modular interface that calls gmetric (via system() or popen())… It’s not very memory efficient.
    Perhaps using gmetric in this way is a silly approach… But right now I think I like it.

    Here’s the gitHub repository for the work in progress I call gmetric-daemon

    I’m going to try and make available (e.g. opensource via apache software license)
    lots more code that I write extending Ganglia.

    I might even write modules that send a “passive” Nagios alerts about metrics exceeding thresholds.

    Bugzilla Postfix e-mail integration

    Posted in Systems Engineering / Unix Systems Operations by david415 on November 25, 2008

    I got Postfix e-mail submissions to Bugzilla (3.0.5) working properly.
    Perhaps these notes of mine could save someone some trouble when attempting this.

    Certainly postfix could accomplish e-mail submissions via a custom transport using a pipe.
    I however decided to use a pipe in the /etc/aliases file; mine contains this important line :

    bug-submit: "|/var/www/bugz/email_in.pl -vvv 2>/tmp/emailin.log"

    Note that for troubleshooting I can take a look at email_in.pl’s STDERR in /tmp/emailin.log;

    Log in via the Bugzilla admin account and go to the Email section of the Parameters page.
    Change the mailfrom to match the above e-mail alias so that Bugzilla users can add a comment to a bug by replying to Bugzilla’s e-mails.


    I’m using SPF to verify sender e-mail addresses.
    Here’s part of my /etc/postfix/main.cf containing some configuration for SPF :

    alias_maps = hash:/etc/aliases

    smtpd_recipient_restrictions =
    permit_mynetworks
    reject_unauth_destination
    check_policy_service unix:private/policy-spf

    policyd-spf_time_limit = 3600

    and part of my /etc/postfix/master.cf :

    policy-spf unix - n n - - spawn
    user=nobody argv=/usr/bin/policyd-spf

    Next I got DomainKeys working.
    SPF and DomainKeys are especially important for this setup because
    Bugzilla will not be doing and e-mail spam filtering.
    All a spammer would have to do to submit annoying bugs into our Bugzilla
    system would be to forge an e-mail from a Bugzilla user’s e-mail address
    and send it to bug-submit@xxx.xxx… This is why I want SPF and DomainKeys fully
    operational… that way many forgery attempts will be rejected.

    The DKIM filters for inbound and outbound mail are started like this :

    
    /usr/local/dkimproxy/bin/dkimproxy.in --listen=127.0.0.1:10025 --relay=127.0.0.1:10026 \
    
    --user=dkim --group=dkim --daemonize --pidfile=/var/run/dkimproxy.in
    
    /usr/local/dkimproxy/bin/dkimproxy.out --listen=127.0.0.1:10027 --relay=127.0.0.1:10028 \
    
    --keyfile=/usr/local/dkimproxy/etc/private.key --selector=selector1 --domain=bugzilla.spinn3r.com \
    
    --user=dkim --group=dkim --signature=dkim --daemonize --pidfile=/var/run/dkimproxy.out

    For filtering inbound mail via DKIM edit the master.cf with something like this :

    # Before-filter SMTP server. Receive mail from the network and
    # pass it to the content filter on localhost port 10025.
    #
    smtp inet n - n - - smtpd
    -o smtpd_proxy_filter=127.0.0.1:10025
    -o smtpd_client_connection_count_limit=10
    # DKIM
    # After-filter SMTP server. Receive mail from the content filter on
    # localhost port 10026.
    127.0.0.1:10026 inet n - n - - smtpd
    -o smtpd_authorized_xforward_hosts=127.0.0.0/8
    -o smtpd_client_restrictions=
    -o smtpd_helo_restrictions=
    -o smtpd_sender_restrictions=
    -o smtpd_recipient_restrictions=permit_mynetworks,reject
    -o smtpd_data_restrictions=
    -o mynetworks=127.0.0.0/8
    -o receive_override_options=no_unknown_recipient_checks

    for outgoing DKIM edit the master.cf like this:


    ## outgoing dkim
    submission inet n - n - - smtpd
    -o smtpd_etrn_restrictions=reject
    -o content_filter=dksign:[127.0.0.1]:10027
    -o receive_override_options=no_address_mappings
    -o smtpd_recipient_restrictions=permit_mynetworks,reject
    dksign unix - - n - 10 smtp
    -o smtp_send_xforward_command=yes
    -o smtp_discard_ehlo_keywords=8bitmime
    127.0.0.1:10028 inet n - n - 10 smtpd
    -o content_filter=
    -o receive_override_options=no_unknown_recipient_checks,no_header_body_checks
    -o smtpd_helo_restrictions=
    -o smtpd_client_restrictions=
    -o smtpd_sender_restrictions=
    -o smtpd_recipient_restrictions=permit_mynetworks,reject
    -o mynetworks=127.0.0.0/8
    -o smtpd_authorized_xforward_hosts=127.0.0.0/8

    Do a postfix reload

    Mysql IF_IDLE patch

    Posted in Systems Engineering / Unix Systems Operations by david415 on October 7, 2008

    I isolated Google’s IF_IDLE feature from SqlChanges; part of the Google V2 patch.
    I diffed this IF_IDLE patch for mysql-5.0.68-percona-highperf but it’ll patch mysql-5.0.67 as well.

    This simple and brilliant change allows us to
    attempt to kill idle mysql processes while avoiding race conditions.

    KILL IF_IDLE <id>

    mysql> SHOW PROCESSLIST;
    +----+-------------+-----------+-----------+---------+------+----------------------------------+
    | Id | User        | Host      | db        | Command | Time | State                            | Info
    +----+-------------+-----------+-----------+---------+------+----------------------------------+
    |  2 | system user |           | NULL      | Connect | 2192 | Waiting for master to send event | NULL
    | 13 | root        | localhost | NULL      | Query   |    0 | NULL                             | show processlist
    | 30 | root        | localhost | NULL      | Sleep   |    2 |                                  | NULL
    +----+-------------+-----------+-----------+---------+------+----------------------------------+
    4 rows in set (0.00 sec)
    mysql> KILL IF_IDLE 30;
    Query OK, 0 rows affected (0.00 sec)
    mysql> SHOW PROCESSLIST;
    +----+-------------+-----------+-----------+---------+------+----------------------------------+
    | Id | User        | Host      | db        | Command | Time | State                            | Info
    +----+-------------+-----------+-----------+---------+------+----------------------------------+
    |  2 | system user |           | NULL      | Connect | 2357 | Waiting for master to send event | NULL
    | 13 | root        | localhost | NULL      | Query   |    0 | NULL                             | show processlist
    +----+-------------+-----------+-----------+---------+------+----------------------------------+
    3 rows in set (0.00 sec)
    

    Innodb Freeze patch

    Posted in Systems Engineering / Unix Systems Operations by david415 on October 2, 2008

    We needed a Mysql Innodb Freeze mechanism for Spinn3r’s database operations, so we decided to isolate Google’s Innodb Freeze feature from Google’s V2 patch. The Innodb Freeze can be toggled on and off like so :

    set global innodb_disallow_writes=ON
    set global innodb_disallow_writes=OFF



    Isolating this patch was easy since its a very small code change.
    I simply patched Mysql with the Google V2 patch.
    Then I started up Cscope and searched for the string: “innodb_disallow_writes”
    From there I was able to trace the various C/C++ symbols; finding all the code for this feature…
    I then edited an unpatched Mysql; manually pasting in the Innodb Freeze code changes.
    Finally I diffed this manually edited Mysql against the unpatched Mysql producing
    this Innodb Freeze patch for Mysql 5.0.37; it’s 323 lines small and only modifies 7 files of the Mysql source.

    I have re-targeted the patch for Mysql 5.0.67 which will also patch Percona’s High Performance Mysql 5.0.68: Innodb Freeze patch for Mysql 5.0.67

    I’ve briefly tested the patch and verified via md5sum that after setting global innodb_disallow_writes to OFF; data on disk does not change and all writes via the Innodb storage engine are blocked. However if you have set innodb_flush_log_at_trx_commit=0 in the my.cnf then updating an existing row will succeed. However the change will only exist in memory and will not be written to disk until you unfreeze (set global innodb_disallow_writes=OFF).

    This is a small step towards Spinn3r’s upgrade to Mysql 5.0.x. I’ll probably end up extracting another feature from the Google V2 patch; and then eventually re-target these for Percona’s High Performance patch Mysql 5.0.68.

    Build A Custom Debian Package Of The Latest OpenSSH

    Posted in Systems Engineering / Unix Systems Operations by david415 on September 10, 2008

    I needed the latest OpenSSH installed on a large cluster of Debian servers running Debian stable Etch. Since Debian isn’t going to ship a package with the latest OpenSSH for Debian stable Etch I decided to build this package myself.

    This is my recipe to build a Debian package of the latest OpenSSH which installs in prefix=/usr/local. I decided not to repackage Debian’s patched OpenSSH because I didn’t want to deal with all the dependencies. The Debian OpenSSH patch adds a lot of code that I don’t need for my production environment. Also my package is named differently than the Debian one… so it can be installed without removing the old Debian openssh package. For my situation I don’t need a post install script as part of the Debian package… because I’m performing a custom cluster-wide software deploy….


    Find a mirror and then download the OpenSSH sourcecode :

    http://www.openssh.org/portable.html
    wget ftp://ftp5.usa.openbsd.org/pub/OpenBSD/OpenSSH/portable/openssh-5.1p1.tar.gz



    Cryptographically verify the OpenSSH tarball :

    wget ftp://ftp3.usa.openbsd.org/pub/OpenBSD/OpenSSH/portable/openssh-5.1p1.tar.gz.asc
    wget ftp://ftp3.usa.openbsd.org/pub/OpenBSD/OpenSSH/portable/DJM-GPG-KEY.asc
    gpg --import DJM-GPG-KEY.asc
    gpg --fingerprint
    gpg --verify openssh-5.1p1.tar.gz.asc  openssh-5.1p1.tar.gz
    gpg --delete-keys "Damien Miller (Personal Key) "
    



    Prepare the build :

    tar xzf openssh-5.1p1.tar.gz
    mv openssh-5.1p1 openssh-5.1p1-spinn3r-r1
    cd openssh-5.1p1-spinn3r-r1
    dh_make --email david@spinn3r.com --single --native --packagename openssh-5.1p1-spinn3r-r1
    cd debian/
    



    Edit the ‘control’ file :

    vi control:

    Source: openssh-5.1p1-spinn3r-r1
    Section: unknown
    Priority: extra
    Maintainer: root
    Build-Depends: debhelper (>= 5), autotools-dev
    Standards-Version: 3.7.2
    
    Package: openssh-5.1p1-spinn3r-r1
    Architecture: any
    Depends: ${shlibs:Depends}, ${misc:Depends}
    Description: David's attempt at a custom Spinn3r OpenSSH deb package
    



    Edit the ‘rules’ file (e.g. vi rules) :
    edited this section to make the install prefix set to /usr/local/ :

    config.status: configure
            dh_testdir
            # Add here commands to configure the package.
            ./configure --host=$(DEB_HOST_GNU_TYPE) --build=$(DEB_BUILD_GNU_TYPE) \
    --prefix=/usr/local --mandir=\$${prefix}/share/man --infodir=\$${prefix}/share/info \
    CFLAGS="$(CFLAGS)" LDFLAGS="-Wl,-z,defs"

    later on in this file there seems to be this section that seemed like a good idea to edit with the correct prefix :

            $(MAKE) prefix=$(CURDIR)/debian/openssh-5.1p1-spinn3r-r1/usr/local install
    



    Change to parent directory and edit Makefile.in
    cd ..

    Edit the Makefile.in (so autoconf generates the Makefile that way we want it). We need to edit the ‘install’ target to NOT generate ssh keys otherwise build process will create a set of keys and the .deb package will be distributed with those keys.
    change this:

    install: $(CONFIGFILES) ssh_prng_cmds.out $(MANPAGES) $(TARGETS) install-files install-sysconf host-key check-config

    to this :

    install: $(CONFIGFILES) ssh_prng_cmds.out $(MANPAGES) $(TARGETS) install-files install-sysconf check-config
    



    Build the custom debian package:
    dpkg-buildpackage -rfakeroot -uc -b

           -us, -uc
                  Do not sign the source package or the .changes file, respectively.
    
             -b indicates that no source files  are  to  be
                  built  and/or  distributed

    the above command outputs out a bunch of stuff and then ends with this :

    dh_md5sums
    dh_builddeb
    dpkg-deb: building package `openssh-5.1p1-spinn3r-r1' in `../openssh-5.1p1-spinn3r-r1_5.1p1-spinn3r-r1_amd64.deb'.
     dpkg-genchanges -b
    dpkg-genchanges: binary-only upload - not including any source code
    dpkg-buildpackage: binary only upload (no source included)
    root@fu:~/builds/openssh-5.1p1-spinn3r-r1#

    The .deb file should have been written to the parent directory.

    Enjoy…

    Tagged with: , , , , , , ,

    Custom Debian Kernel Build without “make menuconfig”

    Posted in Systems Engineering / Unix Systems Operations by david415 on August 9, 2008

    I maintain a cluster of MySQL servers. I mentioned below that Linux has some memory management issues which cause our MySQL servers to swap when they really should not. So my idea was to upgrade the kernel to the Debian Etch Testing 2.6.25 kernel… but patched with Rik van Riel’s Split LRU patch (check it out: http://www.surriel.com/node/6).
    Also… I don’t feel any need to compile a super efficient static kernel with only the modules we use. I’m OK with the Debian default initrd module loading kernel.

    So what I ended up doing was first installing Debian’s kernel :

    apt-get install linux-image-2.6.25-2-amd64

    Then I grab the Debian kernel source (Linux kernel patched by Debian) :

    apt-get source linux-image-2.6.25-2-amd64

    which creates these files in the currect directory :

    linux-2.6-2.6.25
    linux-2.6_2.6.25-7.diff.gz
    linux-2.6_2.6.25-7.dsc
    linux-2.6_2.6.25.orig.tar.gz
    

    Patch the kernel with the split-LRU patch :

    cd linux-2.6-2.6.25
    patch -p1 < ../linux-2.6.25-splitlru.patch

    I move the changelog so that make-kpkg doesn’t complain and quit :

    cd debian
    mv changelog changelog.fu
    cd ..

    Here’s the secret sauce… I’m lazy and don’t want to do a “make menuconfig”
    because I’m happy enough with the way Debian configures the kernel + initrd module loading.
    So I copy the Debian config from the Kernel we just installed :

    cp /boot/config-2.6.25-2-amd64 .config

    Edit the Makefile and put unique id in EXTRAVERSION; this'll show up in a 'uname -a' :
    e.g.

    EXTRAVERSION = spinn3r.r1

    I was building on a dual core machine so I set the CONCURRENCY_LEVEL
    to tell the Debian make-kpkg to spawn instances of the compiler for building.
    However if I was doing this often I'd use DistCC (http://code.google.com/p/distcc/).

    export CONCURRENCY_LEVEL=2

    Build the kernel, create debian package with a custom name :

    fakeroot make-kpkg --initrd --us --uc --revision=spinn3r kernel_image
    

    After some period of time that'll create a .deb file in the kernel directory's parent directory
    which you can installed via a dpkg -i file.deb

    Linux Swap Issue

    Posted in Systems Engineering / Unix Systems Operations by david415 on August 6, 2008

    I no longer have to use this workaround… since we patched our kernel.
    The current 2.6 Linux Kernels seem to have some swap issues.
    The Linux Kernel really likes to swap MySQL out to disk.

    If for example you do a :


    cat /proc/swaps

    Often times on MySQL servers, with a data-set which can easily fit within memory,
    swap is reported to be in use even though it should not.
    Additionally sometimes too much swap space is reported used.

    Here’s some related links on the subject :



    Here’s my work-around for this situation :

    I create two swap files :


    dd if=/dev/zero of=/swap01 bs=1MB count=34000
    dd if=/dev/zero of=/swap02 bs=1MB count=34000
    mkswap /swap01
    mkswap /swap02

    Now I can run my Perl script to rotate the active swap filesystem between /swap01 and /swap02.
    If /swap01 is active, then Perl script does this :


    swapon /swap02
    swapoff /swap01

    This causes the pages written to swap to be reloaded into memory and ensures I’m not using swap. MySQL shouldn’t get swapped in the first place but I feel this is a pretty good workaround. I run this little Perl script from cron every half hour. Notice the locking… :

    #!/usr/bin/perl
    use strict;
    use warnings;
    
    use LockFile::Simple qw(lock trylock unlock);
    
    # set to 1 to turn verbosity off
    my $verbose = 0;
    my $lock = '/var/lock/rotate_swap.lock';
    
    Main();
    
    sub rs_lock
    {
      die "already locked\n" unless trylock($lock);
      $verbose == 1 || print "acquired lock\n";
    }
    
    sub rs_unlock
    {
      unlock($lock);
      $verbose == 1 || print "released lock\n";
    }
    
    sub err
    {
        my $msg = shift;
        print "$msg\n";
    # remove lock
        rs_unlock();
        exit -1;
    }
    
    sub Main
    {
        my %swap;
        $swap{'/swap01'} = '/swap02';
        $swap{'/swap02'} = '/swap01';
    
    # verify that valid swap files exist
    
        my $freecmd_output = `free`;
        my $totalmem;
        if($freecmd_output =~ m/Mem:\s+([^\s]+)/)
        {
    	$totalmem = $1;
        }
        else
        {
    	err("'free' cannot determine available memory");
        }
    
        my @stat_field;
        my $swap_size;
        foreach(keys %swap)
        {
    	@stat_field = stat($_);
    	$swap_size = $stat_field[7];
    	$swap_size = $swap_size / 1024;
    
    	if($swap_size &gt; $totalmem)
    	{
    	    if($verbose == 0)
    	    {
    		print "swap file: $_ size: $swap_size is greater than free mem size: $totalmem\n";
    	    }
    	}
    	else
    	{
    	    err("swap file: $_ size: $swap_size is not greater than free mem: $totalmem");
    	}
        }
    
        eval
        {
    # grab a mutex for this swap /var/lock/rotate_swap.lock
    	rs_lock();
    
    # make sure at least ONE swap partition is up and running...
    
    	my $status = `cat /proc/swaps`;
    	unless($status =~ /Priority\n.+/)
    	{
    	    err("No swap units available!");
    	}
    
    # determine the TARGET swap partition.
    	my $target_swap;
    	my @line;
    	@line = split(/\n/,$status);
    
    	my @field = split(/\s+/,$line[1]);
    	my $current_swap = $field[0];
    
    	if(!defined($swap{$current_swap}))
    	{
    	    $target_swap = '/swap01';
    	}
    	else
    	{
    	    $target_swap = $swap{$current_swap};
    	}
    
    	$verbose == 1 || print "currently swap $current_swap\n";
    
    # attempt to mount it
    
    	unless(system("swapon $target_swap") == 0)
    	{
    	    err("swapon failed for : $target_swap");
    	}
    
    	$verbose == 1 || print "enabled swap $target_swap\n";
    
    # attempt to umount the stable swap partition
    
    	unless(system("swapoff $current_swap") == 0)
    	{
    	    err("swapoff failed for : $current_swap");
    	}
    
    	$verbose == 1 || print "disabled swap $current_swap\n";
    
        }; # end eval {...
    
        # unlock
        rs_unlock();
    
        if ($@)
        {
            ### catch block
            die "caught unexpected error: $!\n";
        }
    }
    
    __END__
    

    Megaraid Nagios Plugins

    Posted in Systems Engineering / Unix Systems Operations by david415 on August 5, 2008

    I wrote three small, useful Perl scripts, Nagios NRPE plugins to be precise, each scheduled to monitor the LSI RAID controller cache policy/status, controller’s patrol read status (since we never want a patrol read to affect performance) and the controller’s Battery Backup Unit status.

    As a sys admin I often write small programs like these. I want to start writing custom stuff like this in Python. Python seems to be a very clean looking language with not a lot of syntactic sugary layers. It seems to have a clean try catch syntax like Java. Anyway here’s check_megaraid_cachepolicy.pl :

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    my $cmd = 'sudo /usr/local/sbin/MegaCli64 -LDGetProp  -Cache -Lall -aALL';
    
    my $output = `$cmd`;
    
    open(F,"&gt;/tmp/fu") || die "$!\n";
    print F "output: $output\n";
    close F;
    
    if($output =~ /Cache Policy:WriteBack, ReadAheadNone, Direct, No Write Cache if bad BBU/)
    {
    print "OK: MegaRAID cache policy is ok\n";
    exit 0;
    }
    else
    {
    print "WARNING: MegaRAID cache policy is not ok\n";
    exit 1;
    
    }
    

    check_megaraid_patrolread.pl :

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    my $cmd = 'sudo /usr/local/sbin/MegaCli64 -AdpPR -Info -aALL';
    
    my $output = `$cmd`;
    
    if($output =~ /Patrol Read Mode: Disabled/)
    {
    print "MegaRAID patrol read mode is disabled\n";
    exit 0;
    }
    else
    {
    print "WARNING: MegaRAID patrol read mode is NOT disabled\n";
    exit 1;
    }
    

    check_megaraid_bbustatus.pl:

    <pre>#!/usr/bin/perl
    
    use strict;
    use warnings;
    
    my $cmd = 'sudo MegaCli64 -AdpBbuCmd -GetBbuStatus -aALL';
    my $output = `$cmd`;
    
    #  Fully Discharged        : No
    #  Fully Charged           : Yes
    
    if($output =~ /Fully Charged           : Yes/)
    {
     print "OK: BBU Fully Charged\n";
     exit 0;
    }
    
    if($output =~ /Fully Discharged        : Yes/)
    {
     print "WARNING: MegaRAID cache BBU is Fully Discharged\n";
     exit 1;
    }
    
    if($output =~ /Fully Charged           : No/)
    {
     print "WARNING: BBU Not fully charged or discharged";
     exit 1;
    }
    

    On the Nagios server I add these two entries into our service config file :

    define service{
            hostgroup_name                  megaraid_servers
            service_description             megaraid cache policy
    #        notifications_enabled           0
            check_command                   check_nrpe_1arg!check_megaraid_cachepolicy
            use                             generic-service
            }
    
    define service{
            hostgroup_name                  megaraid_servers
            service_description             megaraid patrolread
    #        notifications_enabled           0
            check_command                   check_nrpe_1arg!check_megaraid_patrolread
            use                             generic-service
            }
    
    define service{
            hostgroup_name                  megaraid_servers
            service_description             megaraid bbu status
            notifications_enabled           0
            check_command                   check_nrpe_1arg!check_megaraid_bbustatus
            use                             generic-service
            }

    On the cluster nodes run the Nagios NRPE server which is configured run certain nagios health check plugins locally sending the results to the server and thus waking me up at 3am with a e-mail to my cellphone.

    /etc/nagios/nrpe.cfg:

    command[check_megaraid_cachepolicy]=/usr/lib/nagios/custom-plugins/check_megaraid_cachepolicy.pl
    command[check_megaraid_patrolread]=/usr/lib/nagios/custom-plugins/check_megaraid_patrolread.pl
    command[check_megaraid_bbustatus]=/usr/lib/nagios/custom-plugins/check_megaraid_bbustatus.pl
    Tagged with: , , , , , , ,