David’s Blog

Custom Debian Kernel Build without “make menuconfig”

Posted in Systems Engineering / Unix Systems Operations by david415 on August 9, 2008

I maintain a cluster of MySQL servers. I mentioned below that Linux has some memory management issues which cause our MySQL servers to swap when they really should not. So my idea was to upgrade the kernel to the Debian Etch Testing 2.6.25 kernel… but patched with Rik van Riel’s Split LRU patch (check it out: http://www.surriel.com/node/6).
Also… I don’t feel any need to compile a super efficient static kernel with only the modules we use. I’m OK with the Debian default initrd module loading kernel.

So what I ended up doing was first installing Debian’s kernel :

apt-get install linux-image-2.6.25-2-amd64

Then I grab the Debian kernel source (Linux kernel patched by Debian) :

apt-get source linux-image-2.6.25-2-amd64

which creates these files in the currect directory :

linux-2.6-2.6.25
linux-2.6_2.6.25-7.diff.gz
linux-2.6_2.6.25-7.dsc
linux-2.6_2.6.25.orig.tar.gz

Patch the kernel with the split-LRU patch :

cd linux-2.6-2.6.25
patch -p1 < ../linux-2.6.25-splitlru.patch

I move the changelog so that make-kpkg doesn’t complain and quit :

cd debian
mv changelog changelog.fu
cd ..

Here’s the secret sauce… I’m lazy and don’t want to do a “make menuconfig”
because I’m happy enough with the way Debian configures the kernel + initrd module loading.
So I copy the Debian config from the Kernel we just installed :

cp /boot/config-2.6.25-2-amd64 .config

Edit the Makefile and put unique id in EXTRAVERSION; this'll show up in a 'uname -a' :
e.g.

EXTRAVERSION = spinn3r.r1

I was building on a dual core machine so I set the CONCURRENCY_LEVEL
to tell the Debian make-kpkg to spawn instances of the compiler for building.
However if I was doing this often I'd use DistCC (http://code.google.com/p/distcc/).

export CONCURRENCY_LEVEL=2

Build the kernel, create debian package with a custom name :

fakeroot make-kpkg --initrd --us --uc --revision=spinn3r kernel_image

After some period of time that'll create a .deb file in the kernel directory's parent directory
which you can installed via a dpkg -i file.deb

Linux Swap Issue

Posted in Systems Engineering / Unix Systems Operations by david415 on August 6, 2008

I no longer have to use this workaround… since we patched our kernel.
The current 2.6 Linux Kernels seem to have some swap issues.
The Linux Kernel really likes to swap MySQL out to disk.

If for example you do a :


cat /proc/swaps

Often times on MySQL servers, with a data-set which can easily fit within memory,
swap is reported to be in use even though it should not.
Additionally sometimes too much swap space is reported used.

Here’s some related links on the subject :



Here’s my work-around for this situation :

I create two swap files :


dd if=/dev/zero of=/swap01 bs=1MB count=34000
dd if=/dev/zero of=/swap02 bs=1MB count=34000
mkswap /swap01
mkswap /swap02

Now I can run my Perl script to rotate the active swap filesystem between /swap01 and /swap02.
If /swap01 is active, then Perl script does this :


swapon /swap02
swapoff /swap01

This causes the pages written to swap to be reloaded into memory and ensures I’m not using swap. MySQL shouldn’t get swapped in the first place but I feel this is a pretty good workaround. I run this little Perl script from cron every half hour. Notice the locking… :

#!/usr/bin/perl
use strict;
use warnings;

use LockFile::Simple qw(lock trylock unlock);

# set to 1 to turn verbosity off
my $verbose = 0;
my $lock = '/var/lock/rotate_swap.lock';

Main();

sub rs_lock
{
  die "already locked\n" unless trylock($lock);
  $verbose == 1 || print "acquired lock\n";
}

sub rs_unlock
{
  unlock($lock);
  $verbose == 1 || print "released lock\n";
}

sub err
{
    my $msg = shift;
    print "$msg\n";
# remove lock
    rs_unlock();
    exit -1;
}

sub Main
{
    my %swap;
    $swap{'/swap01'} = '/swap02';
    $swap{'/swap02'} = '/swap01';

# verify that valid swap files exist

    my $freecmd_output = `free`;
    my $totalmem;
    if($freecmd_output =~ m/Mem:\s+([^\s]+)/)
    {
	$totalmem = $1;
    }
    else
    {
	err("'free' cannot determine available memory");
    }

    my @stat_field;
    my $swap_size;
    foreach(keys %swap)
    {
	@stat_field = stat($_);
	$swap_size = $stat_field[7];
	$swap_size = $swap_size / 1024;

	if($swap_size &gt; $totalmem)
	{
	    if($verbose == 0)
	    {
		print "swap file: $_ size: $swap_size is greater than free mem size: $totalmem\n";
	    }
	}
	else
	{
	    err("swap file: $_ size: $swap_size is not greater than free mem: $totalmem");
	}
    }

    eval
    {
# grab a mutex for this swap /var/lock/rotate_swap.lock
	rs_lock();

# make sure at least ONE swap partition is up and running...

	my $status = `cat /proc/swaps`;
	unless($status =~ /Priority\n.+/)
	{
	    err("No swap units available!");
	}

# determine the TARGET swap partition.
	my $target_swap;
	my @line;
	@line = split(/\n/,$status);

	my @field = split(/\s+/,$line[1]);
	my $current_swap = $field[0];

	if(!defined($swap{$current_swap}))
	{
	    $target_swap = '/swap01';
	}
	else
	{
	    $target_swap = $swap{$current_swap};
	}

	$verbose == 1 || print "currently swap $current_swap\n";

# attempt to mount it

	unless(system("swapon $target_swap") == 0)
	{
	    err("swapon failed for : $target_swap");
	}

	$verbose == 1 || print "enabled swap $target_swap\n";

# attempt to umount the stable swap partition

	unless(system("swapoff $current_swap") == 0)
	{
	    err("swapoff failed for : $current_swap");
	}

	$verbose == 1 || print "disabled swap $current_swap\n";

    }; # end eval {...

    # unlock
    rs_unlock();

    if ($@)
    {
        ### catch block
        die "caught unexpected error: $!\n";
    }
}

__END__

Megaraid Nagios Plugins

Posted in Systems Engineering / Unix Systems Operations by david415 on August 5, 2008

I wrote three small, useful Perl scripts, Nagios NRPE plugins to be precise, each scheduled to monitor the LSI RAID controller cache policy/status, controller’s patrol read status (since we never want a patrol read to affect performance) and the controller’s Battery Backup Unit status.

As a sys admin I often write small programs like these. I want to start writing custom stuff like this in Python. Python seems to be a very clean looking language with not a lot of syntactic sugary layers. It seems to have a clean try catch syntax like Java. Anyway here’s check_megaraid_cachepolicy.pl :

#!/usr/bin/perl

use strict;
use warnings;

my $cmd = 'sudo /usr/local/sbin/MegaCli64 -LDGetProp  -Cache -Lall -aALL';

my $output = `$cmd`;

open(F,"&gt;/tmp/fu") || die "$!\n";
print F "output: $output\n";
close F;

if($output =~ /Cache Policy:WriteBack, ReadAheadNone, Direct, No Write Cache if bad BBU/)
{
print "OK: MegaRAID cache policy is ok\n";
exit 0;
}
else
{
print "WARNING: MegaRAID cache policy is not ok\n";
exit 1;

}

check_megaraid_patrolread.pl :

#!/usr/bin/perl

use strict;
use warnings;

my $cmd = 'sudo /usr/local/sbin/MegaCli64 -AdpPR -Info -aALL';

my $output = `$cmd`;

if($output =~ /Patrol Read Mode: Disabled/)
{
print "MegaRAID patrol read mode is disabled\n";
exit 0;
}
else
{
print "WARNING: MegaRAID patrol read mode is NOT disabled\n";
exit 1;
}

check_megaraid_bbustatus.pl:

<pre>#!/usr/bin/perl

use strict;
use warnings;

my $cmd = 'sudo MegaCli64 -AdpBbuCmd -GetBbuStatus -aALL';
my $output = `$cmd`;

#  Fully Discharged        : No
#  Fully Charged           : Yes

if($output =~ /Fully Charged           : Yes/)
{
 print "OK: BBU Fully Charged\n";
 exit 0;
}

if($output =~ /Fully Discharged        : Yes/)
{
 print "WARNING: MegaRAID cache BBU is Fully Discharged\n";
 exit 1;
}

if($output =~ /Fully Charged           : No/)
{
 print "WARNING: BBU Not fully charged or discharged";
 exit 1;
}

On the Nagios server I add these two entries into our service config file :

define service{
        hostgroup_name                  megaraid_servers
        service_description             megaraid cache policy
#        notifications_enabled           0
        check_command                   check_nrpe_1arg!check_megaraid_cachepolicy
        use                             generic-service
        }

define service{
        hostgroup_name                  megaraid_servers
        service_description             megaraid patrolread
#        notifications_enabled           0
        check_command                   check_nrpe_1arg!check_megaraid_patrolread
        use                             generic-service
        }

define service{
        hostgroup_name                  megaraid_servers
        service_description             megaraid bbu status
        notifications_enabled           0
        check_command                   check_nrpe_1arg!check_megaraid_bbustatus
        use                             generic-service
        }

On the cluster nodes run the Nagios NRPE server which is configured run certain nagios health check plugins locally sending the results to the server and thus waking me up at 3am with a e-mail to my cellphone.

/etc/nagios/nrpe.cfg:

command[check_megaraid_cachepolicy]=/usr/lib/nagios/custom-plugins/check_megaraid_cachepolicy.pl
command[check_megaraid_patrolread]=/usr/lib/nagios/custom-plugins/check_megaraid_patrolread.pl
command[check_megaraid_bbustatus]=/usr/lib/nagios/custom-plugins/check_megaraid_bbustatus.pl
Tagged with: , , , , , , ,