Megaraid Nagios Plugins
I wrote three small, useful Perl scripts, Nagios NRPE plugins to be precise, each scheduled to monitor the LSI RAID controller cache policy/status, controller’s patrol read status (since we never want a patrol read to affect performance) and the controller’s Battery Backup Unit status.
As a sys admin I often write small programs like these. I want to start writing custom stuff like this in Python. Python seems to be a very clean looking language with not a lot of syntactic sugary layers. It seems to have a clean try catch syntax like Java. Anyway here’s check_megaraid_cachepolicy.pl :
#!/usr/bin/perl
use strict;
use warnings;
my $cmd = 'sudo /usr/local/sbin/MegaCli64 -LDGetProp -Cache -Lall -aALL';
my $output = `$cmd`;
open(F,">/tmp/fu") || die "$!\n";
print F "output: $output\n";
close F;
if($output =~ /Cache Policy:WriteBack, ReadAheadNone, Direct, No Write Cache if bad BBU/)
{
print "OK: MegaRAID cache policy is ok\n";
exit 0;
}
else
{
print "WARNING: MegaRAID cache policy is not ok\n";
exit 1;
}
check_megaraid_patrolread.pl :
#!/usr/bin/perl
use strict;
use warnings;
my $cmd = 'sudo /usr/local/sbin/MegaCli64 -AdpPR -Info -aALL';
my $output = `$cmd`;
if($output =~ /Patrol Read Mode: Disabled/)
{
print "MegaRAID patrol read mode is disabled\n";
exit 0;
}
else
{
print "WARNING: MegaRAID patrol read mode is NOT disabled\n";
exit 1;
}
check_megaraid_bbustatus.pl:
<pre>#!/usr/bin/perl
use strict;
use warnings;
my $cmd = 'sudo MegaCli64 -AdpBbuCmd -GetBbuStatus -aALL';
my $output = `$cmd`;
# Fully Discharged : No
# Fully Charged : Yes
if($output =~ /Fully Charged : Yes/)
{
print "OK: BBU Fully Charged\n";
exit 0;
}
if($output =~ /Fully Discharged : Yes/)
{
print "WARNING: MegaRAID cache BBU is Fully Discharged\n";
exit 1;
}
if($output =~ /Fully Charged : No/)
{
print "WARNING: BBU Not fully charged or discharged";
exit 1;
}
On the Nagios server I add these two entries into our service config file :
define service{
hostgroup_name megaraid_servers
service_description megaraid cache policy
# notifications_enabled 0
check_command check_nrpe_1arg!check_megaraid_cachepolicy
use generic-service
}
define service{
hostgroup_name megaraid_servers
service_description megaraid patrolread
# notifications_enabled 0
check_command check_nrpe_1arg!check_megaraid_patrolread
use generic-service
}
define service{
hostgroup_name megaraid_servers
service_description megaraid bbu status
notifications_enabled 0
check_command check_nrpe_1arg!check_megaraid_bbustatus
use generic-service
}
On the cluster nodes run the Nagios NRPE server which is configured run certain nagios health check plugins locally sending the results to the server and thus waking me up at 3am with a e-mail to my cellphone.
/etc/nagios/nrpe.cfg:
command[check_megaraid_cachepolicy]=/usr/lib/nagios/custom-plugins/check_megaraid_cachepolicy.pl command[check_megaraid_patrolread]=/usr/lib/nagios/custom-plugins/check_megaraid_patrolread.pl command[check_megaraid_bbustatus]=/usr/lib/nagios/custom-plugins/check_megaraid_bbustatus.pl
leave a comment