Skip to content

Nagios Acknowledge for the Masses

Nagios Acknowledge for the Masses published on 7 Comments on Nagios Acknowledge for the Masses

I made this simple perl script to help with the acknowledging of multiple alerts.

When running in a large environment, and during a large maintenance alerts can flood the user and even with the use aid of servicegroups and hostgroups the alerts can overwhelm the user.

The script lists any problem unacknowledged or without unscheduled downtime.
Similar to what this link does:

/cgi-bin/status.cgi?host=all&type=detail&servicestatustypes=29&hoststatustypes=15&serviceprops=10

To setup the script, make sure you edit the paths to your nagios status.dat, and the command FIFO file.
Script should be able to write to the FIFO file.

To use the script, run without arguments, in interactive mode.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
#!/usr/bin/perl
 
#####################################################################################################
#   This script provides help to acknowledge multiple services during a large maintenance
#   Sometimes host groups and service groups do not suffice
#      Script requires the setup of the location of the status.dat and the FIFO file
#      Script should be able to write to the FIFO file
#	Command is run interactively
#      Santiago Velasco - sanxiago.com
#########################################################################################
 
	my $command_file = "/usr/local/nagios/var/rw/nagios.cmd";
	my $status_file = "/usr/local/nagios/var/status.dat";
	my $time = time();
	my %state = (1 ,'WARNING', 2,'CRITICAL', 3,'UNKNOWN');
	my $user = $ARGV[0];
	my $msg = $ARGV[1];
	my $search_string = $ARGV[3];
 
print STDERR "nnACKNOWLEDGE AND SCHEDULE DOWNTIME FOR MULTIPLE SERVICESnn";
 
while(!defined($user) or $user =~ /;|[|]/  or length($user)<=1){
	print STDERR "Type in yout USER that acknowledges:n";
	$user = <>;
	$user =~ s/n//;
}
while (!defined($msg) or $msg =~ /;|[|]/ or length($msg)<=1 ){
	print STDERR "Type in the MESSAGE that will be used for all acknowledges:n";
	$msg = <>;
	$msg =~ s/n//;
}	
print STDERR "Type in a string that matches the service_description of the services you want to ack.n Leave it blank to list all alerts):n"; $search_string = <>; $search_string =~ s/n//; if(length($search_string)<=1){
	$search_string='.*';
}
 
if (-r $status_file){	
	open (STATUS, $status_file);
}
else {
	print STDERR "FAILED TO READ NAGIOS STATUS FILEn";
	exit 1;
}
while(<STATUS>){
	if($_ =~ /service {/){
	$is_service = 1;
	}
	if($_ =~ /}/ and $service_description=~/$search_string/){
	$is_service =0;
		if(defined($current_state) and $current_state and $acknowledged==0 and $scheduled_downtime==0 ){
		# Command Format:
		# [time] ACKNOWLEDGE_SVC_PROBLEM;<host_name>;<service_description>;<sticky>;<notify>;<persistent>;<author>;<comment>
		# [time] SCHEDULE_SVC_DOWNTIME;<host_name>;<service_desription><start_time>;<end_time>;<fixed>;<trigger_id>;<duration>;<author>;<comment>
 		undef($ack_true);
		print STDERR "n---------------------------------------------------------n";
		print STDERR "Acknowledge $service_description @ $host_name $current_state".$state{$current_state}."?n$plugin_outputn[y/n/s] (s followed by the number of minutes of scheduled downtime) (Enter to skip)n";
		$ack_true=<>;
		# if acknowledge yes
		if($ack_true=~/^y/){
			if (!(-w $command_file)){ print STDERR "FAILED TO OPEN FIFO FILE"; exit 1; }
			open (CMD, '>>'.$command_file);
			print CMD "[$time] ACKNOWLEDGE_SVC_PROBLEM;$host_name;$service_description;1;0;1;$user;$msgn";
			close (CMD);
		# if schedule downtime 
		}elsif($ack_true=~/^s(.*)/){
			my $duration = $1;
			if($duration=~/[^d]*([0-9]+).*/){
				#expect duration in minutes convert to seconds
				$duration=int($1)*60;
			}else{
				$duration=3600;
			}
                        my $end_time = $time + $duration;
 
                        if (!(-w $command_file)){ print STDERR "FAILED TO OPEN FIFO FILE"; exit 1; }
                        open (CMD, '>>'.$command_file);
			print CMD "[$time] SCHEDULE_SVC_DOWNTIME;$host_name;$service_description;$time;$end_time;1;0;$duration;$user;$msgn";
                        close (CMD);
		}
		}
	undef($current_state);
	undef($host_name);
	}
	if($is_service){
		if($_=~/host_name=(.*)/){
		$host_name=$1;
		}
		if($_=~/service_description=(.*)/){
		$service_description=$1;
		}
                if($_=~/current_state=([0-9]*)/){
                $current_state=$1;
                }
                if($_=~/problem_has_been_acknowledged=([0-9]*)/){
                $acknowledged=$1;
                }
                if($_=~/plugin_output=(.*)/){
                $plugin_output=$1;
                }
		if($_=~/scheduled_downtime_depth=([0-9]*)/){
		$scheduled_downtime=$1;
		}
	}
}
close(STATUS);

7 Comments

Hi Santiago,

I have tried mass_scheduler.pl script but it’s not working for me. I am running Nagios Version 3.3.1.

i have modified as per my nagios setup

my $command_file = “/usr/local/nagios/var/rw/nagios.cmd”;
my $status_file = “/usr/local/nagios/var/status.dat”;

-sh-3.2$ perl mass_scheduler.pl

ACKNOWLEDGE AND SCHEDULE DOWNTIME FOR MULTIPLE SERVICES

Type in yout USER that acknowledges:
nagiosadmin
Type in the MESSAGE that will be used for all acknowledges:
Internet DOWN
Type in a string that matches the service_description of the services you want to ack.
Leave it blank to list all alerts):

After enter i am getting the prompt and nothing happen in nagios

Please suggest me for same.

Thanks in Advance
Vishal Sinha

Great script! However, under Nagios Core 3.3.1, is_script was not being set to true given service objects in status.dat are being tagged as ‘servicestatus’ instead of ‘service’. Modifying Line 46 to read “if($_ =~ /servicestatus {/){” resolved this issue.

This appears to only acknowledge services. I have some hosts that are down right now that I’d like to acknowledge but it doesn’t list anything when I run it. Do I have to tweak something in order to show hosts as well?

Tweak it no, its more of a rewrite than a tweak. Its probably easier to setup and use hostgroups, I built this tool because in my experience the problem was creating all the servicegroups required to list all the interdependencies between all applications, so certain outages would trigger certain alerts and sometimes we would still need other services in the same service group to be monitored, but still needed just acknowledge certain alerts in the servicegroup.

But an outage affecting multiple hosts is usually a very focused issue and should be easily handled with nagios hostgroups. Some ideas are to group your hosts by physical location, or physical switch they connect to, or if they are cloud hosts by availability zone. If you have a massive outage you can easily deal with hosts down with hostgroups and scheduled downtime, even if unscheduled, if you have a massive outage its no unpolite to give an ETA 🙂

Service outages are more detailed for example you could have several instances across several hosts alerting for max connections, but you still want to monitor all other stats like cpu etc.. so you do not want to use scheduled downtime for the service group but acknowledge that specific issue.

I know its not what you were looking for, but I hope it helps.

Leave a Reply

Your email address will not be published. Required fields are marked *