Nagios
What is Nagios and why does any network administrator need it?
Nagios is a powerful, enterprise-class host, service, application, and network monitoring program. Designed to be fast, flexible, and rock-solid stable. Nagios runs on *NIX hosts (like Ubuntu Linux) and can monitor Windows, Linux/Unix/BSD, Netware, and network devices. And best of all, it's free open source software!
Well, it may be free software, but it does cost something - time and effort! Installation is the easy part, configuration is much harder. At least, it used to be, since check_mk makes it so much easier. But let's discuss installing it first.
I'm going to describe how to install Nagios on Ubuntu 8.04.3 LTS. I'll use a fresh installation of the Ubuntu Server Edition JeOS in a virtual machine to get a clean start. While details may differ between other versions of Ubuntu or Linux, most of the guide will still apply.
While Ubuntu does include a packaged version of Nagios in its official repositories, it's probably not the latest version, so I recommend you download, compile and install it yourself.
Before we do that, we have to install some prerequisites first, though. Here's how to install all required dependencies for the Nagios core:
sudo apt-get install apache2 build-essential libapache2-mod-php5 libgd2-xpm-dev traceroute wget
This installs the Apache webserver with PHP and its graphics library as well as the build environment necessary to compile software, traceroute (optional) and the wget downloader (it's not included with JeOS).
Next we prepare the system by creating a user and group for Nagios and a group for running Nagios commands from the web interface:
sudo adduser --system --home /usr/local/nagios --no-create-home --group --disabled-login nagios
sudo addgroup --system nagcmd
sudo adduser nagios nagcmd
sudo adduser www-data nagcmd
As a security measure, the newly created system user is disabled so normal login isn't possible. It's sufficient for running Nagios as a service, though. The web server user www-data is added to the nagcmd group so that commands can be issued from the web interface.
Now we can download and extract the Nagios Core (current version is 3.2.0 as of writing):
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.0.tar.gz
tar xzf nagios-3.2.0.tar.gz
cd nagios-3.2.0
Let's build it:
./configure --with-command-group=nagcmd
make all
It's important to specify the command group so that the binaries will get the proper permissions - otherwise Nagios can't be controlled from the web interface. Compiling the software takes some time, so be patient.
When it's done, install it:
sudo make fullinstall
fullinstall combines install, install-init, install-commandmode, and install-webconf. It doesn't include install-config, though, so we execute that manually:
sudo make install-config
Now that all the binaries and config files have been installed, we're going to restrict access to the web interface by setting an administrator password:
sudo htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
Enter a password and confirm it. You'll need the username (nagiosadmin) and password later when logging into the web interface.
Reload Apache:
sudo invoke-rc.d apache2 reload
Now the Nagios Core installation is done. Leave its directory:
cd ..
Before we can use Nagios to monitor something, we need to install the monitoring plugins. The plugins have dependencies of their own, so we have to install their prerequisites first:
sudo apt-get -y install libmysqlclient15-dev libssl-dev mailx libldap2-dev libnet-snmp-perl libpq-dev libradius1-dev smbclient snmp fping qstat
You probably don't need all of them, but to compile as many monitoring plugins as possible, they should be installed. Only libmysqlclient15-dev, libssl-dev and mailx are really required - while fping and qstat are entirely optional.
Now download and extract the Nagios Plugins package (as of writing, it's version 1.4.14):
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.14.tar.gz
tar xzf nagios-plugins-1.4.14.tar.gz
cd nagios-plugins-1.4.14
Build and install it:
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make
sudo make install
When it's done, you can leave the directory:
cd ..
Next we need to fix Nagios' configuration - Ubuntu's mail command is located in /usr/bin instead of /bin:
sudo sed -i~ 's| /bin/mail | /usr/bin/mail |' /usr/local/nagios/etc/objects/commands.cfg
Before we start Nagios, it's a good idea to verify that the configuration is OK:
sudo /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
The pre-flight check should confirm that everything is alright. This command can be used after modifiying any Nagios configuration file to ensure the system will continue to work.
Now it's time to start up the Nagios service:
sudo invoke-rc.d nagios start
If it started (as it should), add it to the system startup sequence:
sudo update-rc.d nagios defaults 30 18
From then on, Nagios gets started (and shut down) with the system.
Well done! If you followed through here, you now have a working Nagios up and running. It's already monitoring itself (localhost) and can be access with a webbrowser:
http://localhost/nagios/
Username: nagiosadmin, Password: The one you specified earlier!
check_mk
Normally, usually, one would now install some or all of the official Nagios Addons: NRPE (which lets you remotely monitor other Linux/Unix or Windows hosts), NSCA (to integrate passive alerts from remote machines), or NDOUtils (which is an experimental database connector and (used to be) required for interesting third-party addons like NagVis) - but all (or at least: most) of that is no longer necessary thanks to an amazing new extension called check_mk!
I can't stress enough how important this new plugin is! Many, many thanks to Mathias Kettner (he's the mk in check_mk) for such a wonderful addon!
So what does it do? It could be described as "a new general purpose Nagios plugin for retrieving data" - but that description hardly does it justice! It replaces NRPE, NSClient++ and check_snmp. It can also be used in place of NDOUtils (and a database) for addons like NagVis. It also makes configuration much easier so config tools like NConf are also no longer needed. In fact, I set up a Nagios system today that monitors dozens of hosts and hundreds of services, in just a few hours!
So let's take a look at it - after installing Python support for Apache so its multiadmin interface (an optional feature) will be available:
sudo apt-get -y install libapache2-mod-python
Again, we download and install the software. No need to compile it, though, since it's a Python program:
wget http://www.mathias-kettner.de/download/check_mk-1.1.0.tar.gz
tar xzf check_mk-1.1.0.tar.gz
cd check_mk-1.1.0
We just have to run its setup script. If you omit the "--yes", it will ask a lot of questions, but answering yes to all of them is just fine (at least with our current JeOS setup):
sudo ./setup.sh --yes
To make its multiadmin feature readily available to Nagios, we'll add it to the Nagios navigation bar (the list of links in the left frame pane):
sudo sed -i~check_mk '/Configuration/a\<li><a href="/check_mk/filter.py" target="<?php echo $link_target;?>">Check_MK Multiadmin</a></li>' /usr/local/nagios/share/side.php
By default, check_mk is prepared for PNP (an addon we'll install later) in its stable version 0.4.x - here we prepare it for the latest PNP4Nagios version 0.6.x:
sudo sed -i~ "s|/nagios/pnp/index.php?host=\$HOSTNAME\$&srv=\$SERVICEDESC\\$|/pnp4nagios/graph?host=\$HOSTNAME\$\&srv=\$SERVICEDESC\$' class='tips' rel='/pnp4nagios/popup?host=\$HOSTNAME\$\&srv=\$SERVICEDESC\$|;s|/nagios/pnp/index.php?host=\$HOSTNAME\\$|/pnp4nagios/graph?host=\$HOSTNAME\$\&srv=_HOST_' class='tips' rel='/pnp4nagios/popup?host=\$HOSTNAME\$\&srv=_HOST_|" /usr/share/doc/check_mk/check_mk_templates.cfg
This ensures that the action links take us to the PNP graphs for the hosts and services - and even better, the graphs will be shown as popups when hovering over the action icons! (You'll soon see what I mean - and how cool this is!)
Reload Apache and Nagios and you're done:
sudo invoke-rc.d apache2 reload
sudo invoke-rc.d nagios reload
You may leave the check_mk directory:
cd ..
Now that check_mk is installed, we'll enable monitoring its own host by installing the check_mk agent. While the agent can be queried through various means, the regular way is by making it accessible through xinetd, so we install that first:
sudo apt-get install xinetd
Then we only need to copy the agent script check_mk_agent.linux to /usr/bin/check_mk_agent and the xinetd configuration file xinetd.conf to /etc/xinetd.d/check_mk:
sudo cp -p /usr/share/check_mk/agents/check_mk_agent.linux /usr/bin/check_mk_agent
sudo cp -p /usr/share/check_mk/agents/xinetd.conf /etc/xinetd.d/check_mk
Optionally, for security reasons, you may want to edit /etc/xinetd.d/check_mk and specify which IP addresses may query your agent. Uncomment the option only_from and edit the addresses listed there.
Reload xinetd to activate the new configuration:
sudo invoke-rc.d xinetd reload
That's it! Easy, huh? Remember, you can install the agent on other Linux systems just as easily - just copy the check_mk_agent script there and make it available through xinetd (or another means of access, like SSH, which is somewhat more advanced).
To make Nagios monitor your hosts, and to configure check_mk to your liking, edit check_mk's main configuration file:
sudoedit /etc/check_mk/main.mk
List all hosts you want monitored as a comma-separated string array of the configuration variable all_hosts. Right now we'll only monitor localhost:
all_hosts = [ 'localhost' ]
Since localhost is already specified in the original Nagios configuration, we have to disable the original entry, otherwise we'd get a conflict and the our new configuration wouldn't be accepted:
sudo sed -i~check_mk 's/cfg_file=.*localhost.cfg/#&/' /usr/local/nagios/etc/nagios.cfg
This comments out the localhost.cfg. Now we can scan our hosts to auto-discover available services - this is one of the most helpful features of check_mk:
sudo check_mk -I alltcp
After the scan, you can automatically add the newly discovered services to Nagios by running the following command:
sudo check_mk -R
And it's done! Look at the Nagios web interface again - you'll see the new localhost. If you also added the agent to other hosts as briefly mentioned above, and listed their hostnames in main.mk, you could already be monitoring a whole lot of remote systems!
Before you continue, I highly recommend another config tweak that keeps the nagios.log file size down - since by default Nagios logs check_mk activity which can quickly take up a lot of space:
sudo sed -i 's/log_external_commands=1/log_external_commands=0/;s/log_passive_checks=1/log_passive_checks=0/' /usr/local/nagios/etc/nagios.cfg
Restart Nagios for the change to take effect - from now on, you can always do that with:
sudo check_mk -R
To monitor Windows hosts, you simply copy /usr/share/check_mk/agents/windows/check_mk_agent.exe there and run it like this:
check_mk_agent.exe install
To enable the autostart of the agent:
net start check_mk_agent
Then add the hostname to main.mk, scan with sudo check_mk -I alltcp and recreate the config with sudo check_mk -R.
To monitor switches or other devices which are accessible through SNMP, specify the hostname in main.mk with the tag snmp like this: 'HOSTNAME|snmp' - scan it with sudo check_mk -I snmp_info HOSTNAME (or another snmp scan type, check out check_mk -L | grep snmp for a list) and recreate the config with sudo check_mk -R.
There's much more to it - check_mk lets you easily and quickly set up a complete monitoring solution, even for very large and complex environments! Make sure to read about all of its other useful features in the Online Documentation!
PNP
Next is PNP which is an output addon that creates and displays beautiful and informative charts out of the data Nagios collects. While Nagios itself mainly shows on/off states (service is running, or it isn't), PNP lets you see how a service performs. In a way, it's like Munin, but perfectly integrated into Nagios.
It depends on rrdtool, so install that and additional prerequisites first:
sudo apt-get -y install librrds-perl php5-gd rrdtool
Download and extract PNP's latest version, currently 0.6.2:
wget http://sourceforge.net/projects/pnp4nagios/files/PNP-0.6/pnp4nagios-0.6.2.tar.gz/download
tar xzf pnp4nagios-0.6.2.tar.gz
cd pnp4nagios-0.6.2
Compile and install:
./configure
make all
sudo make fullinstall
fullinstall combines install, install-webconf, install-init, and install-config.
There are various modes of operation and quite complicated installation instructions posted on its official website, but a single command can set it up:
sudo sed -i~pnp4nagios '/process_performance_data/s/0/1/;$a\broker_module=/usr/local/pnp4nagios/bin/npcdmod.o config_file=/usr/local/pnp4nagios/etc/npcd.cfg' /usr/local/nagios/etc/nagios.cfg
This uses the npcdmod.o module which makes additional, manual nagios.cfg changes unnecessary! Why isn't this properly documented in its official manual?
A special service to process performance data is required for this mode, so copy its configuration to the proper place:
sudo cp -p /usr/local/pnp4nagios/etc/npcd.cfg-sample /usr/local/pnp4nagios/etc/npcd.cfg
Start it up and add it to the system's autostart:
sudo invoke-rc.d npcd start
sudo update-rc.d npcd defaults 20
PNP doesn't like Ubuntu's default magic_quotes_gpc setting, so we change it. We also have to enable mod_rewrite:
sudo sed -i~ '/magic_quotes_gpc/s/On/Off/' /etc/php5/apache2/php.ini # /etc/php5/cli/php.ini
sudo a2enmod rewrite
Reload Apache for the changes to take effect:
sudo invoke-rc.d apache2 reload
Visit http://localhost/pnp4nagios/ to ensure everything is set up correctly. Then delete the install.php to be able to use PNP:
sudo rm -f /usr/local/pnp4nagios/share/install.php
To perfectly integrate PNP into Nagios and enable mouse-over popups of graphs, put status-header.ssi into Nagios and set up its permissions:
sudo cp -p contrib/ssi/status-header.ssi /usr/local/nagios/share/ssi
sudo chown nagios:nagios /usr/local/nagios/share/ssi/status-header.ssi
sudo chmod 644 /usr/local/nagios/share/ssi/status-header.ssi
Reload Nagios and you're done:
sudo invoke-rc.d nagios reload
Now you can leave PNP's directory:
cd ..
Hover your mouse cursor over the action symbol to see a preview of the graphs as a floating popup image. Click it to go directly to the graphs. Once you get used to this feature, you won't want to miss it!
NagVis
The final addon I'm going to introduce today is NagVis. It's a visualization engine for Nagios that has to be seen to be believed. It's that cool! Check out the screenshots!
Install its prerequisites:
sudo apt-get -y install graphviz php5-cli php5-gd php5-mysql
Then download and extract the current version 1.4.5:
wget http://downloads.sourceforge.net/project/nagvis/NagVis%201.4%20%28stable%29/NagVis-1.4.5/nagvis-1.4.5.tar.gz
tar xzf nagvis-1.4.5.tar.gz
cd nagvis-1.4.5
This version is the first to support mklivestatus. mklivestatus is another great feature of check_mk which lets other addons access Nagios stats without requiring a database and a connector.
Install it like this:
sudo ./install.sh -i mklivestatus -q
Although we chose mklivestatus, the default is MySQL access, so we change that:
sudo sed -i~ 's/;backend="ndomy_1"/backend="live_1"/' /usr/local/nagios/share/nagvis/etc/nagvis.ini.php
Then we have to set PHP's timezone - otherwise we'd get a lot of error messages when trying to open the NagVis pages:
sudo sed -i~ "s|;date.timezone =|date.timezone = `cat /etc/timezone`|" /etc/php5/apache2/php.ini # /etc/php5/cli/php.ini
Now reload Apache:
sudo /etc/init.d/apache2 reload
Leave NagVis' directory:
cd ..
Now you can access it here:
http://localhost/nagios/nagvis/
Setting up NagVis is beyond the scope of this guide, so refer back to its Documentation.
Congratulations! You've successfully completed your Nagios, check_mk, PNP and NagVis installation! But this isn't the end, it's just the beginning. Take a snapshot of your virtual machine (if you used one like I did) and then continue to set up your monitoring solution - check_mk's main.mk is the key...