jgazeley-nagios

Puppet module to manage Nagios, NRPE, NSCA, PNP4Nagios, BPI and other monitoring components


License
Apache-2.0
Install
puppet module install jgazeley-nagios --version 0.3.3

Documentation

nagios

Table of Contents

  1. Overview
  2. Module Description - What the module does and why it is useful
  3. Usage - Configuration options and additional functionality
  4. Examples
  5. Limitations - OS compatibility, etc.
  6. Development - Guide for contributing to the module

Overview

This module installs and manages Nagios, NRPE, NSCA, BPI and PNP4Nagios to give you a full monitoring stack.

Module Description

While Nagios itself is not too complex, a full stack installation includes a number of optional components. Let's have a look at the terminology - if you are new to Nagios you should definitely read and understand the definitions before attempting to use this module.

This module is quite opinionated about how Nagios should be set up. I've made it as configurable as I can without deviating from the model that I believe is best, which has been extensively tested in our local environment before publishing. It makes assumptions about how you want to group things that mean you can start benefiting from Nagios quickly without having to set too much up.

This module makes heavy use of Puppet exported resources to configure Nagios. You must have a working Puppet and PuppetDB environment with exported resources before using this module.

Warning: This module uses puppetlabs/apache to configure the web frontend. Be aware that puppetlabs/apache will purge all other Apache config that is not managed with puppetlabs/apache. This Nagios module with play nicely with other web sites configured with apache::vhost but it will break anything else that has been configured manually.

Nagios

Nagios is the name of the main monitoring application, and it includes a web application and a backend daemon. The daemon does the actual monitoring by executing plugins which send probes to clients, and then displaying the results in the web application or sending them via notifications.

Be careful with the terminology: here we use server to refer to the Nagios server, and client to refer to the Nagios clients, even though they may be servers in their own right.

+--------+      +--------+
| Nagios | ---> | Client |
+--------+      +--------+

NRPE

While Nagios is good at sending probes to clients that are offering services (e.g. sending HTTP requests to web servers) it needs something extra to probe non-public aspects of a client, e.g. checking CPU usage.

To achieve this, we run the NRPE daemon on the client which listens for the server and executes plugins to probe the local system. The Nagios server probes NRPE on the client which runs the plugin and returns the result to Nagios.

+--------+      +--------+      +--------+
| Nagios | ---> |  NRPE  | ---> | Client |
+--------+      +--------+      +--------+

NSCA

NSCA works the other way round from NRPE. NSCA runs on the server and listens for clients to submit passive checks to Nagios on their own schedule (e.g. via cron) rather than the Nagios server initiating the probes.

+--------+      +--------+      +--------+
| Nagios | <--- |  NSCA  | <--- | Client |
+--------+      +--------+      +--------+

BPI

BPI (Business Process Intelligence) is an addon for Nagios which is able to model real-world applications based on a set of probes. For example: you may have a cluster of 2 web servers and so long as either server is up, the overall service is up. You might not care if only one server is down. BPI uses logic like this to work out if your real services are up or down and send appropriate alerts.

PNP4Nagios

Some Nagios plugins return performance data as well as a status code. Out of the box, Nagios can't do anything with this data, but PNP4Nagios can process this data with RRD and automatically draw graphs.

Usage

This module is designed so the base class ::nagios configures a Nagios monitoring server. Other classes are available such as ::nagios::client which configures a Nagios client to be monitored. There are also some defined types which should be directly called where necessary to configure extras.

Classes

::nagios

The ::nagios class installs a Nagios monitoring server and related components.

client

Install components to run a Nagios client, i.e. a server that is monitored. Default: true

server

Install components to run a Nagios monitoring server. Default: false

nrpe

Install support for NRPE, which is required if you want to execute Nagios checks on remote servers (clients). Default: false

nsca

Install support for NSCA, which is required if you want to execute passive Nagios checks. Default: false

selinux

Manage SELinux rules to allow Nagios components to run properly on the clients and server. Strongly recommended if you are running a Red Hat family distro, and SELinux is enabled on your system. Requires puppet/selinux. Default: false

firewall

Manage firewall rules on Nagios clients and server. Strongly recommended to allow Nagios components to work properly. Caution: firewall rules are managed by puppetlabs/firewall. That module purges any firewall rules that are not managed with puppetlabs/firewall so be extremely careful before enabling this option. Default: false

url

Override the hostname that your Nagios server will run on, if you don't want it to run on the server's $::fqdn. Default: $::fqdn

aliases

Array of alternative hostnames that your Nagios server should respond to. Don't forget to set these as alternate names in your SSL certificate. Default: []

dev

Set a flag to mark this Nagios server as a development/testing server. This suppresses active notifications from Nagios. Default: false

serveradmin

Server admin email address for use by Apache. Default: root@localhost

notify_admin

Whether to send Nagios host and service notifications to $serveradmin. Default: false

auto_os_hostgroup

Whether to automatically add this client to a hostgroup of its OS type. Default: true

auto_virt_hostgroup

Whether to automatically add this client to a hostgroup of its hardware/virtualised platform. Default: true

hostgroups

Array of other hostgroups to add the system to. Default: []

parent

Name of a parent object. Default: undef

alias

Set alias for a host. Default: undef

nrpe_package

Name of the NRPE package. You shouldn't need to override this. If you need to add support for a new distro, please send a pull request or raise an issue.

webroot

Location of the webroot on the filesystem. If you need to add support for a new distro, please send a pull request or raise an issue.

cgiroot

Location of the CGI root on the filesystem. If you need to add support for a new distro, please send a pull request or raise an issue.

nsca_client_package

Name of the NSCA client package. If you need to add support for a new distro, please send a pull request or raise an issue.

nrpe_service

Name of the NRPE service. If you need to add support for a new distro, please send a pull request or raise an issue.

nrpe_config

Path to the NRPE config file. If you need to add support for a new distro, please send a pull request or raise an issue.

nrpe_d

Path to the NRPE conf.d directory. If you need to add support for a new distro, please send a pull request or raise an issue.

nrpe_plugin_package

Name of the NRPE plugin package. If you need to add support for a new distro, please send a pull request or raise an issue.

nsca_server_package

Name of the NSCA server package. If you need to add support for a new distro, please send a pull request or raise an issue.

nsca_service

Name of the NSCA service. If you need to add support for a new distro, please send a pull request or raise an issue.

nsca_config

Path to the NSCA config file. If you need to add support for a new distro, please send a pull request or raise an issue.

nagios_package

Name of the Nagios package. If you need to add support for a new distro, please send a pull request or raise an issue.

nagios_service

Name of the Nagios service. If you need to add support for a new distro, please send a pull request or raise an issue.

::nagios::client

The ::nagios::client class installs components needed for a system to be monitored by a Nagios monitoring server.

nrpe

Whether to enable support for NRPE. Default: true

nsca

Whether to enable support for NSCA. Default: true

selinux

Whether to manage SELinux policies to allow plugins to execute properly via NRPE. Default: true

firewall

Whether to manage firewall rules to allow plugin to execute properly via NRPE. Default: true

basic_checks

Whether to set up a basic set of checks that should work on all systems (e.g. ping). Default: true

nrpe_package

Name of the NRPE client package. If you need to add support for a new distro, please send a pull request or raise an issue.

nsca_client_package

Name of the NSCA client package. If you need to add support for a new distro, please send a pull request or raise an issue.

nrpe_service

Name of the NRPE service. If you need to add support for a new distro, please send a pull request or raise an issue.

nrpe_config

Path to the NRPE config file. If you need to add support for a new distro, please send a pull request or raise an issue.

nrpe_d

Path to the NRPE conf.d directory. If you need to add support for a new distro, please send a pull request or raise an issue.

nrpe_plugin_package

Name of the NRPE plugin package. If you need to add support for a new distro, please send a pull request or raise an issue.

ssl_cert

Path to SSL server certificate. Default: /path/to/cert.crt

ssl_key

Path to SSL private key. Default: /path/to/key.key

ssl_chain

Path to SSL certificate chain file. Default: undef

ssl_cipher

Allowed SSL ciphers. Defaults to a more secure list than ships with puppetlabs/apache. Default: HIGH:!MEDIUM:!aNULL:!MD5:!RC4:!3DES

Defined types

nagios::service

The ::nagios::service defined type installs a service, a command and other related components required to monitor something.

host_name

Hostname of the system that the check should be associated with. Default: $::fqdn

check_command

Override the name of the check command in the service definition. Default: $title

service_description

Human-readable name for the service.

use

Name of the Nagios template to inherit from. Default: undef

servicegroups

One or more additional servicegroups that this service should be a member of. It will automatically be added to a servicegroup with the same name as the check. Default: undef

add_servicegroup

Whether to automatically create the servicegroup that this service belongs to by default. Default: true

add_servicedep

Whether to automatically add a service dependency on NRPE, if this service is a NRPE-based check. Default: true

active_checks_enabled

Whether to override active checks. Default: undef

max_check_attempts

Whether to override the maximum number of check attempts before reporting hard state. Default: undef

check_freshness

Override check freshness. Probably only useful for passive checks. Default: undef

freshness_threshold

Override freshness threshold. Probably only useful for passive checks. Default: undef

command_definition

The command line used to execute the plugin. The default can be used only if no arguments are required. Default: $check_command

check_interval

Override the check interval on a per-service basis. This is usually inherited from a template with use. Default: undef

use_nrpe

Whether to execute this check on the monitored host via NRPE. Default: false

use_sudo

Whether to use sudo when executing this check. Default: false

sudo_user

The username to use when executing plugins with sudo when $use_sudo = true. Default undef

install_plugin

Whether to install the Nagios plugin on the system. Default: true

plugin_provider

Provider for the plugin installation, if $install_plugin = true. Default: package

plugin_source

Source for installation of the plugin if $install_plugin = true. Default: undef

service_dependency

Add arbitrary service dependencies on other services on this host. Default: undef

nagios_server

The hostname of the Nagios server that will be monitoring this host. Default: hiera('nagios_server')

nagios::bpi::config

The ::nagios::bpi::config defined type configures a BPI "service", i.e. a group or one or more monitored objects in Nagios. The title of this resource forms the BPI groupID and must be alphanumeric characters with no spaces. This ID is used internally by the program as well as for the check_bpi.php plugin.

This can be a bit confusing to configure, especially the members option, so it is probably best to read the examples below.

# Group of DNS servers created by checking the `DNS` Nagios service on all DNS servers.
# If one or more DNS servers is up, this group counts as up.
nagios::bpi::config { 'dns':
  displayname => 'DNS',
  members     => [
    {
      host    => 'dns1.example.com',
      service => 'DNS',
      opt     => '&',
    },
    {
      host    => 'dns2.example.com',
      service => 'DNS',
      opt     => '&',
    },
  ],
  priority    => 2,
  primary     => 0,
}

# Group of DHCP servers created by checking the `DHCP` Nagios service on all DHCP servers.
# If one or more DHCP servers is up, this group counts as up.
nagios::bpi::config { 'dhcp':
  displayname => 'DHCP',
  members     => [
    {
      host    => 'dhcp1.example.com',
      service => 'DHCP',
      opt     => '&',
    },
    {
      host    => 'dhcp2.example.com',
      service => 'DHCP',
      opt     => '&',
    },
  ],
  priority    => 2,
  primary     => 0,
}

# Virtual group to reflect the state of the whole network. If the DNS and DHCP groups
# are both up, this group is up. If either DNS or DHCP is down, this group is down.
nagios::bpi::config { 'network':
  displayname => 'Network',
  members     => [
    {
      host => '$dns',
      opt  => '|',
    },
    {
      host => '$dhcp',
      opt  => '|',
    },
  ],
  priority    => 1,
  primary     => 1,
}

displayname

The display name for the BPI group (required)

members

Members of this BPI group, which can consist or services and other BPI groups. Data should be expressed as an array of hashes with the following keys:

  • host: The hostname of a host in Nagios or the groupID of a BPI group. Required.
  • service: The servicename of a service in Nagios, if host is a Nagios host. Not required if host is a BPI group.
  • opt: an & or | character where & means service is part of a cluster and | means it is an essential service for the group.

For example: a critical service with an | option will cause a critical state for the entire group. For clusters, critical is only reached when ALL services in a cluster are NOT OK.

nagios

Automatically create a Nagios check for this BPI group. Default: true

desc

Description for a bpi group. Optional, default: undef

primary

Primary/Top-Level groups are 1, subgroups are 0. Setting 0 hides the BPI group except where is explicitly referenced as a component of another BPI group. Default: 1

info

Link to internal or external webpage. Optional, default: undef

warning_threshold

The number of problems a group reaches before going 'warning'. Default: 0

critical_threshold

The number of problems a group reaches before going 'critical'. Default: 0

priority

The display priority on screen between 1-3, 1 being 'high priority'. Default: 1

event_handler

Set an event handler for this BPI group's Nagios check. Only makes sense if nagios=true. Default: undef

Examples

Install a Nagios server

class ::profile::nagios {
  # Install Nagios server
  class { 'nagios':
    nrpe        => true,                     # Set up NRPE for monitoring of remote hosts
    nsca        => false,                    # Skip NSCA, which is needed for passive checks
    selinux     => true,                     # Manage SELinux policies to allow Nagios to run smoothly
    firewall    => true,                     # Manage firewall rules to allow Nagios/NRPE to run smoothly
    url         => 'nagios.example.com',     # Service URL of Nagios, if different from the system hostname
    serveradmin => 'root@example.com',       # Admin's email address
    ssl_cert    => '/etc/pki/tls/certs/nagios.example.com.pem',  # Path to SSL cert for HTTPS
    ssl_key     => '/etc/pki/tls/private/nagios.example.com.key',  # Path to SSL key for HTTPS
    auth_type   => 'CAS',                    # Override Apache basic auth and use CAS single sign-on instead
  }

  # Deploy HTTPS certificate
  file { '/etc/pki/tls/certs/nagios.example.com.pem':
    source => 'puppet:///modules/profile/nagios/nagios.example.com.pem',
    mode   => '0644',
    owner  => 'root',
    group  => 'root',
  }

  # Deploy HTTPS private key
  file { '/etc/pki/tls/private/nagios.example.com.key':
    source => 'puppet:///modules/profile/nagios/nagios.example.com.key',
    mode   => '0600',
    owner  => 'root',
    group  => 'root',
  }
}

Basic non-NRPE service

This service definition monitors the host remotely, directly from the Nagios server. This is ideal for monitoring services that are available on the remote host, such as HTTP.

nagios::service { 'check_http':
  service_description => 'HTTP',
  plugin_source       => 'nagios-plugins-http',
  command_definition  => 'check_http -I $HOSTADDRESS$ $ARG1$',
}

Basic NRPE service

This service definition installs the plugin on the monitored host and configures NRPE. The check itself is installed on the Nagios server. This is ideal for monitoring attributes of the remote host that are not available externally.

nagios::service { 'check_users':
  use_nrpe            => true,                       # Execute this on the host via NRPE
  service_description => 'Current users',            # Human-readable description
  plugin_source       => 'nagios-plugins-users',     # Package that provides this plugin
  command_definition  => 'check_users -w 10 -c 20',  # Syntax for actually calling the plugin
}

Service running on an unmanaged host

This service definition is applied to the Nagios server, and the host name is overridden to point at a different system (one that is not managed by Puppet). This is ideal for monitoring "dumb" devices such as switches or other people's servers that you have no access to.

nagios::service { 'check_ping_router':
  host_name           => 'router.example.com',
  plugin_source       => 'nagios-plugins-ping',
  service_description => 'Ping',
  command_definition  => 'check_ping -H $HOSTADDRESS$ -w 100,10% -c 1000,50% -p 5',
}

Service running on a manually managed host

This service definition is applied to the Nagios server and the host name is overriden to point at a different system which is manually managed, and has a manually-configured NRPE agent but no Puppet agent. This is ideal for monitoring legacy servers where you can't retrofit Puppet.

nagios::service { 'check_load_legacysystem.example.com':
  check_command       => 'check_load',                 # Name of the command we have manually set on the remote system
  use_nrpe            => true,                         # Use NRPE, which we have manually set up
  service_description => 'Load',
  host_name           => 'legacysystem.example.com',   # Override monitored server name
  install_plugin      => false,                        # Don't attempt to manage the plugin
}

Limitations

This module has been developed for Nagios 4 on CentOS 7. It's pretty flexible so it should work on other platforms too but they have had little-to-no testing.

This module is currently functional but not feature-complete. There are rough edges and things not implemented yet. Please look at the issue tracker to look for outstanding issues and feature requests.

In particular the HTTPS/SSL config is rough around the edges and quite a few options are hard-coded in and need to be brought out to parameters.

Development

This module was written initially for internal use - features we haven't needed to use probably haven't been written. Please send pull requests with new features and bug fixes. You are also welcome to file issues but I make no guarantees of development effort if the features aren't useful to my employer.