Linux4Dummies: Nagios

=====================================

Nagios Monitoring Tool Documentation

=====================================

Introduction

----------------

Nagios is a powerful monitoring system that enables organizations to identify and resolve IT infrastructure problems before they affect critical business processes. Nagios is a powerful tool that provides you with instant awareness of your organization's mission-critical IT infrastructure. Nagios allows you to detect and repair problems and mitigate future issues before they affect end-users and customers.

How Nagios Works ?

* Monitoring

IT staff configure Nagios to monitor critical IT infrastructure components, including system metrics, network protocols, applications, services, servers, and network infrastructure.

* Alerting

Nagios sends alerts when critical infrastructure components fail and recover, providing administrators with notice of important events. Alerts can be delivered via email, SMS, or custom script.

* Response

IT staff can acknowledge alerts and begin resolving outages and investigating security alerts immediately.

* Reporting

Reports provide a historical record of outages, events, notifications, and alert response for later review. Availability reports help ensure your SLAs are being met.

What Nagios Provides ?

* Plan for infrastructure upgrades before outdated systems cause failures

* Respond to issues at the first sign of a problem

* Coordinate technical team responses

* Ensure your organization's SLAs are being met

* Monitor your entire infrastructure and business processes

=====================================

Nagios Monitoring Tool Installation

=====================================

Installation of Nagios

---------------------------

This guide is intended to provide you with simple instructions on how to install Nagios from source (code) on CentOS.

If you follow these instructions, here's what you'll end up with:

* Nagios and the plugins will be installed underneath /usr/local/nagios

* Nagios will be configured to monitor a few aspects of your local system (CPU load, disk usage, etc.)

* The Nagios web interface will be accessible at `<http://localhost/nagios/>`_

Prerequisites

During portions of the installation you'll need to have root access to your machine.

Make sure you've installed the following packages on your Fedora installation before continuing.

* Apache

* PHP

* GCC compiler

* GD development libraries

You can use yum to install these packages by running the following commands (as root):

# Install Apache using following command

``$ yum install httpd``

# Install PHP using following command

``$ yum install php``

# Install GCC Compiler using following command

``$ yum install gcc glibc glibc-common``

# Install GD Development libraries

``$ yum install gd gd-devel``

# Create a new nagios user account and give it a password.

``$ useradd -m nagios``

``$ passwd nagios``

# Create a new nagcmd group for allowing external commands to be submitted through the web interface. Add both the nagios user and the apache user to the group.

``$ groupadd nagcmd``

``$ usermod -a -G nagcmd nagios``

``$ usermod -a -G nagcmd apache``

# Download Nagios & Nagios Plugins

``$ mkdir download``

``$ cd download``

``$wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.3.tar.gz``

``$ wget http://sourceforge.net/projects/nagiosplug/files/nagiosplug/1.4.15/nagios-plugins-1.4.15.tar.gz/download``

Compile and Install Nagios

# Extract the Nagios source code tarball.

``$ tar xzf nagios-3.2.3.tar.gz``

# Change directory to extracted nagios folder

``$ cd nagios-3.2.3``

# Run the Nagios configure script, passing the name of the group you created earlier

``$ /configure --with-command-group=nagcmd``

# Compile the Nagios source code

``$ make all``

# Install binaries, init script, sample config files and set permissions on the external command directory.

``$ make install``

``$ make install-init``

``$ make install-config``

``$ make install-commandmode``

Configure the Web Interface

# Install the Nagios web config file in the Apache conf.d directory.

``$ make install-webconf``

# Create a nagiosadmin account for logging into the Nagios web interface. Remember the password you assign to this account - you'll need it later.

``$ htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin``

# Restart Apache to make the new settings take effect

``$ /etc/init.d/httpd restart``

# Compile and Install the Nagios Plugins

# Change to directory where nagios plugin is downloaded in our case it is download

``$ cd download``

# Extract the Nagios plugins source code tarball

``$ tar xzf nagios-plugins-1.4.15.tar.gz``

# Change directory to nagios plugin directory

``$ cd nagios-plugin-1.4.15/``

# Compile and install the plugins

``$ ./configure --with-nagios-user=nagios --with-nagios-group=nagios``

``$ make``

``$ make install``

# Add Nagios to the list of system services and have it automatically start when the system boots.

``$ chkconfig --add nagios``

``$ chkconfig nagios on``

# Verify the sample Nagios configuration files

``$ /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg``

# If there are no errors, start Nagios

``$ /etc/init.d/nagios start``

# Login to the Web Interface

``http://localhost/nagios``

# or

``http://yourhost or ip/nagios``

# You should now be able to access the Nagios web interface at the URL below. You'll be prompted for the username (nagiosadmin) and password you specified earlier.

# Click on the "Service Detail" navbar link to see details of what's being monitored on your local machine. It will take a few minutes for Nagios to check all the services associated with your machine, as the checks are spread out over time.

Nagios Directory Structure after Installation

# By default Nagios installed to /usr/local/nagios/

# This directory contains folder bin,sbin,etc,include,libexec,share & var

# Bin : Contains nagios binaries.

# Etc : Conatins nagios configuration files

# Libexec : Contains executable plugins

# Sbin : Contains various cgi scripts.

# Share : Contains docs, stylesheets & images related files & folders

# Var : Contains log file.

=======================

Configuration Overview

=======================

There are several different configuration files that you’re going to need to create or edit before you start monitoring anything. Be patient! Configuring Nagios can take quite a while, especially if you’re first-time user. Once you figure out how things work, it’ll all be well worth your time.

Main Configuration File

The main configuration file contains a number of directives that affect how the Nagios daemon operates. This config file is read by both the Nagios daemon and the CGIs. This is where you’re going to want to get started in your configuration adventures.

Resource File

Resource files can be used to store user-defined macros. The main point of having resource files is to use them to store sensitive configuration information (like passwords), without making them available to the CGIs. You can specify one or more optional resource files by using the resource_file directive in your main configuration file.

Object Definition Files

Object definition files are used to define hosts, services, hostgroups, contacts, contactgroups, commands,etc. This is where you define all the things you want monitor and how you want to monitor them. You can specify one or more object definition files by using the cfg_file and/or cfg_dir directives in your main configuration file.

CGI Configuration File

The CGI configuration file contains a number of directives that affect the operation of the CGIs. It also contains a reference the main configuration file, so the CGIs know how you’ve configured Nagios and where your object defintions are stored.

Object Configuration Overview

What Are Objects ?

Objects are all the elements that are involved in the monitoring and notification logic. Types of objects include:

* Services

* Service Groups

* Hosts

* Host Groups

* Contacts

* Contact Groups

* Commands

* Time Periods

* Notification

:::::::::::::::::::::::::::::::::::::::::::::::::::::

Main Configuration File Options

::::::::::::::::::::::::::::::::::::::::::::::::::::::

Note: When creating and/or editing configuration files, keep the following in mind:

1. Lines that start with a ’#’ character are taken to be comments and are not processed.

2. Variables names must begin at the start of the line - no white space is allowed before the name.

3. Variable names are case-sensitive.

Config File Location

The main configuration file is usually named nagios.cfg and located in the /usr/local/nagios/etc/ directory.

Configuration File Variables

Below you will find descriptions of each main Nagios configuration file option.

Log File

Format: log_file:<file_name>

Example: log_file:/usr/local/nagios/var/nagios.log

This variable specifies where Nagios should create its main log file. This should be the first variable that you define in your configuration file, as Nagios will try to write errors that it finds in the rest of your configuration data to this file.

Object Configuration File

Format: cfg_file:<file_name>

Example:

cfg_file:/usr/local/nagios/etc/hosts.cfg

cfg_file:/usr/local/nagios/etc/services.cfg

cfg_file:/usr/local/nagios/etc/commands.cfg

This directive is used to specify an object configuration file containing object definitions that Nagios should use for monitoring. Object configuration files contain definitions for hosts, host groups, contacts, contact groups, services, commands, etc. You can seperate your configuration information into several files and specify multiple cfg_file: statements to have each of them processed.

Object Configuration Directory

Format: cfg_dir:<directory_name>

Example:

cfg_dir:/usr/local/nagios/etc/commands

cfg_dir:/usr/local/nagios/etc/services

cfg_dir:/usr/local/nagios/etc/hosts

This directive is used to specify a directory which contains object configuration files that Nagios should use for monitoring. All files in the directory with a .cfg extension are processed as object config files. Additionally, Nagios will recursively process all config files in subdirectories of the directory you specify here.You can seperate your configuration files into different directories and specify multiple cfg_dir: statements to have all config files in each directory processed.

Object Cache File

Format: object_cache_file:<file_name>

Example: object_cache_file:/usr/local/nagios/var/objects.cache

This directive is used to specify a file in which a cached copy of object definitions should be stored. The cache file is (re)created every time Nagios is (re)started and is used by the CGIs. It is intended to speed up config file caching in the CGIs and allow you to edit the source object config files while Nagios is running without affecting the output displayed in the CGIs.

Resource File

Format: resource_file:<file_name>

Example: resource_file:/usr/local/nagios/etc/resource.cfg

This is used to specify an optional resource file that can contain $USERn$ definitions. $USERn$ macros are useful for storing usernames, passwords, and items commonly used in command definitions (like directory paths). The CGIs will not attempt to read resource files, so you can set restrictive permissions (600 or 660) on them to protect sensitive information. You can include multiple resource files by adding multiple resource_file statements to the main config file - Nagios will process them all.

Temp File

Format: temp_file:<file_name>

Example: temp_file:/usr/local/nagios/var/nagios.tmp

This is a temporary file that Nagios periodically creates to use when updating comment data, status data, etc. The file is deleted when it is no longer needed.

Temp Path

Format: temp_path:<dir_name>

Example: temp_path:/tmp

This is a directory that Nagios can use as scratch space for creating temporary files used during the monitoring process.

Status File

Format: status_file:<file_name>

Example: status_file:/usr/local/nagios/var/status.dat

This is the file that Nagios uses to store the current status, comment, and downtime information. This file is used by the CGIs so that current monitoring status can be reported via a web interface. The CGIs must have read access to this file in order to function properly. This file is deleted every time Nagios stops and recreated when it starts.

Status File Update Interval

Format: status_update_interval:<seconds>

Example: status_update_interval:15

This setting determines how often (in seconds) that Nagios will update status data in the status file. The minimum update interval is 1 second.

Nagios User

Format: nagios_user:<username/UID>

Example: nagios_user:nagios

This is used to set the effective user that the Nagios process should run as. After initial program startup and before starting to monitor anything, Nagios will drop its effective privileges and run as this user. You may specify either a username or a UID.

Nagios Group

Format: nagios_group:<groupname/GID>

Example: nagios_group:nagios

This is used to set the effective group that the Nagios process should run as. After initial program startup and before starting to monitor anything, Nagios will drop its effective privileges and run as this group. You may specify either a groupname or a GID.

Notifications Option

Format: enable_notifications:<0/1>

Example: enable_notifications:1

This option determines whether or not Nagios will send out notifications when it initially (re)starts. If this option is disabled, Nagios will not send out notifications for any host or service.

0:Disable notifications

1:Enable notifications (default)

Service Check Execution Option

Format: execute_service_checks:<0/1>

Example: execute_service_checks:1

This option determines whether or not Nagios will execute service checks when it initially (re)starts. If this option is disabled, Nagios will not actively execute any service checks and will remain in a sort of "sleep" mode.

0 : Don’t execute service checks

1 : Execute service checks (default)

Passive Service Check Acceptance Option

Format: accept_passive_service_checks:<0/1>

Example: accept_passive_service_checks:1

This option determines whether or not Nagios will accept passive service checks when it initially (re)starts. If this option is disabled, Nagios will not accept any passive service checks.

0 : Don’t execute service checks

1 : Execute service checks (default)

Host Check Execution Option

Format: execute_host_checks:<0/1>

Example: execute_host_checks:1

This option determines whether or not Nagios will execute on-demand and regularly scheduled host checks when it initially (re)starts. If this option is disabled, Nagios will not actively execute any host checks.

0 : Don’t execute service checks

1 : Execute service checks (default)

Passive Host Check Acceptance Option

Format: accept_passive_host_checks:<0/1>

Example: accept_passive_host_checks:1

This option determines whether or not Nagios will accept passive host checks when it initially (re)starts. If this option is disabled, Nagios will not accept any passive host checks.

0 : Don’t execute service checks

1 : Execute service checks (default)

Event Handler Option

Format: enable_event_handlers:<0/1>

Example: enable_event_handlers:1

This option determines whether or not Nagios will run event handlers when it initially (re)starts. If this option is disabled, Nagios will not run any host or service event handlers.

0 : Don’t execute service checks

1 : Execute service checks (default)

Log Rotation Method

Format: log_rotation_method:<n/h/d/w/m>

Example: log_rotation_method:d

This is the rotation method that you would like Nagios to use for your log file. Values are as follows:

n : None (don’t rotate the log - this is the default)

h : Hourly (rotate the log at the top of each hour)

d : Daily (rotate the log at midnight each day)

w : Weekly (rotate the log at midnight on Saturday)

m : Monthly (rotate the log at midnight on the last day of the month)

Log Archive Path

Format: log_archive_path:<path>

Example: log_archive_path:/usr/local/nagios/var/archives/

This is the directory where Nagios should place log files that have been rotated. This option is ignored if you choose to not use the log rotation functionality.

External Command Check Option

Format: check_external_commands:<0/1>

Example: check_external_commands:1

This option determines whether or not Nagios will check the command file for commands that should be executed. This option must be enabled if you plan on using the command CGI to issue commands via the web interface.

0 : Don’t execute service checks

1 : Execute service checks (default)

External Command Check Interval

Format: command_check_interval:<xxx>[s]

Example: command_check_interval:1

If you specify a number with an "s" appended to it (i.e. 30s), this is the number of seconds to wait between external command checks. If you leave off the "s", this is the number of "time units" to wait between external command checks. Unless you’ve changed the interval_length value (as defined below) from the default value of 60, this number will mean minutes.

External Command Buffer Slots

Format: external_command_buffer_slots:<#>

Example: external_command_buffer_slots:512

Note: This is an advanced feature. This option determines how many buffer slots Nagios will reserve for caching external commands that have been read from the external command file by a worker thread, but have not yet been processed by the main thread of the Nagios deamon.

Update Checks

Format: check_for_updates:<0/1>

Example: check_for_updates:1

This option determines whether Nagios will automatically check to see if new updates (releases) are available. It is recommend that you enable this option to ensure that you stay on top of the latest critical patches to Nagios.

Bare Update Checks

Format: bare_update_checks:<0/1>

Example: bare_update_checks

This option deterines what data Nagios will send to api.nagios.org when it checks for updates. By default, Nagios will send information on the current version of Nagios you have installed, as well as an indicator as to whether this was a new installation or not.

Lock File

Format: lock_file:<file_name>

Example: lock_file:/tmp/nagios.lock

This option specifies the location of the lock file that Nagios should create when it runs as a daemon (when started with the -d command line argument). This file contains the process id (PID) number of the running Nagios process.

State Retention Option

Format: retain_state_information:<0/1>

Example: retain_state_information:1

This option determines whether or not Nagios will retain state information for hosts and services between program restarts. If you enable this option, you should supply a value for the state_retention_file variable. When enabled, Nagios will save all state information for hosts and service before it shuts down (or restarts) and will read in previously saved state information when it starts up again.

0 : Don’t execute service checks

1 : Execute service checks (default)

State Retention File

Format: state_retention_file:<file_name>

Example: state_retention_file:/usr/local/nagios/var/retention.dat

This is the file that Nagios will use for storing status, downtime, and comment information before it shuts down. When Nagios is restarted it will use the information stored in this file for setting the initial states of services and hosts before it starts monitoring anything.

Automatic State Retention Update Interval

Format: retention_update_interval:<minutes>

Example: retention_update_interval:60

This setting determines how often (in minutes) that Nagios will automatically save retention data during normal operation. If you set this value to 0, Nagios will not save retention data at regular intervals, but it will still save retention data before shutting down or restarting.

Use Retained Program State Option

Format: use_retained_program_state:<0/1>

Example: use_retained_program_state:1

This setting determines whether or not Nagios will set various program-wide state variables based on the values saved in the retention file. Some of these program-wide state variables that are normally saved across program restarts if state retention is enabled include the enable_notifications, enable_flap_detection, enable_event_handlers, execute_service_checks, and Accept_passive_service_checks options.

0 : Don’t execute service checks

1 : Execute service checks (default)

Use Retained Scheduling Info Option

Format: use_retained_scheduling_info:<0/1>

Example: use_retained_scheduling_info:1

This setting determines whether or not Nagios will retain scheduling info (next check times) for hosts and services when it restarts. If you are adding a large number (or percentage) of hosts and services, I would recommend disabling this option when you first restart Nagios, as it can adversely skew the spread of initial checks. Otherwise you will probably want to leave it enabled.

0 : Don’t execute service checks

1 : Execute service checks (default)

Syslog Logging Option

Format: use_syslog:<0/1>

Example: use_syslog:1

This variable determines whether messages are logged to the syslog facility on your local host. Values are as follows:

0 : Don’t use syslog facility

1 : Use syslog facility

Notification Logging Option

Format: log_notifications:<0/1>

Example: log_notifications:1

This variable determines whether or not notification messages are logged. If you have a lot of contacts or regular service failures your log file will grow relatively quickly. Use this option to keep contact notifications from being logged

0 : Don’t log notifications

1 : Log notifications

Service Check Retry Logging Option

Format: log_service_retries:<0/1>

Example: log_service_retries:1

This variable determines whether or not service check retries are logged. Service check retries occur when a service check results in a non-OK state, but you have configured Nagios to retry the service more than once before responding to the error. Services in this situation are considered to be in "soft" states.

0 : Don’t log service check retries

1 : Log service check retries

Host Check Retry Logging Option

Format: log_host_retries:<0/1>

Example: log_host_retries:1

This variable determines whether or not host check retries are logged.

0 : Don’t log host check retries

1 : Log host check retries

Event Handler Logging Option

Format: log_event_handlers:<0/1>

Example: log_event_handlers:1

This variable determines whether or not service and host event handlers are logged. Event handlers are optional commands that can be run whenever a service or hosts changes state.

0 : Don’t log event handlers

1 : Log event handlers

Initial States Logging Option

Format: log_initial_states:<0/1>

Example: log_initial_states:1

This variable determines whether or not Nagios will force all initial host and service states to be logged, even if they result in an OK state. Initial service and host states are normally only logged when there is a problem on the first check.

0 : Don’t log initial states (default)

1 : Log initial states

External Command Logging Option

Format: log_external_commands:<0/1>

Example: log_external_commands:1

This variable determines whether or not Nagios will log external commands that it receives from the external command file.

0 : Don’t log external commands

1 : Log external commands (default)

Passive Check Logging Option

Format: log_passive_checks:<0/1>

Example: log_passive_checks:1

This variable determines whether or not Nagios will log passive host and service checks that it receives from the external command file. If you are setting up a distributed monitoring environment or plan on handling a large number of passive checks on a regular basis, you may wish to disable this option so your log file doesn’t get too large.

0 : Don’t log passive checks

1 : Log passive checks (default)

Global Host Event Handler Option

Format: global_host_event_handler:<command>

Example: global_host_event_handler:log-host-event-to-db

This option allows you to specify a host event handler command that is to be run for every host state change. The global event handler is executed immediately prior to the event handler that you have optionally specified in each host definition.

Global Service Event Handler Option

Format: global_service_event_handler:<command>

Example: global_service_event_handler:log-service-event-to-db

This option allows you to specify a service event handler command that is to be run for every service state change. The global event handler is executed immediately prior to the event handler that you have optionally specified in each service definition.

Inter-Check Sleep Time

Format: sleep_time:<seconds>

Example: sleep_time:1

This is the number of seconds that Nagios will sleep before checking to see if the next service or host check in the scheduling queue should be executed. Note that Nagios will only sleep after it "catches up" with queued service checks that have fallen behind.

Service Inter-Check Delay Method

Format: service_inter_check_delay_method:<n/d/s/x.xx>

Example: service_inter_check_delay_method:s

This option allows you to control how service checks are initially "spread out" in the event queue. Using a "smart" delay calculation (the default) will cause Nagios to calculate an average check interval and spread initial checks of all services out over that interval, thereby helping to eliminate CPU load spikes. Using no delay is generally not recommended, as it will cause all service checks to be scheduled for execution at the same time. This means that you will generally have large CPU spikes when the services are all executed in parallel.

n : Don’t use any delay - schedule all service checks to run immediately (i.e. at the same time!)

d : Use a "dumb" delay of 1 second between service checks

s : Use a "smart" delay calculation to spread service checks out evenly (default)

x.xx : Use a user-supplied inter-check delay of x.xx seconds

Maximum Service Check Spread

Format: max_service_check_spread:<minutes>

Example: max_service_check_spread:30

This option determines the maximum number of minutes from when Nagios starts that all services (that are scheduled to be regularly checked) are checked. Default value is 30 (minutes).

Service Interleave Factor

Format: service_interleave_factor:<s|x>

Example: service_interleave_factor:s

This variable determines how service checks are interleaved. Interleaving allows for a more even distribution of service checks, reduced load on remote hosts, and faster overall detection of host problems. Setting this value to 1 is equivalent to not interleaving the service checks (this is how versions of Nagios previous to 0.0.5 worked). Set this value to s (smart) for automatic calculation of the interleave factor unless you have a specific reason to change it.

x : A number greater than or equal to 1 that specifies the interleave factor to use. An interleave factor of 1 is equivalent to

not interleaving the service checks.

s : Use a "smart" interleave factor calculation (default)

Maximum Concurrent Service Checks

Format: max_concurrent_checks:<max_checks>

Example: max_concurrent_checks:20

This option allows you to specify the maximum number of service checks that can be run in parallel at any given time. Specifying a value of 1 for this variable essentially prevents any service checks from being run in parallel. Specifying a value of 0 (the default) does not place any restrictions on the number of concurrent checks. You’ll have to modify this value based on the system resources you have available on the machine that runs Nagios, as it directly affects the maximum load that will be imposed on the system (processor utilization, memory, etc.).

Check Result Reaper Frequency

Format: check_result_reaper_frequency:<frequency_in_seconds>

Example: check_result_reaper_frequency:5

This option allows you to control the frequency in seconds of check result "reaper" events. "Reaper" events process the results from host and service checks that have finished executing. These events consitute the core of the monitoring logic in Nagios.

Maximum Check Result Reaper Time

Format: max_check_result_reaper_time:<seconds>

Example: max_check_result_reaper_time:30

This option allows you to control the maximum amount of time in seconds that host and service check result "reaper" events are allowed to run. "Reaper" events process the results from host and service checks that have finished executing. If there are a lot of results to process, reaper events may take a long time to finish, which might delay timely execution of new host and service checks. This variable allows you to limit the amount of time that an individual reaper event will run before it hands control back over to Nagios for other portions of the monitoring logic.

Check Result Path

Format: check_result_path:<path>

Example: check_result_path:/var/spool/nagios/checkresults

This options determines which directory Nagios will use to temporarily store host and service check results before they are processed. This directory should not be used to store any other files, as Nagios will periodically clean this directory of old file.

Max Check Result File Age

Format: max_check_result_file_age:<seconds>

Example: max_check_result_file_age:3600

This options determines the maximum age in seconds that Nagios will consider check result files found in the check_result_path directory to be valid. Check result files that are older that this threshold will be deleted by Nagios and the check results they contain will not be processed. By using a value of zero (0) with this option, Nagios will process all check result files even if they’re older than your hardware.

Host Inter-Check Delay Method

Format: host_inter_check_delay_method:<n/d/s/x.xx>

Example: host_inter_check_delay_method:s

This option allows you to control how host checks that are scheduled to be checked on a regular basis are initially "spread out" in the event queue. Using a "smart" delay calculation (the default) will cause Nagios to calculate an average check interval and spread initial checks of all hosts out over that interval, thereby helping to eliminate CPU load spikes.

n : Don’t use any delay - schedule all host checks to run immediately (i.e. at the same time!)

d : Use a "dumb" delay of 1 second between host checks

s : Use a "smart" delay calculation to spread host checks out evenly (default)

x.xx : Use a user-supplied inter-check delay of x.xx seconds

Maximum Host Check Spread

Format: max_host_check_spread:<minutes>

Example: max_host_check_spread:30

This option determines the maximum number of minutes from when Nagios starts that all hosts (that are scheduled to be regularly checked) are checked. This option will automatically adjust the host inter-check delay method (if necessary) to ensure that the initial checks of all hosts occur within the timeframe you specify. In general, this option will not have an affect on host check scheduling if scheduling information is being retained using the use_retained_scheduling_info option. Default value is 30 (minutes).

Timing Interval Length

Format: interval_length:<seconds>

Example: interval_length:60

This is the number of seconds per "unit interval" used for timing in the scheduling queue, re-notifications, etc. "Units intervals" are used in the object configuration file to determine how often to run a service check, how often to re-notify a contact, etc.

Important: The default value for this is set to 60, which means that a "unit value" of 1 in the object configuration file will mean 60 seconds (1 minute).

Auto-Rescheduling Option

Format: auto_reschedule_checks:<0/1>

Example: auto_reschedule_checks:1

This option determines whether or not Nagios will attempt to automatically reschedule active host and service checks to "smooth" them out over time. This can help to balance the load on the monitoringserver, as it will attempt to keep the time between consecutive checks consistent, at the expense of executing checks on a more rigid schedule.

Auto-Rescheduling Interval

Format: auto_rescheduling_interval:<seconds>

Example: auto_rescheduling_interval:30

This option determines how often (in seconds) Nagios will attempt to automatically reschedule checks. This option only has an effect if the auto_reschedule_checks option is enabled. Default is 30 seconds.

Auto-Rescheduling Window

Format: auto_rescheduling_window:<seconds>

Example: auto_rescheduling_window:180

This option determines the "window" of time (in seconds) that Nagios will look at when automatically rescheduling checks. Only host and service checks that occur in the next X seconds (determined by this variable) will be rescheduled. This option only has an effect if the auto_reschedule_checks option is enabled. Default is 180 seconds (3 minutes).

Aggressive Host Checking Option

Format: use_aggressive_host_checking:<0/1>

Example: use_aggressive_host_checking:0

Nagios tries to be smart about how and when it checks the status of hosts. In general, disabling this option will allow Nagios to make some smarter decisions and check hosts a bit faster. Enabling this option will increase the amount of time required to check hosts, but may improve reliability a bit. Unless you have problems with Nagios not recognizing that a host recovered, I would suggest not enabling this option.

0 : Don’t use aggressive host checking (default)

1 : Use aggressive host checking

Translate Passive Host Checks Option

Format: translate_passive_host_checks:<0/1>

Example: translate_passive_host_checks:1

This option determines whether or not Nagios will translate DOWN/UNREACHABLE passive host check results to their "correct" state from the viewpoint of the local Nagios instance.

0 : Disable check translation (default)

1 : Enable check translation

Passive Host Checks Are SOFT Option

Format: passive_host_checks_are_soft:<0/1>

Example: passive_host_checks_are_soft:1

This option determines whether or not Nagios will treat passive host checks as HARD states or SOFT states. By default, a passive host check result will put a host into a HARD state type. You can change this behavior by enabling this option.

0 : Passive host checks are HARD (default)

1 : Passive host checks are SOFT

Predictive Host Dependency Checks Option

Format: enable_predictive_host_dependency_checks:<0/1>

Example: enable_predictive_host_dependency_checks:1

This option determines whether or not Nagios will execute predictive checks of hosts that are being depended upon for a particular host when it changes state. Predictive checks help ensure that the dependency logic is as accurate as possible.

0 : Disable predictive checks

1 : Enable predictive checks (default)

Predictive Service Dependency Checks Option

Format: enable_predictive_service_dependency_checks:<0/1>

Example: enable_predictive_service_dependency_checks:1

This option determines whether or not Nagios will execute predictive checks of services that are being depended upon for a particular service when it changes state.

0 : Disable predictive checks

1 : Enable predictive checks (default)

Cached Host Check Horizon

Format: cached_host_check_horizon:<seconds>

Example: cached_host_check_horizon:15

This option determines the maximum amount of time (in seconds) that the state of a previous host check is considered current. Cached host states (from host checks that were performed more recently than the time specified by this value) can improve host check performance immensely. Too high of a value for this option may result in (temporarily) inaccurate host states, while a low value may result in a performance hit for host checks. Use a value of 0 if you want to disable host check caching.

Cached Service Check Horizon

Format: cached_service_check_horizon:<seconds>

Example: cached_service_check_horizon:15

This option determines the maximum amount of time (in seconds) that the state of a previous service check is considered current. Cached service states (from service checks that were performed more recently than the time specified by this value) can improve service check performance when a lot of service dependencies are used. Too high of a value for this option may result in inaccuracies in the service dependency logic. Use a value of 0 if you want to disable service check caching.

Large Installation Tweaks Option

Format: use_large_installation_tweaks:<0/1>

Example: use_large_installation_tweaks:0

This option determines whether or not the Nagios daemon will take several shortcuts to improve performance. These shortcuts result in the loss of a few features, but larger installations will likely see a lot of benefit from doing so.

0 : Don’t use tweaks (default)

1 : Use tweaks

Child Process Memory Option

Format: free_child_process_memory:<0/1>

Example: free_child_process_memory:0

This option determines whether or not Nagios will free memory in child processes when they are forked off from the main process. By default, Nagios frees memory. However, if the use_large_installation_tweaks option is enabled, it will not. By defining this option in your configuration file, you are able to override things to get the behavior you want.

0 : Don’t free memory

1 : Free memory

Environment Macros Option

Format: enable_environment_macros:<0/1>

Example: enable_environment_macros:0

This option determines whether or not the Nagios daemon will make all standard macros available as environment variables to your check, notification, event hander, etc. commands. In large Nagios installations this can be problematic because it takes additional memory and (more importantly) CPU to compute the values of all macros and make them available to the environment.

0 : Don’t make macros available as environment variables

1 : Make macros available as environment variables (default)

Flap Detection Option

Format: enable_flap_detection:<0/1>

Example: enable_flap_detection:0

This option determines whether or not Nagios will try and detect hosts and services that are "flapping". Flapping occurs when a host or service changes between states too frequently, resulting in a barrage of notifications being sent out. When Nagios detects that a host or service is flapping, it will temporarily suppress notifications for that host/service until it stops flapping.

0 : Don’t enable flap detection (default)

1 : Enable flap detection

Low Service Flap Threshold

Format: low_service_flap_threshold:<percent>

Example: low_service_flap_threshold:25.0

This option is used to set the low threshold for detection of service flapping.

High Service Flap Threshold

Format: high_service_flap_threshold:<percent>

Example: high_service_flap_threshold:50.0

This option is used to set the high threshold for detection of service flapping.

Low Host Flap Threshold

Format: low_host_flap_threshold:<percent>

Example: low_host_flap_threshold:25.0

This option is used to set the low threshold for detection of host flapping.

High Host Flap Threshold

Format: high_host_flap_threshold:<percent>

Example: high_host_flap_threshold:50.0

This option is used to set the high threshold for detection of host flapping.

Soft State Dependencies Option

Format: soft_state_dependencies:<0/1>

Example: soft_state_dependencies:0

This option determines whether or not Nagios will use soft state information when checking host and service dependencies. Normally Nagios will only use the latest hard host or service state when checking dependencies. If you want it to use the latest state (regardless of whether its a soft or hard state type), enable this option.

0 : Don’t use soft state dependencies (default)

1 : Use soft state dependencies

Service Check Timeout

Format: service_check_timeout:<seconds>

Example: service_check_timeout:60

This is the maximum number of seconds that Nagios will allow service checks to run. If checks exceed this limit, they are killed and a CRITICAL state is returned. A timeout error will also be logged.

Host Check Timeout

Format: host_check_timeout:<seconds>

Example: host_check_timeout:60

This is the maximum number of seconds that Nagios will allow host checks to run. If checks exceed this limit, they are killed and a CRITICAL state is returned and the host will be assumed to be DOWN. A timeout error will also be logged.

Event Handler Timeout

Format: event_handler_timeout:<seconds>

Example: event_handler_timeout:60

This is the maximum number of seconds that Nagios will allow event handlers to be run. If an event handler exceeds this time limit it will be killed and a warning will be logged.

Notification Timeout

Format: notification_timeout:<seconds>

Example: notification_timeout:60

This is the maximum number of seconds that Nagios will allow notification commands to be run. If a notification command exceeds this time limit it will be killed and a warning will be logged.

Obsessive Compulsive Service Processor Timeout

Format: ocsp_timeout:<seconds>

Example: ocsp_timeout:5

This is the maximum number of seconds that Nagios will allow an obsessive compulsive service processor command to be run. If a command exceeds this time limit it will be killed and a warning will be logged.

Obsessive Compulsive Host Processor Timeout

Format: ochp_timeout:<seconds>

Example: ochp_timeout:5

This is the maximum number of seconds that Nagios will allow an obsessive compulsive host processor command to be run. If a command exceeds this time limit it will be killed and a warning will be logged.

Performance Data Processor Command Timeout

Format: perfdata_timeout:<seconds>

Example: perfdata_timeout:5

This is the maximum number of seconds that Nagios will allow a host performance data processor command or service performance data processor command to be run. If a command exceeds this time limit it will be killed and a warning will be logged.

Obsess Over Services Option

Format: obsess_over_services:<0/1>

Example: obsess_over_services:1

This value determines whether or not Nagios will "obsess" over service checks results and run the obsessive compulsive service processor command you define.

0 : Don’t obsess over services (default)

1 : Obsess over services

Obsessive Compulsive Service Processor Command

Format: ocsp_command:<command>

Example: ocsp_command:obsessive_service_handler

This option allows you to specify a command to be run after every service check.

Obsess Over Hosts Option

Format: obsess_over_hosts:<0/1>

Example: obsess_over_hosts:1

This value determines whether or not Nagios will "obsess" over host checks results and run the obsessive compulsive host processor command you define.

0 : Don’t obsess over hosts (default)

1 : Obsess over hosts

Obsessive Compulsive Host Processor Command

Format: ochp_command:<command>

Example: ochp_command:obsessive_host_handler

This option allows you to specify a command to be run after every host check.

Performance Data Processing Option

Format: process_performance_data:<0/1>

Example: process_performance_data:1

This value determines whether or not Nagios will process host and service check performance data.

0 : Don’t process performance data (default)

1 : Process performance data

Host Performance Data Processing Command

Format: host_perfdata_command:<command>

Example: host_perfdata_command:process-host-perfdata

This option allows you to specify a command to be run after every host check to process host performance data that may be returned from the check. The command argument is the short name of a command definition that you define in your object configuration file. This command is only executed if the process_performance_data option is enabled globally and if the process_perf_data directive in the host definition is enabled.

Service Performance Data Processing Command

Format: service_perfdata_command:<command>

Example: service_perfdata_command:process-service-perfdata

This option allows you to specify a command to be run after every service check to process service performance data that may be returned from the check. The command argument is the short name of a command definition that you define in your object configuration file. This command is only executed if the process_performance_data option is enabled globally and if the process_perf_data directive in the service definition is enabled.

Service Performance Data Processing Command

Format: service_perfdata_command:<command>

Example: service_perfdata_command:process-service-perfdata

Service Performance Data File

Format: service_perfdata_file:<file_name>

Example: service_perfdata_file:/usr/local/nagios/var/service-perfdata.dat

This option allows you to specify a file to which service performance data will be written after every service check. Data will be written to the performance file as specified by the service_perfdata_file_template option. Performance data is only written to this file if the process_performance_data option is enabled globally and if the process_perf_data directive in the service definition is enabled.

Host Performance Data File Mode

Format: host_perfdata_file_mode:<mode>

Example: host_perfdata_file_mode:a

This option determines how the host performance data file is opened. Unless the file is a named pipe you’ll probably want to use the default mode of append.

a : Open file in append mode default

w : Open file in write mode

p : Open in non-blocking read/write mode (useful when writing to pipes)

Service Performance Data File Mode

Format: service_perfdata_file_mode:<mode>

Example: service_perfdata_file_mode:a

This option determines how the service performance data file is opened. Unless the file is a named pipe you’ll probably want to use the default mode of append.

a : Open file in append mode (default)

w : Open file in write mode

p : Open in non-blocking read/write mode (useful when writing to pipes)

Host Performance Data File Processing Interval

Format: host_perfdata_file_processing_interval:<seconds>

Example: host_perfdata_file_processing_interval:0

This option allows you to specify the interval (in seconds) at which the host performance data file is processed using the host performance data file processing command. A value of 0 indicates that the performance data file should not be processed at regular intervals.

Service Performance Data File Processing Interval

Format: service_perfdata_file_processing_interval:<seconds>

Example: service_perfdata_file_processing_interval:0

This option allows you to specify the interval (in seconds) at which the service performance data file is processed using the service performance data file processing command. A value of 0 indicates that the performance data file should not be processed at regular intervals.

Host Performance Data File Processing Command

Format: host_perfdata_file_processing_command:<command>

Example: host_perfdata_file_processing_command:process-host-perfdata-file

This option allows you to specify the command that should be executed to process the host performance data file.

Service Performance Data File Processing Command

Format: service_perfdata_file_processing_command:<command>

Example: service_perfdata_file_processing_command:process-service-perfdata-file

This option allows you to specify the command that should be executed to process the service performance data file.

Orphaned Service Check Option

Format: check_for_orphaned_services:<0/1>

Example: check_for_orphaned_services:1

This option allows you to enable or disable checks for orphaned service checks. Orphaned service checks are checks which have been executed and have been removed from the event queue, but have not had any results reported in a long time. Since no results have come back in for the service, it is not rescheduled in the event queue. This can cause service checks to stop being executed. Normally it is very rare for this to happen.

0 : Don’t check for orphaned service checks

1 : Check for orphaned service checks (default)

Orphaned Host Check Option

Format: check_for_orphaned_hosts:<0/1>

Example: check_for_orphaned_hosts:1

This option allows you to enable or disable checks for orphaned hoste checks. Orphaned host checks are checks which have been executed and have been removed from the event queue, but have not had any results reported in a long time. Since no results have come back in for the host, it is not rescheduled in the event queue. This can cause host checks to stop being executed. Normally it is very rare for this to happen

0 : Don’t check for orphaned host checks

1 : Check for orphaned host checks (default)

Service Freshness Checking Option

Format: check_service_freshness:<0/1>

Example: check_service_freshness:0

This option determines whether or not Nagios will periodically check the "freshness" of service checks.Enabling this option is useful for helping to ensure that passive service checks are received in a timely manner.

0 : Don’t check service freshness

1 : Check service freshness (default)

Service Freshness Check Interval

Format: service_freshness_check_interval:<seconds>

Example: service_freshness_check_interval:60

This setting determines how often (in seconds) Nagios will periodically check the "freshness" of service check results. If you have disabled service freshness checking this option has no effect.

Host Freshness Checking Option

Format: check_host_freshness:<0/1>

Example: check_host_freshness:0

This option determines whether or not Nagios will periodically check the "freshness" of host checks. Enabling this option is useful for helping to ensure that passive host checks are received in a timely manner.

0 : Don’t check host freshness

1 : Check host freshness (default)

Host Freshness Check Interval

Format: host_freshness_check_interval:<seconds>

Example: host_freshness_check_interval:60

This setting determines how often (in seconds) Nagios will periodically check the "freshness" of host check results. If you have disabled host freshness checking (with the check_host_freshness option), this option has no effect.

Additional Freshness Threshold Latency Option

Format: additional_freshness_latency:<#>

Example: additional_freshness_latency:15

This option determines the number of seconds Nagios will add to any host or services freshness threshold it automatically calculates (e.g. those not specified explicity by the user).

Embedded Perl Interpreter Option

Format: enable_embedded_perl:<0/1>

Example: enable_embedded_perl:1

This setting determines whether or not the embedded Perl interpreter is enabled on a program-wide basis. Nagios must be compiled with support for embedded Perl for this option to have an effect.

Embedded Perl Implicit Use Option

Format: use_embedded_perl_implicitly:<0/1>

Example: use_embedded_perl_implicitly:1

This setting determines whether or not the embedded Perl interpreter should be used for Perl plugins/scripts that do not explicitly enable/disable it. Nagios must be compiled with support for embedded Perl for this option to have an effect.

Date Format

Format: date_format:<option>

Example: date_format:us

This option allows you to specify what kind of date/time format Nagios should use in the web interface and date/time macros. Possible options (along with example output) include:

Option Output Format Sample Output

Us MM/DD/YYYY HH:MM:SS 06/30/2011 03:15:00

euro DD/MM/YYYY HH:MM:SS 30/06/2011 03:15:00

iso8601 YYYY-MM-DD HH:MM:SS 2011-06-30 03:15:00

strict-iso8601 YYYY-MM-DDT HH:MM:SS 2011-06-30 3:15:00

Timezone Option

Format: use_timezone:<tz>

Example: use_timezone:US/Mountain

This option allows you to override the default timezone that this instance of Nagios runs in. Useful if you have multiple instances of Nagios that need to run from the same server, but have different local times associated with them. If not specified, Nagios will use the system configured timezone.

Note: If you use this option to specify a custom timezone, you will also need to alter the Apache configuration directives for the CGIs to specify the timezone you want. Example:

``<Directory "/usr/local/nagios/sbin/">``

``SetEnv TZ "US/Mountain"``

``...``

``</Directory>``

Illegal Object Name Characters

Format: illegal_object_name_chars:<chars...>

Example: illegal_object_name_chars:**‘~!$%^&*"|’<>?,():**

This option allows you to specify illegal characters that cannot be used in host names, service descriptions, or names of other object types. Nagios will allow you to use most characters in object definitions, but I recommend not using the characters shown in the example above. Doing may give you problems in the web interface, notification commands, etc.

Regular Expression Matching Option

Format: use_regexp_matching:<0/1>

Example: use_regexp_matching:0

This option determines whether or not various directives in your object definitions will be processed as regular expressions. More information on how this works can be found here.

0 : Don’t use regular expression matching (default)

1 : Use regular expression matching

True Regular Expression Matching Option

Format: use_true_regexp_matching:<0/1>

Example: use_true_regexp_matching:0

If youve enabled regular expression matching of various object directives using the use_regexp_matching option, this option will determine when object directives are treated as regular expressions. If this option is disabled (the default), directives will only be treated as regular expressions if they contain **"*, ?, +, or \.."** If this option is enabled, all appropriate directives will be treated as regular expression - be careful when enabling this!

0:Dont use true regular expression matching (default)

1:Use true regular expression matching

Administrator Email Address

Format: admin_email:<email_address>

Example: admin_email:root@localhost.localdomain

This is the email address for the administrator of the local machine (i.e. the one that Nagios is running on). This value can be used in notification commands by using the $ADMINEMAIL$ macro.

Administrator Pager

Format: admin_pager:<pager_number_or_pager_email_gateway>

Example: admin_pager:pageroot@localhost.localdomain

This is the pager number (or pager email gateway) for the administrator of the local machine (i.e. the one that Nagios is running on). The pager number/address can be used in notification commands by using the $ADMINPAGER$ macro.

:::::::::::::::::::::::::::::::::::::::::::::::::::

CGI Configuration File Options

::::::::::::::::::::::::::::::::::::::::::::::::::::

Notes

When creating and/or editing configuration files, keep the following in mind:

1. Lines that start with a ’#’ character are taken to be comments and are not processed

2. Variables names must begin at the start of the line - no white space is allowed before the name

3. Variable names are case-sensitive

Sample Configuration

Tip: A sample CGI configuration file (/usr/local/nagios/etc/cgi.cfg) is installed for you.

Config File Location

By default, Nagios expects the CGI configuration file to be named cgi.cfg and located in the config file directory along with the main config file. (/usr/local/nagios/etc/cgi.cfg).

Configuration File Variables

Below you will find descriptions of each main Nagios configuration file option.

Main Configuration File Location

Format: main_config_file:<file_name>

Example: main_config_file:/usr/local/nagios/etc/nagios.cfg

This specifies the location of your main configuration file. The CGIs need to know where to find this file in order to get information about configuration information, current host and service status, etc.

Physical HTML Path

Format: physical_html_path:<path>

Example: physical_html_path:/usr/local/nagios/share

This is the physical path where the HTML files for Nagios are kept on your workstation or server. Nagios assumes that the documentation and images files (used by the CGIs) are stored in subdirectories called docs/ and images/, respectively.

URL HTML Path

Format: url_html_path:<path>

Example: url_html_path:/nagios

If, when accessing Nagios via a web browser, you point to an URL like http://www.myhost.com/nagios, this value should be nagios. Basically, its the path portion of the URL that is used to access the Nagios HTML pages.

Authentication Usage

Format: use_authentication:<0/1>

Example: use_authentication:1

This option controls whether or not the CGIs will use the authentication and authorization functionality when determining what information and commands users have access to. I would strongly suggest that you use the authentication functionality for the CGIs. If you decide not to use authentication, make sure to remove the command CGI to prevent unauthorized users from issuing commands to Nagios. The CGI will not issue commands to Nagios if authentication is disabled, but I would suggest removing it altogether just to be on the safe side.

0 : Don’t use authentication functionality

1 : Use authentication and authorization functionality (default)

Default User Name

Format: default_user_name:<username>

Example: default_user_name:guest

Setting this variable will define a default username that can access the CGIs. This allows people within a secure domain (i.e., behind a firewall) to access the CGIs without necessarily having to authenticate to the web server. You may want to use this to avoid having to use basic authentication if you are not using a secure server, as basic authentication transmits passwords in clear text over the Internet.

Important

Do not define a default username unless you are running a secure web server and are sure that everyone who has access to the CGIs has been authenticated in some manner! If you define this variable, anyone who has not authenticated to the web server will inherit all rights you assign to this user.

System/Process Information Access

Format: authorized_for_system_information:<user1>,<user2>,<user3>,...<usern>

Example: authorized_for_system_information:nagiosadmin,theboss

This is a comma-delimited list of names of authenticated users who can view system/process information in the extended information CGI. Users in this list are not automatically authorized to issue system/process commands. If you want users to be able to issue system/process commands as well, you must add them to the authorized_for_system_commands variable.

System/Process Command Access

Format: authorized_for_system_commands:<user1>,<user2>,<user3>,...<usern>

Example: authorized_for_system_commands:nagiosadmin

This is a comma-delimited list of names of authenticated users who can issue system/process commands via the command CGI. Users in this list are not automatically authorized to view system/process information. If you want users to be able to view system/process information as well, you must add them to the authorized_for_system_information variable.

Configuration Information Access

Format: authorized_for_configuration_information:<user1>,<user2>,<user3>,...<usern>

Example: authorized_for_configuration_information:nagiosadmin

This is a comma-delimited list of names of authenticated users who can view configuration information in the configuration CGI. Users in this list can view information on all configured hosts, host groups, services, contacts, contact groups, time periods, and commands.

Global Host Information Access

Format: authorized_for_all_hosts:<user1>,<user2>,<user3>,...<usern>

Example: authorized_for_all_hosts:nagiosadmin,theboss

This is a comma-delimited list of names of authenticated users who can view status and configuration information for all hosts. Users in this list are also automatically authorized to view information for all services. Users in this list are not automatically authorized to issue commands for all hosts or services. If you want users able to issue commands for all hosts and services as well, you must add them to the authorized_for_all_host_commands variable.

Global Host Command Access

Format: authorized_for_all_host_commands:<user1>,<user2>,<user3>,...<usern>

Example: authorized_for_all_host_commands:nagiosadmin

This is a comma-delimited list of names of authenticated users who can issue commands for all hosts via the command CGI. Users in this list are also automatically authorized to issue commands for all services. Users in this list are not automatically authorized to view status or configuration information for all hosts or services. If you want users able to view status and configuration information for all hosts and services as well, you must add them to the authorized_for_all_hosts variable.

Global Service Information Access

Format: authorized_for_all_services:<user1>,<user2>,<user3>,...<usern>

Example: authorized_for_all_services:nagiosadmin,theboss

This is a comma-delimited list of names of authenticated users who can view status and configuration information for all services. Users in this list are not automatically authorized to view information for all hosts. Users in this list are not automatically authorized to issue commands for all services. If you want users able to issue commands for all services as well, you must add them to the authorized_for_all_service_commands variable.

Global Service Command Access

Format: authorized_for_all_service_commands:<user1>,<user2>,<user3>,...<usern>

Example: authorized_for_all_service_commands:nagiosadmin

This is a comma-delimited list of names of authenticated users who can issue commands for all services via the command CGI. Users in this list are not automatically authorized to issue commands for all hosts. Users in this list are not automatically authorized to view status or configuration information for all hosts. If you want users able to view status and configuration information for all services as well, you must add them to the authorized_for_all_services variable.

Read-Only Users

Format: authorized_for_read_only:<user1>,<user2>,<user3>,...<usern>

Example: authorized_for_read_only:john,mark

A comma-delimited list of usernames that have read-only rights in the CGIs. This will block any service or host commands normally shown on the extinfo CGI pages. It will also block comments from being shown to read-only users.

Lock Author Names

Format: lock_author_names:[0/1]

Example: lock_author_names:1

This option allows you to restrict users from changing the author name when submitting comments, acknowledgements, and scheduled downtime from the web interface. If this option is enabled, users will be unable to change the author name associated with the command request.

0 : Allow users to change author names when submitting commands

1 : Prevent users from changing author names (default)

Statusmap CGI Background Image

Format: statusmap_background_image:<image_file>

Example: statusmap_background_image:smbackground.gd2

This option allows you to specify an image to be used as a background in the statusmap CGI if you use the user-supplied coordinates layout method. The background image is not be available in any other layout methods. It is assumed that the image resides in the HTML images path (i.e./usr/local/nagios/share/images). This path is automatically determined by appending "/images" to the path specified by the physical_html_path directive. Note: The image file can be in GIF, JPEG, PNG, or GD2 format. However, GD2 format (preferably in uncompressed format) is recommended, as it will reduce the CPU load when the CGI generates the map image.

Statusmap CGI Color Transparency Indexes

Format: color_transparency_index_r:<0-255>

color_transparency_index_g:<0-255>

color_transparency_index_b:<0-255>

Example: color_transparency_index_r:255

color_transparency_index_g:255

color_transparency_index_b:255

These options set the r,g,b values of the background color used the statusmap CGI, so normal browsers that can’t show real png transparency set the desired color as a background color instead (to make it look pretty). Defaults to white: (R,G,B) :(255,255,255).

Default Statusmap Layout Method

Format: default_statusmap_layout:<layout_number>

Example: default_statusmap_layout:4

This option allows you to specify the default layout method used by the statusmap CGI. Valid options are:

<layout_number> Value Layout Method

0 User-defined coordinates

1 Depth layers

2 Collapsed tree

3 Balanced tree

4 Circular

5 Circular (Marked Up)

6 Circular (Balloon)

Statuswrl CGI Include World

Format: statuswrl_include:<vrml_file>

Example: statuswrl_include:myworld.wrl

This option allows you to include your own objects in the generated VRML world. It is assumed that the file resides in the path specified by the physical_html_path directive. Note: This file must be a fully qualified VRML world (i.e. you can view it by itself in a VRML browser).

Default Statuswrl Layout Method

Format: default_statuswrl_layout:<layout_number>

Example: default_statuswrl_layout:4

This option allows you to specify the default layout method used by the statuswrl CGI. Valid options are:

<layout_number> Value Layout Method

0 User-defined coordinates

2 Collapsed tree

3 Balanced tree

4 Circular

CGI Refresh Rate

Format: refresh_rate:<rate_in_seconds>

Example: refresh_rate:90

This option allows you to specify the number of seconds between page refreshes for the status, statusmap, and extinf CGIs.

Audio Alerts

Formats: host_unreachable_sound:<sound_file>

host_down_sound:<sound_file>

service_critical_sound:<sound_file>

service_warning_sound:<sound_file>

service_unknown_sound:<sound_file>

Examples: host_unreachable_sound:hostu.wav

host_down_sound:hostd.wav

service_critical_sound:critical.wav

service_warning_sound:warning.wav

service_unknown_sound:unknown.wav

These options allow you to specify an audio file that should be played in your browser if there are problems when you are viewing the status CGI. If there are problems, the audio file for the most criticaltype of problem will be played. The most critical type of problem is on or more unreachable hosts, while the least critical is one or more services in an unknown state (see the order in the example above). Audio files are assumed to be in the media/ subdirectory in your HTML directory (i.e./usr/local/nagios/share/media).

Ping Syntax

Format: ping_syntax:<command>

Example: ping_syntax:/bin/ping -n -U -c 5 $HOSTADDRESS$

This option determines what syntax should be used when attempting to ping a host from the WAP interface (using the statuswml CGI. You must include the full path to the ping binary, along with all required options. The $HOSTADDRESS$ macro is substituted with the address of the host before the command is executed.

Escape HTML Tags Option

Format: escape_html_tags:[0/1]

Example: escape_html_tags:1

This option determines whether or not HTML tags in host and service (plugin) output is escaped in the CGIs. If you enable this option, your plugin output will not be able to contain clickable hyperlinks.

Notes URL Target

Format: notes_url_target:[target]

Example: notes_url_target:_blank

This option determines the name of the frame target that notes URLs should be displayed in. Valid options include _blank, _self, _top, _parent, or any other valid target name.

Action URL Target

Format: action_url_target:[target]

Example: action_url_target:_blank

This option determines the name of the frame target that action URLs should be displayed in. Valid options include _blank, _self, _top, _parent, or any other valid target name.

Splunk Integration Option

Format: enable_splunk_integration:[0/1]

Example: enable_splunk_integration:1

This option determines whether integration functionality with Splunk is enabled in the web interface. If enabled, you’ll be presented with "Splunk It" links in various places in the CGIs (log file, alert history, host/service detail, etc). Useful if you’re trying to research why a particular problem occurred.

Splunk URL

Format: splunk_url:<path>

Example: splunk_url:http://127.0.0.1:8000/

This option is used to define the base URL to your Splunk interface. This URL is used by the CGIs when creating links if the enable_splunk_integration option is enabled.

==================================

How to Monitor Windows Machines

==================================

Introduction

This document describes how you can monitor "private" services and attributes of Windows machines, such as:

* Memory usage

* CPU load

* Disk usage

* Service states

* Running processes

etc.

Publicly available services that are provided by Windows machines (HTTP, FTP, POP3, etc.) can be monitored easily. Monitoring private services or attributes of a Windows machine requires that you install an agent on it. This agent acts as a proxy between the Nagios plugin that does the monitoring and the actual service or attribute of the Windows machine. Without installing an agent on the Windows box, Nagios would be unable to monitor private services or attributes of the Windows box. For this example, we will be installing the NSClient++ addon on the Windows machine and using the check_nt plugin to communicate with the NSClient++ addon. The check_nt plugin should already be installed on the Nagios server.

Download NSClient++ for windows 32bit :

`<http://files.nsclient.org/x-0.3.x/NSClient%2B%2B-0.3.8-Win32.msi>`_

Download NSClient++ from windows 64bit :

`<http://files.nsclient.org/x-0.3.x/NSClient%2B%2B-0.3.8-x64.msi>`_

Install Nagios Client NSClient++ :

Click on NSClient++0.3.8 & start installation & follow the steps as shown in images.

Click & select “ I accept the terms in the License Agreement ” & Click Next

Dont change any settings on this page & click next.

As shown in image fill the required fields & select modules as shown in image.

Nagios Client installation is completed now edit NSC.ini file which is located at C:\Program Files\NSClient++

Open NSC.ini file with notepad & Verify nagios ip address, password & uncomment the port number used for nagios as shown in images.

Save changes to NSC.ini file & goto services in control panel & Setup the NSClient++ service as shown in image.

Right Click on NSClient++ service & goto properties.

Click on “ Log On ” tab & select “ Allow Service to interact with desktop ” & click “ Apply ” & “ ok ” start the service as shown in image.

**Configuring Nagios :**

Open the windows.cfg file for editing.

``vi /usr/local/nagios/etc/objects/windows.cfg``

This Configuration file is divided to three parts:

* Host Definition

* Host Group Definition

* Service Definition

Host Definition :

Host definition is mainly consists of four configuration parameters as shown in image.

As shown in image there are four configuration parameters use,host_name,alias & address.

use : This is host template name used in template.cfg file. ( For ex: windows_server ).

host_name: This is host name of windows machine to be monitored.( For ex: Apache webserver )

Alias: This is alias name assigned to host. ( Apache Server )

Address: This is ip address of windows machine to be monitored. ( For ex: 192.168.1.1 )

Host Group Definition:

Host group definition mainly consists of 3 configuration parameters as shown in image.

Hostgroup_name: This is name of group it can be anything ( For ex: Database Server’s ).

Alias: This is alias name assigned to hostgroup, it can be anything ( For ex: Database Server’s ).

Members: This is hostname to be added to host group. ( For ex: Database server,Apache Server, …….xyz )

Service Definition:

Service definition consists of four configuration parameters as shown in image.

Use : Service template name to be used as defined in template.cfg file.

Host_Name : Host name of the server as defined in Host Definition.

Service_description : This is name of service to be monitored. ( we can customize this name as per requirement ).

Check_Command : This Command to check services as defined in commands.cfg.

Example Configuration For windows MySQL Database Server :

Following images are the example configuration of Mysql Database server. You can change service definitions As per your requirements.

Host Definition

Service Definition

Above images are example configuration for MySQL database server, host definition includes hostname & ip address of Server to be monitored & service definition includes various service to be monitored on database server, we can change this Service definition as per requirement. NSClient Version, Uptime, CPU Load, Memory Usage, & C:/ Drive Space is default Service definitions.

After writing configuration for server, if you want to check output of any plugin it can be done as follows.

Move to directory /usr/local/nagios/libexec/ & Check help page of plugin as shown in image below.

``cd /usr/local/nagios/libexec/``

``./check_nt --help``

If you want to check manually C:/Drive Space of server it can be done as follows.

As shown in above image we checked C :/ Drive Space of server where,

Check_nt is plugin it is located in /usr/local/nagios/libexec

-H is IP Address of server to be monitored.

-s is password of nagios server. ( It is kept used for secure connection )

-p is port number of nagios server.

-v is variable to be monitored ( Disk space, CPU Load, Memory usage, Service Status, Process Status )

-l is argument ( In this case it is “c drive” ).

-w Warning limit ( If any )

-c Critical limit ( If any )

Once configuration file is written properly we have to check weather all configuration parameters are written in exact manner. Use following command to check configuration parameter.

Note :- Every time we change configuration requires to check configuration files & restart nagios service.

``/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg``

In above command /bin/nagios is “nagios binary”. -v is verify nagios configuration file. nagios.cfg is Nagios Main Configuration file. (Output of command is shown in image below) It shows Total warning & Total error is zero it means our configuration parameter is right we can proceed to add this server For monitoring.

It seems like out configuration is correct now restart nagios service to start monitoring of server & respective services.

Note :- Every time we change configuration requires to check configuration files & restart nagios service.

``/etc/init.d/nagios restart``

After restarting nagios service server monitoring will starts as shown in image below.

===============================

How to Monitor Network Switch

===============================

We can monitor the status of network switches. Some cheaper "unmanaged" switches and hubs don’t have IP addresses and are essentially invisible on your network, so there’s not any way to monitor them. More expensive switches and routers have addresses assigned to them and can be monitored by pinging them or using SNMP to query status information. I’ll describe how you can monitor the following things on managed switches.

* Packet loss using ping

* SNMP status information

* Switch Uptime

* Switch CPU Load

* Switch Memory Usage

Create new host and service definitions for monitoring switch in /usr/local/nagios/etc/object/switch.cfg file.

Host Definition:-

Host definition includes the configuration parameters which are explained below.

Use :- This is name of template defined in template.cfg file

Host_name :- This is host name of switch ( Cisco Switch ) ( You can change this name as per requirement )

Alias :- This is alias name of switch ( You can change this as per requirement )

Address :- This is IP Address of switch. ( 192.168.1.5 )

Service Definition :-

Service Definition for ping status of switch.

Service definition includes the configuration parameters which are explained below.

Use :- Service template name defined in template.cfg

Host_name :- Host name of switch ( Cisco Switch )( You can change this name as per requirement ).

Service_description :- Name of service or description of service to be monitored on host.

Check_command :- Actual Service command to monitor service on host.

Example Configuration For Cisco Switch :

Following images are the example configuration of Cisco Switch. You can change service definitions As per your requirements.

Host Definition

Service Definition

Ping Command :

Ping Command is to check ping status of switch, If round trip average of ping is more than 400 miliseconds or Packet loss is more than 40% it gives warning & ping is more than 900 miliseconds or packet loss is more then 90% then it gives Ping Status as critical.

Switch Uptime Command:

In above image check_snmp is command to check uptime of switch using Snmp status, -c is snmp community name for switch which is required to configure in switch & -o is object is or OID of switch to monitor Uptime of switch.

Port 1 Link Status Command:

In above image check_snmp is command to check Port link status of switch using Snmp status, -c is snmp community name for switch which is required to configure in switch,-o is object id or OID of switch to monitor Port link status of switch & -m is MIB (Management information base ) of switch.

Switch CPU Load Command:

In above image check_snmp_load is command to check CPU load on cisco switch, -c is snmp community name for switch which is required to configure in switch, -T is type of switch ( Cisco, Netgear etc… ), & threshold limit for cpu load i.e cpu load increases up to 80% to 85% it will notify as warning & if cpu load increases upto 90% to 95% it will notify as Critical.

Switch Memory Usage Command:

In above image check_snmp_mem_v1 is command to check Memory usage on cisco switch Using snmp status, -c is snmp community name which is required to configure in switch, -I stands for type of switch i.e cisco or netgear etc… & threshold limit for memory usage i.e memory usage increases up to 90% it will notify as warning & if memory usage Increases upto 95% it will notify as critical.

Once configuration file is written properly we have to check weather all configuration parameters are written in exact manner. Use following command to check configuration parameter.

Note :- Every time we change configuration requires to check configuration files & restart nagios service.

``/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg``

It seems like out configuration is correct now restart nagios service to start monitoring of server & respective services.

Note :- Every time we change configuration requires to check configuration files & restart nagios service.

``/etc/init.d/nagios restart``

After Restarting nagios service monitoring of switch starts as shown in image below.

===============================

How to Monitor Network Router

===============================

We can monitor the status of network routers & hardware firewall. Some routers and hardware firewall have addresses assigned to them and can be monitored by pinging them or using SNMP to query status information. I’ll describe how you can monitor the following things on router & hardware firewall.

* Ping status of router.

* Bandwidth usage of different ports.

* Router Uptime.

* Router or Firewall CPU Load.

* Router or Firewall port status.

* Router or Firewall Memory usage.

Create new host and service definitions for monitoring router in /usr/local/nagios/etc/object/router.cfg file.

Host Definition:

Host definition includes the configuration parameters which are explained below.

Use:- This is template name for routers or firewall as defined in template.cfg

Host_name:- This is hostname of router or firewall to be monitored ( It can be change as per requirements ).

Alias :- This is alias name of router or firewall ( It can be change as per requirements ).

Address :- This is IP address of router or firewall to be monitored.

Service Definition:

Service definition includes the configuration parameters which are explained below.

Use:- This is template name for service as defined in template.cfg

Host_name:- This is hostname of router or firewall to be monitored.

Service_description:- This is name of service or description of service to be monitored on router or firewall.

Check_command:- This is actual command to process monitoring on router or firewall.

Example Configuration For Router or Firewall :

Following images are the example configuration of Fortifate Firewall. You can change service definitions As per your requirements.

Host Definition:

This is host definition of fortigate firewall as explained above.

Service Definition:

Check_ping command:

In above image check_ping is command to check ping status of firewall using round trip average, if ping Status is greater than 200 miliseconds or packet loss is greater than 20% it will notify as warning & if ping status is greater than 600 Milliseconds or packet loss is greater than 60% it will notify as critical.

Uptime command:

In above image check_snmp is command to check uptime status of firewall using snmp status, -c is snmp community name which configured in firewall, -o is OID or object id for firewall to monitor uptime status of firewall.

CPU Load Command:

In above image check_snmp_load is command to check cpu load on firewall using snmp status, -c is snmp community name which is configured in firewall, -T is type of firewall ( Cisco,fortigate, etc… ) if cpu load on firewall is more than 80% it will notify as warning & if cpu load on firewall is more than 90% it will notify as critical.

Port status:

In above image check_snmp_int is command to check port status of firewall using snmp status, -c is snmp community name which is configured in firewall & -n is port number to be monitored as shown in image we are m,onitorinf port number 1 With same configuration we can monitor different port of firewall by changing port number as shown in image below.

Bandwidth Monitorin Command:

As shown in above image check_local_mrtgtraf is command to check bandwidth usage of firewall or router ports using mrtg package, mrtg is mrtg is multi router traffic graphing tool configured with nagios to collect the data required to monitor the bandwidth of router or firewall. It collects the data in log file & by monitoring log file we can get bandwidth usage of that port. 172.25.0.10_4.log is log file of port number wan2, if bandwidth usage of this port is more than 1 Mb/s it will notify as warning & if bandwidth usage of this port is more than 5 Mb/s it will notify as critical.By using same configuration we can monitor different ports bandwidth usage by altering log file for respective port as shown in image below.

Once configuration file is written properly we have to check weather all configuration parameters are written in exact manner. Use following command to check configuration parameter.

Note :- Every time we change configuration requires to check configuration files & restart nagios service.

``/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg``

It seems like out configuration is correct now restart nagios service to start monitoring of server & respective services.

Note :- Every time we change configuration requires to check configuration files & restart nagios service.

``/etc/init.d/nagios restart``

After Restarting nagios service monitoring of switch starts as shown in image below.

Linux4Dummies

Pages

Tuesday, April 3, 2012

Nagios

No comments:

Post a Comment