Main Configuration File Options


Notes

When creating and/or editing configuration files, keep the following in mind:

  1. Lines that start with a '#' character are taken to be comments and are not processed
  2. Variables names must begin at the start of the line - no white space is allowed before the name
  3. Variable names are case-sensitive

Sample Configuration

A sample main configuration file can be created by running the 'make config' command. The default name of the main configuration file is netsaint.cfg - look for it in the NetSaint distribution directory or in the etc/ subdirectory of your installation.

Index

Log file
Object (host) configuration file
Resource file
Temp file

Status file (status log)
Aggregated status updates option
Aggregated status data update interval

NetSaint user
NetSaint group

Program mode

Service check execution option
Passive service check acceptance option
Event handler option

Log rotation method
Log archive path

External command check option
External command check interval
External command file

Comment file
Lock file

State retention option
State retention file
Automatic state retention update interval
Use retained program state option

Syslog logging option
Notification logging option
Service check retry logging option
Host retry logging option
Event handler logging option
Initial state logging option
External command logging option
Passive service check logging option

Global host event handler
Global service event handler

Inter-check sleep time
Inter-check delay method
Service interleave factor
Maximum concurrent service checks
Service reaper frequency
Timing interval length

Agressive host checking option

Flap detection option
Low service flap threshold
High service flap threshold
Low host flap threshold
High host flap threshold

Soft service dependencies option

Service check timeout
Host check timeout
Event handler timeout
Notification timeout
Obsessive compulsive service processor timeout
Performance data processor command timeout

Obsess over services option
Obsessive compulsive service processor command

Performance data processing option
Host performance data processor command
Service performance data processor command

Orphaned service check option

Administrator email address
Administrator pager

Log File

Format: log_file=<file_name>
Example: log_file=/usr/local/netsaint/var/netsaint.log

This variable specifies where NetSaint should create its main log file. This should be the first variable that you define in your configuration file, as NetSaint will try to write errors that it finds in the rest of your configuration data to this file. This file is never deleted, pruned or rotated by NetSaint. I suggest adding a cron job to do log rotations every month or so (more often if you have a lot of alarms).

Object (Host) Configuration File

Format: cfg_file=<file_name>
Example: cfg_file=/usr/local/netsaint/etc/hosts.cfg

This specifies the object/host configuration file that NetSaint should use for monitoring. This file has traditionally been called the "host" config file, even though it may contain more than just host definitions. Object configuration files contain definitions for hosts, host groups, contacts, contact groups, services, commands, etc. You can split your configuration information into several files and specify multiple cfg_file= statements to include each of them.

Resource File

Format: resource_file=<file_name>
Example: resource_file=/usr/local/netsaint/etc/resource.cfg

This is used to specify an optional resource file that can contain $USERn$ macro definitions. $USERn$ macros are useful for storing usernames, passwords, and items commonly used in command definitions (like directory paths). The CGIs will not attempt to read resource files, so you can set restrictive permissions (600 or 660) on them to protect sensitive information. You can include multiple resource files by adding multiple resource_file statements to the main config file - NetSaint will process them all. See the sample resource.cfg file in the base of the NetSaint directory for an example of how to define $USERn$ macros.

Temp File

Format: temp_file=<file_name>
Example: temp_file=/usr/local/netsaint/var/netsaint.tmp

This is the temporary file into which NetSaint redirects the standard output and error from the execution of plugins. The output from the plugins is scooped from the temp file and used for both display in the "status" CGI output and use in notification macros. This file is deleted after the plugin has been executed. This file is also used as a scratch file when NetSaint updates the status log.

Note: On most systems, the temp file will have to reside on the same filesystem as the status file, the log file, and the log file archive path.

Status File (Status Log)

Format: status_file=<file_name>
Example: status_file=/usr/local/netsaint/var/status.log

This is the file that NetSaint uses to store the current status of all monitored services. The status of all hosts associated with the service you monitor are also recorded here. This file is used by the "status" CGI so that current monitoring status can be reported via a web interface. The CGIs must have read access to this file in order to function properly. This file is deleted every time NetSaint stops and recreated when it starts.

Aggregated Status Updates Option

Format: aggregate_status_updates=<0/1>
Example: aggregate_status_updates=1

This option determines whether or not NetSaint will aggregate updates of host, service, and program status data. Normally, status data is updated immediately when a change occurs. This can result in high CPU loads if you are monitoring a lot of services. If you want NetSaint to only update status data (in the status log) every few seconds (as determined by the status_update_interval option), enable this option. If you want immediate updates, disable it. Values are as follows:

Aggregated Status Update Interval

Format: status_update_interval=<seconds>
Example: status_update_interval=15

This setting determines how often (in seconds) that NetSaint will update status data in the status log. The minimum update interval is five seconds. If you have disabled aggregated status updates (with the aggregate_status_updates option), this option has no effect.

NetSaint User

Format: netsaint_user=<username/UID>
Example: netsaint_user=netsaint

This is used to set the effective user that the NetSaint process should run as. After initial program startup and before starting to monitor anything, NetSaint will drop its effective privileges and run as this user. You may specify either a username or a UID.

NetSaint Group

Format: netsaint_group=<groupname/GID>
Example: netsaint_group=netsaint

This is used to set the effective group that the NetSaint process should run as. After initial program startup and before starting to monitor anything, NetSaint will drop its effective privileges and run as this group. You may specify either a groupname or a GID.

Program Mode

Format: program_mode=<a/s>
Example: program_mode=a

This is the intial program mode that NetSaint should use when it starts or restarts. More information on program modes can be found here. Note: If you have state retention enabled, NetSaint will ignore this setting when it (re)starts and use the last known setting for this option (as stored in the state retention file), unless you disable the use_retained_program_state option. If you want to change this option when state retention is active (and the use_retained_program_state is enabled), you'll have to use the appropriate external command or change it via the web interface. Values are as follows:

Service Check Execution Option

Format: execute_service_checks=<0/1>
Example: execute_service_checks=1

This option determines whether or not NetSaint will execute service checks when it initially (re)starts. If this option is disabled, NetSaint will not actively execute any service checks and will remain in a sort of "sleep" mode (it can still accept passive checks unless you've disabled them). This option is most often used when configuring backup monitoring servers, as described in the documentation on redundancy, or when setting up a distributed monitoring environment. Note: If you have state retention enabled, NetSaint will ignore this setting when it (re)starts and use the last known setting for this option (as stored in the state retention file), unless you disable the use_retained_program_state option. If you want to change this option when state retention is active (and the use_retained_program_state is enabled), you'll have to use the appropriate external command or change it via the web interface. Values are as follows:

Passive Service Check Acceptance Option

Format: accept_passive_service_checks=<0/1>
Example: accept_passive_service_checks=1

This option determines whether or not NetSaint will accept passive service checks when it initially (re)starts. If this option is disabled, NetSaint will not accept any passive service checks. Note: If you have state retention enabled, NetSaint will ignore this setting when it (re)starts and use the last known setting for this option (as stored in the state retention file), unless you disable the use_retained_program_state option. If you want to change this option when state retention is active (and the use_retained_program_state is enabled), you'll have to use the appropriate external command or change it via the web interface. Values are as follows:

Event Handler Option

Format: enable_event_handlers=<0/1>
Example: enable_event_handlers=1

This option determines whether or not NetSaint will run event handlers when it initially (re)starts. If this option is disabled, NetSaint will not run any host or service event handlers. Note: If you have state retention enabled, NetSaint will ignore this setting when it (re)starts and use the last known setting for this option (as stored in the state retention file), unless you disable the use_retained_program_state option. If you want to change this option when state retention is active (and the use_retained_program_state is enabled), you'll have to use the appropriate external command or change it via the web interface. Values are as follows:

Log Rotation Method

Format: log_rotation_method=<n/h/d/w/m>
Example: log_rotation_method=d

This is the rotation method that you would like NetSaint to use for your log file. Values are as follows:

Log Archive Path

Format: log_archive_path=<path>
Example: log_archive_path=/usr/local/netsaint/var/archives/

This is the directory where NetSaint should place log files that have been rotated. This option is ignored if you choose to not use the log rotation functionality.

External Command Check Option

Format: check_external_commands=<0/1>
Example: check_external_commands=1

This option determines whether or not NetSaint will check the command file for internal commands it should execute. This option must be enabled if you plan on using the command CGI to issue commands via the web interface. Third party programs can also issue commands to NetSaint by writing to the command file, provided proper rights to the file have been granted as outlined in this FAQ. More information on external commands can be found here.

External Command Check Interval

Format: command_check_interval=<xxx>
Example: command_check_interval=1

This is the number of "time units" to wait between external command checks. Unless you've changed the interval_length value (as defined below) from the default value of 60, this number will mean minutes. Each time NetSaint checks for external commands it will read and process all commands present in the command file before continuing on with its other duties. More information on external commands can be found here.

External Command File

Format: command_file=<file_name>
Example: command_file=/usr/local/netsaint/var/rw/netsaint.cmd

This is the file that NetSaint will check for external commands to process. The command CGI writes commands to this file. Other third party programs can write to this file if proper file permissions have been granted as outline in here. The external command file is implemented as a named pipe (FIFO), which is created when NetSaint starts and removed when it shuts down. More information on external commands can be found here.

Comment File

Format: comment_file=<file_name>
Example: comment_file=/usr/local/netsaint/var/comment.log

This is the file that NetSaint will use for storing service and host comments. Comments can be viewed and added for both hosts and services through the extended information CGI.

Lock File

Format: lock_file=<file_name>
Example: lock_file=/tmp/netsaint.lock

This option specifies the location of the lock file that NetSaint should create when it runs as a daemon (when started with the -d command line argument). This file contains the process id (PID) number of the running NetSaint process.

State Retention Option

Format: retain_state_information=<0/1>
Example: retain_state_information=1

This option determines whether or not NetSaint will retain state information for hosts and services between program restarts. If you enable this option, you should supply a value for the state_retention_file variable. When enabled, NetSaint will save all state information for hosts and service before it shuts down (or restarts) and will read in previously saved state information when it starts up again.

State Retention File

Format: state_retention_file=<file_name>
Example: state_retention_file=/usr/local/netsaint/var/status.sav

This is the file that NetSaint will use for storing service and host state information before it shuts down. When NetSaint is restarted it will use the information stored in this file for setting the initial states of services and hosts before it starts monitoring anything. This file is deleted after NetSaint reads in initial state information when it (re)starts. In order to make NetSaint retain state information between program restarts, you must enable the retain_state_information option.

Automatic State Retention Update Interval

Format: retention_update_interval=<minutes>
Example: retention_update_interval=60

This setting determines how often (in minutes) that NetSaint will automatically save retention data during normal operation. If you set this value to 0, NetSaint will not save retention data at regular intervals, but it will still save retention data before shutting down or restarting. If you have disabled state retention (with the retain_state_information option), this option has no effect.

Use Retained Program State Option

Format: use_retained_program_state=<0/1>
Example: use_retained_program_state=1

This setting determines whether or not NetSaint will set various program-wide state variables based on the values saved in the retention file. Some of these program-wide state variables that are normally saved across program restarts if state retention is enabled include the program_mode, enable_flap_detection, enable_event_handlers, execute_service_checks, and accept_passive_service_checks options. If you do not have state retention enabled, this option has no effect.

Syslog Logging Option

Format: use_syslog=<0/1>
Example: use_syslog=1

This variable determines whether messages are logged to the syslog facility on your local host. Values are as follows:

Notification Logging Option

Format: log_notifications=<0/1>
Example: log_notifications=1

This variable determines whether or not notification messages are logged. If you have a lot of contacts or regular service failures your log file will grow relatively quickly. Use this option to keep contact notifications from being logged.

Service Check Retry Logging Option

Format: log_service_retries=<0/1>
Example: log_service_retries=1

This variable determines whether or not service check retries are logged. Service check retries occur when a service check results in a non-OK state, but you have configured NetSaint to retry the service more than once before responding to the error. Services in this situation are considered to be in "soft" states. Logging service check retries is mostly useful when attempting to debug NetSaint or test out service event handlers.

Host Check Retry Logging Option

Format: log_host_retries=<0/1>
Example: log_host_retries=1

This variable determines whether or not host check retries are logged. Logging host check retries is mostly useful when attempting to debug NetSaint or test out host event handlers.

Event Handler Logging Option

Format: log_event_handlers=<0/1>
Example: log_event_handlers=1

This variable determines whether or not service and host event handlers are logged. Event handlers are optional commands that can be run whenever a service or hosts changes state. Logging event handlers is most useful when debugging NetSaint or first trying out your event handler scripts.

Initial States Logging Option

Format: log_initial_states=<0/1>
Example: log_initial_states=1

This variable determines whether or not NetSaint will force all initial host and service states to be logged, even if they result in an OK state. Initial service and host states are normally only logged when there is a problem on the first check. Enabling this option is useful if you are using an application that scans the log file to determine long-term state statistics for services and hosts.

External Command Logging Option

Format: log_external_commands=<0/1>
Example: log_external_commands=1

This variable determines whether or not NetSaint will log external commands that it receives from the external command file. Note: This option does not control whether or not passive service checks (which are a type of external command) get logged. To enable or disable logging of passive checks, use the log_passive_service_checks option.

Passive Service Check Logging Option

Format: log_passive_service_checks=<0/1>
Example: log_passive_service_checks=1

This variable determines whether or not NetSaint will log passive service checks that it receives from the external command file. If you are setting up a distributed monitoring environment or plan on handling a large number of passive checks on a regular basis, you may wish to disable this option so your log file doesn't get too large.

Global Host Event Handler Option

Format: global_host_event_handler=<command>
Example: global_host_event_handler=log-host-event-to-db

This option allows you to specify a host event handler command that is to be run for every host state change. The global event handler is executed immediately prior to the event handler that you have optionally specified in each host definition. The command argument is the short name of a command definition that you define in your host configuration file. The maximum amount of time that this command can run is controlled by the event_handler_timeout option. More information on event handlers can be found here.

Global Service Event Handler Option

Format: global_service_event_handler=<command>
Example: global_service_event_handler=log-service-event-to-db

This option allows you to specify a service event handler command that is to be run for every service state change. The global event handler is executed immediately prior to the event handler that you have optionally specified in each service definition. The command argument is the short name of a command definition that you define in your host configuration file. The maximum amount of time that this command can run is controlled by the event_handler_timeout option. More information on event handlers can be found here.

Inter-Check Sleep Time

Format: sleep_time=<seconds>
Example: sleep_time=1

This is the number of seconds that NetSaint will sleep before checking to see if the next service check in the scheduling queue should be executed. Note that NetSaint will only sleep after it "catches up" with queued service checks that have fallen behind.

Inter-Check Delay Method

Format: inter_check_delay_method=<n/d/s/x.xx>
Example: inter_check_delay_method=s

This option allows you to control how service checks are initially "spread out" in the event queue. Using a "smart" delay calculation (the default) will cause NetSaint to calculate an average check interval and spread initial checks of all services out over that interval, thereby helping to eliminate CPU load spikes. Using no delay is generally not recommended unless you are testing the service check parallelization functionality. Using no delay will cause all service checks to be scheduled for execution at the same time. This means that you will generally have large CPU spikes when the services are all executed in parallel. More information on how to estimate how the inter-check delay affects service check scheduling can be found here.Values are as follows:

Service Interleave Factor

Format: service_interleave_factor=<s|x>
Example: service_interleave_factor=s

This variable determines how service checks are interleaved. Interleaving allows for a more even distribution of service checks, reduced load on remote hosts, and faster overall detection of host problems. With the introduction of service check parallelization, remote hosts could get bombarded with checks if interleaving was not implemented. This could cause the service checks to fail or return incorrect results if the remote host was overloaded with processing other service check requests. Setting this value to 1 is equivalent to not interleaving the service checks (this is how versions of NetSaint previous to 0.0.5 worked). Set this value to s (smart) for automatic calculation of the interleave factor unless you have a specific reason to change it. The best way to understand how interleaving works is to watch the status CGI (detailed view) when NetSaint is just starting. You should see that the service check results are spread out as they begin to appear. More information on how interleaving works can be found here.

Maximum Concurrent Service Checks

Format: max_concurrent_checks=<max_checks>
Example: max_concurrent_checks=20

This option allows you to specify the maximum number of service checks that can be run in parallel at any given time. Specifying a value of 1 for this variable essentially prevents any service checks from being parallelized. Specifying a value of 0 (the default) does not place any restrictions on the number of concurrent checks. You'll have to modify this value based on the system resources you have available on the machine that runs NetSaint, as it directly affects the maximum load that will be imposed on the system (processor utilization, memory, etc.). More information on how to estimate how many concurrent checks you should allow can be found here.

Service Reaper Frequency

Format: service_reaper_frequency=<frequency_in_seconds>
Example: service_reaper_frequency=10

This option allows you to control the frequency in seconds of service "reaper" events. "Reaper" events process the results from parallelized service checks that have finished executing. These events consitute the core of the monitoring logic in NetSaint.

Timing Interval Length

Format: interval_length=<seconds>
Example: interval_length=60

This is the number of seconds per "unit interval" used for timing in the scheduling queue, re-notifications, etc. "Units intervals" are used in the host configuration file to determine how often to run a service check, how often of re-notify a contact, etc.

Important: The default value for this is set to 60, which means that a "unit value" of 1 in the host configuration file will mean 60 seconds (1 minute). I have not really tested other values for this variable, so proceed at your own risk if you decide to do so!

Agressive Host Checking Option

Format: use_agressive_host_checking=<0/1>
Example: use_agressive_host_checking=0

Beginning with release 0.0.4, NetSaint tries to be a little smarter about how and when it checks the status of hosts. In general, disabling this option will allow NetSaint to make some smarter decisions and check hosts a bit faster. Enabling this option will increase the amount of time required to check hosts, but may improve reliability a bit. If you want to know more about exactly what this option does, search the source code in the netsaint.c file for the string "use_agressive_host_checking" and read some of the comments I've added. Unless you have problems with NetSaint not recognizing that a host recovered, I would suggest not enabling this option.

Flap Detection Option

Format: enable_flap_detection=<0/1>
Example: enable_flap_detection=0

This option determines whether or not NetSaint will try and detect hosts and services that are "flapping". Flapping occurs when a host or service changes between states too frequently, resulting in a barrage of notifications being sent out. When NetSaint detects that a host or service is flapping, it will temporarily supress notifications for that host/service until it stops flapping. Flap detection is very experimental at this point, so use this feature with caution! More information on how flap detection and handling works can be found here. Note: If you have state retention enabled, NetSaint will ignore this setting when it (re)starts and use the last known setting for this option (as stored in the state retention file), unless you disable the use_retained_program_state option. If you want to change this option when state retention is active (and the use_retained_program_state is enabled), you'll have to use the appropriate external command or change it via the web interface.

Low Service Flap Threshold

Format: low_service_flap_threshold=<percent>
Example: low_service_flap_threshold=25.0

This option is used to set the low threshold for detection of service flapping. For more information on how flap detection and handling works (and how this option affects things) read this.

High Service Flap Threshold

Format: high_service_flap_threshold=<percent>
Example: high_service_flap_threshold=50.0

This option is used to set the low threshold for detection of service flapping. For more information on how flap detection and handling works (and how this option affects things) read this.

Low Host Flap Threshold

Format: low_host_flap_threshold=<percent>
Example: low_host_flap_threshold=25.0

This option is used to set the low threshold for detection of host flapping. For more information on how flap detection and handling works (and how this option affects things) read this.

High Host Flap Threshold

Format: high_host_flap_threshold=<percent>
Example: high_host_flap_threshold=50.0

This option is used to set the low threshold for detection of host flapping. For more information on how flap detection and handling works (and how this option affects things) read this.

Soft Service Dependencies Option

Format: soft_state_dependencies=<0/1>
Example: soft_state_dependencies=0

This option determines whether or not NetSaint will use soft service state information when checking service dependencies. Normally NetSaint will only use the latest hard service state when checking dependencies. If you want it to use the latest state (regardless of whether its a soft or hard state type), enable this option.

Service Check Timeout

Format: service_check_timeout=<seconds>
Example: service_check_timeout=60

This is the maximum number of seconds that NetSaint will allow service checks to run. If checks exceed this limit, they are killed and a CRITICAL state is returned. A timeout error will also be logged.

Host Check Timeout

Format: host_check_timeout=<seconds>
Example: host_check_timeout=60

This is the maximum number of seconds that NetSaint will allow host checks to run. If checks exceed this limit, they are killed and a CRITICAL state is returned and the host will be assumed to be DOWN. A timeout error will also be logged.

Event Handler Timeout

Format: event_handler_timeout=<seconds>
Example: event_handler_timeout=60

This is the maximum number of seconds that NetSaint will allow event handlers to be run. If an event handler exceeds this time limit it will be killed and a warning will be logged.

Notification Timeout

Format: notification_timeout=<seconds>
Example: notification_timeout=60

This is the maximum number of seconds that NetSaint will allow notification commands to be run. If a notification command exceeds this time limit it will be killed and a warning will be logged.

Obsessive Compulsive Service Processor Timeout

Format: ocsp_timeout=<seconds>
Example: ocsp_timeout=5

This is the maximum number of seconds that NetSaint will allow an obsessive compulsive service processor command to be run. If a command exceeds this time limit it will be killed and a warning will be logged.

Performance Data Processor Command Timeout

Format: perfdata_timeout=<seconds>
Example: perfdata_timeout=5

This is the maximum number of seconds that NetSaint will allow a host performance data processor command or service performance data processor command to be run. If a command exceeds this time limit it will be killed and a warning will be logged.

Obsess Over Services Option

Format: obsess_over_services=<0/1>
Example: obsess_over_services=1

This value determines whether or not NetSaint will "obsess" over service checks results and run the obsessive compulsive service processor command you define. I know - funny name, but it was all I could think of. This option is useful for performing distributed monitoring. If you're not doing distributed monitoring, don't enable this option.

Obsessive Compulsive Service Processor Command

Format: ocsp_command=<command>
Example: ocsp_command=obsessive_service_handler

This option allows you to specify a command to be run after every service check, which can be useful in distributed monitoring. This command is executed after any event handler or notification commands. The command argument is the short name of a command definition that you define in your host configuration file. The maximum amount of time that this command can run is controlled by the ocsp_timeout option. More information on distributed monitoring can be found here.

Performance Data Processing Option

Format: process_performance_data=<0/1>
Example: process_performance_data=1

This value determines whether or not NetSaint will process host and service check performance data by running either the host_perfdata_command or service_perfdata_command (whichever is appropriate) after every host and/or service check.

Host Performance Data Processor Command

Format: host_perfdata_command=<command>
Example: host_perfdata_command=handle-host-perfdata

This option allows you to specify a command that is to be run after every host check for the purpose of logging or handling host performance data. The command argument is the short name of a command definition that you define in your host configuration file. The maximum amount of time that this command can run is controlled by the perfdata_timeout option. More information on performance data can be found here.

Service Performance Data Processor Command

Format: service_perfdata_command=<command>
Example: service_perfdata_command=handle-service-perfdata

This option allows you to specify a command that is to be run after every service check for the purpose of logging or handling host performance data. The command argument is the short name of a command definition that you define in your host configuration file. The maximum amount of time that this command can run is controlled by the perfdata_timeout option. More information on performance data can be found here.

Orphaned Service Check Option

Format: check_for_orphaned_services=<0/1>
Example: check_for_orphaned_services=0

This option allows you to enable or disable checks for orphaned service checks. Orphaned service checks are checks which ahve been executed and have been removed from the event queue, but have not had any results reported in a long time. Since no results have come back in for the service, it is not rescheduled in the event queue. This can cause service checks to stop being executed. Normally it is very rare for this to happen - it might happen if an external user or process killed off the process that was being used to execute a service check. If this option is enabled and NetSaint finds that results for a particular service check have not come back, it will log an error message and reschedule the service check. If you start seeing service checks that never seem to get rescheduled, enable this option and see if you notice any log messages about orphaned services.

Administrator Email Address

Format: admin_email=<email_address>
Example: admin_email=root

This is the email address for the administrator of the local machine (i.e. the one that NetSaint is running on). This value can be used in notification commands by using the $ADMINEMAIL$ macro.

Administrator Pager

Format: admin_pager=<pager_number_or_pager_email_gateway>
Example: admin_pager=pageroot@pagenet.com

This is the pager number (or pager email gateway) for the administrator of the local machine (i.e. the one that NetSaint is running on). The pager number/address can be used in notification commands by using the $ADMINPAGER$ macro.