Jump to content

Service checks in LibreNMS


Cowboy Denny
 Share

Recommended Posts

LETS STARTED SEP BY STEP

LibreNMS is becoming one of my favourite monitoring tools. Setup and getting started is easy and it has enough advanced options and tuneable. I recently discovered that LibreNMS is able to check services as well. Services, in this context, means, executing Nagios plugins (like check http, check ping, etc). This allows you to check services that SNMP does not cover by default, like HTTP(s) health checks, certificate expiry, tcp port checks (e.g. rdp) and anything for which you can write a Nagios plugin yourself. The performance data, if available, is graphed automatically. Alerting is done with the regular LibreNMS alerts. This guide covers the setup of services (it's not enabled by default) and a few basic checks, like an http health check, certificate expiry and SSH monitoring.

Nagios check plugins

For those unfamiliar with Nagios, it is a monitoring system which can execute checks. These checks are scripts and program’s which take input (for example, which host to check, thresholds), do a check and then return an exit code and some performance data. The plugins can be in any language, Nagios only cares about the exit codes. They can be the following:

  • 0: OK
  • 1: WARNING
  • 2: CRITICAL

For example, to check if a website is working, you would use the check_http plugin. This plugin checks if the site returns a 200 OK and if so, gives exit status 0. If not, for example because of a timeout, access denied or 50x error, it will return status 1 or 2. Nagios then can do all kinds of alerting based on those statuses.

Performance data is comma separated value data added after the status output in the command result. This can be anything, for example, the time the HTTP request took.

Since you can write these scripts yourself any monitoring system that uses these plugins is very extensible. It can check anything you want as long as you can write a script for it. This makes the monitoring tool very powerful, you're not limited to what they provide.

Step1: Enabling service checks

Service checks are not enabled by default in LibreNMS. The documentation explains how to enable the module. In this guide I assume your path is /opt/librenms/. Edit your config file:

sudo nano /opt/librenms/config.php

Add the following line

$config['show_services']           = 1;

 Service Auto Discovery

To automatically create services for devices with available checks.

You need to enable the discover services within nano /opt/librenms/config.php with the following:

$config['discover_services']           = true;

Service Templates Auto Discovery

To automatically create services for devices with configured Service Templates.

You need to enable the discover services within nano /opt/librenms/config.php with the following:

$config['discover_services_templates']           = true;
 
Save it Ctrl +S
 

Debian/Ubuntu: 

$config['nagios_plugins']   = "/usr/lib/nagios/plugins";

Centos: 

$config['nagios_plugins']   = "/usr/lib64/nagios/plugins";
 

 

Save the file.

Step2: Setup

Service checks are now distributable if you run a distributed setup. To leverage this, use the dispatch service. Alternatively, you could also replace check-services.php with services-wrapper.py in cron instead to run across all polling nodes. 

If you need to debug the output of services-wrapper.py then you can add -d to the end of the command - it is NOT recommended to do this in cron.

Firstly, install Nagios plugins.

Debian / Ubuntu: sudo apt install monitoring-plugins Centos: yum install nagios-plugins-all

Make sure the Nagios plugins are installed:

apt-get install nagios-plugins nagios-plugins-extra

 

This will point LibreNMS at the location of the nagios plugins - please ensure that any plugins you use are set to executable. For example:

Debian/Ubuntu: 

chmod +x /usr/lib/nagios/plugins/*

Centos: 

chmod +x /usr/lib64/nagios/plugins/*

 

Edit the LibreNMS cronjob to include service checks:

Sudo nano /etc/cron.d/librenms

Add:

*/5  *    * * *   librenms    /opt/librenms/services-wrapper.py 1

Step3:Debug

Change user to librenms for example

su - librenms

then you can run the following command to help troubleshoot services.

./check-services.php -d

 

Performance data

after test to see if the plugins work

su – librenms

    ./check-services.php -d

    -- snip --

    Nagios Service - 26

    Request:  /usr/lib/nagios/plugins/check_icmp localhost

    Perf Data - DS: rta, Value: 0.016, UOM: ms

    Perf Data - DS: pl, Value: 0, UOM: %

    Perf Data - DS: rtmax, Value: 0.044, UOM: ms

    Perf Data - DS: rtmin, Value: 0.009, UOM: ms

    Response: OK - localhost: rta 0.016ms, lost 0%

    Service DS: {

        "rta": "ms",

        "pl": "%",

        "rtmax": "ms",

        "rtmin": "ms"

    }

    OK u:0.00 s:0.00 r:40.67

    RRD[update /opt/librenms/rrd/localhost/services-26.rrd N:0.016:0:0.044:0.009]

    -- snip --

Do a test to see if the plugins work:

/usr/lib/nagios/plugins/check_http -H google.com -S -p 443 

Example output:

HTTP OK: HTTP/1.1 200 OK - 1320 bytes in 0.199 second response time |time=0.198748s;;;0.000000 size=1320B;;;0 

 or  

 

/usr/lib/nagios/plugins/check_icmp 8.8.8.8

Step4:Alerting

Services uses the Nagios Alerting scheme where exit code:

    0 = Ok,

    1 = Warning,

    2 = Critical,

To create an alerting rule to alert on service=critical, your alerting rule would look like:

    %services.service_status = "2"

 

There is a default alert rule in LibreNMS named Service up/down:

services.service_status != 0 AND macros.device_up = 1

If you want to differentiate between WARNING and CRITICAL Nagios alerts, you can create two rules:

# warning

services.service_status = 1 AND macros.device_up = 1

 

 

# critical

services.service_status = 2 AND macros.device_up = 1

 

 

Step5:Related Polling / Discovery Options

These settings are related and should be investigated and set accordingly. The below values are not defaults or recommended.

$config['service_poller_enabled']           = true;
$config['service_poller_workers']           = 24;
$config['service_poller_frequency']           = 300;
$config['service_poller_down_retry']           = 5;
$config['service_discovery_enabled']           = true;
$config['service_discovery_workers']           = 16;
$config['service_discovery_frequency']           = 3600;
$config['service_services_enabled']           = true;
$config['service_services_workers']           = 16;
$config['service_services_frequency']           = 60;

 

Step6:Service checks polling logic

Service check is skipped when the associated device is not pingable, and an appropriate entry is populated in the event log. Service check is polled if it's IP address parameter is not equal to associated device's IP address, even when the associated device is not pingable.

To override the default logic and always poll service checks, you can disable ICMP testing for any device by switching Disable ICMP Test setting (Edit -> Misc) to ON.

Service checks will never be polled on disabled devices.

 

Adding a dummy host for testing 

You must have a host in LibreNMS to be able to add service checks. Normally you would use snmp to monitor devices, but if you just want to do simple (HTTP) checks without SNMP you can add a host without SNMP or TCP checks. Via Devices, Add Device you can enter an URL/IP. Uncheck the SNMP checkbox and check the Force add button:

 

If this device does not accept ICMP (ping) traffic, you can disable that as well. Go to the device, select the Cog menu, Edit, "Misc" tab, then check "Disable ICMP Test?":

 

If you do want to use SNMP, here is a quick guide for Ubuntu. First install snmpd:

apt-get install snmpd

Edit the configuration. Remove everything and add the following:

agentAddress udp:161

 

createUser <username> SHA "<password>" AES "<password>" 

 

view systemonly included .1.3.6.1.2.1.1

view systemonly included .1.3.6.1.2.1.25.1

 

rwuser <username>

 

sysLocation <location>

sysContact  <your name and email>

 

includeAllDisks 10%

 

defaultMonitors         yes

linkUpDownNotifications yes

Change username and password to a long and secure name and password (8 characters minimum). Restart snmpd:

service snmpd restart

Add a rule in your firewall to only allow access to UDP port 161 from your monitoring service and deny all other traffic.

You can now add this machine in LibreNMS using SNMPv3 and the authentication data you provided.

Configuring services in LibreNMS

In LibreNMS you should now have a new tab button in the top menu, named "Services":

 

 

Make sure you added a host as described above. You can navigate to a host and click the "Services" tab, then click "Add service". In the top menu bar you can also click "Services", "Add Service". You then have to select the host as well.

The type is the nagios plugin you want to use. In our case, http (the check_ part is not shown).

Enter a meaningfull description. For example, "HTTP Check https://example.org/path/to/data".

The IP address can be the hostname or the IP. It is recommended to make this the same as the host the services are coupled to.

The "Parameters" are the Nagios check command parameters, from the shell. In the case of an HTTP check for one of the servers hosting google.com it would be:

-E -I 192.168.88.6 -S -p 443 -u "/index.html"

  • IP Address: 192.168.88.6
  • -E: extended performance data
  • -I 192.168.88.6: the specifc IP address (optional, I have multiple A records)
  • -S: use SSL
  • -p 56: use port 443
  • -u "/index.html": the URL to request. (optional)

 

All parameters can be found on the monitoring-plugins website. You can test on the shell first before you add the check to LibreNMS.

Save the dialog box and wait a few minutes for the check to run.

An SSH check is even simpler, just select SSH as the type and add the check. Here is an example of a Cisco switch where SSH is checked:

 

 

A certificate check, to get an alert when a certificate is about to expire, can also be done. The type is http and the parameters are:

--sni -S -p 443 -C 30

It will check if the certificate expires within 30 days.

Limits

Specific alerting and rechecking when a check fails is not as configurable in Icinga or Nagios. The check will run, and alert you on a failure. Icinga/Nagios allow you to configure escalation paths and advanced re-checking. For example, when a check fails, recheck it 4 times with an interval of X seconds (instead of the regular check interval) and only alert if it still fails.

In Icinga you can define (service or host) groups and apply service checks to these groups. LibreNMS doesn't allow this, so you cannot define a check and apply it to a group. If you need to check 100 servers, it means defining 100 checks by hand per server.

 

Here is an example of a dummy host (no ICMP or SNMP) with a HTTP check and alerting enabled:

https://www.monitoring-plugins.org/doc/man/check_http.html

https://docs.librenms.org/Extensions/Services/

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...