Traditional network monitoring systems tend to get their monitoring data via SNMP. In case of Linux snmpd is usually the system application that's responsible for providing that data. The data consists mostly, but not entirely, of metrics. For example, the data contains strings such as operating system version, administrative contacts and network interface names.
It is possible to make snmpd run custom scripts and programs to gather more data from the system. The options are described in snmpd.conf man page, but here are the most important ones for this use-case:
- extend: run a program without a shell to provide the data
- extend-sh: run a program in a shell to provide the data
- pass: make a program responsible for providing data for an OID subtree: the script is called once for each get and getnext request
- pass-persist: make a program responsible for providing data for an OID subtree: the script is called once and data is collected by "talking" to the program via stdin/stdout
The easiest way to extend snmpd is to use extend or extend-sh. The results will get placed a predefined OID subtree. Here is a trivial example:
extend test1 /bin/echo Hello, world!
The configuration of this extension can be queried from NET-SNMP-EXTEND-MIB::nsExtendConfigTable, as long you have snmp-mibs-downloader or equivalent package installed:
$ snmpwalk -Os -v 3 -u monitor -l authPriv -a sha -A <pass> -x AES -X <pass> localhost NET-SNMP-EXTEND-MIB::nsExtendConfigTable nsExtendCommand."test1" = STRING: /bin/echo nsExtendArgs."test1" = STRING: Hello, world! nsExtendInput."test1" = STRING: nsExtendCacheTime."test1" = INTEGER: 5 nsExtendExecType."test1" = INTEGER: exec(1) nsExtendRunType."test1" = INTEGER: run-on-read(1) nsExtendStorage."test1" = INTEGER: permanent(4) nsExtendStatus."test1" = INTEGER: active(1)
The output from the extend command is available in two additional tables:
$ snmpwalk -Os -v 3 -u monitor -l authPriv -a sha -A <pass> -x AES -X <pass> localhost NET-SNMP-EXTEND-MIB::nsExtendOutput1Table nsExtendOutput1Line."test1" = STRING: Hello, world! nsExtendOutputFull."test1" = STRING: Hello, world! nsExtendOutNumLines."test1" = INTEGER: 1 nsExtendResult."test1" = INTEGER: 0 $ snmpwalk -Os -v 3 -u monitor -l authPriv -a sha -A <pass> -x AES -X <pass> localhost NET-SNMP-EXTEND-MIB::nsExtendOutput2Table nsExtendOutLine."test1".1 = STRING: Hello, world!
For details on the content of these tables please refer to the snmpd.conf man page.
While extend and extend-sh can be useful, they seem to always output STRING data.For example we could have this line in snmpd.conf:
extend-sh postfix-up /bin/systemctl is-active postfix > /dev/null && echo 2 || echo 0
This would work just fine in the sense that we get a 2 if postfix is up, and 0 if postfix is down. However, the data type for those numbers would be a STRING. With some (most?) monitoring systems that is a problem when trying to create alerts based on the data: you just can't compare an INTEGER (at the NMS side) to a STRING (at the snmpd side) and expect to get meaningful results.
It seems that the only way to define a data type for the data is to use the more complex pass or pass-persist directives instead. An example of pass:
pass .1.3.9950.1.1 /etc/snmp/postfix-service.sh
In this case the script (postfix-service.sh) is given responsibility for the OID subtree .1.3.9950.1.1, which is within the private OID range. This means that the program you've defined is expected to be able to handle iterating over all the OID objects in that subtree (see snmpd.conf man page). In other words the script should implement a SNMP getnext function. It is possible, however, to have a pass program that returns just one static OID object. The above postfix-service.sh script is a trivial example of that:
#!/bin/sh -f echo .1.3.9950.1.1 echo integer /bin/systemctl is-active postfix > /dev/null && echo 2 || echo 0
The first line outputs the OID object, the second line the data type and the third line will return 2 if postfix is running and 0 if postfix is down. We can query the object with snmpget:
$ snmpget -Os -v 3 -u monitor -l authPriv -a sha -A <pass> -x AES -X <pass> localhost .1.3.9950.1.1 .1.3.9950.1.1 = INTEGER: 2
As we can see, the data type is INTEGER and value is 2 (=postfix is up). The INTEGER data type allows adding alert rules based on the value in the network monitoring system. For a full list of data types see the pass section in the snmpd.conf man-page.
Even though we are able to snmpget the OID, we cannot walk the .1.3.9950.1 subtree because we have not implemented a SNMP getnext function in the script:
$ snmpwalk -Os -v 3 -u monitor -l authPriv -a sha -A <pass> -x AES -X <pass> localhost .1.3.9950.1 iso.3.9950.1.1 = INTEGER: 2 iso.3.9950.1.1 = INTEGER: 2 Error: OID not increasing: iso.3.9950.1.1 >= iso.3.9950.1.1
As can be seen, snmpwalk runs the script twice (as described in the snmpd.conf man page). As the script always outputs the same OID snmpwalk realizes that OIDs are not increasing and stops after the second entry.
To make snmpwalk work a more complex script is required. Basically you need to track what the current OID is, so that you can give snmpd the next one when it asks. You also need to ensure that the OIDs you give snmpd are always growing, so they need to be sorted numerically. To get the idea of what's involved look at net-snmp-systemd-service-status.
If your OID subtree will be more than a few entries in size I recommend writing a pass_persist script instead of a pass script. This is because snmpd launches a pass_persist script once and "talks" to it to get all the data it needs. A pass script, on the other hand, is launched once per request, which means lots of overhead when snmpd walks through the data. In net-snmp-systemd-service-status was able to reduce the time it took to query status of ~60 system services from 3.3 seconds to 0.15 seconds. Also, with pass_persist keeping track of the current OID (required for "getnext") is way easier, as you can keep its status in memory, instead of having to persist it on disk, for example. There is a caveat, though: snmpd leaves the pass_persist script running. So, you either need to make it exit after every get and after walking through the OID subtree with getnext, or make the script update its data automatically. Otherwise snmpd will always get the old data the script cached initially.
You can create OIDs dynamically by, for example, converting the characters in a human-readable name into corresponding ASCII values and separating the values with dots. This produces a reproduceable two-way translation between the OID and the human-readable name. However, when using this approach, you need to make sure that you present the OIDs to snmpd in OID-sorted order, or it will complain about OIDs not incrementing properly. Fortunately there are one-liners to do the OID sorting, for example here.
Here are some external links that can be useful:
- net-snmp-systemd-service-status: a pass_persist script that can return the status of all systemd services using a dynamically created OID subtree.
- Tut:Extending snmpd using shell scripts
- snmp.conf man-page
- Net-snmp README files
- Net-snmp extensions
- Net-snmp FAQ
- Object identifier
- Is there reserved OID space for internal enterprise CAs?
- OID maximum length
- Writing your own MIBs