I recently got turned onto
OMD/check_mk, which is probably the most user friendly tool for Nagios. It has a nearly complete Nagios stack, and more:
- Nagios
- Monitoring Plugins
- check_nrpe
- mrpe (a check_mk clone of nrpe)
- Icinga
- Shinken
- NagVis
- pnp4nagios
- rrdtool/rrdcached
- Check_MK
- MK Livestatus
- Multisite
- Dokuwiki
- Thruk
- Mod-Gearman
- check_logfiles
- check_oracle_health
- check_mysql_health
- jmx4perl
- check_webinject
- check_multi
It has a very simple install and works with Debian/Ubuntu/Red Hat/SUSE. I recently deleted my Icinga build and went all in with this.
As well as regular servers, vmware and other things, I'm able to graph Google Nest's temperature, humidity and "leaf" status. You can set up alerts if it crosses a certain threshold and graph for as many years and you wish.
Check_MK with Google Nest
This is something that is sorely lacking in Google's current site since the graphing information is only stored for 10 days, and doesn't provide the granular level of detail I want.
Google Nest's graphs.
While I won't go into much detail on the install of OMD (it's pretty straightforward) and I won't go into much detail about the Nest integration (
Great writeup on how to do that here) I will be showing a way to get another product, the Elertus, monitored with check_mk as an example of what you can do with it.
The Elertus, is a wifi temperature, sound, light and water detector that runs on batteries.
I wrote up an article about it on this blog before. It's a little trickier to monitor than the Nest due to the fact that there is no API, no way to scrape information off of the device since it only makes outbound connections.
Since my last post, last year, nothing has changed with it, nor has my opinion. It's still sluggish, no graphing, no "all clear" alerts... and a painfully slow (and useless) app to compliment. The plus side is that it's been almost a year and the batteries still work (AA batteries). I have had no connection issues from it as far as I can tell.
Other than an alerts tab, there's not much more to the Elertus website
When I had my Icinga server, I stood up another box that basically sniffed the traffic as it was in delivery to their servers.(
which is still sent cleartext, btw). I cut out the bits and pieces I needed to get a basic graph up. Since I started this server, I decided to try a new method.
Using pfSense, I created a static DCHP connection for the device and an internal NAT rule to relay the traffic from the Elertus to my own web server running Apache with
mod_dumpio turned on.
While this basically kills any communication with the Elertus servers, they won't be missed. With, my prior setup I was able to enjoy both my internally generated alerts, and their alerts, but I found my own to be a lot more useful.
The Elertus sends out a POST to their servers as a check in.
device_type=1&posix_time=1423925134&email_id=myemail@gmail.com&mac_address=000680000000&alert_flags=&light=14&temp=298&humidity=30&battery=70&motion=0&int_contact=1&ext_contact=1&ext_temp=-1&fw_ver=4.0.1_EL_v7&debug=rssi:46, ant:I, af:, pkt:l14_t298_h30_b70_m0_i1_e1_x-1_p1423925134, wdog:1, crtry:4, queue:3, ctime:w2285_d410_n130_s205_t3040, &
Apache has a virtualhost set up to receive those incoming PHP POST requests with dumpio enabled and the trace level set to 7.
<IfModule dumpio_module>
DumpIOInput On
DumpIOOutput On
LogLevel dumpio:trace7
</IfModule
I also made sure to set custom logs, for just this module, as they will be filling up quick.
ErrorLog ${APACHE_LOG_DIR}/dumpio_module_error.log
CustomLog ${APACHE_LOG_DIR}/dumpio_module_access.log combined
The requests come in 2 at a time, sometimes more if there is any movement/light. So, to remedy the lack of a consistent time period to check the logs, I set up a cron job to pull the newest line in the log with POST data. This runs every 5 minutes.
#!/bin/bash
tac /var/log/apache2/dumpio_module_error.log | grep -m1 "email" > /tmp/tempout.txt
~/scripts/temp.sh
~/scripts/humidity.sh
~/scripts/battery.sh
The scripts within it, in turn, pull information out of the latest POST and updates the raw values to separate files. I've kept it modular so I can add more checks as I need them. I'm not too concerned about the water, movement and light alerts just yet.
The temperature is in Kelvin, so I converted it to Fahrenheit.
#!/bin/bash
tempk=$(cat /tmp/tempout.txt | awk -F "=" '/light/ {print $8}' | sed 's/&.*//')
tempb=$(awk "BEGIN {print "$tempk" - 273.15}")
temp=$(echo ""$tempb"*1.8+32" | bc)
echo "$temp" > ~/perfdata/temp
(I know, this can be cleaned up a lot)
The battery and humidity, are pretty much the same thing, with different names. It just requires another
{print $X} position. The light, movement and water sensor can be added just as easily since they are only values of 0 or 1.
Now, unlike Nagios, getting devices to check_mk is a breeze. Making custom checks, with RRD graphing is just as easy. You add the custom checks to the host itself, not the monitoring server.
This web server is Ubuntu and it has
check_mk installed. There is a folder that allows you to put custom scripts to make local checks, above what it already monitors.
/usr/lib/check_mk_agent/local/
The setup is pretty straightforward if you want monitoring with performance graphs.
#!/bin/bash
TEMP1=$(cat ~/perfdata/temp)
echo '<<<local>>>'
echo "P Temperature temp="$TEMP1";35:89;32:91;0;110"
Any script that is in this folder runs when check_mk runs and grabs the other server global readings.
The output appends itself to the bottom of the generated file, as a local check like this:
<<<local>>>
P Temperature temp=73.13;35:89;32:91;0;110
It tells check_mk that it's available to be added as a service, with graphing
The name of the service
The variable name
The output from
$TEMP1, the WARN
min:max, the CRIT
min:max and UNKNOWN
lower;upper (for graphing reasons and not required). Notice that the colons and semicolons are there for a reason.
When you scan the host in WATO on the main check_mk site, the host should have basic performance graphs and automatically added to the notification ruleset the host was part of.
While it's not an ideal solution, it gets the job done. I wish Elertus would open their API up to developers, the would probably sell a lot more units if they did.
I recommend anyone who's given up with Nagios/Icinga to give OMD a try. It's pretty good. The documentation is a little poor (and mostly in German), but with Google Translate and some coffee, you can get through it if you've ever set up any other monitoring before.