Monitoring solution

Associate
Joined
3 May 2018
Posts
604
My home automation system has expanded rapidly over the past year. I have probably a dozen or two devices, some wifi, some rf, some zigbee. I also have a dozen or so microservices and a few servers.

I was considering DIY'ing it as quite a few would be bespoke statuses, such as "stale data", "device last seen", "last message on MQTT topic", "status from REST end point"

However the guts of it is just a network monitoring tool with a few exotic plugins/scripts.

I looked at Nagios as it's famed for being customisable. However, it seems dated, reminds me of Cacti days, everything is manually configured. All the good stuff that actually makes it useful in enterprise is pay ware add-ons.

I looked at OpenNMS, but got lost.. no bored.. in the complexity.

Can anyone make any other suggestions for pluggable network monitoring applications or should I persevere with the two above? Ideally, "FREE[tm]"
 
Thanks. While I ponder and procrastinate on that I decided to start with a "sensor". If you will.

Using python, I
* opened the DHCP server's leases file. Scanned it for MAC addresses.
* For every MAC address I query the DHCP server's OMAPI API to get the full latest lease information. Including: ip-address, lease state, expiry time, client-hostname, ddns-fws-name (name forced on the device).
* For every MAC address do a vendor lookup and store it.
* For every valid lease, take the IP address and spawn a thread to ping it once, timeout 1 second. Store the ping rtt, packetloss, status.
* Wait on all the pings completing, max 1 second.
* Convert all the data points into InfluxDB line format.

Using Telegraf I can then run that script and publish the data to Influx. Which I can then find a way to visualize in Grafana.

That only provides me with a low layer overview of what is on the network segments which got it's config from DHCP. It does not cover the static IP addresses. I could list those manually, or I could run a periodic subnet sweep with nmap or ping/arpping to discover "rogues". Rouge is most likely a device I completely forgot I had running somewhere with a static IP I didn't know I had.

I also need to plumb this into a monitoring and alerting system to set rules on, say, the presence of a MAC address with a successful ping at least once in the last 5 minutes. Otherwise send an SMS or flash the house lights.
 
Both PRTG and Observium appear to be jack of all trades fully encased, encapsulated, locked in, cripple-ware, proprietary efforts. No thanks.

100 sensors? Wuhahaha! I currently track about 6000 time series. Granted quite a number of those I don't need and should weed out, like network stats for all docker bridge networks etc.

The thing is I have a data logging and graphing system, so I'm all good on metrics and displaying them graphically.

It's the thresholds, limits and diagnostic tests that I'm more interested in. I know the above include that, but they also appear to be the database, the agent, the UI, the API and the graph engine all at once. I just want the alerting system.
 
Back
Top Bottom