Server Monitoring Software

Xenoxide · 1 Oct 2011 at 21:35

Hi guys,

We have a small cluster of machines at work that will be deployed soon, and I'm looking to get some opinions on what to use for server monitoring.

There's around 20 Linux machines total, and will need to see things like CPU/memory usage, disk usage, load average, network throughput, etc. We'll need to be able to generate graphs and reports, but also look at the metrics in real time. It would also be handy if there was an API we could use to built our own apps to display the data (ie. On a large screen in the office).

I've previously used GFI Server Monitor for some basic tasks, but I don't think it will fulfil most of my needs now. Does anyone have any recommendations?

Cheers.

Rich · 2 Oct 2011 at 00:55

PRTG fits the bill. SNMP/WMI/Netflow based monitoring, REST based API, Reports, alerts and easy to use.

SMN · 2 Oct 2011 at 10:45

Nagios all the way.

PistolPete · 2 Oct 2011 at 11:06

SMN said:
Nagios all the way.

Can be PITA to setup, but once that's done it's spot on.

Hellsmk2 · 2 Oct 2011 at 11:17

Solarwinds NPM - much easier to setup than Nagios.

Gaz1988 · 2 Oct 2011 at 15:02

I can tell you what not to use, foglight - biggest PITA ever!

smargh · 2 Oct 2011 at 15:25

Hellsmk2 said:
Solarwinds NPM - much easier to setup than Nagios.

I'm the NPM (SLX) admin at my place. It's expensive, and a pain in the a*** to do some stuff reliably. For example, if you go onto a Windows server and rename a particular volume, then NPM will lose track of it until you re-add it. Same for if a SAN volume is changed. Very annoying, and needs a dedicated alert to find the now 'unknown' volumes.

The custom poller & alerting system is also difficult to manage for lots of nodes. For example, you can tell that a value exists in a returned table column, but you can't easily get the text of that row to put into the alert email, unless you want to do a lot of awkward custom SQL which is almost impossible to debug.

However, I suppose it does work well. SolarWinds just has issues getting the basics right for bigger enterprises and seems to like developing more unrefined features.

It also takes a LOT of time and patience to convince their tech support staff that they have a bug in their product, which I've had to do twice. I've given up trying to get them to fix the problem with unknown renamed Windows volumes.

Xez · 2 Oct 2011 at 15:38

Rich said:
PRTG fits the bill. SNMP/WMI/Netflow based monitoring, REST based API, Reports, alerts and easy to use.

Also use this, nice bit of kit.

Seaniboy · 2 Oct 2011 at 21:12

We use - http://www.opsview.com/node?currency=GBP

ShadowMan · 2 Oct 2011 at 22:50

using Zenoss in the office. Opensource with no ads running on an ubuntu server box. Not bad to set up and seems to work well.

didn't have budget for Solarwinds or would have gone for that after running trial.

Xenoxide · 3 Oct 2011 at 00:23

Cheers for all the suggestions!

PRTG looks like my favourite option so far, but the open-source aspect of Zenoss is very alluring. I think I'll be giving PRTG, Nagios, and Zenoss a shot tomorrow.

kg648 · 3 Oct 2011 at 07:20

ShadowMan said:
using Zenoss in the office. Opensource with no ads running on an ubuntu server box. Not bad to set up and seems to work well.

didn't have budget for Solarwinds or would have gone for that after running trial.

I tried Zenoss, but for the love of god I could not get any plugins working.

Hulkster · 3 Oct 2011 at 12:20

I am a fan of Nagios.

If you are just having a dabble and want to avoid excessive setup, take a look at FAN - Fully Automated Nagios...

http://fannagioscd.sourceforge.net/

I've had nagios on varios OS's but find Ubuntu Server to be the best documented host in the community (purely based on what I have found)

If you are in to fiddling with things, be prepared to lose hours to nagios cfg files

I have it monitoring all sorts of stuff, and have had quite a lot of fun (yes, sad) customising scripts to work with different devices.

One thing I would reccomend is getting up to speed with SNMP if you are not already - there is so much you can do with it. You will want the net-snmp packages on whatever platform you use and also have a play with the snmpwalk command to see what devices can tell you.

Rich · 3 Oct 2011 at 19:09

ShadowMan said:
using Zenoss in the office. Opensource with no ads running on an ubuntu server box. Not bad to set up and seems to work well.

didn't have budget for Solarwinds or would have gone for that after running trial.

Used this also. Its ok. The Google Maps functionality is unreliable to say the least when you get to 200+ nodes. Everything else seems to work quite well. We were probably asking a little much of it to be fair.

ChrisB · 3 Oct 2011 at 19:17

Icinga I recommend you take a look at the demo

We've also integrated Nagvis and pnp4nagios

volkan · 4 Oct 2011 at 11:20

We use Zabbix to monitor some 150+ nodes. Thats routers, switches, power bars, servers and VMs.

Its open source and easy to get running. I will admit it does time a little while to create the templates you want, however once you have those setup with the alerts, graphs and tiggers you need it works perfectly.

http://www.zabbix.com/

#Chri5# · 4 Oct 2011 at 12:00

Do any of the tools mentioned above have a strong but simple wallboard display?

We currently use IPMonitor. It's very good for monitoring public facing kit (routers, firewalls, mail servers etc) but it doesn't have any remote agents / nodes which can pass information via HTTPs. Hence to get information from other networks you need VPN tunnels which become a royal pain.

The best thing about IPM is the wallboard - very simple but spot on for alerting you to problems at a specific site / customer which you can then drill into as needed. The groups with problems automatically get flowed to the top of the screen, so even the most non-technical person in the office can instantly gauge if there any problems.

A lot of the wallboards seem to cram in far too much information such as performance metrics.

RichIbizaSport · 4 Oct 2011 at 14:32

+1 for Zabbix.

We use it for monitoring our machines/networks/devices. Very customisable, but fairly difficult setting up. Though if your coming from a Linux background it should be no problem.

The dashboard can be set to be nice and clear with triggers/alerts of different priority. We have it set up to email appropriate people or sms the out of hours team with certain triggers.

Monitors absolutely everything we need: windows event logs, any windows counter, cpu, ram, disk space. linux hosts. snmp. website scenarios e.g. log in to a website and monitor response times, d/l speed etc.

Can set screens for seeing live or historic data.

Hulkster · 6 Oct 2011 at 09:43

Do you have agentless monitoring of event logs etc?

One great things with PRTG is the super simple out-of-the-box WMI support

I use NSclient++ to report Windows server stats to Nagios but have not seen how to monitor event logs with this.

I had a good go with Zabbix but just found the interface really clunky. Nagios is not the easiest tool to set up but once you have the various config files with commands/services defined, it's trivial to add new hosts.

RichIbizaSport · 7 Oct 2011 at 09:36

Hulkster said:
Do you have agentless monitoring of event logs etc?

One great things with PRTG is the super simple out-of-the-box WMI support

I use NSclient++ to report Windows server stats to Nagios but have not seen how to monitor event logs with this.

I had a good go with Zabbix but just found the interface really clunky. Nagios is not the easiest tool to set up but once you have the various config files with commands/services defined, it's trivial to add new hosts.

Unfortunately you do need an agent on there for event log monitoring. Though the footprint is tiny so it hasn't been a problem for us. I think the only agentless operation is ping.