Server temperature monitoring

Soldato
Joined
6 May 2009
Posts
20,180
So I came into work this morning and the aircon had broken in our server room. It was like an oven in there and my boss was already shutting down servers. Two of the servers were making a stupidly loud screeching noise.

I rang around offices and let people know they would be working for an hour or so. All was back up and running after the aircon fuse was replaced and the room cooled down. Temps showed 35c when they usually are around 22c

One of the two that were screeching hosts BES and our anti-virus, this took longer to come back on (about 3 hours) in this time nonone could get email on their blackberries.

Needless to say directors and everyone in general wasnt too happy about the downtime but it did seem like 'bring a game day' at school!

Overall we only lost one DL380 but this was a test terminal services server that was on its last legs

What server temperature monitoring software do you use that gives warnings with fast increases in temperature? Free is better but dont mind paying for a product that is good.

edit - We have DL380s and DL360 mostly and one DL160 and a Poweredge 2950
 
Last edited:
do you use hp servers exclusively? fairly sure insight manager does temperature monitoring? alternatively i would be looking for a proper network attached sensor for the server room itself - most of them will do email/snmp/sms based alerting for when the cack hits the fan.
 
An SNMP based monitoring solution (we use mutiny) can do that, alternatively a standalone room environmental monitor
 
I'd advise going down the climate route as suggested by #Chri5#. Provide all the info you could need and good alerting.
 
As already mentioned SNMP is very good - you can use an Open Source monitoring tool named OpenNMS for this (free)

Other ideas:

1. LAN based temp sensor as mentioned by someone else

2. APC UPS have this built in and will email you if the temp threshold is exceeded

3. Nagios // SNMP will do this based off SNMP or scripts // tools in the OS

4. Most servers nowadays have thermal shut down

This used to happen a lot in an old clients server room, they had office based air con units cooling there server room, I think the same air con system failed over 10-15 times causing all kinds of problems from leaking all over the floor and almost flooding the room to dying on a friday night, on the Monday it was like walking into the carribbean - I have absolutely no idea how we didn't lose a single system that weekend. Repeated advice to buy dedicated air con with redundant systems was ignored regardless of the amount of times it failed.

Funny looking back :D
 
Thanks for all the suggestions guys. I sent my boss info on hardware products and asked if he wanted me to look into free software such as OpenNMS. He said no thanks...

until next time!
 
we use the temperature monitors on the UPS at work.

APC just e-mails us when it passes self tests etc.. and also e-mails when the temp goes over a certain ammount.
 
Pretty sure HP servers come with Insight manager, when we had HP servers (Back in NT4 days!). It could be configured to send you a text message when a disk failed / fan failed / or temperature got too high. Simply plug a modem into the server running Insight manager.

If they not happy with that then they have to take the hit and not complain when it goes **** up like today.
 
What server temperature monitoring software do you use that gives warnings with fast increases in temperature? Free is better but dont mind paying for a product that is good.

edit - We have DL380s and DL360 mostly and one DL160 and a Poweredge 2950


Hi,

In my company, we have ~ 2000 Compaq/HP DL360 & DL 380 (G1 to G6) and we monitore them with Xymon/Hobbit (Opensource).

For monitoring hardware (temperature, raid, ram, fans, etc), we use HP INSIGHT command line (hpacucli, hpasmcli) into Xymon/hobbit by creating an external script (bash). Very easy.

No need to pay anything.

When we have an alert, it sends an email into our ticketing system. (it could be a SMS).

Regards,
Nico
 
Back
Top Bottom