average CPU Temp in your Server room??

knowlesy · 18 Jul 2011 at 15:16

So basically whats your average or current CPU temp in your server room ?

also what kind of aircon / fan whatever do you have / use

?

paradigm · 18 Jul 2011 at 15:23

Currently all sat between 20ºc and 28ºc. That's with two Panasonic wall mounted "suitcase" style systems cooling a room the size of your average airing cupboard

It's cold in there, so cold I usually put on a coat

ChrisD. · 18 Jul 2011 at 16:05

Between 75 and 85 degrees C, 24/7.

bigredshark · 18 Jul 2011 at 17:05

In the first internal datacenter I picked at random, 20.1 currently, with a high of 20.9 in the last month and low of 19.8, based on Leibert chillers with all the usual toys...

Fair to say it's probably not typical, we have better AC than most of the big commercial data-center providers.

knowlesy · 18 Jul 2011 at 19:30

MilanoChris said:
Between 75 and 85 degrees C, 24/7.

eeeeeeeeeeeeeek and I thought mine @ 40C was bad!!

bigredshark said:
In the first internal datacenter I picked at random, 20.1 currently, with a high of 20.9 in the last month and low of 19.8, based on Leibert chillers with all the usual toys...

Fair to say it's probably not typical, we have better AC than most of the big commercial data-center providers.

a few questions how do you track the core temps over months/years etc as I assume you use a lot of virtualization etc or does vsphere/hyperv/xen give you that option to see ?

Baz · 18 Jul 2011 at 19:36

19'c normally, 21 on a hot day.

One wall mounted 8KW and a 10KW ceiling mounted AC.

Edit: somehow read room temps,

CPU between 30 and 35

ChrisD. · 18 Jul 2011 at 19:51

knowlesy said:
eeeeeeeeeeeeeek and I thought mine @ 40C was bad!!

Sun Fire server that can easily cope with that. I'm not sure what our Windows boxes are at, I don't concern myself with them as they're not that busy and they never crash so I assume they're fine lol.

The Sun box is hammered 24/7.

bigredshark · 18 Jul 2011 at 22:58

knowlesy said:
a few questions how do you track the core temps over months/years etc as I assume you use a lot of virtualization etc or does vsphere/hyperv/xen give you that option to see ?

Everything is SNMP polled and stashed in a database, and I mean *everything*, so each of the 2000 odds servers in that site have all of their temperature sensors polled and the results kept for more than a year. The switches and routers too and then each rack has a temperature probe for ambient, plus air supply and return temp in monitored by the AC (again, into SNMP). The voltage and power usage of every outlet is recorded...our environmental monitoring database is just short of 1TB in size.

It's quite the resource - I can correlate power and network usage with heat output in a busy period vs a quiet one. Evaluate the benefits of shutting down blade chassis' which aren't required. It's also an interesting early warning system, power usage is a good measure of general system load, so if we see increased power usage or heat generated without a corresponding traffic spike then somebody will be finding out why.

We do this much but it makes me wonder what kind of witchcraft google do with the same data...

The mechanics of the polling depends on the hardware and OS generally but usually the vendors tools allow it for server hardware.

* And actually, very little virtualisation, I'm not a big fan personally and it also makes little sense for us, we load balance and failover workloads without so the HA features have little use and it just seems a way to get another layer to pay for support on, adds complexity and eats hardware resources to run. I accept that this isn't typical.

droyden · 19 Jul 2011 at 09:26

20C constant

ChrisD. · 19 Jul 2011 at 10:08

bigred - where do you work?

Skidilliplop · 19 Jul 2011 at 11:07

Server room temp 18.5C CPUs are 30ish. Some of the older G4 MLs are running at 42 crappy old Xeons :/

bigredshark · 19 Jul 2011 at 12:05

MilanoChris said:
bigred - where do you work?

I'm actually self employed as an infrastructure consultant, however I've been pretty much working for one client for a couple of years now. I prefer not to name names but they're one of the larger internet services companies, we do hosting, email and DNS primarily under a pretty big collection of brands worldwide. Bit of standard ISP stuff too but it's not core business.

ChrisD. · 19 Jul 2011 at 12:30

Ah, contractor then. I'm doing the same next year, can't wait tbh.

#Chri5# · 19 Jul 2011 at 13:14

19 deg C on a 2U server in a Manchester DC (so cooling isn't my problem*).

*Unless it all breaks...

Skidilliplop · 19 Jul 2011 at 14:17

#Chri5# said:
19 deg C on a 2U server in a Manchester DC (so cooling isn't my problem*).

*Unless it all breaks...

Got 2-4 completely independent A/C circuits per server room to prevent that happening. Had air-con failures cause havoc before. Never again will this happen (while i'm involved at least)!

smargh · 19 Jul 2011 at 16:44

List of temperatures (centigrade) and the number of CPUs running at that temperature:

23 1
25 2
26 5
27 5
28 8
29 5
30 203 - probably most of the main server room and DR location - multiple A/C units
31 8
32 14
33 11
34 13
35 5
36 7
37 2
38 4
39 3
40 31
41 6
42 2
43 2
44 3
45 2
46 1
47 1
48 8
49 1
50 2
51 1
56 1

knowlesy · 19 Jul 2011 at 20:46

bigredshark said:
Everything is SNMP polled and stashed in a database, and I mean *everything*, so each of the 2000 odds servers in that site have all of their temperature sensors polled and the results kept for more than a year. The switches and routers too and then each rack has a temperature probe for ambient, plus air supply and return temp in monitored by the AC (again, into SNMP). The voltage and power usage of every outlet is recorded...our environmental monitoring database is just short of 1TB in size.

It's quite the resource - I can correlate power and network usage with heat output in a busy period vs a quiet one. Evaluate the benefits of shutting down blade chassis' which aren't required. It's also an interesting early warning system, power usage is a good measure of general system load, so if we see increased power usage or heat generated without a corresponding traffic spike then somebody will be finding out why.

We do this much but it makes me wonder what kind of witchcraft google do with the same data...

The mechanics of the polling depends on the hardware and OS generally but usually the vendors tools allow it for server hardware.

* And actually, very little virtualisation, I'm not a big fan personally and it also makes little sense for us, we load balance and failover workloads without so the HA features have little use and it just seems a way to get another layer to pay for support on, adds complexity and eats hardware resources to run. I accept that this isn't typical.

EPIC sir!!!

Ev0 · 19 Jul 2011 at 23:02

Think we've got something similar setup in our datacentres, just under 6k machines I think over a couple of sites.

Been sat in the data centres the last week or 2 (and still sat in one now!), getting sick of the place

Skidilliplop · 20 Jul 2011 at 16:16

smargh said:
List of temperatures (centigrade) and the number of CPUs running at that temperature:

50 2
51 1
56 1

These 4 would have triggered environmental alert traps to my ops monitoring software by now.

GO FIX!

smargh · 21 Jul 2011 at 00:18

Skidilliplop said:
These 4 would have triggered environmental alert traps to my ops monitoring software by now.

GO FIX!

One at 50C was actually unmanaged - it was the last sensor reading before the server was decommissioned, so the A/C was probably off. I just haven't deleted it. It was old anyway.

Another is a very heavily used SQL server - CPU temps spike during working hours.

The last server is an old web server which I don't think is actually used at all now - nobody would notice for at least a month until a dev wanted to experiment with something weird. It's now in a proper server room, but it used to be stored in a very small room which was constantly a minimum of 30C ambient, hitting 35C (== superspeed fans) during hot summer days, and it turned itself off a few times when it hit 40C ambient. I would not be surprised if it were filled with dust. If it dies we'd just give the devs a VM to use & strip it for spares.

I don't actually have alerts for CPU or ambient temps, but we do get alerts when fans ramp up at 35C ambient, and the APC kit at the main & DR server rooms do the temperature, humidity and water leak detection stuff.