Average Computer/Server Crashes per Month/Year

Soldato
Joined
9 Dec 2006
Posts
9,289
Location
@ManCave
Hi all,

i manage a bunch of servers at work
(there just really high end desktops used as servers)

we have maybe 1 or 2 a week crash or not play nicely.

I said to my manager its not possible to have 0% failure per month/year.
there is going to be a least an application crash per month.

is there a website somewhere that has figures of server/pc failures per year? to see if these are worst than average?

we have 60 Servers, (worst case 2 crash per week) that's 3%
 
That's hilariously high. There's no reason a server should crash, your downtime should only be due to patches and limited to scheduled maintenance windows so it can be excluded from any performance metrics.

A server should never flat out stop responding during the business day, and if it does it definitely shouldn't be happening every couple of days.

The fact that your company considers desktops running services to be acceptable says a lot about the environment you have to deal with to be honest, and there's just no way to ever run a slick operation if the sort of processes that lead to that situation are allowed to keep happening.

What are these boxes doing? Why aren't they virtualised on server hardware?
 
Last edited:
That's hilariously high. There's no reason a server should crash, your downtime should only be due to patches and limited to scheduled maintenance windows so it can be excluded from any performance metrics.

A server should never flat out stop responding during the business day, and if it does it definitely shouldn't be happening every couple of days.

The fact that your company considers desktops running services to be acceptable says a lot about the environment you have to deal with to be honest, and there's just no way to ever run a slick operation if the sort of processes that lead to that situation are allowed to keep happening.

What are these boxes doing? Why aren't they virtualised on server hardware?
ill try give you some background.

The 30 New servers (1 year old) have had one crash/errors per 30 in 1 year.
The 30 old servers (3.8 years old) have 2 per 30 per week.

i am trying to get the old computers replaced due to this very reason. even with fresh installs with same software running

these Computers are running: [ think casino security type environment]
i5 2.8ghz
4GB Ram
550-TI
16xHD AV Cards
Windows 7 32bit. [cant run server as software is not supported]
50-80% cpu usage 24/7 365 days
GPU usage 40%+

"your downtime should only be due to patches and limited to scheduled maintenance windows so it can be excluded from any performance metrics."
don't get that luxury, a reboot per week is lucky. thats when updates/patches are done. and we need to due to bad memory problems with the av cards.

A server should never flat out stop responding during the business day, and if it does it definitely shouldn't be happening every couple of days.
servers sometime reboot, But mostly our AV cards do have a Memory issues (using to much) which causes this. also we get A LOT of power cuts which i think does not help this. we get around 20+ a year.



as you can understand, these cannot be virtualised to much Hardware required & power needed.
but i am trying to get the old computers replaced.

new calculations:
old servers crash 6% per week
new servers crash 3% per year < that better than average?
 
Last edited:
You're still massively worse than average. If it's processing CCTV video then why aren't you using IP encoders or IP cameras?

If you're using some PCs with capture cards as some sort of Robinson Crusoe CCTV system without UPSes then you're never going to have a stable system so whoever wants one should pay out for it or get off your case.
 
Also look at the environment the machines are being kept, e.g. in a small, crowded room with no AC or access to ventilation as well as hardware monitoring.

As mentioned previously, invest in UPS's and look into disaster recovery documentation and guidelines.
 
and you don't have a UPS setup? :o
on our main servers/network yes.

but no, getting funding for a UPS per server is hard as there far distance a part. even for a safe shutdown is very hard. there classed as not critical to stay up, but do lose a few hours of work because of it.:confused: which i think is funny.

the other problem is our power does not come back on due to the load we push through our building. so ups would not work. we looked into this & we would need a bunch of very large generators which we don't have room for & estimate cost would be double figures of Millions
 
So what's the question here? You have a company unwilling to fund things properly and therefore stuff breaks quite a lot. News at 10.
 
That is an insane figure, agree with caged here, that is abnormally high.

Where abouts are the servers for the powercuts - like physical location? In the same office or datacentre? If they are in the office then it's more understandable - power fluctuations such as spikes, drops and frequency / voltage variations over a fixed space of time can negatively impact hardware and cause higher hardware fail rates or put more stress on the PSU to regulate the conversion process. Trust me on this from experience of dealing with certain power companies with poorly regulated two / three phase systems and crumbling infrastructure. :mad:

Datacentres have higher grade power sources and regulators (with better UPS and fail overs). They pay the power companies a lot more money for this privilege. Under normal business and residential contracts; power companies do not guarantee the supply of power under the electricity act which is the line they love to use. Luckily it's not all doom and gloom; luckily laws above the electricity act exist.

We had a lot of similar problems, where abouts is your office?

Normal checks apply of course; the server is functioning, the OS is properly installed with updates, integrity checking, working hard disks, applications and drivers behaving etc etc.
 
pc crashes (including a server) - on average none (I do use UPS :)) and I use them for 3D rendering etc so can be 100% cpu/gpu for days. But then I don't fill them up with junk software either...

software crashes - depends on how buggy the software/plugin is but generally pretty low. I had a short period with a dodgy plugin but that was it.

Honestly, you really should invest in UPS's if nothing else, I live in an area which enjoys those little power flicker dropouts (yeah I don't understand why we should put up with these either) and the UPS has been a god send to stop the damage to the pc's and to help keep them running smooth.
 
Ask them what failure rate they are hoping for, when they say none show them the bill of a decent mid range server times by how ever many you need.

then the cost and time involved to rebuild all and put them on a decent UPS'.

you will either fix your problem because they invest or they will back off.
 
If you're using some PCs with capture cards as some sort of Robinson Crusoe CCTV system without UPSes then you're never going to have a stable system so whoever wants one should pay out for it or get off your case.
I don't know why this made me laugh. Did you mean Heath Robinson, or was Robinson Crusoe also known for making ridiculous contraptions?
 
If you consider what someone stuck on a desert island might be able to produce in terms of reliable electronic systems then you'll have some idea of what this setup probably looks like.
 
Wait..... you have SIXTY 'servers', not just that but they are basically just desktop machines.

WHAT ON EARTH ARE YOU DOING?!

If it were the 90s I could understand not being able to virtualise due to hardware. But why do you need physical capture cards in the lovely world of IP?
 
1 this year and that was because our IT director forgot to turn Windows Updates to manual and the DNS server restarted after a an update.

Have to say OP that your setup sounds like it should have been retired when DOS was released.
 
1 or 2 a week? wow. theres something seriously wrong there.

we've got servers on multiples of years worth of uptime. the longest of those are some of our legacy kit which is 7-8 years old.

e: in fact just checked one of our older servers, up since 19/06/2011. which was when we shut everything down to rearrange the server room :D
 
Last edited:
we have about 1000 servers with 900 users (yer those figures not look right i know)

blue screens are about 1 every few months

of 28 citrix servers 1 a week needs a reboot because its messed up (known citrix problem)

running 2003 / 2008 on ESX
 
yeh, i understand all your thoughts, but you must understand how hard it can be to push some management in the right direction. :(

But why do you need physical capture cards in the lovely world of IP?
not possible the moment. hopefully in coming years we can upgrade. but theres a lot to upgrade.

not got much of a choice at the moment.

i want to do the following things myself, but got justify everything to the T some of you know this

- get Ups up and running [ need to justify this] we going to need a lot
- New Servers with Server Hardware [ordered 1 to approve] needed a new one anyway
- we cant use Windows server [AV Card restrictions] so Windows 7 Pro is going to have to do. to we can upgrade to IP
- Move to our new New AV cards we just approved. As the old ones give us memory leaks most of time causing our crashes. this should help with 99.9% of our crashes. the old av cards are not supported anymore and there drivers was terrible.
- UPS should fix our power cut issues
 
Last edited:
Looking at what you are saying; there are some thoughts on this you may need to think about:

- You cannot power ALL the servers from one or multiple UPS systems without some sort of network co-ordination (unless they're stand alone, in which case that would waste more resources money, power than anything). You'd realistically need to look at all the servers and see if you cannot merge roles, components to cut down on the amount of servers. Network equipment would also need protecting too. Just picturing the money you'ld save on power, that alone would justify the cost of new equipment as it would save money over time. Recommend you use the line with your boss with hard figures on the continued running costs of the server vs new more efficient equipment.

- AV card restrictions? A licensing issue or a driver issue? Can't say a normal card manufacturer and software manufacturer would artificially limit you to a client OS unless they have a more expensive server version. The new cards sound nice(r) do they support the server OS?

- A UPS alone won't resolve the issues if there is too much load on the circuits, if you migrate to new equipment and reduce the amount of servers it may help. If you do it separately it may be more of a nightmare balancing between circuits tripping and just consistent crashing :P.

Short order I don't envy your job. That sounds like a complete nightmare of a setup. Best of luck with it. Sounds more of an personalised version of hell :P
 
Hi all,

i manage a bunch of servers at work
(there just really high end desktops used as servers)

we have maybe 1 or 2 a week crash or not play nicely.

I said to my manager its not possible to have 0% failure per month/year.
there is going to be a least an application crash per month.


is there a website somewhere that has figures of server/pc failures per year? to see if these are worst than average?

we have 60 Servers, (worst case 2 crash per week) that's 3%

bJYSwzK.jpg
 
Back
Top Bottom