Problems with a crashing server

Associate
Joined
20 Jan 2010
Posts
66
Location
London
Hi,

Not sure if this is the right section to ask but,

I am volunteering at a charity and their HP Proliant server in the office is "Crashing" and with a Kernel-Power error, Event ID 41:

"The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly."

I first thought it was the UPS failing but I ran a self test and it passed. I also removed power to the UPS and it successfully shut the server down.

I checked for BSOD's and looked through the event viewer but the logs just stop and then start again with the critical Kernel-Power error.

I then contacted HP (we have a care pack for the server) and ran some diagnostic tests, but as it wasn't a clear cut hardware issue they refused to do anything.

The server is crashing about 2/3 times a month and has been doing so for the last 6 months. I am really struggling to figure out what is going on here. Any help with the issue would be greatly appreciated.

Many thanks,
Will
 
Last edited:
i would look to do extended memory tests, as it sounds like its a build up.
temperately, you could try nightly or weekly reboots of the server using the scheduler.
also what does the server do? exchange? file shares? ad?
 
Thanks for the quick reply, I will run a memory test tonight.

The server runs Exchange 2010, file shares and AD.

If I don't find any memory issues I will schedule a weekly reboot and see if that fixes it.
 
Do a full firmware and software update. Download something called HP SPP (Service Pack for Proliant) which is an ISO. Run \hp\swpackages\hpsum.bat on the server itself, follow the prompts, and it will update everything in one hit. There are loads of known issues that get resolved in this way.

Am surprised this wasn't the first thing HP support recommended.
 
Sorry rotor, I was not very specific in what HP asked me to do.

They did make me do a full bios, firmware and software update about 1 month ago. The issue reoccurred 3 times since.
 
how much memory does the server have? how much is used? also how much is paged?
exchange is will take up 50% of memory no matter how much and its a real pig!
 
The server has 8GB of Ram with an 8GB pagefile.

Currently at 96% Ram usage.
Exchange store ~ 2GB
2 SQL databases (Windows SBS + Windows Internal Database) ~ 1GB each
The rest is mainly IIS for remote access and OWA.
 
SQL will grab all available memory if memory max not set. I would setup some performance monitoring on RAM usage and CPU usage. Then take look at the performance logs after a crash and see if either are max'd out?

I would also perhaps setup performance monitoring on DISK I/O and memory paging too?
 
Thanks for the help.

Ok ASE I will set that up, Would you recommend I set a limit for SQL then?

It seems that HP Integrated Log viewer is not installed... I will have a look on HP's website. Annoying thing is that I arrived when this system was already set up.

Ran the memtest last night and it passed.
 
Which HP server is it? Not all the HP servers support the HP Integrated Log Viewer (ML150's etc.)

Grab the memory dump file from C:\windows\minidump and open it with the Windows Debugging Tools. This will give you a more detailed reason as to why it BSOD http://msdn.microsoft.com/en-gb/library/windows/hardware/ff551063(v=vs.85).aspx and run !analyze -v

SBS 2011's with 8GB ram are awful, it runs much better with 12-16GB.
 
If it's a newish Gen8 server with ILO4, make sure your ILO version is running at least 1.51 - we've had issues with servers randomly rebooting themselves thinking that the OS was hung.
 
Would you recommend I set a limit for SQL then?
.

Not an exact science and depends on what other applications you are running? The type of disk subsystem you are running? Type of databases? Etc?

But I use the following formulas as the initial setting then tune it after I have gathered performance information over a number of months.
OS Memory = TotalMemory - (1% * (numa nodes))- 3% -1Gbyte

Total SQL Buffer Memory (Max mem) = TotalMemory - OS Memory - application memory.

The SQL buffer memory then can be shared between each SQL instance. The larger the buffer memory the better, reduces Disk I/O since SQL can cache more data in memory.

PS numa nodes is typically the number of physical CPU sockets not CPU cores!
 
does the mem test loop for x amout of times?
i would be included still to have it run over the weekend continually

It looped 4 times. I will try and run it next weekend.

Which HP server is it? Not all the HP servers support the HP Integrated Log Viewer (ML150's etc.)

Grab the memory dump file from C:\windows\minidump and open it with the Windows Debugging Tools. This will give you a more detailed reason as to why it BSOD http://msdn.microsoft.com/en-gb/library/windows/hardware/ff551063(v=vs.85).aspx and run !analyze -v

SBS 2011's with 8GB ram are awful, it runs much better with 12-16GB.

It is a HP Proliant ML110 g6.

It is not a BSOD, the minidump folder is empty and it is set to create minidumps if it does BSOD.

If it's a newish Gen8 server with ILO4, make sure your ILO version is running at least 1.51 - we've had issues with servers randomly rebooting themselves thinking that the OS was hung.

Its a Gen6 server and as far as I am aware it doesn't have ILO installed.

Thanks for helping out with this.
 
Its a Gen6 server and as far as I am aware it doesn't have ILO installed.

Quickspecs say it likely has a LO100, which is a kind of cut down version. Since the memory checks out, I'd definitely go with rotors suggestion of running the SPP against it.
 
Back
Top Bottom