ML350 keeps locking up randomly, Any ideas?

Associate
Joined
30 Jul 2003
Posts
442
Hi Well i have a client with an ML350 running 4x sas on P400, 4gb ram, server 2003 standard.

Recently since i last restarted it it now randomly locks up. Error log shows nothing (other than it was running up until.. then when i restart it)

I've redone all the backup and virus scan shedules as 2 of the servers were trying to backup the same files at the same time, and norton decided to do a virus scan when a backup was running.

So i first though oh thats prolly it then.... servers were fine for 5 days, then this morning i get a call with people going "we cant access Z Drive"

Ugh, get there, server locked up again ..... I'm going there at the weekend to run Memtest. Anyone got any ideas? Its the DC thats crashing, the exchange machine is running fine but the Dc crashes, it is running APC Software, and Timeslips.

Anyone have some magical diagnostic tool that i could use as well to check like the raid card or hdd's?

Many thanks

Ben
 
Boot it off SmartStart and put it on a soak test.

Don't bother with memtest. Smartstart will do it all.

Other than that, make sure everything is seated.

Have you checked the Insight Log for any hardware errors or thermal events?
 
I havent yet, the IT company I took over from have made the server a shambles. they charge £14k for the 2 servers then a further £14k for them to be setup on site ..... am i mad or does that seem a bit pricey especially as they have only been installed for about 8 months and the company are down to 10gb of space as they were sold the wrong sized hdd's

I'll see if they were kind enough to leave me the smart start disk... are they downloadable from anywhere?

Not even sure if the insight manager was installed, I've spent the last 2 months cleaning up the mess, things like all the pc's are on static ip's... all the gateways and dns's are manually set.

Madness i tells ye
 
lol indeed nearly fell off my chair thinking why why did i not sell them this at 14k ... :P

but then again when you think of it its closer to £28k for the entire install.... thats just upsetting thats like a years wages in 2 weeks :(

Right so checked the instight jobbie, error log is empty ..... usefull

So, i've uninstalled all the updates that were installed at last restart jsut incase it was some conflict of somthing, as before the restart the server was working for about 3 months non stop no problems.

Umm yea so soak test at the weekend.

Any other software ideas?

In the security log i did notice that one of the machines logged onto the server at 20:56 and logged off at 20:57 and thats when the server went down ..... i though hmm mebbe a user caused the problem? but its the accountants machine which was left on running timeslips, and its a win 2k box, the timeslips may be an issue as its a central database esk program and the backup does access those files to .. well bak them up.

Hmmmmm, hate the ones were theres like no leads or clues and its a load of guess work until you get it right, as it makes the client think your just milking them for money :(
 
How has it locked up.

On the screen it was last left at?
Nothing visible?
Can you still move mouse?
Can you turn on/off numlock?

Screen was blank this time, altough last time it locked it was at the login screen (ctrl-alt-del)

nothing was visible this time

cant move mouse

no numlock

Was completly frozen... :(
 
Are you positive it isn't thermal?

Might be worth getting the latest Firmware CD from hp.com and upgrading it all, dump the latest (supported) PSP on too.
 
Its in a cupboard with just network gear at about 16 degrees :S air conditioned, its freeeezin in there. ill check the seating though, doesnt mean that the hsf might of come lose or somming mebbe :S
 
Well, the Insight Log Viewer should be showing any thermal events.

Sure you loaded the right thing? :p It's empty?

Yea im pretty sure i did, although im starting to wonder :P

had hdd diagnostics and post logs etc in it. but the system log was empty which was a bit strange ...

Ill double check and take a screeny in a bit :)
 
Make sure you have the latest network drivers - IIRC HP released a new step up (6.5?) and pulled the earlier drivers.

We have been having a problem with ML350s locking up and the upgraded drivers seemed to have cured the issue.

Having said that, it was on 2008 boxes.

Not sure if you're supporting this at the moment but check to see if you have had an IPv6 address set as your preferred DNS. It's caused all sorts of issues with updates causing the box to fall over as AD fails.
 
right so, im on site, looking at this server. It's running a full test on everryytthinngg and will be a while :(

but i did come accross somthing interesting...

i logged on to check my cd was ok etc.. no cd rom ... whaaaa hmm I thought, mebbe the old it firm unplugged it for "security reasons"

open the chassie (no not whilst it was on)

cd rom is plugged in.... hmmm. rebooted.. no cd rom .. how strange. gave the cables a wiggle. powered on, cdrom appears :O

now im wondering, if the cd rom "vanished" whilst the server was live, this would more than likely cause a lock up as it does on my home pc when a hdd decides the cable wants to fall out, i just get a lock and then have to hard reset.

Thoughts?

Diak
 
Back
Top Bottom