Headless server locking up sporadically

Associate
Joined
6 Feb 2004
Posts
1,376
Location
Toon
Hi All

I'm usually OK with this sort of thing and I think I know the problem - just a sanity check/other ideas I may have missed before I go throwing money at the issue.

Old X99 system running Server 2019 has started locking up recently. There have been some changes, which were made at the same time I upgraded (clean install) from Server 2016:

'New' NVME drive
Additional Memory

However, I've temporary changed everything back to how it was before (still had the old SSD with Server 2016 on) and the problem persists.

The biggest issue is the server is headless, and I've not been able to get any errors, stop codes etc out of it, so there's an element of guesswork. Event viewer doesn't record anything at all, so it's some sort of hardware fault I think.

HWInfo has revealed a SMART error, which I'm assuming is the cause of the problem. I'm not great with HDDs so really just wanting to check that this is likely to be what's locking up the server. The error says:

[05] Reallocated Sector Count: 100/10, Worst: 100 (Data = 16,0)

I dont really understand what that means, but as it's the only thing I can find that's wrong, I just want to check it won't be wasting my time replacing the drive (12TB so not cheap).

Cheers!


*edit* forgot one possible key fact - I also swapped out the old 5820k for an engineering sample intel chip (12 core, 24 thread, picked up by CPU-Z as a 'Xeon-2000'). However this was done a few weeks before the OS upgrade and was fine on server 2016, so I don't think that's the issue.

Also just to rule out the other obvious stuff, temperatures are fine, it rarely goes above 35ºC
 
Last edited:
There's a monitor attached now, I attached one in the hope it would reveal something. However when I checked, it's just locked up on the lock screen, no BSOD.

Been pretty thorough with the memory testing, ran memtest for 24 hrs, and have also swapped in/out several sticks, as this is normally the most likely issue from previous experience.
 
Last edited:
Nobody any thoughts? The main question I'm really asking is:

Could the 16 re-allocated sectors on the 12TB drive be the cause of the lockups?

My personal feeling is no - although this drive is part of a RAID 0 style server storage space array. The number of sectors is unchanged at 16 so it's not getting worse, I'd imagine windows/the drive would just avoid those sectors. Of course it's something I'll look to replace, but money might be best spent elsewhere at the min...
 
RAID0 and a potentially failing drive doesn't sound like a good combination. Hopefully, you have good backups.

I wouldn't expect a failing drive to cause a hard lockup as you describe.

If you still have the original CPU stick it back in and see how it goes.
 
Yeah everything's double-backed up, wouldn't use RAID0 if not (or risk a drive with known bad sectors :) ). Speed is nice though.

I do have the original CPU tbf, so I guess that's the next logical step
 
Back
Top Bottom