Threadripper rig crashing - 8Pack RAM?

Soldato
Joined
18 Aug 2007
Posts
9,918
Location
Liverpool
Edit: Solved - please don't waste your time reading this.

Specs as per sig, but for those on mobile:

AMD Threadripper 3960X at stock
Asus RoG Strix TRX40 E-Gaming
32GB (4x 8GB) 8Pack Edition RAM 3600c16 ('stock' @ DOCP 1.35v)
Sapphire Pulse Vega 56
Seasonic Prime Ultra Platinum 1000W PSU
Samsung Evo Plus 1TB NVMe

Since I built this machine I've had crashes under heavy load on Linux (mostly when running FoldingAtHome on GPU and BOINC on CPU at the same time). It ran OK under Windows so I assumed it was just a product of the hacky way I'd been running OpenCL on Linux to leverage the Vega card for FaH.

I separately noticed that if I enabled secure memory (core isolation in Windows) or TSME in the BIOIS (to encrypt RAM) the machine would hard crash regularly, even in BIOS. I figured it was a bug in the BIOS and just disabled it, albeit begrudgingly as I spent thousands on this rig and it runs fine on my outgoing 8700K machine.

This week, Asus released a BIOS update with 1.0.0.4 AGESA for the CPU. I updated to it and my machine is turning off every few minutes, just a hard power off to a black screen, with a red light on the mobo and the OLED says 'Check CPU'. Uh oh. I reset the BIOS to stock just in case, but the crashes still happen. I managed to boot to WIndows, and RealBench (running stress test with up to 32GB RAM) crashes after a second or two saying 'instability detected'. Everything's at stock!

If I run RB with anything less than my actual RAM (16GB, 8GB, 4GB) it runs all day long with no issues. As soon as I try to use 32GB it crashes out with 'instability detected'. I'm about to load a memtest live USB but am I right in suspecting it's probably the RAM? I know the mobo OLED said check CPU but a quick search suggests that's a very generic error and often means RAM or PSU.

With the machine being practically unbootable with TSME enabled or core memory isolation enabled, this is making me suspect the RAM above anything else. Does this sound reasonable?

If memtest passes I'll try the usual - two sticks instead of four, switch slots on the mobo, reseat everything etc. For now I'm just looking for preliminary thoughts. TIA.

Edit: I can't go back to the old BIOS because Asus kindly removed all previous versions from their site. I did keep a backup of the old BIOS file, but the new BIOS simply says 'Not a recognised BIOS file' when I try to use it. FFS Asus!

Edit 2: I booted a Linux live USB just fine (Artix), and passed dozens of runs of Prime95 (blend, small and smallest FFTs). It also passed a stress test of the RAM directly. I booted into Windows and RealBench crashed with instability detected (7z). I rebooted into BIOS, changed DOCP to Default, which set RAM to 2133MHz at 1.35v and fabric speed to 1200MHz. Booted into Windows and RB still crashes after a few seconds with instability detected (7z). It's gotta be the RAM, right?!
 
Last edited:
Honestly first thought after reading that was it might actually be the motherboard and/or ram slots rather than the ram.

It is one of those situations where a lot of testing is likely going to be needed, including different slots etc sadly.
 
Honestly first thought after reading that was it might actually be the motherboard and/or ram slots rather than the ram.

It is one of those situations where a lot of testing is likely going to be needed, including different slots etc sadly.

I'm starting to pull out chunks of hair! See the edit I posted just after you replied. Linux seems to work fine and passes Prime95 runs and Stress. Windows reports no errors in the install (sfc /scannow) and everything's fine, except these random power offs. I'm about to physically pull the RAM and re-seat, and then start trying 2 sticks instead of 4 and alternating them (to test the RAM and slots as you suggest). Ugh.
 
I'm starting to pull out chunks of hair! See the edit I posted just after you replied. Linux seems to work fine and passes Prime95 runs and Stress. Windows reports no errors in the install (sfc /scannow) and everything's fine, except these random power offs. I'm about to physically pull the RAM and re-seat, and then start trying 2 sticks instead of 4 and alternating them (to test the RAM and slots as you suggest). Ugh.
Right, had a quick google and there 'might' be a bug in realbench (not sure if it's still around but it's showing as quite recent in google) relating to memory/page file sizes. Might be worth trying a different tool to check.
 
Right, had a quick google and there 'might' be a bug in realbench (not sure if it's still around but it's showing as quite recent in google) relating to memory/page file sizes. Might be worth trying a different tool to check.

I found similar info so I double checked my pagefile... It was set to 1000MB. :o I don't know how, I don't know when. This is a new install of Windows 10 x64 for Workstations 2004. I set it to system managed/auto pagefile and rebooted, and RB works fine again! At least I don't have to tear down the system, as that would have been a huge headache. I still can't enable TSME but that's not a hardware issue, and getting support from Asus is a blood from stone job so I'll just have to keep digging. Thanks for your input mate, your reply pointed me to the answer.
 
I found similar info so I double checked my pagefile... It was set to 1000MB. :o I don't know how, I don't know when. This is a new install of Windows 10 x64 for Workstations 2004. I set it to system managed/auto pagefile and rebooted, and RB works fine again! At least I don't have to tear down the system, as that would have been a huge headache. I still can't enable TSME but that's not a hardware issue, and getting support from Asus is a blood from stone job so I'll just have to keep digging. Thanks for your input mate, your reply pointed me to the answer.
Might be worth shooting an email off to OCUK over the TSME issue, at the end of the day 8pack is basically their 'own brand' so they might be able to point you in the right direction to fixing it.
 
Back
Top Bottom