Faulty/incompatible memory or something else...?

Soldato
Joined
29 Oct 2004
Posts
10,884
Hi all,

I'm experiencing a bit of system instability and wondered if you could help point me in the right direction in terms of diagnosing it. Years ago this would have probably been an easy one for me but I've been out of the game for a while so need some pointers!

System is freezing/rebooting/suffering file corruption/BSOD (IRQL_NOT_LESS_OR_EQUAL/attempting to write to read only memory) depending on what I'm doing.

It started with system freezes (unresponsive but desktop still visible on screen) in Ubuntu when training machine learning models with TensorFlow (basically high GPU usage/moderate CPU+RAM usage) and then moved on to file corruption when running TensorFlow on CPU (all 24 cores in 100% use, 60-70% RAM usage). Slightly worryingly when I rebooted into Windows after seeing the corruption in Ubuntu, Windows told me there was OS corruption and wanted to repair my install (which failed). Removing one of the sticks of RAM resolved that issue for a while.

After reinstalling the other stick a day or so later, GTA V just crashed after about 5 minutes with a BSOD. Then on reboot the system wouldn't POST. Removing the stick of RAM again allowed it to POST again, but half way through writing out this post the system rebooted itself and got stuck in a BSOD loop again.

Backing off from D.O.C.P settings to Auto (DDR3200 => DDR2133) has (so far...) allowed me to try and finish off this post.

To me this seems obvious that it's a RAM issue, but I want to be able to say that for certain. I think a dodgy PSU could also cause issues like this, or maybe even the CPU (now that the memory controller is part of the chip). Temperatures don't seem unreasonable (~50 degrees as reported by Ryzen master when typing this). Unfortunately I don't have any components I can swap out to narrow it down in that way.

I thought I'd start by running a RAM check. In the old days I'd run loops of memtest86+ but gather that's a bit out of date now?

Basic system specs:

AMD 3900X
ASUS B550-F motherboard
2x32GB Corsair Vengeance LPX (3200/C16)
NVIDIA 2070 Super
650W Antec TruePower PSU

Any help/tips/pointers would be much appreciated!

Thanks :)

TL;DR: how do I check my RAM on a modern Ryzen-based system?
 
Memtest86 and Ram Test by Karhu are my two go to choices.

That should immediately determine if your RAM is faulty.
 
Thanks! Just downloaded the latest memtest86 and started running it on one stick at d.o.c.p settings

Interestingly I also just noticed that Bluetooth wasn't working - I had to disable and then re-enable it in bios before it came back. I'm sure it was working fine before today's bsod...
 
aaand 10994 errors in memtest on the first pass already!

If I test the other stick and it works fine I guess it can be reasonably assumed that it's a faulty stick and not incompatibility, but if both sticks fail it could be two faulty sticks, or incompatibility, or even still something else?
 
BIOS flashed successfully but errors persist with 1 stick at D.O.C.P settings :(

Trying with auto RAM settings now, but not sure what that will tell me if it passes. Maybe the more useful test is for me to try the other stick on its own at the rated speed tomorrow
 
PSU is near enough 10 years old now.

  • Every time I ran memtest on one stick it failed within 2-3minutes at D.O.C.P settings, but passed memtest at default (auto) settings
  • Increasing voltage didn't help (from 1.35 to 1.39). Quite the opposite - the BIOS interface switched to a mix of English, French and Chinese and locked up there a few times. Reverted back to 1.35v and BIOS fine again.
  • Replaced the seemingly dodgy stick with the other one from the matched pair and it passed memtest, no errors at all even at D.O.C.P settings
  • Added the other stick and running memtest now. 1 pass completed successfully.

From the above it seems like the options are:
  • Overheating - not convinced on this one as it's similar ambient temp today and all fans are/were working
  • Badly seated DIMM - seems like the most likely cause but curious how it would be fine sometimes but not others
  • PSU - also seems like a plausible cause. Could explain why this issue isn't constant, and may also explain the complete lock ups I've experienced before (system goes from fine to complete freeze/totally unresponsive without any other apparent corruption/preceding instability)
 
Just had to RMA a kit of Corsair memory myself (only 1 year old), one module obviously faulty. It happens... Handy lifetime warranty though and a couple of days after raising the RMA, Corsair have just stuck a replacement memory kit into the post for me.

Interestingly though, while one module failed repeatedly on the Windows 10 Memory Diagnostic Tool, it would pass several hours of MEMtest Pro without an issue. Definitely a faulty memory module though, as system 100% stable running the one module that tested OK, but very quickly collapsed in a heap on the obviously faulty module. Strange but true.
 
Back
Top Bottom