Consistent BSOD in "Control" - 5900X/3090

Associate
Joined
2 Jun 2007
Posts
1,084
Location
London
Hello all,

Just completed the first build in 10 years and have unfortunately been plaged with a BSOD issue.
Specs are:

AMD R9 5900X
ASUS STRIX X570-E Motherboard
64GB Corsair Vengence PRO 3600MHz (2x 32GB packs)
Nvidia RTX 3090FE
Sabrent Rocket 4.0 1TB
Corsair HX 850W PSU (left over from my old build - 10 years old)
NZXT Z73 AIO (3x 120mm)
NZXT H710

Windows 10 x64 (20H2)
Nvidia driver version 457.51
BIOS version 2816 (beta) / 3001

Symptoms:

This issue really only started happening since day 2 of the build.
Every time I boot into Control I get a consistent BSOD within 1-2 minutes (typically immediately or within 10-15 seconds). The BSOD reason is the ever-helpful "WHEA_UNCORRECTABLE_ERROR" and it reboots.

I have been able to cause this to happen one other time during a stress test of 3DMark Port Royal, but only after 60+ minutes.

Other than the BSOD the only out of the ordinary thing is the noticible coil whine on the 3090 while it's under load.

Troubleshooting so far:
  1. Swapped all 4 DIMMS pairs, ran with set A then set B
  2. Ran with 1x DIMM (tested all 4) in the primary slot (A2)
  3. Turned off D.O.C.P and manually set voltage (1.35v) and speed (3600MHz)
  4. Turned off D.O.C.P completely and ran at stock settings (2866MHz)
  5. Cleared CMOS and left at optimised defaults
  6. Downgraded Nvidia driver version to 457.09 after a DUU clean
  7. Reformatted windows 10, installed only the AMD Chipset driver and Nvidia graphics driver.
  8. Upgraded to BIOS version 3001 (just released for this board)
Under each of the above the BSOD in "Control" occurs.

I am dreding the RMA of a £1400 GPU. But it appears to either be a card issue or a PSU load issue (the aging Corsair HX 850W) which may just not be able to keep up with the load. However this doesn't explain why Running Port Royal for an hour caused the issue, whereas it's consistent in Control almost immediately.

I'm going to try and do the following:
  1. Memtest86 on all 4 DIMMs overnight
  2. Trying to replace that aging PSU with an ASUS ROG 1200W (only PSU I can even remotely get my hands on right now - if the PC component gods are kind to me)
  3. Test the issue with my GTX 980 - unfortunately that is in no way comparible with the RTX 3090 but it's all I have (anyone in East London fancy lending me their 3000 series card for testing?)
  4. RMA the 3090FE - Absolute last resort :(

Anyone have any other ideas? or next steps?
 
Last edited:
UPDATE #1
Ran MemTest86 throughout the day - All tests were selected, and I ran 4x passes of each (including the fabled "Hammer" test). Zero errors returned after 7:16:53 of testing. I also re-instated the XMP/DOCP profile before the test - so it was running at 3600MHz/1.35v - this pretty much rules the memory out at this point.

I'm currently stressing the CPU with an hour long run of CPU-Z stress test/FurMark CPU Burner - no issues thus far - CPU is maxed at 100% on all cores/threads and is topping out at 77C with 39C liquid temperature.

Later I'm going to:
  1. DDU and upgrade to Nvidia 460.79 driver version
  2. Remove the 3090 from slot #1 and move it to slot #2
  3. Try reseating it in slot #1 again if the problem goes away.
  4. Remove my nice fancy ATX and EPS custom cables (I really want to avoid this, as it's effectively disassembling the PC, but needs must)
  5. Hotwiring the GPU to run off a secondary PSU if I can - might need to scavenge a decent watt one from my server for this job.
More updates to follow.
 
Last edited:
UPDATE #2
Spent the evening moving the card around between PCI-E slots, it looked as if the second slot didn't produce a BSOD, but after 20-30 minute it did the same thing.

The PSU left over from my old build has two PCI-E 8-pin connectors that are "built-in" (e.g. not modular) and modular cables for additional ones if required. Thinking I may just be pushing a rail of the PSU too hard I moved one of the PCI-E connectors for the 3090 FE onto another (modular) socket.

This seems to have fixed the problem - so I expect I was just pushing an aging PSU a little too hard and it was causing power dips resulting in the BSODs.

The (competitor) order of the ASUS ROG 1200W PSU was fulfulled today, so this will once and for all resolve the problems I'm expecting. Shall keep this thread updated once the new PSU arrives but I think the source of the problem has been found.
 
Indeed the PSU replacement fixed it. However I'd fixed it before the new 1200W PSU arrived by balancing the 3090 power connectors across different ports on my older 850W PSU.
I guess in the original configuration it was pushing a single rail too far - moving one pwoer connectior from the built-in cabling to a modular connector fixed the problem.

Might be worth giving that a go youself.
 
If you were daisy-chaining the connection and drawing the load on one cable, two separate ones would most certainly make a difference on a high load card.

wasn't strictly a daily chain. The Corsair HX850W was semi-modular, so I was using both of the built-in PCI-E connectors.
 
Back
Top Bottom