Consistent BSOD in "Control" - 5900X/3090

Associate
Joined
2 Jun 2007
Posts
1,084
Location
London
Hello all,

Just completed the first build in 10 years and have unfortunately been plaged with a BSOD issue.
Specs are:

AMD R9 5900X
ASUS STRIX X570-E Motherboard
64GB Corsair Vengence PRO 3600MHz (2x 32GB packs)
Nvidia RTX 3090FE
Sabrent Rocket 4.0 1TB
Corsair HX 850W PSU (left over from my old build - 10 years old)
NZXT Z73 AIO (3x 120mm)
NZXT H710

Windows 10 x64 (20H2)
Nvidia driver version 457.51
BIOS version 2816 (beta) / 3001

Symptoms:

This issue really only started happening since day 2 of the build.
Every time I boot into Control I get a consistent BSOD within 1-2 minutes (typically immediately or within 10-15 seconds). The BSOD reason is the ever-helpful "WHEA_UNCORRECTABLE_ERROR" and it reboots.

I have been able to cause this to happen one other time during a stress test of 3DMark Port Royal, but only after 60+ minutes.

Other than the BSOD the only out of the ordinary thing is the noticible coil whine on the 3090 while it's under load.

Troubleshooting so far:
  1. Swapped all 4 DIMMS pairs, ran with set A then set B
  2. Ran with 1x DIMM (tested all 4) in the primary slot (A2)
  3. Turned off D.O.C.P and manually set voltage (1.35v) and speed (3600MHz)
  4. Turned off D.O.C.P completely and ran at stock settings (2866MHz)
  5. Cleared CMOS and left at optimised defaults
  6. Downgraded Nvidia driver version to 457.09 after a DUU clean
  7. Reformatted windows 10, installed only the AMD Chipset driver and Nvidia graphics driver.
  8. Upgraded to BIOS version 3001 (just released for this board)
Under each of the above the BSOD in "Control" occurs.

I am dreding the RMA of a £1400 GPU. But it appears to either be a card issue or a PSU load issue (the aging Corsair HX 850W) which may just not be able to keep up with the load. However this doesn't explain why Running Port Royal for an hour caused the issue, whereas it's consistent in Control almost immediately.

I'm going to try and do the following:
  1. Memtest86 on all 4 DIMMs overnight
  2. Trying to replace that aging PSU with an ASUS ROG 1200W (only PSU I can even remotely get my hands on right now - if the PC component gods are kind to me)
  3. Test the issue with my GTX 980 - unfortunately that is in no way comparible with the RTX 3090 but it's all I have (anyone in East London fancy lending me their 3000 series card for testing?)
  4. RMA the 3090FE - Absolute last resort :(

Anyone have any other ideas? or next steps?
 
Last edited:
WHEA errors with a 3rd gen ryzen usually point to memory issues (well, flaky bios support for memory anyway).

Anything above 3200 can be problematic right now so I'd suggest underclocking to 3200 c16, SOC 1.1v and ddr around 1.35/1.4v (or whatever your xmp is rated for).

I think it's less likely to be gpu related. Can you test something that mostly stresses the gpu just to rule it out?

If it helps, I'm on a similar setup and Control seems stable (5900x, MSI x570 Ace, 4x8GB sticks of 3600 c14, 3080). My current kit is running at xmp, but I had big problems overclocking my old 3200 c14 kit.
 
Last edited:
UPDATE #1
Ran MemTest86 throughout the day - All tests were selected, and I ran 4x passes of each (including the fabled "Hammer" test). Zero errors returned after 7:16:53 of testing. I also re-instated the XMP/DOCP profile before the test - so it was running at 3600MHz/1.35v - this pretty much rules the memory out at this point.

I'm currently stressing the CPU with an hour long run of CPU-Z stress test/FurMark CPU Burner - no issues thus far - CPU is maxed at 100% on all cores/threads and is topping out at 77C with 39C liquid temperature.

Later I'm going to:
  1. DDU and upgrade to Nvidia 460.79 driver version
  2. Remove the 3090 from slot #1 and move it to slot #2
  3. Try reseating it in slot #1 again if the problem goes away.
  4. Remove my nice fancy ATX and EPS custom cables (I really want to avoid this, as it's effectively disassembling the PC, but needs must)
  5. Hotwiring the GPU to run off a secondary PSU if I can - might need to scavenge a decent watt one from my server for this job.
More updates to follow.
 
Last edited:
UPDATE #2
Spent the evening moving the card around between PCI-E slots, it looked as if the second slot didn't produce a BSOD, but after 20-30 minute it did the same thing.

The PSU left over from my old build has two PCI-E 8-pin connectors that are "built-in" (e.g. not modular) and modular cables for additional ones if required. Thinking I may just be pushing a rail of the PSU too hard I moved one of the PCI-E connectors for the 3090 FE onto another (modular) socket.

This seems to have fixed the problem - so I expect I was just pushing an aging PSU a little too hard and it was causing power dips resulting in the BSODs.

The (competitor) order of the ASUS ROG 1200W PSU was fulfulled today, so this will once and for all resolve the problems I'm expecting. Shall keep this thread updated once the new PSU arrives but I think the source of the problem has been found.
 
I think given the last update, yes the PSU change solved the issue(he said as much), David has posted once since then in another topic (on the 22nd) and not reported any further problems.
 
Indeed the PSU replacement fixed it. However I'd fixed it before the new 1200W PSU arrived by balancing the 3090 power connectors across different ports on my older 850W PSU.
I guess in the original configuration it was pushing a single rail too far - moving one pwoer connectior from the built-in cabling to a modular connector fixed the problem.

Might be worth giving that a go youself.
 
If you were daisy-chaining the connection and drawing the load on one cable, two separate ones would most certainly make a difference on a high load card.
 
I think a lot of older or somewhat less strong power supplies struggle with the spikes that can be seen with the 3080/90.
 
If you were daisy-chaining the connection and drawing the load on one cable, two separate ones would most certainly make a difference on a high load card.

wasn't strictly a daily chain. The Corsair HX850W was semi-modular, so I was using both of the built-in PCI-E connectors.
 
wasn't strictly a daily chain. The Corsair HX850W was semi-modular, so I was using both of the built-in PCI-E connectors.

Wow sounds like you went through quite a testing procedure to figure this out. Glad you did, I think there's a few here that have had issues and your example might help them.
 
I have the same power supply running on a new 5800x build BUT I am still using my old GPU (EVGA 1070) so definitely upgrade my power supply when I get a new GPU, which I am planning to do once stock levels settle down a bit.

Edit: what I mean to say is I was thinking about replacing the PSU anyway, but this thread confirms that, so thank you for posting.
 
Last edited:
RTX 3090 has over 550W transients.
https://www.igorslab.de/en/nvidia-g...-and-common-decadence-if-price-is-not-all/16/
Besides being stressfull for PSU in general, those spikes are certainly high risk for problems if using split from end cables instead of using separate cable for every power connector of the card.

Sounds like terrible design on Corsair's part. Is it a single 12v rail psu or multi? If it's multi rail and they're both on the same rail, that's stupid, but an internal daisy chain is even worse.
All PSUs have 12V wires connected into same source inside PSU.
Those "multiple rails" are nothing more than lots of smaller fuses in fusebox instead of that one big main fuse.
 
Can also try undervolting the GPU to check whether the PSU is at its limit even while using separate connectors. 850W is a bit close for a 3090 though.
 
.

All PSUs have 12V wires connected into same source inside PSU.
Those "multiple rails" are nothing more than lots of smaller fuses in fusebox instead of that one big main fuse.

Yup this is true, looking at how many "rails" a PSU has has nothing to do with what it can deliver to your hardware. It's just marketing ********.
 
I am no PSU expert or power electronics engineer. But having the 12V line split into multiple rails or lines have certain advantages - less stress on individual line power electronics and components for one.

yes everything has to come from the same plug hole or 3pin. But having daisy chained pcie connectors on the older psu isn’t a particularly good idea for modern GPU. Those older PSUs were never really intended for drawing 300w+ power on a GPU with daisy chained connectors and power line.
 
I am no PSU expert or power electronics engineer. But having the 12V line split into multiple rails or lines have certain advantages - less stress on individual line power electronics and components for one.
You can't ever have multiple rails, when there's only one 12V supply.
Or do you claim your house has as many power lines coming into it as there are fuses in fuse box?
 
You can't ever have multiple rails, when there's only one 12V supply.
Or do you claim your house has as many power lines coming into it as there are fuses in fuse box?
a house fuse box has multiple RCD which protects each circuit - this is the same analogy as PSU's seeprate 12V rails etc. these circuit RCD ensures each circuit is protected against overload/spikes
a house fuse box has a high duty RCD sit above all the circuit RCD - this is the same analogy as the PSU's protection circuit at the power input. this ensures the whole house/pc is protected against power spike etc.

what you saying is that there is no split of loads within the fuse box which is wrong. there should be and there is in the current PSU designs.

if you do not split the 12V lines, then it is will have pretty nasty outcome. in old days, GPUs dont drawing current from the secondary PCIe link that's why it is daisy chained off the end of a single PCIe cable. not so these days. so if you only have a single protection circuit for a single 12V line, then you are pretty much screwed if one component spikes, it will knock out and damage everything else on that line also.

have a read of article below.

https://hexus.net/tech/tech-explained/psu/66781-tech-explained-need-know-psus/

What about multiple 12V rails?
If you already know a thing or two about power supplies, you'll be aware that manufacturers and users have in recent years been touting the benefits of either having a single 12V rail, or multiple 12V rails. Trouble is, if both of them are said to offer benefits, which allocation of rails is right for you, and why did we go from single to multiple in the first place?

The answer is simpler than you'd think. With modern-day components placing a greater demand on the 12V rail, Intel's ATX specification was amended to suggest that PSUs should feature two 12V rails with independent over-current protection for safety reasons. By limiting the flow of amps in each rail, there's less chance of wires becoming dangerously hot.

Multiple rails are a good idea, then, but unfortunately for Intel's specification, a few poorly-constructed power supplies gave multiple rails a bad reputation that continues to linger. Instead of separating PCIe connectors across multiple rails, some PSUs were found to feature all of their available PCIe connections on one rail - resulting in an overload when multiple components were attached, ensuring an automatic shutdown.

Fortunately, that inappropriate layout is becoming a rarity, and most PSUs, such as the Enermax line, feature one 12V rail solely for PCIe connectors, and another for the PC's other components.

The bottom line? For the vast majority of users, there's no perceivable difference between a single- or multiple-rail PSU. What's of greater importance is the PSU's range of connectors.
 
Back
Top Bottom