3080 FE - Black screen restarts

xPETEZx · 22 Mar 2021 at 09:55

I completed my new build system start of feb when I managed to obtain an RTX3080 FE, full spec:

CPU: 5900x
Cooler: Dark Rock 4 Pro
Mobo: MSI X570 Tommahawk
RAM: Crucail Ballistix Red RGB 2x 16GB 3600 CL16
SSD: Samsung 970 Evo Plus 1TB
PSU: Corsair RM750i
Case: Fractal Design Define 7 Compact (w/ 4x 140 & 1x120 Fans)
GPU: nVidia RTX3080 Founders Edition (connected using Corsair 12-pin cable)

Since last Wednesday I have been having random black-screen reboots. I had updated to latest nvidia drivers earlier that day.

The crashes usually happen in-game, and usually within a few seconds of the game full loading. However they have also occurred at desktop just web browsing.

Here is what I have tried so far:

1.) reverted driver to previous version (using device manager revert button)
2.) Clean-install of the latest drivers
3.) Disabled GeForce experience overlay and updated to the latest version.
4.) Ran the card at stock clocks instead of its usual undervolt of 850mv @ 1850Mhz
5.) To rule out the CPU, ran stock. Instead of the usual -30 curve. (Ran Prime95 for 4hrs at usual undervolt to confirm stable)
6.) To rule out RAM, ran at stock instead of XMP. (Ran Memtest @ stock & XMP settings. 0 errors after 5hrs+)
7.) Confirmed all physical cables secure. Reseat card securely in PCIe slot and ensured good fit.

So far none of these tests or fixes have worked. Most annoyingly I cant reliably cause the issue. Sunday I had 3 black screen reboots while just using Firefox (trying to complete the census of all things!)
The day before I had 2 black screen reboots trying to play Hitman 2, then was able to play fine for 2hrs+ on the third attempt.
Day before I attempted to run Prime95+Furmark. On first attempt, after less than 1m of both running system reboots. When it came back up, it ran fan with both CPU & GPU @ 100% load for over 30 minutes.

On Sunday I ran 5hrs+ worth of Memtest86 (both at stock & XMP) to confirm RAM is 100%, followed by 4hrs of Prime95 with CPU at its usual -30 undervolt. (So system online for 9hrs+) Then I browsed the web for a bit, before going to test FurMark one more time... and again black screen reboot within 1m~

All this leads me to believe the issue is the RTX3080 FE. I know some people will point fingers at the PSU, but if the PSU was the issue id be expecting it to crash reliably at very high power loads.
I do have a near 10-year old HX750 I could test too, but would rather avoid doing so, as that thing is long past warranty period.

As these are black-screen-reboots, I get nothing in event viewer. Just a "event id 41 task 63" indicating that the previous shutdown was not clean. Nothing to indicate what caused the issue.
I disabled auto-restart in Windows, but it still does it anyway.

I am at my wits end as to what the issue could be. The system ran 100% reliably for nearly 1.5 months. Since Wednesday I barely trust it...

My next test is going to be a clean install of Windows on a different SSD, install just the bare minimum and see if I get crashes the same way.

Appreciate any tips of pointers any of you guys may have!

JohnG7 · 22 Mar 2021 at 09:57

I would look to the PSU, these cards have transient spikes as high as 1000watt (3090), a good 850w is ideally needed (not all 850w).

FurMark is backlisted by Nvidia and AMD their card throttle when they see it so pointless.

xPETEZx · 22 Mar 2021 at 10:29

JohnG7 said:
I would look to the PSU, these cards have transient spikes as high as 1000watt (3090), a good 850w is ideally needed (not all 850w).

FurMark is backlisted by Nvidia and AMD their card throttle when they see it so pointless.

Why does everybody go straight to PSU? When googling, I have found very few if any topics where anybody actually confirms buying a larger PSU fixed issues. I find more results where that DIDNT fix the problem. Just glance over at the SFF community, where they typically run 600-750W supplies yet plenty of builds with 3080s and even 3090s... I think this "spikes" thing with 30 cards is far more overblown...

The 750i has monitor too, and the highest possible load I have ever seen on the OUT side is 530W, and this was on the 100% CPU & GPU load, so totally unrealistic.
Gaming, the highest I have seen is about 490W on the OUT. That leaves over 260W for spikes... and thats assuming the PSU trips right at 750W...

I changed the 750i' from Multi-rail OCP to Single-rail OCP too, but this also made no difference.

However, that does not rule out the PSU could be faulty. Aside from putting as high a load as possible on the system, can you think of any other way to test the PSU?

IF FurMark is blacklisted, why does it still generate a 100% load on the GPU / result in very high power draw and put out more heat than any game I can run?
Would there be another tool you could suggest instead?

NZXT30 · 22 Mar 2021 at 10:52

Any sort of reboot or shutdown indicates a PSU problem. I had a faulty 3080FE that was getting black screens but no reboots and I would also get a display driver error on the event viewer.

Mesai · 22 Mar 2021 at 10:53

The PSU should be fine; maybe borderline but since it's an FE and you've tried undervolting. I can't see the size being the issue, maybe just a faulty unit if the PSU is the problem.

Assume you're using 2 separate cables into the connector?

I've had a similar issue before with a faulty M.2 slot (the drive was fine) but event viewer at least said it was unable to write the memory dump. If you aren't getting event viewer errors or memory dumps then it sounds like the power is just dropping.

JohnG7 · 22 Mar 2021 at 10:55

xPETEZx said:
Why does everybody go straight to PSU? When googling, I have found very few if any topics where anybody actually confirms buying a larger PSU fixed issues. I find more results where that DIDNT fix the problem. Just glance over at the SFF community, where they typically run 600-750W supplies yet plenty of builds with 3080s and even 3090s... I think this "spikes" thing with 30 cards is far more overblown...

The 750i has monitor too, and the highest possible load I have ever seen on the OUT side is 530W, and this was on the 100% CPU & GPU load, so totally unrealistic.
Gaming, the highest I have seen is about 490W on the OUT. That leaves over 260W for spikes... and thats assuming the PSU trips right at 750W...

I changed the 750i' from Multi-rail OCP to Single-rail OCP too, but this also made no difference.

However, that does not rule out the PSU could be faulty. Aside from putting as high a load as possible on the system, can you think of any other way to test the PSU?

IF FurMark is blacklisted, why does it still generate a 100% load on the GPU / result in very high power draw and put out more heat than any game I can run?
Would there be another tool you could suggest instead?

Because it is all over the forum with other in similar issue till they got a new PSU, best of luck with your issue...

@ Mesai - He already stated he is using the Corsair 12pin.

Mesai · 22 Mar 2021 at 11:05

JohnG7 said:
@ Mesai - He already stated he is using the Corsair 12pin.

Oh right, I didn't realise that was a direct connect cable and not just an adapter.

xPETEZx · 22 Mar 2021 at 11:33

So whats the best way to rule out this being a PSU problem?
Should I risk using my old Corsair HX750? Its 10 years old now. Hasnt been in full time use for 4 years. Used on occasion to test older builds etc.
Just not sure I want to trust my brand new hardware to such an old unit... :|
Other than that I have a 2nd PC with an RM650 PSU in thats also about 5yrs old.

Perhaps I could connect the RM650 just to the mainboard & leave the 750i powering the 3080? (so it has the full supply to itself?) Or the other way round to rule out the 750i being the problem?

Id like to avoid buying a new PSU needlessly. Esp as I like the monitoring stuff on the 750i, and that rare and expensive the higher up you go. (AX860 maybe?)

z10m · 22 Mar 2021 at 11:39

xPETEZx said:
So whats the best way to rule out this being a PSU problem?
Should I risk using my old Corsair HX750? Its 10 years old now. Hasnt been in full time use for 4 years. Used on occasion to test older builds etc.
Just not sure I want to trust my brand new hardware to such an old unit... :|
Other than that I have a 2nd PC with an RM650 PSU in thats also about 5yrs old.

Perhaps I could connect the RM650 just to the mainboard & leave the 750i powering the 3080? (so it has the full supply to itself?) Or the other way round to rule out the 750i being the problem?

Id like to avoid buying a new PSU needlessly. Esp as I like the monitoring stuff on the 750i, and that rare and expensive the higher up you go. (AX860 maybe?)

have you got it set in single rail mode.?
edit: sorry you already said you have done that.

ltron · 22 Mar 2021 at 12:39

Have you checked the event viewer to see if any BSODs are logged?

xPETEZx · 22 Mar 2021 at 13:10

ltron said:
Have you checked the event viewer to see if any BSODs are logged?

Yes, unfortunately nothing but "event id 41 task 63" is logged on reboot. This is logged when the computer is starting back up, so does not contain anything useful about why it rebooted

I also tried logging the sensor data using GPU-Z, and did mange to have it running when a crash occurred. However nothing jumped out at me as being out-of-line just before the crash.
Power, fans, temps, load etc was all stable for a few minutes before.

Purgatory · 22 Mar 2021 at 13:20

xPETEZx said:
Yes, unfortunately nothing but "event id 41 task 63" is logged on reboot. This is logged when the computer is starting back up, so does not contain anything useful about why it rebooted

I also tried logging the sensor data using GPU-Z, and did mange to have it running when a crash occurred. However nothing jumped out at me as being out-of-line just before the crash.
Power, fans, temps, load etc was all stable for a few minutes before.

read what i said here :-

https://www.overclockers.co.uk/forums/threads/3080-driver-update-black-scree.18923153/

may help.

xPETEZx · 22 Mar 2021 at 13:46

Purgatory said:
read what i said here :-

https://www.overclockers.co.uk/forums/threads/3080-driver-update-black-scree.18923153/

may help.

So to confirm you removed the drivers in safe mode using DDU, then re-installed clean using the DCH version of the latest?

ltron · 22 Mar 2021 at 14:18

xPETEZx said:
Yes, unfortunately nothing but "event id 41 task 63" is logged on reboot. This is logged when the computer is starting back up, so does not contain anything useful about why it rebooted

I also tried logging the sensor data using GPU-Z, and did mange to have it running when a crash occurred. However nothing jumped out at me as being out-of-line just before the crash.
Power, fans, temps, load etc was all stable for a few minutes before.

Does GPU-Z log GPU voltages like HWInfo? Those were all stable? What about using HWInfo to monitor PSU voltages? The RM750i allows you to monitor the PSU in exquisite detail if you connect the PSU to an internal USB 2.0 header using the Corsair Link cable and HWInfo will show you those stats.

Sorry if I missed it but check that you are using two separate PCI-E power cables for graphics card.

Purgatory · 22 Mar 2021 at 14:25

xPETEZx said:
So to confirm you removed the drivers in safe mode using DDU, then re-installed clean using the DCH version of the latest?

DDU in safe mode then install the Standard driver NOT the DCH. Have internet unplugged when doing it, but download drivers first of course then install the drivers after DDU.

Alan1969 · 22 Mar 2021 at 16:23

just to confirm - is your Corsair RM750i brand new?

I would try your 10 yr old psu. Recently I had a gold EVGA with a 5 or 7 yr warranty that failed after 2 to 3 yrs - I started to get black screens/reboots randomly even when the computer was not stressed (ie. in windows). RMA confirmed that it was the fault - whilst waiting bought a 850W gold as a replacement (for my red devil rx5700XT). GPU's nowadays do seem to have large power spikes and are increasingly wanting more and more power (and cooling, and getting physically larger etc).

xPETEZx · 22 Mar 2021 at 17:36

ltron said:
Does GPU-Z log GPU voltages like HWInfo? Those were all stable? What about using HWInfo to monitor PSU voltages? The RM750i allows you to monitor the PSU in exquisite detail if you connect the PSU to an internal USB 2.0 header using the Corsair Link cable and HWInfo will show you those stats.

Sorry if I missed it but check that you are using two separate PCI-E power cables for graphics card.

Yes GPU-Z logs voltages and the likes for the GPU. It doesnt log the stuff from the 750i, but I have that connected and collect the info in HWInfo. (Its how I know what highest loads I have seen are etc, and that the PSU seems stable)

I am using the Corsair 12-pin cable. It connects to 2x ports on the PSU.

Purgatory said:
DDU in safe mode then install the Standard driver NOT the DCH. Have internet unplugged when doing it, but download drivers first of course then install the drivers after DDU.

Ok, will try this next. Whats the DDU driver for? And why to avoid it?

Alan1969 said:
just to confirm - is your Corsair RM750i brand new?

I would try your 10 yr old psu. Recently I had a gold EVGA with a 5 or 7 yr warranty that failed after 2 to 3 yrs - I started to get black screens/reboots randomly even when the computer was not stressed (ie. in windows). RMA confirmed that it was the fault - whilst waiting bought a 850W gold as a replacement (for my red devil rx5700XT). GPU's nowadays do seem to have large power spikes and are increasingly wanting more and more power (and cooling, and getting physically larger etc).

Yes the RM750i is brand new. All parts in the this system where purchased new between Nov 20 and Feb 21. I didnt move anything over from my last build.

I am just worried about using such an old PSU, that hasnt been used for a not while with my brand new gear. Where it has no warranty itself either, I dont want to have issues if it where to damage any of my new stuff :|

Further testing:
Today I used the system all day just browsing etc, then played some Hitman 2. No issues. I put the system to sleep.
When I came back later, it did the black screen reboot as soon as I got to the login screen, but before I could actually login. First time for that...

I then used a spare SSD (non-nVME) to install a fresh build of Windows 10 20H2. On that install I just put latest drivers, OCCT, HWInfo and 3DMark.

I ran the PSU test, which loaded GPU & CPU to 100%. Ran perfectly fine with max power draw being 490W.
After 30 minutes, I decided to end the test... when I did that it stopped and then black-screen rebooted...

It seems to happen far more when load CHANGES than under sustained load.
I then rolled that fresh install of Windows 10 back to 461.72 driver. Tried the PSU test again.... this time I couldnt get it to black-screen. Even after starting / stopping the test a few times...

I did realise I was using the 461.92 DDU driver and re-installed the 461.92 standard driver. I must have accidentally downloaded the wrong one?

Thanks all for suggestions so far. Very frustrating to have this kind of issue 1.5 months into a new build

My last system went 6years without any major headaches!

stooeh · 22 Mar 2021 at 19:22

What BIOS ver are you running ? I was getting some intermittent black screen reboots which I thought was maybe my RAM or even the games themselves but after updating to a newer beta BIOS (151 > 153 I think) for the X570 tomahawk the issues went away. Havent tried the newer official BIOS 1.5 or beta 1.6 yet since seems good atm.

dmfg · 22 Mar 2021 at 19:42

Did you definitely remove all your CPU undervolt settings? I've been messing around with CO undervolting and all the articles say that you get instability at low load/near idle long before you get any problems under high load. Especially things like going from high to low load and vice versa, which sounds a bit like what you're describing.

ltron · 22 Mar 2021 at 20:55

xPETEZx said:
Yes GPU-Z logs voltages and the likes for the GPU. It doesnt log the stuff from the 750i, but I have that connected and collect the info in HWInfo. (Its how I know what highest loads I have seen are etc, and that the PSU seems stable)

I am using the Corsair 12-pin cable. It connects to 2x ports on the PSU.

Ok, will try this next. Whats the DDU driver for? And why to avoid it?

Yes the RM750i is brand new. All parts in the this system where purchased new between Nov 20 and Feb 21. I didnt move anything over from my last build.

I am just worried about using such an old PSU, that hasnt been used for a not while with my brand new gear. Where it has no warranty itself either, I dont want to have issues if it where to damage any of my new stuff :|

Further testing:
Today I used the system all day just browsing etc, then played some Hitman 2. No issues. I put the system to sleep.
When I came back later, it did the black screen reboot as soon as I got to the login screen, but before I could actually login. First time for that...

I then used a spare SSD (non-nVME) to install a fresh build of Windows 10 20H2. On that install I just put latest drivers, OCCT, HWInfo and 3DMark.

I ran the PSU test, which loaded GPU & CPU to 100%. Ran perfectly fine with max power draw being 490W.
After 30 minutes, I decided to end the test... when I did that it stopped and then black-screen rebooted...

It seems to happen far more when load CHANGES than under sustained load.
I then rolled that fresh install of Windows 10 back to 461.72 driver. Tried the PSU test again.... this time I couldnt get it to black-screen. Even after starting / stopping the test a few times...

I did realise I was using the 461.92 DDU driver and re-installed the 461.92 standard driver. I must have accidentally downloaded the wrong one?

Thanks all for suggestions so far. Very frustrating to have this kind of issue 1.5 months into a new build My last system went 6years without any major headaches!

I had similar problems with a faulty X99 motherboard but mainly freezes, not reboots, at idle or not under 100% load. Disabling C-States (which kick in at lower loads) in the BIOS would 99% work around the problem. You definitely tested the CPU at stock without the undervolt (using the PC normally and the problem still occurred)? This is because if it's the case that the undervolt lowers the voltage across the entire voltage curve then a stress test would not pick up problems that occur at lower loads as it only tests at 100% load.