• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

RTX 3090 FE HELP! GPU overheating.

Associate
Joined
3 Oct 2020
Posts
29
Pre-amble: My 3090 FE has been running perfectly up until Easter Sunday. I have done no OC on it, left everything at stock setting since I bought it on release day. Average Temps used to be 75°C maybe 82°C on really hot summer days (UK summer averages around 25-30°C), under heavy gaming load. I played Dragons Dogma 2 that sunday with DLSS 'balanced' at 4k and all other settings cranked to max, no issues. Solid 80+fps for those who cares.

Actual Issue: On Easter monday the fans on the GPU started to go nuts as soon as I started up a game. Quick look at hardware monitor showed that the temps were hitting 85°C during gaming. Right now UK is in Springtime, with ambient temps averaging 10-15°C currently. This is happening with pretty much all games I tried (Dragons Dogma 2, Jedi Survivor, BG3, Fortnite). The PC was cleaned back in December, but I gave it another thorough cleaning with the compressor. I tried turning all the settings down in games and even running them in windowed mode at 1080p resolution. Still the same thing. Infact the temps climbed up to 88°C at one point. All the while the fans at stock curve settings are ramping up to 2200+RPM and sound like jet engines. I downloaded MSI afterburner for the first time and undervolted the card. Still no change.

EDIT: I rolled back the windows OS updates and Nvidia driver versions before the next couple of steps. Needless to say those steps didn't work.

Since it's been over three and half years Nvidia's warranty has expired, I repadded and repasted the GPU. There was the thinnest layer of dust on the board so I cleaned that and the fins+fans out (since I had it all apart anyway, it made sense). There was no change. Temps didn't get worse, they didn't get any better. Now the PC case itself didn't feel that much hotter under load so I borrowed a thermal imaging camera to get an idea of what was happening. I've linked the google drive that has screencaps of hardware monitored compared to the thermal images I took.
https://drive.google.com/drive/folders/1ooHTvUY-GTGxWdBb9xXxxgD2F1n2M1xF?usp=sharing

NOTE 1: On the thermal images, the green crosshair reading is top left. Red crosshair is maximum temp reading (on the image somewhere), shown at the bottom left.
NOTE 2: My case is inverted. Just incase the photo's throw people off. And yes I did wonder if the upside down layout could have damaged the fan bearings but surely that's a moot point since even right way up one of the fans are facing the bottom anyway?


Basically what I'm seeing on the thermal camera is that there is a drastic temperature difference between what the hardware monitor is reading and what the camera is showing me (under load). Like GPU temp is 86°C but hottest point reading on thermal is 61°C.
When idling the two readings are fairly similar.

I'm hoping someone more knowledgable can shed some light. Is it a faulty temp sensor? Did the fan bearings get damaged overnight? Did a solder fail somewhere on the board?

The suddeness with how this has happened is worrying. If these are the early signs of an inevitable GPU death then I need to start making plans for what to do if it fails and for an unplanned system upgrade for the 50 series (first world problem I know).

Thank you to anyone who's read up to here.
 
Last edited:
Associate
OP
Joined
3 Oct 2020
Posts
29
Does anything change in you try an undervolt curve in Afterburner, so bring the card down to 900-925mV? Try it if you have not, see if it remains hot and loud, check the power draw as well if you can.
No it doesn't. It's currently undervolted to 900mV and there's been no difference. I haven't paid attention to the power draw before the issues but right now I've seen that it consistently draws 330W+. Not sure if that's higher than usual.
 
Man of Honour
Joined
22 Jun 2006
Posts
11,886
Basically what I'm seeing on the thermal camera is that there is a drastic temperature difference between what the hardware monitor is reading and what the camera is showing me (under load). Like GPU temp is 86°C but hottest point reading on thermal is 61°C.
When idling the two readings are fairly similar.
Depending on where the temperature sensor is located, that's not unusual, because you can't see the hottest part of the die and the heatsink and fan assembly should cool what you can see.

Since it's been over three and half years Nvidia's warranty has expired, I repadded and repasted the GPU. There was the thinnest layer of dust on the board so I cleaned that and the fins+fans out (since I had it all apart anyway, it made sense). There was no change. Temps didn't get worse, they didn't get any better.
You definitely used pads that were the same size, right? There can be big temperature problems if you're out even a mm, due to poor contact.

Average Temps used to be 75°C maybe 82°C on really hot summer days (UK summer averages around 25-30°C), under heavy gaming load.

This is happening with pretty much all games I tried (Dragons Dogma 2, Jedi Survivor, BG3, Fortnite).
Are you sure that the games you used to play are similarly demanding and you were using the same settings?

NOTE 2: My case is inverted. Just incase the photo's throw people off. And yes I did wonder if the upside down layout could have damaged the fan bearings but surely that's a moot point since even right way up one of the fans are facing the bottom anyway?
Fan bearings can deteriorate faster due to orientation, but running upside down shouldn't be an issue and it doesn't sound like your bearings are the problem. Heat pipes and vapour chambers can also be affected, though if your case always ran this way up then I would assume it doesn't matter.

Do all the fans in your case still work, you haven't enabled some kind of quiet/silent mode?
 
Last edited:
Associate
OP
Joined
3 Oct 2020
Posts
29
Thanks for replying in detail :)

Depending on where the temperature sensor is located, that's not unusual, because you can't see the hottest part of the die and the heatsink and fan assembly should cool what you can see.
Yeah that makes sense. It just the delta seemed quite high (more than what I was expecting) and I've never been too attentive to GPU designs and thermal emissions.

You definitely used pads that were the same size, right? There can be big temperature problems if you're out even a mm, due to poor contact.
I used pads that are 1.5mm thicker and slightly modified layout (adding pads to places that originally didn't have pads). Originals pads were 1mm. The re-padding guides I saw all mentioned that .5mm extra gives a better contact and the that FE's pad layout didn't cover some memory modules that benefit from it. My temps didn't get worse since re-padding. They just stayed the same.

Are you sure that the games you used to play are similarly demanding and you were using the same settings?
Fortnite is the one constant measuring stick. Since getting the 3090 the graphics settings have always stayed the same, and there was never any loud fan noise or temp spikes in the past three and half years. It is there now whenever I play fortnite now, even on lowest setting, 1080p windowed and framecapped to 30fps.
I also have been playing Dragons Dogma 2 just fine a week prior to the issues(?) commencing.

Fan bearings can deteriorate faster due to orientation, but running upside down shouldn't be an issue and it doesn't sound like your bearings are the problem. Heat pipes and vapour chambers can also be affected, though if your case always ran this way up then I would assume it doesn't matter.
Yes interesting, vapour chambers+heat pipes makes sense and I didn't think about that at all. Though I don't know how to diagnose them to be faulty so I can't rule them out for sure.

Do all the fans in your case still work, you haven't enabled some kind of quiet/silent mode?
Yes they still do. I have five (I know that quantity is at the limit of diminishing returns). They were the first things I checked when I heard the fans ramp up. They were on an adjusted fan curves to run slightly quieter, which I turned up to turbo mode to see if the extra air flow would affect the GPU temps, but they didn't. I have two for intakes and three for outtake.
 
Soldato
Joined
20 Sep 2006
Posts
2,804
Location
Hampshire
I'd say that there is potentially something wrong with the heatpipes or vapor champer. From what I can see, your images show that only part of your heatsink is getting hot, whilst the other half stays pretty cool. You've also re-done thermal paste and pads.

Would it be possible to find a broken 3090 or just a 3090 fe cooler (without the PCB) to put on it and compare?
 
Last edited:
Man of Honour
Joined
22 Jun 2006
Posts
11,886
Yes interesting, vapour chambers+heat pipes makes sense and I didn't think about that at all. Though I don't know how to diagnose them to be faulty so I can't rule them out for sure.
You could try laying the case on the side and see if it makes any difference, though wouldn't identify if they were faulty, just if the orientation matters.
 
Soldato
Joined
20 Sep 2006
Posts
2,804
Location
Hampshire
You could try laying the case on the side and see if it makes any difference, though wouldn't identify if they were faulty, just if the orientation matters.
I was also going to suggest trying a different orientation for the case. Some GPUs have way better thermals when oriented differently due to the way the heatpipes are facing. Might be worth a shot.
 
Associate
OP
Joined
3 Oct 2020
Posts
29
You could try laying the case on the side and see if it makes any difference, though wouldn't identify if they were faulty, just if the orientation matters.
I was also going to suggest trying a different orientation for the case. Some GPUs have way better thermals when oriented differently due to the way the heatpipes are facing. Might be worth a shot.
Thank you for the suggestions. I turned the case upside down and gave that a try. I don't think it made an appreciable difference. Everything (bar hotspot temps which got hotter at 103.8°C) was 1°C cooler. But that could just as well be due to me not playing something long enough for it to build up enough heat.

Screen cap and thermal captures are uploaded in a new folder named "case upside down".
https://drive.google.com/drive/folders/1ooHTvUY-GTGxWdBb9xXxxgD2F1n2M1xF?usp=sharing

I'll see how defeated I feel after dinner and decide whether to run something with the case on it's side.

Would it be possible to find a broken 3090 or just a 3090 fe cooler (without the PCB) to put on it and compare?
Perhaps, I have a colleauge who has a 3090 FE so I guess I could invade his place and see what his card does under load. Though he's more into VR and flight sims I'd expect very similar GPU behaviour. Or at least similar to mine under normal circumstances.
 
Associate
Joined
31 Jan 2012
Posts
2,004
Location
Droitwich, UK
Are the clocks throttling? I'm not sure what the limits are on the 3090 but if it's reaching the max allowed then the clock behaviour could help confirm whether it's a sensor issue or not at least.

Edit: on second thoughts if the sensor is faulty then the card would behave accordingly I imagine, so not a very helpful suggestion.
 
Last edited:
Associate
OP
Joined
3 Oct 2020
Posts
29
Are the clocks throttling? I'm not sure what the limits are on the 3090 but if it's reaching the max allowed then the clock behaviour could help confirm whether it's a sensor issue or not at least.

Edit: on second thoughts if the sensor is faulty then the card would behave accordingly I imagine, so not a very helpful suggestion.
I've not noticed any gameplay signs (like the game hitching or stuttering). But since I've never experienced throttling myself I don't really know what to look out for in a game.

In terms of the clockspeed I've not paid much attention to it. But GPU-Z (the googledrive has screen caps) does show performance throttling is happening due to thermal throttling (under load).

Second thoughts or not I appreciate the input. I'm the 'I can put it together' kind of guy, sadly not a 'I can fix it' kind :cry:
 
Soldato
Joined
27 Mar 2009
Posts
3,306
Have you tried running it with the side of the case off entirely? The temps all look normalish apart from the fan speeds needed to achieve them. Its hard to tell but your CPU looks on the hot side to. On that GPU-Z screen shot its showing 60Deg and looks to be a good bit lower than the rest of the graph so you might be hitting 90degrees plus on that.

Did you ever change the thermal pads/paste before this happened or only after to try and fix it?
 
Soldato
Joined
27 Mar 2009
Posts
3,306
Another thing on that afterburner skin how do you control the temp limit/powerlimit priority? On the cyborg skin I use there is a little option between the two sliders to prioritise power limit over the 83deg limit and this is default. Yours seems to be trying its hardest to keep 83deg like its been switched round
 
Associate
Joined
2 Sep 2016
Posts
919
I've not noticed any gameplay signs (like the game hitching or stuttering). But since I've never experienced throttling myself I don't really know what to look out for in a game.
You might not be stuttering but your hwmonitor screen is indicating your thermal throttling, might be worth while trying ptm7950 just slapped it on my 3070 FE was fed up of repasting due to pump out and my card was still thermal throttling with fresh paste despite my gpu and hotspot temp being under the limit :confused:, doesn't thermal throttle with ptm7950 temps are improved, pretty sure my card was cooler when it was newer maybe heat pipes lose liquid overtime and become less effective?
 
Associate
OP
Joined
3 Oct 2020
Posts
29
Have you tried running it with the side of the case off entirely? The temps all look normalish apart from the fan speeds needed to achieve them. Its hard to tell but your CPU looks on the hot side to. On that GPU-Z screen shot its showing 60Deg and looks to be a good bit lower than the rest of the graph so you might be hitting 90degrees plus on that.

Did you ever change the thermal pads/paste before this happened or only after to try and fix it?
I did do a couple of tests with the side panel off (before I borrowed the thermal camera) and didn't see any difference in temps.

And no, the thermal pads/paste has been whatever it shipped with until last Wednesday.


Another thing on that afterburner skin how do you control the temp limit/powerlimit priority? On the cyborg skin I use there is a little option between the two sliders to prioritise power limit over the 83deg limit and this is default. Yours seems to be trying its hardest to keep 83deg like its been switched round
I didn't even know I could set a temp limit O_O. Though thinking about it now it make sense. I only downloaded afterburner to try to diagnose the issue and then tried to undervolt the card to reduce high temps (and adjust GPU fan curves too). I'll try setting a temp limit and see how I get on, thank you!

The clock speed (1850ish if I was looking at the right screenshot?) looks pretty normal from what I know of the FE (somewhat regret selling the one I snagged at MSRP a couple of years ago).
Good to know that clock speed is operating around a normal range. I think you probably were look at the correct one, but I had three different hardware monitors running so I'm not entirely sure :cry:
 
Associate
OP
Joined
3 Oct 2020
Posts
29
You might not be stuttering but your hwmonitor screen is indicating your thermal throttling, might be worth while trying ptm7950 just slapped it on my 3070 FE was fed up of repasting due to pump out and my card was still thermal throttling with fresh paste despite my gpu and hotspot temp being under the limit :confused:, doesn't thermal throttle with ptm7950 temps are improved, pretty sure my card was cooler when it was newer maybe heat pipes lose liquid overtime and become less effective?
Interesting. Again, not an expert on GPU cooling system but I would have expected some liquid loss in the vapor chamber overtime but three years seems rather quick. I didn't see any damage on the pipes myself but that is not to say it couldn't be there.

It is tempting to try ptm7950 given what you are saying. Though my current position is that I don't want take the GPU apart a second time unless I really have to. If somehow through everyone's help I can identify the root cause, then depending on what it is I may want to sell the card. So I'd rather not risk putting marks on the screws or snapping a ribbon cable :eek:
 
Associate
OP
Joined
3 Oct 2020
Posts
29
Apologies to all three for the late night notifications (if you are Europe/UK based)

@Tetras @Skilid I tried turning the case sideways. Again not any appreciable difference on the readings and thermals. Thank you for the suggestions though as I wouldn't have come up with it on my own!

@Finners Thank you for pointing out that I can prioritise temp vs power on afterburner. All my tests following people suggestions from the thread were done without forcing user presets, but just to be sure I uninstalled msi and restarted before taking a base test. GPU-Z and CPUID both gave me same base results (85°C, crazy fans the lot)
Then I reinstalled afterburner, set priority to temperature and experimented with temp limits.

Had to bring temperature limit down to 70°C, which also brought the power limit down to 48%. Only then did the GPU fans started to sound like they normally do (around 1500RPM). All other temperatures started to look normal too.

Though I still don't know what the root cause is, at least this will allow me to continue gaming. 40-60FPS may not be pushing the card but it's reliable and plenty good for the types of games I play.

Temps and thermals for any who're interested ("Temp Restricted" folder)
https://drive.google.com/drive/folders/1ooHTvUY-GTGxWdBb9xXxxgD2F1n2M1xF?usp=sharing

I'm still going to try and figure this out for about another week. So any input or questions is very welcome. Current plan is that if I can't identify root cause and/or fix the issue, I'll sell the card and get 3080/3070 with the money. See myself through until the next gen GPU's are released.
 
Back
Top Bottom