AMD Ryzen 7 7800X3D CPU Burns Up

Grim5 · 24 Apr 2023 at 23:27

Derbauer just posted a video about this issue and he got a statement from Asus

Asus confirms there is an issue related to SoC and VddSoC voltages and they are working with AMD - interestingly Asus is blaming AMD's Expo RAM profiles, implying it's the Expo ram timing profiles causing overvolting

Quartz · 25 Apr 2023 at 00:44

Derbauer's video cast doubt on that theory: he tried pumping the voltage to no ill effect.

blower · 25 Apr 2023 at 01:03

Quartz said:
Derbauer's video cast doubt on that theory: he tried pumping the voltage to no ill effect.

A 30 minute test doesn't really confirm anything - It may die in an hour or it could take days even at that voltage. Most people who had failures has there system running for weeks before it died, seems whatever is the cause - it's a slow burn problem (pun intended). As the SOC power draw is pretty flat, continuous - regardless of load, this would imply time is more a factor.

One thing that his video does look to confirm, is what I suspected - this seems to be a systemic issue that affects all motherboards and all 7000 series CPU's. It look's to be worse with X3D chips and ASUS motherboards - but various user reports show the burnout death can happen on any combination of AM5 CPU's and motherboard.

gooseuk · 25 Apr 2023 at 07:10

I have ran EXPO 1 since ive had this board which is november, and i have used two different ram kits. I have often changed bios soon as ive seen them and no issues.
Intresting stuff as always with new kit ha

Minstadave · 25 Apr 2023 at 07:16

Quartz said:
Derbauer's video cast doubt on that theory: he tried pumping the voltage to no ill effect.

Did he even put any load on them whilst the voltage was high? Didn't look like it.

DavidTJ · 25 Apr 2023 at 07:21

Sigh, first build in a long time, hope this is a small QC issue rather than something bigger. When I posted a the other day it seemed like a very isolated incident but based on how this thread has grown obviously not.

Bencher · 25 Apr 2023 at 07:25

blower said:
I'm sure it's not all CPU's its going to cause an issue on - even when the cause of failure is found, if it's SoC voltage or VddCR or a combination.....the effects will vary from one CPU to the next. Maybe the majority can handle 1.35v on SoC or higher VddCr voltages, but as always with the silicon lottery you will have some that are less tolerant and eventually let the blue smoke out as a FET shorts etc.

All because yours is fine doesn't mean they're all ok and this isn't the issue (or part of the issue).

I still can't believe that 1.35v on SOC would cause the socket to burn. That's unfathomable to me. The CPU degrading with that voltage? Sure, no problem with that (although that's still too low of a voltage but whatever) but the actual socket? How can that ever be possible? I've drove 400 watts with 1.64 volts on my 12900k, the socket is just fine. Im pretty sure there is something else at stake, some protections did not kick in at all, high voltage can only degrade your chip slowly, it can't burn it cause there are overtemperature protections there on both the mobo and the CPU. You really need to try hard to physically burn a CPU these days.

delta0 · 25 Apr 2023 at 07:53

blower said:
Der8auer has done a video:

I missed the Damage on this Ryzen 7900X

Support me on Patreon: https://www.patreon.com/der8auer---------------------------------------------------------Music / Credits:Outro:Dylan Sitts feat. HDBee...

www.youtube.com

He contacted ASUS and asked why the older BIOS's were removed, and the response was quite insightful:

The EFI updates posted on Friday contain some dedicated thermal monitoring mechanisms we've implemented to help protect boards and CPUs. We removed older BIOS's for that reason and also because manual Vcore control was available on previous builds. We're also working with AMD on defining new rules for AMD expo and SoC Voltage. We'll issue new updates for that ASAP. Please bear with us.

The part on the Soc voltage is interesting - seems they are beginning to realise that current Expo 'rules' are too excessive when it comes to SoC voltages. The current BIOS's still set it too high in my opinion (based on the sheer power draw increase of SoC IP blocks at such voltages), so it will be interesting to see what is changed in later BIOS releases.

Looks like it is impacting non-X3D chips as well according to that video and a wide range of motherboards. It is strange it is impacting some but not others.

Edit:
Some early observations are it is occurring on Renesas VRM controller motherboards. It may explain why some of us haven’t had issues (my X670e-f uses Digi+) but others have.

shalke · 25 Apr 2023 at 08:55

Seems all media news outlets are exploding on this news.

Quartz · 25 Apr 2023 at 09:45

blower said:
A 30 minute test doesn't really confirm anything

True, but while it doesn't disprove the theory, it does cast doubt on it. Time will tell.

Addicted · 25 Apr 2023 at 09:54

delta0 said:
Looks like it is impacting non-X3D chips as well according to that video and a wide range of motherboards. It is strange it is impacting some but not others.

Edit:
Some early observations are it is occurring on Renesas VRM controller motherboards. It may explain why some of us haven’t had issues (my X670e-f uses Digi+) but others have.

Is there a list of what motherboards use what? I had a quick google but couldn't find info on my Asus Tuf X670E-Plus Wifi.

I'm sure they'll figure it out, it's widespread enough now for them to diagnose what's going on (and also not ignore it). I'm interested to see what Gamers Nexus concludes with it too.

delta0 · 25 Apr 2023 at 09:57

Addicted said:
Is there a list of what motherboards use what? I had a quick google but couldn't find info on my Asus Tuf X670E-Plus Wifi.

I'm sure they'll figure it out, it's widespread enough now for them to diagnose what's going on (and also not ignore it). I'm interested to see what Gamers Nexus concludes with it too.

I think you are Digi+ as well https://tweakers.net/reviews/10700/...r-je-ryzen-7000-cpu-vrm-componentanalyse.html

I think ASUS switch to Renesas on the X670e-e and upwards.

This is all speculation for now. GN’s views on this will be really welcome.

Jamin280672 · 25 Apr 2023 at 10:41

Whatever the cause, if Asus are in touch with AMD over these issues, we are going to see a lot of bios updates, before Asus removed all of the old bioses, they were releasing new bioses every 10 days.

Asus are replacing burnt out boards, but amd are not replacing the CPUs under warranty, they are blaming the end user for using expo, but at the end of the day, that's a cop out, it's still their fault.

I've got to get my water block off this weekend, I'm going to have a look at the socket and chip, and if there is any signs of bulging, it's going straight back to the shop I brought it from for a replacement.

delta0 · 25 Apr 2023 at 10:51

Jamin280672 said:
Whatever the cause, if Asus are in touch with AMD over these issues, we are going to see a lot of bios updates, before Asus removed all of the old bioses, they were releasing new bioses every 10 days.

Asus are replacing burnt out boards, but amd are not replacing the CPUs under warranty, they are blaming the end user for using expo, but at the end of the day, that's a cop out, it's still their fault.

I've got to get my water block off this weekend, I'm going to have a look at the socket and chip, and if there is any signs of bulging, it's going straight back to the shop I brought it from for a replacement.

It’s MSI and Gigabyte as well.

rare · 25 Apr 2023 at 10:51

Seems like the board manufacturers shouldn’t have been quite so loose with their SoC voltages. I’m sure AMD have quite strict tolerances and perhaps the board vendors chose to go around that in an effort to support higher ram speeds.

Using the excuse that EXPO invalidates the warranty is a cop out. It’s a single click in the BIOS, with no warnings when applied. How is anyone supposed to know it invalidates the CPU warranty?

stephenb · 25 Apr 2023 at 11:44

AMD Ryzen 7000 Burning Out: EXPO and SoC Voltages to Blame (AMD Responds)

Impacts all motherboard makers and all Ryzen 7000 chips.

www.tomshardware.com

Name · 25 Apr 2023 at 13:13

All these articles say msi has released new bios revisions but I can't see anything on their website. Majority of B650 updates are from 2 weeks ago.

blower · 25 Apr 2023 at 13:15

stephenb said:
AMD Ryzen 7000 Burning Out: EXPO and SoC Voltages to Blame (AMD Responds)

Impacts all motherboard makers and all Ryzen 7000 chips.

www.tomshardware.com

This is the most interesting part of that article:

Our sources also added further details about the nature of the chip failures — in some cases, excessive SoC voltages destroy the chips' thermal sensors and thermal protection mechanisms, completely disabling its only means of detecting and protecting itself from overheating. As a result, the chip continues to operate without knowing its temperature or tripping the thermal protections.

AMD's modern chips often run at their thermal limits to squeeze out every last drop of performance within their safe thermal range — it isn't uncommon for them to run at 95C during normal operation — so they will automatically continue to draw more power until it dials back to remain within a safe temperature. In this case, the lack of temperature sensors and protection mechanisms allows the chip to receive more power beyond the recommended safe limits. This excessive power draw leads to overheating that eventually causes physical damage to the chip

It is a theory still but one that fits very nicely, and explains why CCD cores are burning out on most chips and not the IO chip (which fails far less frequently).
The fact it is nearly always a CCD core that's burnt out (based on where the socket and substrate damage was in photos), has bugged me on the SoC voltage theory - leading me to believe there have been 2 modes of failure (one SoC causing IO chiplet failure and some other unknown cause of failure)

However if the SoC overvoltage is killing the monitoring systems and based on how the chip ramps up power until it hits limiters - then this would explain the catastrophic type of failures we have been seeing across the CCD cores.

Well it all fits....I smelt a rat before anything started coming out on SoC. When I saw just how much the SoC power draw increased when going from stock 1.05 to 1.35v EXPO - it was alarming, and based on how Zen 3 had issues I was doubtful that 1.35-1.4v was a safe voltage to be running SoC at......as the old saying goes - there's no smoke without fire!

Jamin280672 · 25 Apr 2023 at 13:39

blower said:
This is the most interesting part of that article:

Our sources also added further details about the nature of the chip failures — in some cases, excessive SoC voltages destroy the chips' thermal sensors and thermal protection mechanisms, completely disabling its only means of detecting and protecting itself from overheating. As a result, the chip continues to operate without knowing its temperature or tripping the thermal protections.

AMD's modern chips often run at their thermal limits to squeeze out every last drop of performance within their safe thermal range — it isn't uncommon for them to run at 95C during normal operation — so they will automatically continue to draw more power until it dials back to remain within a safe temperature. In this case, the lack of temperature sensors and protection mechanisms allows the chip to receive more power beyond the recommended safe limits. This excessive power draw leads to overheating that eventually causes physical damage to the chip

Thankfully, this is what Asus has improved in the latest bios (1202) the protection mechanisms

Addicted · 25 Apr 2023 at 13:41

Jamin280672 said:
Thankfully, this is what Asus has improved in the latest bios (1202) the protection mechanisms

What voltage is your VDDR_SOC at on the latest BIOS out of interest?