• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

AMD Ryzen 7 7800X3D CPU Burns Up

Derbauer just posted a video about this issue and he got a statement from Asus

Asus confirms there is an issue related to SoC and VddSoC voltages and they are working with AMD - interestingly Asus is blaming AMD's Expo RAM profiles, implying it's the Expo ram timing profiles causing overvolting

 
Derbauer's video cast doubt on that theory: he tried pumping the voltage to no ill effect.

A 30 minute test doesn't really confirm anything - It may die in an hour or it could take days even at that voltage. Most people who had failures has there system running for weeks before it died, seems whatever is the cause - it's a slow burn problem (pun intended). As the SOC power draw is pretty flat, continuous - regardless of load, this would imply time is more a factor.

One thing that his video does look to confirm, is what I suspected - this seems to be a systemic issue that affects all motherboards and all 7000 series CPU's. It look's to be worse with X3D chips and ASUS motherboards - but various user reports show the burnout death can happen on any combination of AM5 CPU's and motherboard.
 
Last edited:
I have ran EXPO 1 since ive had this board which is november, and i have used two different ram kits. I have often changed bios soon as ive seen them and no issues.
Intresting stuff as always with new kit ha
 
Sigh, first build in a long time, hope this is a small QC issue rather than something bigger. When I posted a the other day it seemed like a very isolated incident but based on how this thread has grown obviously not.
 
I'm sure it's not all CPU's its going to cause an issue on - even when the cause of failure is found, if it's SoC voltage or VddCR or a combination.....the effects will vary from one CPU to the next. Maybe the majority can handle 1.35v on SoC or higher VddCr voltages, but as always with the silicon lottery you will have some that are less tolerant and eventually let the blue smoke out as a FET shorts etc.

All because yours is fine doesn't mean they're all ok and this isn't the issue (or part of the issue).
I still can't believe that 1.35v on SOC would cause the socket to burn. That's unfathomable to me. The CPU degrading with that voltage? Sure, no problem with that (although that's still too low of a voltage but whatever) but the actual socket? How can that ever be possible? I've drove 400 watts with 1.64 volts on my 12900k, the socket is just fine. Im pretty sure there is something else at stake, some protections did not kick in at all, high voltage can only degrade your chip slowly, it can't burn it cause there are overtemperature protections there on both the mobo and the CPU. You really need to try hard to physically burn a CPU these days.
 
Der8auer has done a video:


He contacted ASUS and asked why the older BIOS's were removed, and the response was quite insightful:

The EFI updates posted on Friday contain some dedicated thermal monitoring mechanisms we've implemented to help protect boards and CPUs. We removed older BIOS's for that reason and also because manual Vcore control was available on previous builds. We're also working with AMD on defining new rules for AMD expo and SoC Voltage. We'll issue new updates for that ASAP. Please bear with us.

The part on the Soc voltage is interesting - seems they are beginning to realise that current Expo 'rules' are too excessive when it comes to SoC voltages. The current BIOS's still set it too high in my opinion (based on the sheer power draw increase of SoC IP blocks at such voltages), so it will be interesting to see what is changed in later BIOS releases.
Looks like it is impacting non-X3D chips as well according to that video and a wide range of motherboards. It is strange it is impacting some but not others.

Edit:
Some early observations are it is occurring on Renesas VRM controller motherboards. It may explain why some of us haven’t had issues (my X670e-f uses Digi+) but others have.
 
Last edited:
Looks like it is impacting non-X3D chips as well according to that video and a wide range of motherboards. It is strange it is impacting some but not others.

Edit:
Some early observations are it is occurring on Renesas VRM controller motherboards. It may explain why some of us haven’t had issues (my X670e-f uses Digi+) but others have.
Is there a list of what motherboards use what? I had a quick google but couldn't find info on my Asus Tuf X670E-Plus Wifi.


I'm sure they'll figure it out, it's widespread enough now for them to diagnose what's going on (and also not ignore it). I'm interested to see what Gamers Nexus concludes with it too.
 
Is there a list of what motherboards use what? I had a quick google but couldn't find info on my Asus Tuf X670E-Plus Wifi.


I'm sure they'll figure it out, it's widespread enough now for them to diagnose what's going on (and also not ignore it). I'm interested to see what Gamers Nexus concludes with it too.
I think you are Digi+ as well https://tweakers.net/reviews/10700/...r-je-ryzen-7000-cpu-vrm-componentanalyse.html

I think ASUS switch to Renesas on the X670e-e and upwards.

This is all speculation for now. GN’s views on this will be really welcome.
 
Last edited:
Whatever the cause, if Asus are in touch with AMD over these issues, we are going to see a lot of bios updates, before Asus removed all of the old bioses, they were releasing new bioses every 10 days.

Asus are replacing burnt out boards, but amd are not replacing the CPUs under warranty, they are blaming the end user for using expo, but at the end of the day, that's a cop out, it's still their fault.

I've got to get my water block off this weekend, I'm going to have a look at the socket and chip, and if there is any signs of bulging, it's going straight back to the shop I brought it from for a replacement.
 
Last edited:
Whatever the cause, if Asus are in touch with AMD over these issues, we are going to see a lot of bios updates, before Asus removed all of the old bioses, they were releasing new bioses every 10 days.

Asus are replacing burnt out boards, but amd are not replacing the CPUs under warranty, they are blaming the end user for using expo, but at the end of the day, that's a cop out, it's still their fault.

I've got to get my water block off this weekend, I'm going to have a look at the socket and chip, and if there is any signs of bulging, it's going straight back to the shop I brought it from for a replacement.
It’s MSI and Gigabyte as well.
 
Seems like the board manufacturers shouldn’t have been quite so loose with their SoC voltages. I’m sure AMD have quite strict tolerances and perhaps the board vendors chose to go around that in an effort to support higher ram speeds.

Using the excuse that EXPO invalidates the warranty is a cop out. It’s a single click in the BIOS, with no warnings when applied. How is anyone supposed to know it invalidates the CPU warranty?
 

This is the most interesting part of that article:

Our sources also added further details about the nature of the chip failures — in some cases, excessive SoC voltages destroy the chips' thermal sensors and thermal protection mechanisms, completely disabling its only means of detecting and protecting itself from overheating. As a result, the chip continues to operate without knowing its temperature or tripping the thermal protections.

AMD's modern chips often run at their thermal limits to squeeze out every last drop of performance within their safe thermal range — it isn't uncommon for them to run at 95C during normal operation — so they will automatically continue to draw more power until it dials back to remain within a safe temperature. In this case, the lack of temperature sensors and protection mechanisms allows the chip to receive more power beyond the recommended safe limits. This excessive power draw leads to overheating that eventually causes physical damage to the chip


It is a theory still but one that fits very nicely, and explains why CCD cores are burning out on most chips and not the IO chip (which fails far less frequently).
The fact it is nearly always a CCD core that's burnt out (based on where the socket and substrate damage was in photos), has bugged me on the SoC voltage theory - leading me to believe there have been 2 modes of failure (one SoC causing IO chiplet failure and some other unknown cause of failure)

However if the SoC overvoltage is killing the monitoring systems and based on how the chip ramps up power until it hits limiters - then this would explain the catastrophic type of failures we have been seeing across the CCD cores.

Well it all fits....I smelt a rat before anything started coming out on SoC. When I saw just how much the SoC power draw increased when going from stock 1.05 to 1.35v EXPO - it was alarming, and based on how Zen 3 had issues I was doubtful that 1.35-1.4v was a safe voltage to be running SoC at......as the old saying goes - there's no smoke without fire!
 
Last edited:
This is the most interesting part of that article:

Our sources also added further details about the nature of the chip failures — in some cases, excessive SoC voltages destroy the chips' thermal sensors and thermal protection mechanisms, completely disabling its only means of detecting and protecting itself from overheating. As a result, the chip continues to operate without knowing its temperature or tripping the thermal protections.

AMD's modern chips often run at their thermal limits to squeeze out every last drop of performance within their safe thermal range — it isn't uncommon for them to run at 95C during normal operation — so they will automatically continue to draw more power until it dials back to remain within a safe temperature. In this case, the lack of temperature sensors and protection mechanisms allows the chip to receive more power beyond the recommended safe limits. This excessive power draw leads to overheating that eventually causes physical damage to the chip
Thankfully, this is what Asus has improved in the latest bios (1202) the protection mechanisms
 
Back
Top Bottom