• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

AMD Ryzen 7 7800X3D CPU Burns Up

Soldato
OP
Joined
12 Feb 2014
Posts
2,864
Location
Somewhere Only We Know
Yes i have heard this, many are saying its ASUS boards causing it. I have notiched on the website for my board that a lot of the bios versions have been taken down. I wonder if its just for the X3D chips also.

Yes, agreed, they have a new bios for my board available this morning, 1202, all the older bioses have gone too except 1 beta bios, ive got the X670E Gene, seems to help with the code 15 hang a bit from a cold boot, a warm boot it still hangs a bit, but I wonder if it has more to do with cooking chips, the agesa version hasnt changed and the bios notes dont really say anything, its says a TPM 2.0 update, the the TPM version hasnt changed in the bios and also says optimise performance for 7000X7D chips, Im guessing the mean 7000X3D chips lol.

069PO7H.jpg
 
Soldato
OP
Joined
12 Feb 2014
Posts
2,864
Location
Somewhere Only We Know
Why are these things alway with the CPU I've ordered on the motherboard I've ordered...

Couldn't have been a non-X3D chip on an MSI board could it?

I think ASUS have probably fixed it with the latest bios update and the reason they pulled all the others to prevent you from back flashing to any of them.
 
Last edited:
Soldato
OP
Joined
12 Feb 2014
Posts
2,864
Location
Somewhere Only We Know
I went for the 7900 non X, the performance difference between the X version and non X version wasnt much, unless you spend your life and spare time running benchmarks, and the fact its only 65watts compared to the 170watt counter part, well lets just say I pulled away from Intel because they were toasters in disguise, I have my chip set to throttle at 80oC, however it is under water and never seen it go over 75oC in anything I throw at it.

Im thinking about hard setting the CPU voltage, but at the same time I dont want to stop it boosting, it spends most of its time sitting at 5.3ghz, when im not doing anything with it then it sits around 3.5ghz and seen it more than a few times boost to 5.55ghz.
 
Last edited:
Soldato
OP
Joined
12 Feb 2014
Posts
2,864
Location
Somewhere Only We Know
I'm suspecting it is more a systemic issue affecting all motherboards but may be worse on some, it really could just be the LGA socket - after all this is AMD's first venture into LGA with the Ryzen series.
Not really, theyve been doing LGA with Threadripper from the beginning

it would be nice to get hold of a pin out diagram to see roughly what the area being damaged is.
 
Soldato
OP
Joined
12 Feb 2014
Posts
2,864
Location
Somewhere Only We Know
Just got my system up & running today. Core voltages never exceed 1.11v according HWiNfo64

One thing I did notice is the SOC voltage was 1.35v by default - in the past with Zen 2,3 you never ever wanted that to go above 1.2v (and that was high). I've read up to 1.4v is ok on Zen 4.
SOC power was a continuous 20w, even at idle. I dropped it to 1.15v - which dropped it to 13w and hence my package power by 7W under all conditions (idle & full load).

No system instability by dropping it, and i'm running ram at 6200mhz with very tight timings, including sub timings. I suspect they have a higher SOC due to the onchip gfx with Zen 4...since i'm not using that i don't want it continuously burning up extra power unnecessarily.

Edit: SOC voltage is increased from ~1.05v to 1.35v when you enable AMD Expo ram timings, at least on this ASUS mobo. Note this is separate to the RAM voltage increasing to 1.35v for my particular modules (GSkill 2x16gb 6000Mhz 30-38-38-40) when enabling EXPO.

I don't like the idea of EXPO increasing the SOC so much, it's doubling the continuous power draw on the chip from 10w to 20w.
Remember EXPO is a form of overclocking (both the RAM & CPU memory controller) - so i would recommend dropping the SOC voltage down to 1.15v - these were considered safe settings on Zen 3/2.

Also on the ASUS ROG Strix, someone posted their SOC voltage increased to 1.40v when EXPO was enabled. We could be on to something here, possible these SOC voltages are way too high - over doubling SOC on chip power. People using EXPO and certain motherboard bios's setting SOC voltages too high when on EXPO, could be a common link to the CPU failures.

My SOC voltage when first setup was 1.2v, when I enabled expo I and rebooted back to the bios is was 1.35v, so I manually set it to 1.25v as buildzoid recommends, I wouldnt drop it too far as it does have a performance impact, the easiest way to tell is run something simple like cinebench, then test again with it at 1.25v.

You are quite right when you say the limit is 1.4v, however 1.35v is only for ram 6200mhz and upwards, the SOC is completely unlinked from the rest of the CPU in Ryzen 7000 so basically has nothing to do with the core anymore since all these chips basically have a GPU in them, the new Soc voltage is uncore voltage which used to be the 1.2v limit on previous ryzens, it should be around 1.1v on auto in your bios.

Uncore voltage is another voltage all of its own, its not the memory controller as there is a separate option for that in the same list which you'll find is around 1.35-1.39v, then youve got you 2 DRAM voltages (1.35v), the SOC voltage as discussed which I assume is the onboard GPU (ive got the internal GPU on mine disabled in the bios) and then obviously the Core voltage.
 
Last edited:
Soldato
OP
Joined
12 Feb 2014
Posts
2,864
Location
Somewhere Only We Know
Isn’t SOC everything but the CCD anyway?
Nope not anymore, it's all changed in Ryzen 7000, soc is the onboard GPU only, which is really what it's always been, a little extra helped with the imc in previous ryzens, but now it's completely unlinked, imc has its own voltage setting and you now have uncore for everything else, it's all explained in buildzoids video on ram timings.
 
Last edited:
Soldato
OP
Joined
12 Feb 2014
Posts
2,864
Location
Somewhere Only We Know
This just seems to be precautions by the motherboard manufacturers - in effect blaming users for somehow overvolting X3D chips. This doesn't explain why non X3D chips have burnt up in a similar manner, nor the fact it also means that all those on reddit who said they were not overclocking or overvolting - must be lying, and i really don't think they all must be lying.
I think the voltage is probably on auto and the board is the thing over volting the chips personally, but hopefully this will stop that from happening.
 
Soldato
OP
Joined
12 Feb 2014
Posts
2,864
Location
Somewhere Only We Know

Pin analysis of the destroyed Ryzen 7800X3D – All burned pins supply the VDDCR (CPU Core Power Supply)​

Obviously overvolted by the board if the voltage was on auto, or overvolted by the end user.

I was looking for a pin out, so thats a good page to save
 
Last edited:
Soldato
OP
Joined
12 Feb 2014
Posts
2,864
Location
Somewhere Only We Know
I'm on an ASUS TUF X670E-PLUS WIFI, 1406 bios because the 1408 bios just kept crashing (bsods, even froze in the bios constantly...) and I'm scared to try anything newer because this 'just works'.
My system only behaves like that when I have memory content restore enabled or on auto, if I disable it then its fine.

Im using Bulidzoids timings and voltages as per the video I posted on the pervious page with Expo disabled, but then I am running a 7900 non X chip, not a 3D chip, I nearly brought the 7800X3D too, im glad I didnt, I was discussing it with the guy I brought my board off and mentioned im not much of a gamer anymore, so then was torn between the 7900 and 7900X, after looking at reviews, there wasnt much difference between the 2 and went for the 65watt part instead of the 170watt part, this thing is super fast enough though.


Its mentioned these along with the X chips can handle voltages better as they are not so sensitive to voltages, my SOC is at 1.25v, im memory testing as im typing this, will probably stop it soon as its been going hours, but I read people were having problems with ram stability with the latest bios, for me so far so good.

fvmroKV.jpg
 
Last edited:
Soldato
OP
Joined
12 Feb 2014
Posts
2,864
Location
Somewhere Only We Know
Whatever the cause, if Asus are in touch with AMD over these issues, we are going to see a lot of bios updates, before Asus removed all of the old bioses, they were releasing new bioses every 10 days.

Asus are replacing burnt out boards, but amd are not replacing the CPUs under warranty, they are blaming the end user for using expo, but at the end of the day, that's a cop out, it's still their fault.

I've got to get my water block off this weekend, I'm going to have a look at the socket and chip, and if there is any signs of bulging, it's going straight back to the shop I brought it from for a replacement.
 
Last edited:
Soldato
OP
Joined
12 Feb 2014
Posts
2,864
Location
Somewhere Only We Know
This is the most interesting part of that article:

Our sources also added further details about the nature of the chip failures — in some cases, excessive SoC voltages destroy the chips' thermal sensors and thermal protection mechanisms, completely disabling its only means of detecting and protecting itself from overheating. As a result, the chip continues to operate without knowing its temperature or tripping the thermal protections.

AMD's modern chips often run at their thermal limits to squeeze out every last drop of performance within their safe thermal range — it isn't uncommon for them to run at 95C during normal operation — so they will automatically continue to draw more power until it dials back to remain within a safe temperature. In this case, the lack of temperature sensors and protection mechanisms allows the chip to receive more power beyond the recommended safe limits. This excessive power draw leads to overheating that eventually causes physical damage to the chip
Thankfully, this is what Asus has improved in the latest bios (1202) the protection mechanisms
 
Soldato
OP
Joined
12 Feb 2014
Posts
2,864
Location
Somewhere Only We Know
What voltage is your VDDR_SOC at on the latest BIOS out of interest?
I've manually set it to 1.15v, on auto it still wants to do 1.35v, I'm going to bring the mem controller voltage down tonight which is on 1.35v too, and test, see if I can get that down to 1.2v / 1.25v, I don't think either of these voltages need to be anywhere near as high as they are.
 
Soldato
OP
Joined
12 Feb 2014
Posts
2,864
Location
Somewhere Only We Know
Cool, yeah mine are both at 1.2v stable as it stands, I guess I could try reducing the SOC down a bit more too. I did get a crash last night with them both at 1.15v, but I'm not sure if that was due to one or the other, and missed the actual crash as I was downstairs making a drink lol.
Thanks for the confirmation, at least I know what to shoot for, I've read today another guy manually set both to 1.25v and left them there.

It's a bit bad really as someone who doesn't know much about a bios but knows how to put a machine together is just going to leave this all on auto and potentially end up with a dead system because of it over volting and no fault of their own.
 
Soldato
OP
Joined
12 Feb 2014
Posts
2,864
Location
Somewhere Only We Know
That is the theory - SoC overvoltage is killing part of the IO chip that monitors and controls the power ramping of the CPU based on load. Zen 4 chips ramp up power, under the boost strategy, until they hit limiters (thermal, voltage, power etc). If nothing is monitoring the limiters (as it's burnt out) then....
The chip ramps up power uncontrollably (i.e. requests higher and higher VDDCR voltages from the board VRM's) - this causes the CCD chiplets to burn out, short, and get so hot they burn the substrate of the CPU, which in turn cooks the pins and socket in that area where the CCD is.

To sum up........SoC overvoltage kills internal CPU monitoring -> MB overvolts CPU core on request of the CPU -> CPU incinerates itself -> CPU burns the MB socket
This makes perfect sense, let just hope we've not already all killed the protection section in the io die due to the soc over voltage, otherwise AMD are going to get a right pile of chips back that they're not going to be able to ignore.
 
Back
Top Bottom