Is it possible my new 5950x system hardware has already degraded?

Associate
Joined
10 Dec 2020
Posts
26
I've been running fully stable at 3600cl16 / 1800flck the past week and a half on my 5950x build that is about 1,5 week old. Not a single WHEA error in the event log or any crashes.

Last night i left the system idle during bedtime and when i woke up i noticed WHEA errors in the event log, i was thinking this was related the idle state of the cpu, the past 1,5 week of me using the computer it has really not been at idle at all.

The problem is, the WHEA error seems to be "creeping up" now suddenly i am getting WHEA errors DURING LOAD, something that would never happen EVER in the past 1,5 weeks, and i have not changed any settings or any drivers. Right now i am getting a literal flood of WHEA errors, it seems like it is just increasing by the hour. Its like my hardware isn't the same? WTF happened? Temps very never bad and i never touched any voltages.

Any ideas?
 
Yeah to my surprise now i decided to try everything at stock, so i am running the system at 2666mhz/1333flck and guess what? My event log is spamming with WHEA errors at stock setting also, this means something broke suddenly, during idle at night and now my hardware is most likely screwed, i cant even run stock without any errors. Wow.

That and my cpu never passed 70c, also never touched any voltages, only put flck to 1800, memory to xmp 3600, that's it, no PBO nothing. Also not changed any drivers from before and after the errors started occurring. Is my hardware now suddenly faulty or could it be something else? It just seems strange that everything was running absolutely stable for a week+ and now i cant even get stock setting without WHEA error and i didn't change anything. More likely the hardware has broken?
 
Last edited:
5950x
Auros x570 Extreme Rev 1.1 F30 (tested F31 bios , no change)
Ballistix MAX 4x16gb 4000CL19
3090 FE
850 Platinum SeaSonic
1TB Samsung Pro 980 gen4 nvme

But i solved the issue, as in i locked the clockspeeds at 3000mhz removing the boosting algorithm completely and now my new computer is slower then my old but it works without any WHEA errors, this means it must be the cpu right? Chances of the motherboard breaking when its not running even close to its limits seems strange. If it was a ram issue i would still have the WHEA errors now when the cpu is at 3000mhz because the ram is still running at 2666mhz.

Or am i mistaken here?
 
Could be the gigabyte bios as mine hip has been a bit flaky today
but it was stable for over a week now i can't seem to get any stability what so ever? I did try to re-flash the bios but it seems nothing helps.
to me the cpu has changed, ive initiated an RMA and it looks like it won't be any new cpu for me until early jan, so it will be awhile.

i am more than certain more people will experience something like i did where the cpu works fine in the beginning and your overclocks are rocking then suddenly hell breaks loose, that 1.5v to core boost at idle is probably going to screw up a whole lot of systems until amd patches it and lowers the boosts, but then you've essentially gotten scammed purchased something that was advertised to do something which it couldn't do in the long term

hopefully i am wrong
 
So the only overclocking you did was change RAM speed and FCLK?

Yeah, my goal was a new fast stable system, i've read that just setting your ram/FLCK to something like 3600/1800 1:1 will yeild a massive performance improvement without any risks, so this is the only thing i did do + tighten the timings because my ram is specced at 4000mhz.
 
It might be worth testing the memory. I'd put more money on that being an issue than cpu degradation.

Also about the 1.5v thing, it's totally normal. Voltages like that are only dangerous if the cpu is actually doing work. The voltage algorithm on these cpus is designed like this. This in no way would cause degradation.

I will try setting my memory back to 3600cl16/1800 flck with core boosts disabled, i read another thread and it appears the WHEA errors are directly related to the core boost algorithm, i am working right now so i can't reboot but once i am done with todays grind i will try it out and see if i have any errors. I personally don't think its the memory but i do agree with you that it's by design to boost to 1.5v so it seems very odd that would cause degredation in so little time. I am surprised to see how this issue crept up on me, from nothing for a week straight to not even being able to run at stock.

I found this thread (https://www.overclock.net/threads/replaced-3950x-with-5950x-whea-and-reboots.1774627/) where the user has the same issue as me, the only difference seem to be that for him it started out like this from day one, rather then starting after day 8 which to me is the part that baffles me the most, it just came out of nowhere and is why i had the theory my hardware may have degraded in one way or another.
 
Ok i've done some tests, here are the results:

Core Performance Boost -> Disabled, CPU Multipiler x34 (auto), FLCK 1800, MEM 3600 = WHEA Errors
Core Performance Boost -> Disabled, CPU Multipiler x34 (auto), FLCK 1333, MEM 3600 = WHEA Errors
Core Performance Boost -> Disabled, CPU Multipiler x34 (auto), FLCK 1333, MEM 2666 = WHEA Errors
Core Performance Boost -> Disabled, CPU Multipiler x34 (auto), FLCK 1333, MEM 1333 = WHEA Errors
Core Performance Boost -> Auto, CPU Multipiler x32, Mem 3600, FLCK 1800 = No errors!
Core Performance Boost -> Auto, CPU Multipiler x36, Mem 3600, FLCK 1800 = No errors!
Core Performance Boost -> Auto, CPU Multipiler x38, Mem 3600, FLCK 1800 = No errors!

So what is wrong here? Clearly disabling "Core Performance Boost" does not yield the same effect as setting a static multipiler for the cpu yet the resulsts in hwmonitor are the same. If you set a multipiler the cpu no longer boosts, and if you disable core performance boost the cpu no longer boosts, so why does one generate WHEA errors and the other does not? Clearly some **** code in those gigabyte bioses.

I'm starting to think when you update a bios there is actual microcode that gets updated in the CPU and even if you flash back to an older bios that microcode remains, this is my new theory to why my problems started yesterday. I did update to bios f31 but i flashed back to f30 after i didn't find the f31 one to yield any better results and i knew i was stable on f30 so why even bother using the newer one? Anyway, not long after that i went to bed and in the morning i had the WHEA errors. So this may be the fault to them, the F31 bios updated some microcode in the processor and due to the older microcode in the older bios the cpu won't take that because it's an older microcode.

I am only guessing, i am far from an expert in this field.

Anyway, everything is stable with a static multipiler overclock and memory runs just fine so its something fubar with the bios/cpu, thanks gigabyte i guess?
 
Last edited:
What’s the setting on the CPU PLL and Nb PLL

Have you checked the voltages using zen timing and ryzen master?

best dial in vram, vsoc, IOD and CCD voltages manually.

I have not, i expect the motherboard to do this automatically? Can you give me some "good values"? as i don't know what they should be if they are faulty. Cheers!
 
Best set your own volts

set vdram to 1.4
Vsoc to 1.05
V IOD and V ccd to 1.0v
V DDP to 0.95.

then leave CPU on auto and PBO on motherboard under AMD overclocking.

have you memtested? 2 sticks at a time? Also I guess these are not 64GB kits. You have 2 lots of 32GB?

that's correct, mem was bought in 2x16 packs

i did a massive amounts of memtest and aida64 the first 4 days, everything was stable, this issue came out of nowhere, and the settings i tried them now, no change.

the only thing that fixes the issue is setting a static multipiler for the cpu, even disabling core boost doesn't help, but setting the static multipiler which pretty much does the same as disabling core boosts help, i think its all gigabyte trash code at this point i am however still wondering why it occurred out of seemingly nowhere after system was stable for over a week.

EDIT: You may be on to something anyway, i realize now even when i set the multipiler to static but hardpress my memory using aida64 cache benchmark i still get (tho rarely) those whea errors.

I will run another set of memtests over the day to see if suddenly my memory kit has gone bad, is this common for memory to work fine then suddenly go bad out of nowhere? Temps never went above 38c on the memory according to hw monitor (they have built in sensors).
 
Last edited:
Ok i just finished 7 hours of memtest, at 64gb i only had time to get 1 pass, but yeah no errors.
Running aida64 cache benchmark yeilds instant whea in event log with whatever setting i try. The WHEA errors are just a lot more rare when i set the cpu to a static multipiler.

What i notice is that whenever i set the cpu to static multipiler the CPU VCORE also becomes pretty much static, sitting around 1.1v all the time.
Without a static multipiler and core boost of the algoritm seems to still "take care" of the vcore voltage but it never goes to 1.5v, with core boosts of it maxes out at 1.032v.

The thing is, the minimum voltage is what makes me think this is the issue, the minimum voltage for vcore is 0.192v? This sounds like it could be the issue, the "algoritm" chokes the cpu from power and it generates these issues in the eventlog, should i set the vcore to 1.2v?

p9M8k26.png

I also found this video: https://www.youtube.com/watch?v=tobzYO5pSJs where the guy explains his cpu was working at first, then suddenly nothing works, it went so far for him that his cpu won't even post windows anymore, he swaps the cpu with an old one and everything works fine, so chances are these 5950x are prone to break as these issues seem very common at this point.

It blows my mind that all the reviewers NEVER talked about this, yet whenever the cpus come out all hell breaks loose, is it safe to say that every single reviewer in the world gets a special type of hardware sent to them and whenever the masses get to buy the hardware we get something else? I just can't seem to understand how this goes under the radar, it's a massive issue and no one with a following is talking about it?

Anyways if you have any ideas what i can try next i'm all ears, i'm thinking once again i may have to RMA my cpu.
 
I just found out my ryzen 2600 is compatable with the motherboard (x570 aorus extreme), i have this in my backup server. Is it worth it to disassemble it just to try the stuff with another cpu? I don't know if i want to, the backup server is operating daily for my important files, i may as well just buy another cheap cpu tbh
 
What are the volts doing? As I said I don’t trust motherboard’s auto volts and also PLL is important.

I have not touched PLL i have to look and see if i can find it in the bios somewhere, what should i be setting it at?

I am right now running with:
set vdram to 1.4
Vsoc to 1.05
V IOD and V ccd to 1.0v
V DDP to 0.95.

At 2666mhz/1333FLCK and before i even get into windows the eventviewer fills up with WHEA errors, no difference with or without these voltage settings, no difference if i have memory in xmp 3600mhz/1800flck or this slow stuff.
 
So added to my voltages above:
VORE manually set to 1.2v
CPU Vore Loadline Calibration > TURBO
Vcore SOC Loadline Calibration > TURBO
CPU VCORE Protetion > 400MV (max)
CPU VCORE SOC Protetion > 400MV (Max)
PWM Phase Control > Extreme Performance (Max)

Running everything at stock, 3400mhz, no core boosts enabled, memory at default 2666mhz loose timings no XMP and flck 1333.

Full of WHEA errors.

Literally none of these voltages make any difference to anything, voltages do get higher tho, now cpu is idling at 1.2v so i know these settings have effect, memory is at 1.416 v, but considering it doesn't make a single difference it seems stupid to continue to play around with voltages, its clearly not the problem.
 
PLL should in the digipower menu. Usually found in the same area where you set core volts and core multipliers but it is a sub menu. It should be set it L2/3 probably on auto atm.

also enable Gear Down Mode as you are running 4x16GB.

Can't for the life of me find this PLL thing anywhere in the bios, perhaps its called something else?

I think we have something finally, Gear down mode Enabled in 3 different places in bios, running AIDA64 benchmark right now and it previously instantly resulted in whea errors, now i actually have none, so i believe you found the issue.

How Gear down mode changed must have been when i tried bios f31 and then went back to f30, gear down mode has always been set to auto but i guess auto changed somehow?
 
A note on your PSU - I wouldn’t be confident that it’s enough to handle the sudden current spikes that the 3xxx series exhibits under load. If you find your PSU switching off under sustained GPU and CPU load you’ll know the issue :)

Really? what kind of PSU would you recommend?
 
so gear down mode definitely made it better but i still cannot run my old spec of
3600cl16/1800 my guess is that any of these 40+ memory tweak settings that previously were auto, and is still auto was modified by one or another bios update and that's what causing the issue, one would hope the xmp profile would set those but it doesn't from what i can tell.... any idea what more mem setting that is really important? I read something about ProcOCT what shoult that setting be?

EDIT:

Not sure what to tell you guys, gear down mode fixed the problem, so i started doing some work, and suddenly all the whea errors are back, i didn't change anything lol, ok i officially give up on this rigg, ill just wait until it completely fails so i can rma the thing, wtf is this garbage, its either the trash gigabyte bioses making changes as we go, or the cpu is broken or a little of both, it seems to be far to finicky to be worth spending anymore time on as nothing makes sense anymore
 
Last edited:
Not sure what to tell you guys, gear down mode fixed the problem, so i started doing some work, and suddenly all the whea errors are back, i didn't change anything lol, ok i officially give up on this rigg, ill just wait until it completely fails so i can rma the thing, wtf is this garbage, its either the trash gigabyte bioses making changes as we go, or the cpu is broken or a little of both, it seems to be far to finicky to be worth spending anymore time on as nothing makes sense anymore
Look just get your system into windows and set things as I explained and do those tests. You have 64GB ram there is no way you could have ran memtest86 on 3200 MHz and 3400MHz and 3600MHz and tested 3600MHz at XMP timings in 24hrs let alone Aida64 test. So you are running your system with errors in ram.

i have 32GB ram each memtest86 4 pass tests takes 5hrs so yours will take a lot longer. Don’t take any short cuts just do it properly otherwise you are in a world of hurt.

I cant freaking stress this more. Test and freaking test your damn ram. Like it is the last thing that works. Memtest86 the crap out of it. 4 passes and be freaking damn patient and at the end of it you will be rewarded for your efforts.

Hmm i ran 2 passes of memtest86, i looked and it took around 4hours to make 1 pass, what would be wrong? I am now running memtest86 at 3600mhz/1800flck timing 16-18-18-36 with no stability issues, i chose to ignore those whea errors and suddenly they just went away out of nowhere. Again my WHEA error are ID 10,11 which doesn't seem to have any stability impact on the system and it's probably just a random luck i didn't see them earlier as they came from nowhere and now has disappeared out of nowhere.

Is there any special setting in memtest86 that i need to configure? I've just made it on a usb stick and once you boot on it the tests starts automatically so i haven't touched anything. It does seem strange that 64gb of ram would make 1 pass in 4 hours if it takes your 32gb 5 hours to make 1 pass.
 
No special settings just let the test run through 4passes.

If it all goes through then it is fine. You can move on from it.

is your windows a fresh install?

each pass on my 32GB is 1hr and 45min 4 passes is exactly 7hrs. I leave it overnight.
Oh i read your original post wrong, yeah that seems about the same, i ran memtest tonight at 3600mhz 16-18-18-19-36/1800flck for around 8.5 hours and it had made 2 passes no errors.

I am ignoring the WHEA errors for now, they are coming and going at complete random, right now i don't have them anymore and im back at 3600mhz/1800flck. They have never lead to my system crashing either so i believe WHEA ID 10 and 11 is not nearly as bad at 19 which seem to lead to hard crashing and reboots.
 
Back
Top Bottom