Unexplained crashes appearing out of the blue

Associate
Joined
26 Aug 2019
Posts
56
So I've been having these weird crashes coming up since yesterday and I'll try to explain as thoroughly as I can in case hopefully someone can understand what could be going on.

I've build a new pc exactly a week ago, 5950x, ballistix 3600 C16, dark hero mb, nxzt z63 and an nvme drive, the gpu I kept the same as before 780ti.
Originally just connected everything, kept the same windows installation (previous system was an intel q6600, ddr 2) and pretty much every thing had been fine since. Not a single crash or issue or pretty much anything noteworthy. Nothing.
Throughout the week I was messing around here and there running cinebench, check core clocks, scores etc trying to see what I get and how I can raise things. Yesterday decided to quickly install windows on another drive and see on a clean install if things are a bit better.
And it did seem to give me some better scores. However that is when the crashes began, first one was when I was booting off the USB drive to install windows, and then I had a few where it would hard crash after a few minutes after booting into windows (so I had enough time to run cinebench) while some times I would try to run a game to see if it's running in that windows (because it wouldn't load at all in my previous one) and it would crash before I had the time to literally double click on 2 folders and then the exe. I thought that something must be messed up with the installation although I was really concerned that this happened even in the bios windows setup environment which could mean something more generalised.
Went back to my old/main windows. Everything carried on being fine. Decided to move that windows to a different drive (just so I can keep the data and the previous installation if something happens) and make a new fresh install on my main drive. Also in the mean time as I was on the stock bios before, I upgraded to the most recent one and kept everything on stock aside DOCP. While trying to install the new windows due to being on the phone and distracted with something important I ended up installing on the backed up partition and completely lost my old/main windows. I couldn't recover the files.
With nothing else being able to do I just carried on installing the windows on the drive I was supposed to. For the first few hours of setting things up etc was fine. Then went to play a game, hard crash within a second of loading the map. I was like ****, whatever is going on is still being carried forward.
After the restart loaded the game again, played fine for like 20-30mins. In the meantime temps etc were all fine. Before going to bed put memtest64 to run for 3 hours. Woke up in the middle of the night just to check the pc and at some point it had restarted. By checking the uptime I could tell that probably it had crashed a bit after the 3 hours had passed (difficult to know for certain as I didn't get exact time of starting the test). Before going to bed again I ran prime95 with a blended test. It's been going on for 4 hours now and is stable. Temps hovering on the 60s. Although from what I see it doesn't push high mhz on the cores most of the time (this minute for example all are around 3Ghz Edit: might 've underestimated a bit, for now they have been for a few moments at 4,450).
For the crash in the middle of the night the event viewer stated this:

A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 0

The details view of this entry contains further information.

Anyone has any idea what could be going on? How could it possibly be stable in an old windows installation with a previous intel cpu for a week yet then going to crashing in a bios setting, on a fresh windows install, on 1min into gaming, into just being in windows opening a folder but at the same time stable in a blended prime95 for 4 hours????? Heeeeeelp.
 
Associate
Joined
13 Sep 2013
Posts
1,625
Location
Aberdeen
I'd add a bit of vdimm on and maybe set your soc etc manually and see how that goes. Are you running 2 sticks or 4? Also got proc/CAD bus and rtt to play with.
 
Associate
OP
Joined
26 Aug 2019
Posts
56
I'd add a but of vdimm on and maybe set your soc etc manually and see how that goes.

I've download now testmem5 to give the memory another ran to see if it has anything to do with that. Although again a week with playing games etc and not a single crash makes it a bit weird.

Code:
set your soc etc manually
What do you mean with this one? is it the timings? I am a bit new to platform and I find the new uefi and things you can change so much more complicated compared to the bios I am used with the q6600.

And the PSU I have is a somewhat recent EVGA G3 750W so in theory it shouldn't be a power issue, at least not when being stable in prime and crapping out in bios or idle in windows.
 
Associate
OP
Joined
26 Aug 2019
Posts
56
Okay I just had ran 3 cycles on testmem5 and all came back without errors. And as the pc was idling doing nothing it just restarted. Same error as before in the event viewer. When it booted back into windows, opened the browser to type this and while typing did it AGAIN. I am at a complete loss....

I've added a 0,0125 and 0,0250(might ve typed an extra zero now, not sure) on cpu and vsoc and vdimm respectively to see if anything changes. put IF manually on 1800.

From googling around apparently it is a widespread error however for some it's the ram, others the GPU, others CPU etc... so god knows what and why. I'll give it a couple of days and if I still can't figure it out I'll try to return maybe all of them as I am still within the 14days of receiving them and cast the die again. Only annoying this is that every component is from a different shop and the motherboard is from abroad!! argh..

Edit: reverted those changes and instead made the following based on another thread
"Cpu Soc Voltage 1.1v
*Vddg iod 950mv
*Vddg ccd 900mv"

And also increased the vdimm from 1.35 to 1.4

I ll see how it goes, if still issues I might try the xmas day beta bios.

crashed again but this time there's no entry in the event viewer as to why. so bloody annoying.
 
Last edited:
Associate
OP
Joined
26 Aug 2019
Posts
56
Are you experiencing crashes in old windows install or fresh install?

in old installation never had an issue. Not one. In multiple new installations yes. Almost always while idle. From what I've gathered seems to be a common issue.

A few hours ago I updated to the xmas day beta bios and kept everything at stock including ram, didn't enable DOCP. Was gaming for a couple of hours just fine (although usually seems to be more apparent issue while idle). I 've been mainly idling for an hour and it hasn't reoccured... yet.

However from reading it seems that it is either bios related or cpu related. In the case of bios a lot of times apparently disabling c-states fixes it. In other cases people have been luckier Rma-ing their cpu. I'd say more likely to be the disabled c-states the issue because there would be a chance that the previous old w10 installation could have been stuck with a bloated service or something that could keep the cpu from idling as much and thus not exhibiting the issue while the clean installation does not causing this to appear. However as I've said I've lost the previous installation so I can't go back and try in that environment. I'll try disabling the c-states and see how it is in the next couple of days and if still causing issue I guess RMA-ing the CPU might be the next step just for peace of mind.
 
Soldato
Joined
29 May 2005
Posts
4,290
in old installation never had an issue. Not one. In multiple new installations yes. Almost always while idle. From what I've gathered seems to be a common issue.

A few hours ago I updated to the xmas day beta bios and kept everything at stock including ram, didn't enable DOCP. Was gaming for a couple of hours just fine (although usually seems to be more apparent issue while idle). I 've been mainly idling for an hour and it hasn't reoccured... yet.

However from reading it seems that it is either bios related or cpu related. In the case of bios a lot of times apparently disabling c-states fixes it. In other cases people have been luckier Rma-ing their cpu. I'd say more likely to be the disabled c-states the issue because there would be a chance that the previous old w10 installation could have been stuck with a bloated service or something that could keep the cpu from idling as much and thus not exhibiting the issue while the clean installation does not causing this to appear. However as I've said I've lost the previous installation so I can't go back and try in that environment. I'll try disabling the c-states and see how it is in the next couple of days and if still causing issue I guess RMA-ing the CPU might be the next step just for peace of mind.
You didn’t put on ryzen power plan or something? Zen3 doesn’t need amd power plans. Also got latest drivers etc?

lastly check you got those c-states and amd cool n quiet etc.
 
Associate
OP
Joined
26 Aug 2019
Posts
56
You didn’t put on ryzen power plan or something? Zen3 doesn’t need amd power plans. Also got latest drivers etc?

lastly check you got those c-states and amd cool n quiet etc.

At some point I think I had it on the performance plan because I was messing around with benchmarks etc. At some point I changed it back to balanced. Now I don't remember exactly at each time when the crashes mostly appeared. If there were in both plans or just one etc. The thing is that once it crashed while booting off the usb to install windows so not as much per drivers etc. At some point today however it did restart and I had prior re-installed the chipset drivers directly from AMD and not through armoury crate as before.
 
Soldato
Joined
29 May 2005
Posts
4,290
Chipset should be via AMD always.

go into the power plans and disable any energy saving features for now and set min cpu power to 100%. And have a look at your bios settings etc on cpu features and turn all the stuff off. Then see what you get.

i am assuming you have stress tested each components to make sure CPU RAM GPU etc are all individually working.
 
Associate
OP
Joined
26 Aug 2019
Posts
56
Ok just an update. Since disabling c-states and updating to the latest beta bios (did both at the same time so I don't which of the two helped more, I'd say more likely the c-states) the machine has had a continuous uptime for almost 23 hours without a WHEA-18 error. Apparently this is quite common and seems to happen more when cores go idle.

If this affects anyone else and they 're interested to know more about it they can follow these extensive discussions below:

https://community.amd.com/t5/proces...rashing-restarting-whea-logger-id/td-p/423321

https://www.overclock.net/threads/replaced-3950x-with-5950x-whea-and-reboots.1774627/
 
Top Bottom