• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

New 5900x crashing

Associate
Joined
28 Mar 2021
Posts
30
Hi. I have upgraded my old 5600x to a brand new 5900x to cope with the bottleneck I had. 5600x never gave me any problems in it's almost 2 years of service. no crashes nothing. But the new one....

The current specs are:
cpu: ryzen 9 5900x
Mobo: Rog Strix B550-F gaming
gpu: gigabyte RTX 3080
ram: 2x 8gb corsair venceance pro rgb 3200cl16 (dual channel, slots 2 and 4)
storage: 2tb nvme western digital black
psu: corsair rm-750x
cpu cooling: corsair h100i pro
os: windows 11 home

Now the history of events:

1. I have replaced the old CPU with the new one and hit a bios reset settings. left everything untouched and booted up into windows.

From here I have got a random crash while doing non demanding tasks on it.

In event viewer I found 2 errors that caused this crash:

Whea logger:
Description:
A fatal hardware error has occurred.
Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Bus/Interconnect Error
Processor APIC ID: 0

And Whea logger:
Description:
A fatal hardware error has occurred.
Component: Memory
Error Source: Machine Check Exception
The details view of this entry contains further information.


Bear in mind that at this stage my oc were defaulted and my ram was on 2133mhz.

The same crash with same errors happened 40 minutes later when exiting a warzone 2 match.

2. I have decided to go and check if there is any bios update, my current version was one released in oct 2022. Found a newer one. Installed it using a usb flash drive.
Enabled D.O.C.P which brought my memory to 3200mhz and set the flck freq to 1600mhz. Checked of any driver updates on asus website, even used a tool which scans and checks of outdated drivers. Checked for windows update. Everything is up to date.

Opened warzone 2 to test again. in within 10 minutes of game it crashed again with same errors.

3. Decided to fire up a test and check on temps. fired up prime95 and chose the full blend test. In within 3 minutes worker 6 will fail the test. stopped the test and tried once more. same under 3 mins same worker.

4. Went back to bios and reseted everything back to stock settings. Fired prime95 once more and same, within 3 minutes worker 3 failed the test. Stopped the test and ran it again but only on that worker that failed, no errors, let it run for 15 mins and no error.

The temps were reaching 90c on some cores at some points according to hwmonitor.

5. Decided to test some more. opened up icue and set the pump and the fans from quiet mode to extreme. and fired up prime95 once more on all cores. same under 3 mins one of the workers failed the test. This time the temps never reached more than 85.7c( well the test didnt lasted more than 10 mins)


6. Going back to the crashes. since both errors appear at almost every crash and one says cpu and other memory, I decided to go back to the pc case and do some tinkering. I untighted the aio pump which sits on top of cpu a bit (maybe it was too tight) and moved the ram chips from 2-4 to 1-3

Went back to windows and this time I encountered another problem. Past the login screen the windows will crash with the error SYSTEM SERVICE EXCEPTION.
Went to safe mode and checked the event viewer and after some google search found out that there is some app that tries to access forbidden data from the sistem.
Tried several methods to fix the startup such as restore point, sfc scan now, DISM.exe, nothing helped.

7. Hit the reset this pc option using a usb with a new install kit. Booted to windows. updated it , installed all drivers.
Now I got a clean windows, went back to the bios and enabled once more P.O.C.D (ram oc 3200mhz) , ram flck to 1600mhz but this time I have manually set the SoC VDD voltage from 1.0000 to 1.1000 ( found this advice on some reddit post). Moved back the ram sticks to 2-4.

Went to windows and tested once more prime95. Same, a worker will fail in within 3 mins. But no crashes. Did a 3 hour session of warzone to last evening and no crashes.

Now the questions I need help with.

Are there any other tools to test the cpu?

It is a faulty cpu? should I RMA it?
I am planning to let it run for at least 2 more days and if it crashes again I will rma. But I need to make sure is faulty.
 
Last edited:
It is indeed pointing to a faulty CPU - Especially if it is doing it with RAM at below rated speeds - We 100% sure the CPU pump is running? Have you tried CO and set PBO limits on the chip? just to try and cool it a little more? (but at worst this should just throttle, not crash)
The pump is definetely running. I can hear it if I get close to the cpu socket. Since is a corsair one if you put it on extreme it is quite noisy.

I have made a naming mistake in the original post. the voltage I have modified is called SoC VDD voltage. which I increased from auto (1.0000) to manual 1.1000.

These are my current temps and voltage under low load are link
 
If it crashes at stock doing mundane task I'd RMA it.
This is the problem. I don't know if the issue was that is faulty, or maybe overtightened, or even the windows was causing the issue. Because I reinstalled the windows and untightened it in the same time. Since then is still failing on one worker in prime95 but no crashes so far.
I need to perform some more tests on it. hence why I am asking you if what you would suggest.
 
It's highly unlikely to be windows.

Try loosening the cpu block a little and see if it helps or not. If it doesn't just RMA.

On a fully working CPU you should have to set manual curve optimisation on single cores just for stability out of the box.
Thats what I did. but then the windows failed to boot anymore so I had to reset (reinstall) it. Since then no crashes.

I will be running now aida64 extreme test to see how it behaves.
 
Did a 15 minutes test of aida64 no sign of issues.

Went again on prime95 and did a 13 minutes test of smallest FFT, no errors.

Result print

Did the small FFT test, for 15 mins, no errors.

Did a Large FFT test and one worker failed in within 15 seconds. Did the test again and in 13 minutes another worker failed.

Result print

Going to run a memtest86 from bios to see if any errors.


I'd get the latest BIOS for the B550 board first, then RMA if that doesn't work.

I have updated to the latest version yesterday.
 
Last edited:
Finished 2 out of 4(roughly 30 mins each) tests of memtest86 v10 then I stopped the test. no errors.

did a test of occt with my ram at 3200mhz. in under 1 min core 6/thread 12 started failing every second.

Rebooted and ran the test again this time with the ram on auto (2133mhz) been 15 minutes since and no errors.

current status print

Im starting to believe that the SoC voltage fixed the actual system halts. no crashes since, and I have been stressing this cpu all day.

I am still not convinced with the crashes in prime95 and occt.
 
Last edited:
I used to get 63c CB23 multi run chip using around 133w with z73 360 aio have you tried with CB23 multi run ?
Right now I let it ran with stock values. Everything on default except SoC 1.1v
Been 32 mins and no errors.

I have a corsair h100i platinum which is a 240mm rad.

Which settings would you suggest to change in bios?
 
Last edited:
Sounds like the memory controller on the 5900X is borked to me. You could perhaps check the pins underneath to see if you have a broken or bent one.

I would RMA. If you have the 5600X to hand reinstall it and see if the same happens?

Ryzen chips do fail like Intel.

Alternatively, the RAM was not getting enough juice at the proper speeds, for whatever reason it is tripping.

But your errors point to that pesky 5900X memory controller.

I got the 5800x3D and replaced my 5600X, seems like it is fine thus far.

Im intending to do that. should I send it back to amazon or to amd?
 
I have fited back the old 5600x and ran a occt test with the ram memory oc to 3200mhz ( the one that failed in under 10 seconds on the 5900x)

here are the temps during the test print

Right now I am doing a prime95 test on it ( full blend, the one that usually fails in the first minute or in under 15 minutes )
 
Last edited:
Update: I got my replacement cpu. Same issues like with first.

I decided to set PBO on manual and fine tune each core using OCCT extrem test for 15 mins.

My final list of undervoltage is:
-10
+1 (this is core 1)
-15
-10
-15
-20
-15
-10
-10
-20
-15
-10

Now its seems stable. 30 minutes of extreme test and no errors. And my temps are somewhere in 80c. With some highs of 85c. Which is way better than the 90c I was getting last time.

But I still have the other issue to sort the other error I was getting.

Whea logger id 46.

From my experience with the old one. I had to raise SOC vdd voltage to 1.1v yet it doesn't fix it on this one.

This error appears only when idling or when the computer finished a task and goes to idle. (Last one happened after the 30 mins test ended. In the second when the pc finished the test boom. Reset)

Hence why I don't have a print to show you the temps during my test.

To mention that all the tests have been made with D.O.C.P enabled and memory at the rated 3200mhz. Disabling the D.O.C.P feels like it sorts the WHEA id46 error.


Any suggestions what can I do. I don't really want yo run my ram at 2133mhz.
 
Experienced some whea crashes with D.O.C.P disabled (ram on 2133) while playing witcher 3.

went back to bios and switched PBO limit from disabled to manual and gave it these values:
PPT: 170w
TDC: 118A
EDC: 165A

No crashes since, been over 1 hr of playtime and no crash. Will try to put back the ram oc to see if the lack of limiter was the issue.
 
so you got a replacement cpu and your back at it again with the curve optimiser doing the undervolting. why not just turn pbo off and run pure stock and take things from there?
when I got the replacement ( which came from different seller , AMD amazon) I installed it, hit clear cmos and turned it on with everything default. Went for some browsing and mundane tasks and it crashed. I have not done any curve optimizer with the previous one, didnt knew how to or even what I am looking at. Was the first time when I tried to fix or modify a ryzen one, since then I had only athlons and intels.

Browsed through forums and found that some people undervolted their 5900x and it was more stable than stock. (felt more stable but still crashing like once every 40 mins with same errors whea 18 and 46)


I have replaced my ram sticks with another 2x8gb kit which I had in another pc. fortza 3000mhz cl15 (which I overclocked to 3000mhz and set the CPU fabric to 1500mhz. While installing them I decided to do some cable check to se maybe something is broken. saw that my CPU 8pin plug was not fully inserted. it had like a 1mm gap to click (that plug is quite hard to reach and even to spot, hidden by VRM heatsink and top-mounted fans)

Since then no errors, not sure if it was the ram or the plug.

Here are my stats during a 4 hour Warzone 2 gaming session print1 print2

Will come back in a few days with update.

Let me know if there should be other bios settings to address
 
Last edited:
Reply might me late but if any mod is willing to edit initial post and add the following lines would be brilliant


Left it in the rest of the time with all oc settings up (pbo, curve optimizer, etc) BUT with ram non-oc at 2133mhz. It never crashed in the upcoming months.

Fast forward to a week ago when I decided to try once more to fix the situation, bought a new kit of ram g.skill trident neo z ( a good samsung b-die).

Guess what, the cpu would crash in under, anot not just crash, but crash a lot( WHEA 41). Disabling the ram oc back to 2133mhz all good, if I was enabling the oc to 3000mhz, 3200mhz or 3600mhz would crash.

Finally decided that It was enought and my 3080 deserves a cpu that can handle fast ram. Decided to sumit a RMA request. Sent it thurstay to amd, next tuesday got a brand new replacement.

This is my third one, so far no crashes in any form of stress, wz2, starfield, stress tests, idle.
 
Last edited:
Now is nearly the end of the year. Had some months to play out with the settings, overclock ram, overclock cpu, etc, absolute zero crashes. I do still have the same configuration as before. At the end of the day I had to send back 2 units out of 3.

I will make sure next time when I upgrade I go back to Intel.

Bottom line, if you have a 5900x and get memory controller errors dont bother, send it back and if possible switch to Intel.
 
Back
Top Bottom