• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

5800x stability issues

Associate
Joined
22 Apr 2021
Posts
17
Ayup,

I'm after a bit of advice on a recent Ryzen 5800x purchase from Overclockers.

This replaced an existing 3700x in an MSI Pro-a X570 board with 16 gig of cas18 3600 ram (with an AMD-specific XMP profile), an 850watt PSU, Noctua DH15 cooler and RTX 3080 and various other bits and bobs including a Valve Index.

The system's been solid and stable for ages with the 3700x. Since putting the 5800x in I just can't get the thing to be stable when gaming. It's rock solid doing regular Windows stuff, it always boots and runs brilliantly. But in *some* games (specifically, Automobilista 2, Project Cars 2 and Assetto Corsa Competizione) it'll blue-screen after anything from 5 minutes to an hour. The error is logged in the event log as a "bugcheck 0x00000124" which I believe is a generic hardware fault error code. There are no other errors in the event log (no on-going WHEA errors or anything, just this BSOD).

I can run CPU benchmarks like Prime95 without errors for ages. The CPU will boost to around 4600 all core and 4820 single core when benching, with temps topping out at about 80 when running all core prime85 small tests. Temps in gaming are much lower (in the low 60s). But as I'm a VR player I never actually get to see the temps at the point when it BSODs because I have the goggles on. I also ran the Windows memory diagnostic (no errors).


I've been through some fairly standard trouble shooting steps - set the ram back to default, updated drivers (chipset & GPU), tried 3 different BIOSes (the latest 2 beta bioses and the last 'official' bios). I've also tried with PBO disabled in BIOS and lower TDC, EDC and PPT limits (80, 120 and 120 respectively) but nothing seems to make any difference - it'll run for a while in these games but eventually BSOD with this same bugcheck error.



Is there anything I've missed here? I'm not really sure what to do - should I start the RMA process? It feels like it may be a faulty chip, given that the 3700x has been running like a champ with not a single crash or hiccup for more than a year, but I appreciate that demonstrating that it's definitely the chip will be a challenge
 
The ram is Corsair if I remember correctly. I've already tried it at it's default (non xmp) timings and it's made no difference. I'm out of ideas, certainly feels like I got a lemon :(
 
it's not a fresh windows install, it's from the previous build with the 3700x (so about 18 months old) - it was a clean install when I bought this x570 mobo and the 3700x. As this is my main development box (as well as gaming rig) a reformat would be very painful.

I'm not undervolting or overclocking - it's all at stock. I could try voltage tweaks I guess, but I think it really should run stable at stock settings with a decent cooler and PSU.

I guess I could have bent a pin installing. I was properly careful and there were no bent pins when I removed it from the packaging and i didn't catch it on anything when I put it into the socket. Not saying it's impossible but i doubt it. I'll take a look when I re-seat it later today.

I've had Prime95 running all morning without errors, with and without AVX enabled in 'blend' mode and 'small fft' mode, no errors or crashes
 
lots of stuff to try, thanks chaps, wish I had more time to play with this.

I only play VR games but will test non-VR stuff. I agree with welshrat's point that not crashing in Prime95 but crashing in games does suggest a power issue. The PSU is an XFX 850 unit, been solid, maybe it's not any more.

One more data point from this morning. There's a more recent beta bios for the mobo which I flashed. Not crashed yet with this version yet *but* the boost clocks are way lower (and Ryzen Master doesn't appear to be able to read the temperature and package power / current data). With this beta the single core boost speed is about 4600, multi core about 4200 (down from 4800 and 4600 with the older bioses). Hard to interpret that - to me it feels like there's something wrong with this bios that's preventing the chip from boosting properly, but when it does boost it's not stable.

Gah, PCs :(. Should have stuck with the 3700x
 
to answer a few of these reponses (thanks for taking the time guys)

Wouldn't normally use a beta bios but this one seems to offer some stability albeit at the cost of some performance

SFC /scannow didn't find any errors

Already using HWInfo, just found it odd that the beta bios prevents RyzenMaster reading some of the chip's telemetry, clearly a bug in the bios

There's no LLC option in this mobo's bios, i'd have to set voltages somewhere else. Don't mind tweaking voltages or curve optimiser settings to extract performance but having to tweak to get stability suggests a faulty CPU or some other hardware issue

Regarding the PSU, replacing this would mean chucking money at the problem. I know there's a chance it's the weak link but it's never given any issues before and it's well within spec for a 3080 + 5800x. I really really don't want to get into a loop of replacing other parts in the hope they fix this. Is there anything that can log the voltages coming out of the PSU in heavy-load situations to establish it as a possible culprit?
 
Had very little time to pull the rig apart and test stuff this weekend. I still intend to roll my sleeves up, starting with the CPU voltage settings in BIOS (LLC and offsets).

In the mean time I reverted to the latest beta BIOS with the lower boosting frequencies so I could do my league racing (yeah yeah, I'm a massive nerd) and had no issues. Interestingly, comparing the real-world performance (VR frame times) with the high-boosting-unstable bios with the same for the low-boosting-beta-bios, there's just no measurable difference at all.

At this point I'm seriously tempted to chalk it up to a PSU that's not as good as it ought to be, with a viable workaround of the low-boosting beta bios. Maybe when life is a bit less frantic I can systematically test the various voltage settings and establish if there's any that give stability with the non-beta BIOS, or maybe treat myself to a PSU. I also need to reseat the chip, reinstall windows, try single RAM stick, and a million other things that I just don't have time for
 
Took a little time to investigate this further, following the various advice on here and some other bits and bobs I've found (like setting the idle power to 'typical current' or some such).

The end result is that even with a posh new PSU (Corsair RMx 850), clean Windows install, single RAM stick, default settings, various BIOS versions, reseating the CPU and spending hours and hours of time that I would rather be doing something fun, I *still* can't get this 5800x to be stable. It's a real sod to test because it only ever crashes when gaming so no synthetic benchmarks (Prime95, Cinebench, OCCT, CPUZ) can reproduce the crashes.

The only think I've not tried is swapping my motherboard, but having thrown away £130 replacing a perfectly adequate PSU an spending a couple of days on this already I really don't want to do this. When I put the 3700x back in it was stable
 
I've continued to monitor and faff with this and am still undecided about it.

At the weekend I had some regular crashes in Dirt Rally 2 and decided it could only be the CPU and contacted OC to start the RMA process. I got prompt a reply back from OC asking for a bit more info and this made me want to do just a tiny bit more testing, with some interesting (to me at least) results.

At stock settings - XMP off, most stuff on 'auto', processor virtualisation on (I need this cpu for VMs) - it crashes with WHEA bugcheck error 0x0000124.
If I enable PBO and set the PBO limits to Motherboard, with no other changes, it *doesn't* crash
If I enable XMP with PBO limits set to Motherboard it crashes with WHEA Bus/Interconnect Error (on various APIC IDs)
If I leave XMP enabled and set the IF frequency down to 3200 (from 3600, which is what it's set to when I enable XMP) it *doesn't* crash

Note that in each case the crashes only ever happen when gaming. Crashing happens within about 15 mins, the doesn't-crash cases are stable for at least a couple of hours.

So it appears there are 2 triggers for these crashes - the CPU isn't stable with IF set to 3600, and the CPU isn't stable with PBO limits set to auto. With XMP on, IF at 3200 and PBO limits to Motherboard (which means crazy PPT limit like 500w, EDC at like 200 or something) the thing appears to be stable in gaming.


To answer some of the questions here, the GPU is hooked up to 2 separate PCIe power connectors (as it was before), no daisychaining. It's not overclocked. The board is an MSI x570 pro-a. I know this has relatively weak power delivery gubbins but would still have expected it to be able to run a supported CPU at stock settings.

I don't have access to another board to test. I've not tried positive voltage offset
 
Arrggghhhh.... wasted too many evenings on this. Turns out it *does* still crash with PBO on. Tried manual RAM timings, no change. Re-seated and re-pasted again (not in that order, obviously), ran for a while with XMP disabled and IF at 1333 - still crashed (but less). Re-enabled XMP, ran IF at 1800 (ram at 3600) but with Global C-States disabled and *thought* it was stable, but no. 2 crashes in quick(ish) succession running Dirt Rally.

And probably a hundred other things I've tried this week. Put the 3700x back in and all is peachy. Except for this £400 paper weight. I give up. Hopefully Overclockers will accept all this endless faff-and-fail is sufficient evidence that the chip is clearly a basket case and replace it
 
shipped it back to Overclockers this afternoon, crossing my fingers for a prompt and trouble-free replacement :)


glad to see the back of it to be honest. I know every product has an occasional dud but working through so many combinations of settings, all the testing and crashing, the wasted money on a new PSU and the countless hours farting around has kinda put me off AMD stuff. If / when a working replacement arrives I'll probably feel differently - it was *very* fast (in between the crashes).

Thanks for all the support here guys, it's really appreciated
 
relax fellas. This thread isn't for arguing about whether Zen3 has a high failure rate, it's for soothing my frayed nerves and consoling me because I appear to have got a duff one
 
maybe a bit premature to talk about the effect of a replacement - I expect OC will test it first, and given the difficulties I've had getting it to misbehave in synthetic stress tests / benchmarks (despite the endless crashing in games), I'm really dreading a "we tested it and it's fine, we're posting it back" email. I expect my nerves will remain frayed for a few more days. In the mean time please stop bickering about failure rates :)
 
Quick update on this unfortunate and frustrating little adventure.

I sent it for RMA to OC on Monday evening, they received it on Wednesday and now, 2 days later, I've just finished a quick test of its replacement. It's still early days - I need a good long Dirt Rally 2 VR session to establish whether the replacement works better than the original (and I'm supposed to be working). So i'll hold off calling the problem "fixed" until I can put the testing time in.

However, I have to give credit where it's due. OCUK have been supportive and brilliant throughout. The responses to my emails have been quick and helpful, the returns process (pre-paid DPD drop off at my local shop) is spot on, and the turnaround time was great. It's the smoothest and fastest RMA experience I've ever had and after disappointment and frustration of troubleshooting the CPU, to get such fast and hastle free service when it actually came to the RMA is refreshing. So thanks OCUK, it's really appreciated. Hopefully the replacement is a winner :)
 
while I know the MSI x570 pro-a isn't the best motherboard on the market, it's isn't crap. It's just OK.

Having said that, I'd be quite willing to admit defeat and replace it if it turned out to be the culprit - it works well with my 3700x so could be used in another system. However, since installing the replacement CPU I've not had a single crash while gaming. So I'm increasingly confident (but still not 100% sure) that the issue was the CPU, not the mobo, and changing the CPU has fixed it. It's stable with PBO and XMP on (3600 ram / 1800 IF) with no voltage tweaks at all (LLC, idle voltage, core c-states, etc etc all at default / auto)
 
Back
Top Bottom