Computer just shuts down unexpectedly during gaming

Gorbash12346 · 20 Jul 2024 at 12:46

Hi,
Bit of an odd one, after running pretty much flawlessly I've encountered an issue in the last two days with my main computer where it will shut down unexpectedly during gaming after a few seconds of stuttering. No blue screen no blank screen just straight switch off.

Spec is
I9 13900K w/ Corsair H170I elite
Asus Z790 extreme
2x 16GB Corsair Corsair Vengeance Black 32GB 7200MHz DDR5 (No XMP as was originally listed as a supported /approved set but was pulled from the listings after release :mad:

)
RTX 4090 FE
Corsair HX1200
2TB Firecuda 530 nvme
Windows 10 64 bit pro (was 11 but had stability issues so went back to 10 for now.)

I've removed the header in case the switch was faulty, no change,
ran HW monitor to keep an eye on temps and voltages nothing too excessive during stress testing
re-applied thermal paste (Kryonaut) old application looked good. No change
updated to the BIOS that Asus just released on the 12th to address a micro code issue and ran into the news about 13th and 14th gen failing. I really hope this isn't it. Now running in Intel performance spec
Also updated every driver on it including the intel management engine etc.
actually seems to be getting more frequent.
plenty of errors in event viewer but not seeing anything critical?
Everything else on the computer runs exceptionally cool as it's inside a Corsair obsidian 900D w/ all the fans

I'm not sure what to do given I get no error log or blue screen to work from.

Any suggestions/ help would be very much appreciated!

Tetras · 20 Jul 2024 at 13:53

The most obvious reason for a PC to just shut down out of the blue under load with no error messages is the PSU, but admittedly it is hard to trust these CPUs right now.

You haven't moved the PC lately and might have unsettled the power cables?

How is your 4090 connected to the power supply?

Gorbash12346 said:
plenty of errors in event viewer but not seeing anything critical?

The majority of event viewer errors are meaningless, but WHEA errors would be more interesting.

What stability issues did you have with Windows 11?

wookiee87 · 20 Jul 2024 at 13:57

Straight power off would suggest a power issue / overtemp problem. Though having a 13900k does also ring alarm bells given recent news.

I would ask what kind of age your components are first, if everything checks out i would be looking at the cpu in all honesty. try at diffrent power states to see if the problems lesen or stop, if they do you may have a bad cpu.

As Tetras said check your event log for WHEA errors, that basically points to the cpu having issues, if you have loads then your cpu is the problem.

Quartz · 20 Jul 2024 at 14:27

Gorbash12346 said:
I9 13900K w/ Corsair H170I elite

There are apparently issues with some 13900 CPUs:

Gorbash12346 · 20 Jul 2024 at 15:51

Tetras said:
The most obvious reason for a PC to just shut down out of the blue under load with no error messages is the PSU, but admittedly it is hard to trust these CPUs right now.

You haven't moved the PC lately and might have unsettled the power cables?

How is your 4090 connected to the power supply?

The majority of event viewer errors are meaningless, but WHEA errors would be more interesting.

What stability issues did you have with Windows 11?

Lot's of WHEA errors

There's not even a second between them at points.

A corrected hardware error has occurred.

Component: PCI Express Root Port
Error Source: Advanced Error Reporting (PCI Express)

Primary Bus

evice:Function: 0x0:0x1C:0x4
Secondary Bus

evice:Function: 0x0:0x0:0x0
Primary Device Name

CI\VEN_8086&DEV_7A3C&SUBSYS_88821043&REV_11
Secondary Device Name:

All I get on the critical error list is an unexpected shutdown.

This one is repeated most of the way through.

the 4090 was bought at launch in May 2023 iirc and the full 13900k rig was put together from new in December 2022 power supply is a little older

Power to the 3090 is using all individual PCIE cables 4 into the 12 pin FE adaptor (No daisy chained connectors). cable is as straight as can be and fully seated (lots of room in a 900D)

It just crashed while typing this not even gaming. :rolleyes:

I was having memory stability issues that seemed to be significantly worse under windows 11 than in 10. though this may have been more attributed to the dodgy BIOS issues the extreme seemed to suffer on release and is probably why it was discontinued so quickly. But at the time it seemed to make a difference. I haven't re-tried it since to be honest.

Yes after seeing the reports coming out I was dreading it being the same issue.

Tetras · 20 Jul 2024 at 15:59

Gorbash12346 said:
Lot's of WHEA errors There's not even a second between them at points.

A corrected hardware error has occurred.

Component: PCI Express Root Port
Error Source: Advanced Error Reporting (PCI Express)

Are you using a riser?

You could try setting it to PCI-E gen 3.0 in the BIOS.

A few of these PCIE errors are not a problem (especially if they're when the PC boots), but if they're very frequent then that's more alarming.

Gorbash12346 said:
I was having memory stability issues that seemed to be significantly worse under windows 11 than in 10. though this may have been more attributed to the dodgy BIOS issues the extreme seemed to suffer on release and is probably why it was discontinued so quickly. But at the time it seemed to make a difference. I haven't re-tried it since to be honest.

Presumably your memory is actually running at 4800 or 5200, if XMP is disabled?

Gorbash12346 · 20 Jul 2024 at 16:03

Tetras said:
Are you using a riser?

You could try setting it to PCI-E gen 3.0 in the BIOS.

A few of these PCIE errors are not a problem (especially if they're when the PC boots), but if they're very frequent then that's more alarming.

Presumably your memory is actually running at 4800 or 5200, if XMP is disabled?

No riser. Just straight into the board. and supported as well.

Memory is 4800 CAS 40 at the moment.

Would that not strangle the performance of my 4090 going to 3.0?

Tetras · 20 Jul 2024 at 16:05

Gorbash12346 said:
Would that not strangle the performance of my 4090 going to 3.0?

No, you only lose a few %, but regardless, we're just trying to rule out the Intel issues because I'm afraid to say CPU connected devices producing errors is part of the symptoms.

NVIDIA GeForce RTX 4090 PCI-Express Scaling

The new NVIDIA GeForce RTX 4090 is a graphics card powerhouse, but what happens when you run it on a PCI-Express 4.0 x8 bus? In our mini-review we've also tested various PCI-Express 3.0, 2.0 and 1.1 configs to get a feel for how FPS scales with bandwidth.

www.techpowerup.com

Gorbash12346 · 20 Jul 2024 at 16:10

Tetras said:
No, you only lose a few %, but regardless, we're just trying to rule out the Intel issues because I'm afraid to say CPU connected devices producing errors is part of the symptoms.

NVIDIA GeForce RTX 4090 PCI-Express Scaling

The new NVIDIA GeForce RTX 4090 is a graphics card powerhouse, but what happens when you run it on a PCI-Express 4.0 x8 bus? In our mini-review we've also tested various PCI-Express 3.0, 2.0 and 1.1 configs to get a feel for how FPS scales with bandwidth.

www.techpowerup.com

Yeah I'm getting pretty worried. I'll give it a go. I've just checked the power connector on the 4090 in case the old melting power pins was in progress (I've been trying my best not to disturb them since it was fitted in case I provoked the problem) but it's fully intact, no sign of any issue there at all.

Tetras · 20 Jul 2024 at 16:11

Gorbash12346 said:
I've just checked the power connector on the 4090 in case the old melting power pins was in progress (I've been trying my best not to disturb them since it was fitted in case I provoked the problem) but it's fully intact, no sign of any issue there at all.

Glad to hear that, are you using a support bracket to prop up the 4090?

Gorbash12346 · 20 Jul 2024 at 16:25

Tetras said:
Glad to hear that, are you using a support bracket to prop up the 4090?

Because of the bottom of the case being so far away it's held up with a Fine piece of black nylon attached to the overhead AIO. Not taking any chances. :cry:

Ok going to give it a try with a few loops on 3dmark. I suspect Intels new microcode is going to have it running a lot slower. (and I thought I had left the performance losses of hardware vulnerabilities behind

)

Tetras · 20 Jul 2024 at 16:35

Gorbash12346 said:
I suspect Intels new microcode is going to have it running a lot slower. (and I thought I had left the performance losses of hardware vulnerabilities behind )

It obviously isn't ideal to lose any performance, but it is not a big deal for gaming, even with the slowest profile Intel offer.

The top-end performance in benchmarks or long-run workloads can be impacted a lot more because they're more likely to exceed the power limits, or use the max single-core boost.

Gorbash12346 · 20 Jul 2024 at 17:20

Some unusual behaviour so far as the cores boosting to higher clock speeds appear to be cores 4 and 5 hitting 5.8ghz with the rest capping at 5.5ghz and e-cores at 4.3 as usual. Temperatures are significantly down since the bios update at 70 max. previous was about 82-85

Gorbash12346 · 20 Jul 2024 at 18:41

So after 20 loops of 3dmark steel nomad stress test and a good chunk of time on prime95 it's not crashed since changing to pci-e gen 3 and no more WHEA errors though some other one I've not seen before.

Unable to open the job object \BaseNamedObjects\WmiProviderSubSystemHostJob for query access. The calling process may not have permission to open this job. The first four bytes (DWORD) of the Data section contains the status code.
Metadata staging failed, result=0x80070490 for container

Tetras · 20 Jul 2024 at 18:55

Gorbash12346 said:
So after 20 loops of 3dmark steel nomad stress test and a good chunk of time on prime95 it's not crashed since changing to pci-e gen 3 and no more WHEA errors

Sounds good!

Gorbash12346 said:
Unable to open the job object \BaseNamedObjects\WmiProviderSubSystemHostJob for query access. The calling process may not have permission to open this job. The first four bytes (DWORD) of the Data section contains the status code.

Metadata staging failed, result=0x80070490 for container

I haven't done any research or anything.., but I don't think that is likely to be an error to worry about.

Gorbash12346 · 20 Jul 2024 at 19:25

Tetras said:
It obviously isn't ideal to lose any performance, but it is not a big deal for gaming, even with the slowest profile Intel offer.

The top-end performance in benchmarks or long-run workloads can be impacted a lot more because they're more likely to exceed the power limits, or use the max single-core boost.

I think the frustration for me is mainly that I chose Intel over AMD at the time off the back of the increased memory bandwidth with 7200 initially being touted as a perfectly stable speed during the pre-launch review cycle and now I'm stuck at 4800 rolling back to PCIE gen 3 and running the biggest AIO I could get my hands on and it's potentially still **** it's pants. :mad:

Tetras · 20 Jul 2024 at 19:50

Gorbash12346 said:
and now I'm stuck at 4800

You might be able to sort that, I'm not sure what fiddling/troubleshooting you did at the time. That said, with the rumours/news about these CPUs, it may not be a good idea to be running your IMC at high clocks/volts at the moment.

Gorbash12346 said:
rolling back to PCIE gen 3

If this fixes your crashing I don't know what it actually signifies, usually this only fixes the problem if someone is using a riser that wasn't intended for PCI-E 4.0.

You have a high-end motherboard, so I can't see it being a motherboard problem.

I know that Wendell mentioned issues with the PCI-E bus and NVME drives crashing/producing errors, so it could unfortunately be part of that. I can't say for certain either way and it'll be awhile before we can verify 100% that PCI-E gen 3 has fixed your issue.

Gorbash12346 said:
and it's potentially still **** it's pants.

13900K/14900K users do seem to have got a raw deal this generation

I will say though, in the GN video their thoughts about contamination was that the CPUs affected were manufactured in 2023-2024, so if you have an early CPU, I suspect it is less likely to be affected. GN haven't gathered the data on the batches yet and there could be multiple issues at play with the CPUs, with some level of degradation being not connected to the possible contamination during manufacturing.

Gorbash12346 · 21 Jul 2024 at 14:42

WHEA errors are back again. same PCI express root port. no crashes as yet but some intermittant jittering in game that ties in with the times on event viewer.

Tetras · 21 Jul 2024 at 14:50

Gorbash12346 said:
WHEA errors are back again. same PCI express root port. no crashes as yet but some intermittant jittering in game that ties in with the times on event viewer.

You could try laying the PC flat, for if the problem is the seating with the PCI-E slot and GPU sag, though I think the FE 3090 has a vapor chamber and those may not be designed to operate in a different orientation.

Did you set the graphics PCI-E gen only, or the M.2 PCI-E gen too?

Major774 · 22 Jul 2024 at 21:32

Gorbash12346 said:
Lot's of WHEA errors There's not even a second between them at points.

A corrected hardware error has occurred.

Component: PCI Express Root Port
Error Source: Advanced Error Reporting (PCI Express)

Primary Busevice:Function: 0x0:0x1C:0x4
Secondary Busevice:Function: 0x0:0x0:0x0
Primary Device NameCI\VEN_8086&DEV_7A3C&SUBSYS_88821043&REV_11
Secondary Device Name:

All I get on the critical error list is an unexpected shutdown.

This one is repeated most of the way through.

the 4090 was bought at launch in May 2023 iirc and the full 13900k rig was put together from new in December 2022 power supply is a little older

Power to the 3090 is using all individual PCIE cables 4 into the 12 pin FE adaptor (No daisy chained connectors). cable is as straight as can be and fully seated (lots of room in a 900D)

It just crashed while typing this not even gaming.

I was having memory stability issues that seemed to be significantly worse under windows 11 than in 10. though this may have been more attributed to the dodgy BIOS issues the extreme seemed to suffer on release and is probably why it was discontinued so quickly. But at the time it seemed to make a difference. I haven't re-tried it since to be honest.

Yes after seeing the reports coming out I was dreading it being the same issue.

Crashed whilst typing and not gaming… CPU?

I would just down clock it and see if it’s stable.