Best guess which component is responsible for bluescreens when system is cold

Associate
Joined
5 Feb 2012
Posts
46
Location
London
I have a hardware problem that started I think near the end of last year and has gotten worse throughout the year.

The problem only occurs within the first 20 or so minutes from a cold boot (I mean literally the components are cold!).

On Windows 7 I get a bluescreen: 0x00000116 VIDEO_TDR_ERROR

Initially I thought I needed to replace the video card, but looking into it more deeply suggests that just about any component could be at fault.

https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0x116---video-tdr-error

The video card is one of the newest components in my system (besides drives) and I haven't stressed it much mostly just playing non demanding games (e.g. WoW) and I've never attempted to overclock it further than the factory settings.

CPU & memory is overclocked, I removed this for a couple of months but it made no difference at all.

In addition to Windows 7 bluescreens, Ubuntu crashed on me when cold, and also Linux Mint crashed badly.

I ran memory checker for 24 hours with no errors. However if I start memory checker when the system is cold it will freeze after 5 to 10 mins (still reporting no errors).

Due to the problem getting increasingly more frequent I worried soon it would break altogether so I've taken to leaving my system on 24x7 - which has worked perfectly.

The system was built the day the 3770K processor came out so it is getting old. I'm starting to think about an upgrade.

Do you think if I replaced the motherboard, CPU and RAM I should be OK?

Or should I replace the power supply as well - it is the "Corsair Professional Series AX850 High Performance 850W Modular '80 Plus Gold' Power Supply (CMPSU-850AXUK)".

Any opinion appreciated!
 
Thanks for the suggestions that sounds like a good way to potentially isolate the problem.

The PSU was new in 2012 so hopefully I can get a few more years from it.

The OCZ has seemingly worked fine, it is now my boot drive since switching to Ubuntu but I could try unplugging it and booting to Windows 7 instead.
 
The PSU dates from April 2012. I've had the GPU since November 2015, I bought it because my 7950 failed (I made some posts on here about that).

I'll keep in mind that maybe the power draw of the GPU could the fault rather than the GPU itself...

Maybe I should just build an entire new system just reusing the drives and GPU, and replace GPU only if still getting a problem. I said to myself at the time that I would upgrade when Intel comes out with something half of the 22nm process - I honestly expected that to have happened by now! I'm also worried about the AIO cooler, although it hasn't missed a beat so far, I'm not sure what their lifespan is.

I honestly don't think this is software related - the problem occurs on three different OS - Windows 7, Ubuntu and Linux Mint. In addition, it also happens in Memtest86 5.01 which runs without an OS or video driver. It only happens when the system is cold.

https://imgur.com/a/2oom2gB (embedding the image didn't work)

The image shows Memtest86 when it has frozen when memory testing a cold system.
2oom2gB
 
Yes I did put CPU back to stock, but it didn't make any difference.

Perhaps the damage has been done by overclocking and running the system fairly warm for the sake of silence. But yes it has served me well and doesn't owe me anything so I'm going to start thinking about a major upgrade and be ready to buy stuff on black friday :)

Re the OCZ, as far as I know it hasn't given me any issues. I bought it back in 2011 for £431 as my first SSD and perhaps I've kept it for nostalgic reasons. It must be near end of life though since it has been my boot drive for most of the time since then.
 
No I haven't tried that, thanks for the advice since I might give it a go since lately I've had a few freezes when the system is warm also which is new... So I'm now speccing up a new system at the moment. My guess is either the CPU, motherboard or RAM is a fault since they have been the most stressed components. GPU is relatively new and not overclocked nor has it done a lot of gaming work and the PSU has hardly been strained.

I'm now running Ubuntu which doesn't bluescreen but just freezes when there is a problem so there is less diagnostic info to go on.
 
So I built a new Ryzen system today with a new case, psu and everything except GPU and drives.

When I pulled the GPU from the old build I looked at it more closely and saw a plastic film on the back saying "Remove Protective Film Before Use"... Oh dear... It had been in the system like that 4 years close to the very day (bought 24/11/15).

Then I got a sinking feeling ... I realised I had no recollection of removing the plastic film from the cooler contact to the CPU, and sure enough I hadn't...
 
Now I've got my old system up and running again stripped of nearly all the drives and the graphics card. It is now running on the Intel 4000. The thing is I haven't seen the hardware fault on either the new or old system now. I'll keep testing over the rest of the week, but the problem was so reliable that it always happened at least once or probably twice from a cold boot.

So now I'm starting to wonder if it is the PSU not coping with the power draw.

Or it could be that the motherboard was struggling with 8 sata drives + sound card + graphics card + various USB devices and with a lighter load it is ok.
 
I've finally gotten to the bottom of this. It was a faulty crucial m4 128GB SSD.

I installed Windows 7 on this drive on the old PC and while it was installing Windows Updates, the drive vanished from the system and wasn't visible to BIOS. The next day it was back but it vanished again after the computer was on for a while.

I removed it from the system and put in another old 120 GB SSD put Windows on it and there have been no more problems.

I didn't realise a faulty SSD that you are not actively using much can cause so much havoc to the rest of the system, but I guess it makes sense, it was probably put noise on the PCI lanes stopping good comms to the video card hence the blue screen.
 
Yes I do remember, thanks for that, I admit I was a bit fixated on failures caused by overclocking and eliminating that first... In fact there were signs previously now that I think about it that it was this SSD but I put it down to other things at the time.

Thanks to everybody that has contributed to helping me figure this out :)
 
Back
Top Bottom