• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

Flaky 7950? Help me OCU Kenobe, you're my only hope

Associate
Joined
27 Sep 2012
Posts
100
Location
Petone, NZ
------------------------------------------------
RESOLVED (kind of): see post 23 for spoilers :)
------------------------------------------------

I bought a *B-GRADE* Sapphire HD7950 OC from OCUK late last year and have been having system crashes during gaming ever since. I am at my wit's end so have resorted to the unthinkable: asking for help.

Usually after 10-30 mins of gaming the picture will freeze (but no artefacts), the sound will either stop or skip, and after a few more seconds the PC will reboot (Windows 124 error). It will only ever happen once per gaming session. Once the PC reboots I can play for hours will no more issues.

How long it takes to crash seems to be a function of which game I play... Arma2 can crash it in 5-10 mins if I sit in the menu. In GTR2 it'll typically last 40 mins or more. So it looks like a function of load - either graphics or full system load or both.

Facts:
I get these crashes whether it is at the factory OC or reference 800GHz, (CPU at stock or OCed, same result)
The temperatures seem very much under control -- last crash happened at GPU 54C, VRMs 50 & 52C. I've never seen the GPU or VRM temps higher than 70C.

What I've tried so far:
At first I thought this kind of crash sounded like RAM, so I've replaced that and changed slots.
My CPU was running pretty hot so I've replaced the cooler and now getting 40s under load. Voltage is boosted slightly bc its sensitive to that.
I've replaced my crappy 550W PSU with a nice XFX 650W one... actually I thought this had fixed it I didn't have any crashes for a week but they came back -- Is this some kind of clue?
I've pushed the voltage slightly to 1.13 to see if it helped stability, it didn't.
I've lowered the VRAM to 1200GHz but it didn't help.
I've tried reseating the card, tried the other PCIe slot
I've removed drivers, cleaned, manually deleted files & reg keys, installed new driver.
I have not yet reinstalled Windows, although at some point I may give in and do this (but I'll hate myself for it). EDIT: Have now tried it, no help
I feel there is heaps of other stuff I've tried that I can't remember.

Here is the bit where you shake your head... I'll avoid eye contact for a bit...

When I got the card it was 5mm too long for my case, and the cooler 5mm too thick to clear my HDD bay, so I opted (probably unwisely) to let it flex slightly and push against the HDD bay while I tried it out. It was like this for a few weeks and then I cut the stupid plastic 'horns' off the cooler so that it now fits properly. If there was any chance of RMAing it before, there is none now. I know, tut tut tut :( I got the occasional crash since the start, but not every single gaming session like now.

Maybe I have screwed it putting it under that small twisting force. Either way, is this a dodgy graphics card or could it still be something else?

Has anyone else seen this same pattern before?
Game for 30 mins --> full system crash --> reboot --> game forever

Thank you very much for reading this far and for any help you can offer. :)
 
Last edited:
I don't like the sound of it pushing against the HDD bay.

Only thing I can suggest off the top of my head is check the PCB all over for any signs of severed traces, resistors popped out of solder holes or such like. I don't know if these things can actually happen as a result of warping the PCB, but maybe it's worth doing anyway to check for anything askew.
 
@ iBSOD

No. But that would be a pretty good test.

Unfortunately none of my friends in London are gamers (or they have laptops). I'll keep thinking about that though, it would be a pretty definitive test of the card itself.

@ Orangey

Yeah, not my proudest moment. I wonder if repeated heat cycles + some stress in the PCB could've loosened something. I will have a good check around.
 
Last edited:
make another partition on your hard drive and put a clean install of windows 7 on it, to test your card. if it fails that them your card is probably faulty. i wouldnt worry about the warping though, i had a huge heatsink on my gtx 465 which warped it a lot. 2 years later the card still works fine
 
You are a genius (or I am particularly slow), I had never thought of doing that. It should at least rule out a reinstall as a solution.
 
[Several months pass]

I eventually installed Linux on a USB stick and was running benchmarks without crashes. THen I made a new HDD partition and installed Windows on that and ran benchmarks without crashes. Then I took the plunge and reinstalled Windows on the SSD and it was running for a while without crashes... until it crashed. But crashes were far less frequent.

My current thinking is that its not drivers/Windows, its definitely hardware. In between installs I would have moved the card about as well, which is probably why it behaved itself for a while.

With the HD 7950 sitting in a PCI slot normally the temperature sensor will sooner or later stop reporting, or report 0 degrees (can't tell which it is, but outcome is the same). Then the fans spool up to 100% and I get annoyed. But I've found that putting a physical stress on the card (pulling card towards the top of the case) can fix the temperature sensor problem.

But I now think that if I do physically stress the card to solve the temp issue I get more of these crashes, and sometimes the crashes are followed by the dreaded BIOS video card error beeps.

It very much seems like thermal expansion to me. I still only ever get one crash per session, about 10-30 mins in, so it has to be heat related.

Probably I need to buy a new card and be done with this, but I've been trying to find a workaround for the one I have. Just posting back here to make sure the story is complete.
 
Last edited:
I see that you have an Asus motherboard - I remember being on the Sapphire forums a few weeks back when I bought a Sapphire 7970OC 2nd hand and couldn't get it to work - turns out a lot of users posted they were having issues with Asus and (in my case) Gigabyte mainboard compatibility with the card. Turned out card had developed a fault and didn't work in sellers system either.

Do you have the latest BIOS for your motherboard?

I would have a wee read of the Sapphire users forum and see if any of the problems sound in any way similar to your own and if any solution is offered.

http://www.sapphireforum.com/forumdisplay.php?34-HD-7900-Series
 
Thanks Five.Stars, that was good advice. I've now signed up over at the Sapphire forums and haven't seen anything that directly describes my issue (although I did see one person with the same temp sensor flakiness) but I'll keep an eye on it.

I'm already on the latest BIOS for my MB, both the BIOS and the MB getting old now.

Have had a good run last couple of weeks, no crashes, but I haven't changed anything either. Sooner or later it'll start crashing again. If I do ever find the cause I'll report back.
 
I gave in and bought a brand new HD 7950. Took the old card out and installed the new one. And I'm still getting these system crashes. Damn and blast!

So to summarise, the things that I've replaced so far.
PSU
Video card (Sapphire HD 7950 OC swapped for a Sapphire HD 7950 Boost)
RAM
CPU cooler

I am really running on empty now. Does anyone have anything I can add to this list?

1) Could be the SSD/HDD? I've run chkdsk on both but I presume this is not fool proof?
2) Could be that my motherboard hates the 7950? I have my old 4890 that I can use to test this theory.
3) Could be some nasty bit of software/driver that I've reinstalled with Windows somehow? I have a barebones Win install I can test with and Linux on a stick if it comes to that.

Other than that I really don't know what to do. Maybe order a PS4.... joking, guys, joking.
 
Swap that SSD for a different drive. If you doubt any piece if tech, swap it to eliminate.

A 'fresh' install of Win should never have any side effects from previous installations - that's the point of re-installation!

Don't you find it strange that your PC runs fine after an allotted time/crash?

HWInfo can log all temps/voltages and restore the details after a system crash. Use this program, stress your current setup and look for dips not spikes in voltage. If 12v lags - disconnect fans, if 5v lags disconnect usb devices.

Any intermittent fault that I have encountered has been solely down to the PSU.

Just how many more parts are you gonna swap out? The case?

Joking!

Aida64 has got system stability tests and logging as well - worth every penny.
 
Swap that SSD for a different drive. If you doubt any piece if tech, swap it to eliminate.

Yeah, that's going to be the next step I think. Disconnect that mofo and run off the HDD for a while.

A 'fresh' install of Win should never have any side effects from previous installations - that's the point of re-installation!

Poorly written, sorry. I meant that I did a fresh install and have slowly reinstalled the programs + drivers I need. Should get rid of any OS corruption, but I may have reinstalled the offending driver/SW.

Don't you find it strange that your PC runs fine after an allotted time/crash?

Totally! This is what makes me think it's temps... of something. Like it has a problem during the heat up, but not when it's 'hot'. Of everything I don't understand about these crashes, I don't understand this most of all. :confused: It's absolutely consistent too. It may take 10 mins to 2 hours for the crash (or it may not happen), but once it has crashed it has never crashed a second time. Ever. Even after another 8 hours of gaming.

HWInfo can log all temps/voltages and restore the details after a system crash. Use this program, stress your current setup and look for dips not spikes in voltage. If 12v lags - disconnect fans, if 5v lags disconnect usb devices.

Any intermittent fault that I have encountered has been solely down to the PSU.

Yeah, I've done this with both PSUs. The old one wasn't so great on the 12V line. The new one is an absolute champ no matter what the load. When I swapped the new one in the crashes stopped for about 2 weeks. Could it be the MB power connection? HWInfo et al don't record a dip in voltage but perhaps it happens too late to be recorded.

Just how many more parts are you gonna swap out? The case?

Joking!

Aida64 has got system stability tests and logging as well - worth every penny.

<gnashing of teeth>

I know, I know, look at what desperation has brought me to. At least so far all the parts (except 200 quid of video card) have been things I've needed to upgrade anyway. And the video card is staying because I still don't trust the old one with the dodgy 0 degrees temperature sensor issue.

I'm going to find the offending part, and when I do I'm going to ritually sacrifice it. :mad:

Thanks for your suggestions corTEC, I'll let you know how I get on.
 
Wow... You seem to have literally exhausted nearly everything! If you're confident in your psu then the problem must lie with the GPU/MOBO config/setup.

I take it that you have reset and updated AND re-flashed the BIOS? From what you have written I must also assume that you have Primed/IBT'd your system @ stock?

I would also re-flash the GPU BIOS. You have nothing to lose from this process. Most people prefer to flash at boot-time, but I have flashed from Windows scores of times and never had any issues.

Problems like this intrigue me, as a solution is often around the corner! Don't give up now!
 
I will accept your kind words only if we can get this fixed!

Have you Primed? Rule out separate hardware at stock settings.

I can, however, understand how you are coming to conclusions regarding temperature as the thermal dynamics of circuit boards can allow for small fluctuations in temperature and obviously dimensions as heat expands solid materials. If you did unfortunately happen to over-stress the PCB you could have broken some tracks, which when at cool temperatures operate fine, but crash under load. So, with this hypothesis in mind, the system should initially heat, then crash; if you would then allow the system to restore itself to ambient temperature it should, in theory, crash again as this was the starting (initial) environment which caused the crash in the first place.

If the above hypothesis, which I think you are getting at, is TRUE; the only alternative for a crash free system is to leave the machine active 24/7. Reduce all your desktop items to a bare minimum, make the desktop background black and never leave a window maximized, e.g. always on desktop. This will reduce running costs significantly.

I personally have my machine in my loft/attic and run an HDMI, network and USB 2.0 down through the wall into my living room and game that way. I have to leave my machine on constantly through winter as I cannot start the computer up if the temperature is too cold. A hot water bottle placed across the motherboard/RAM has helped before.
 
Primed = prime95 right? I have run this at stock with no issues, although the nature of the problem means I might have just got lucky.

Yes, the logic behind the one-crash-then-work-fine-until-another-day is that something seems to not like the transition from the ''off'' state to the ''high load'' state, but is happy once it has passed through that transition. Thermal expansion is the only mechanism I can think of that could cause this (but perhaps I'm not very imaginative). I am so curious as to what could cause this type of crash.

This weekend I've unplugged the SSD and have been running from a fairly lightweight install on the HDD. No crashes so far so I'll continue like this for a while. One crash will be enough to rule out the SSD and audio drivers (sound card still plugged in though). Unfortunately I get max one chance per day at provoking a crash so testing takes a while.
 
Maybe your MB dont like Sapphire HD 7950 s
If I was you when you bought another 7950 I would have gone for another
make in 7950
 
CPU isn't overclocked any more. I was running it overclocked but I get these crashes either way. Figured I should stock clock it while troubleshooting though.

Possible that Sapphire in particular is causing trouble with my mobo, although I haven't been able to find any evidence of this online.
 
Possible that Sapphire in particular is causing trouble with my mobo said:
Yes it is possible! May around 10 years ago I remember Gigabyte MB was killing ATI1900 (not 100% sure but arouind that time)cards. I brought mine into *** and they killed mine. Luckily It was still under warranty.

I would buy the cheapest MB you can and test.
 
Back
Top Bottom