GPU woes - Happy ending?

Associate
Joined
20 May 2006
Posts
1,029
Location
London
Hi Guys - hasn't been a great start to the year for me so I've been a bit quiet here.
Apologies to the heyes for stomping without a parp:o
So here's the story. Lots of EUE's & WU crashes due to "unstable machine".
Screen locks up & I set the card back to defaults but still get same problems.
Then Vista reboots PC & say it is due to a graphics driver problem which it has fixed.[I was on 178.??] So see there is a new one out & try that [181.22] & thought it might be slower but worth it for stability. No luck:(
Then discovered that there was new version of GPU folding client 6.23 [I was using 6.2]. Again after a promising start same problems.
Then early this a.m. 'cause I couldn't sleep I noticed card was running @ 69C with fan set at 66% in Riva Tuner. Then I thought I haven't used it to o/c but as it is trying to speed the fan up from default @30% maybe there is a conflict with the new driver. So I ticked the box to allow o/cing & then the button to set defaults. 2 WU have finished since then all OK:) & 3rd is at 36%
Hence my output has been shot over the last week:(

Just wondering if the first person to notice there is a new client version out
or new drivers could post an alert here? [Or did they & I missed it]

Happy Groundhog Day for tomorrow [Looks like we have 6 more weeks of Winter:(]
All the best to my fellow team members. Fold on Team 10
 
Update - after reinstalling the driver things seemed to settle down but today just loads of EUE's & nans stopping the run "unstable machine" so keep getting "pausing for 24hrs":( Mostly with 511 & 353 but now even with the 384 pointers.
Could I have damged the card by o/cing? Seemed to be stable doing most things for over 2 months inc FAH. No artifacts or screen freezing.
 
I'm only getting about 2 in 10 Wu to complete:(
Either EUE's or unstable machine. What does the later mean if the card in not o'ced? Will try & run 3D MArk to see if that is OK.
 
Have you got the latest core revision (1.19)? If the EUE rate is that high i would have though something could be quite wrong - i have not seen a gfx EUE for a while now. I would try whacking the fan speed up to 100% and see if its temperature related (the new 511 pointers get the cards very hot). Temps are known for casuing instability in the gfx clients, and a partially clogged fan/little airflow in the case would make this a lot worse.

I guess it could also be a power issue if your PSU is a little old or not quite up to the job (again, the 511's, which are everywhere at the moment, draw more power). I guess your cpu etc is overclocked too? Maybe try backing off the clocks a bit - for me, the 353s seem to be more cpu intensive, so instability there could cause the client to crash.

Other than that, i guess its just drivers/re-installation really. Personally, im using 180.60 CUDA drivers, and they seem fine to me, athough you say you have already tried this? I would try and rule out the temperature first, but if it artifacts in 3Dmark etc, i think it might be RMA time.
 
Thanks for the reply Chrissy. Yes got core 1.19. Temp is 65C with Riva tuner running the fan at 66% [ used to run at 71C @ stock 30%].
PSU is Enermax liberty 620W & is about 2 years old.
Ran 3D Mark, 2003,2005 & 6 & all completed OK.
Using 181.20 drivers [have reinstalled them].
Thought! when I upgraded to FAH GPU 6.23 I didn't uninstall previous version. Could this be a source of problems?:confused:
100% fan temp down to 61C. Considered getting to drop temps the Artic Cooling Acelero.
Could it be that my 8800GT 256MB hasn't got enough RAM to cope with the 511 & 353's? [ althought the 384 pointers sometimes crash now as well].
CPU is o/ced but has been rock solid 9550 @ 3529MHz.
Have had about 3 times error message display adapter stopped working - windows has fixed the problem.
 
RAM should not be an issue - it may make them go slower, but not kill them. The 353 pointers dont use too much RAM/shader power etc (or at least is presume since they run very quickly). I would try returnign to stock settings and completely re-installing the clients and seeing if that makes a difference. Perhaps try the 180.60 drivers too? Other than that im not too sure... sorry i cannot be of more help.
 
The graphics card ran like a dream for many months no more problems than others esp with the 480 pointers. My issues started above 2 weeks ago.
The card is no longer o/ced. I have just uninstalled & reinstalled the client hopefully that will do the trick. I will try the 180 drivers next if I get the problem back.
Thanks for your help & advice, Chrissy.
 
Might be worth swapping your main memory out for a single module that you know works or running a memtest on your RAM?

Most graphics drivers use a portion of system memory so if that's dodgy it can wreak havoc.

I dont know what the chances are of it just suddenly dieing however.

I just know that I used to get the Windows has fixed the problem message for the nvdl or something file and different memory has since stopped it.
 
Right - reduced the o/c by 130MHz & RAM now running at stock speed.
Still getting unstable machine overnight & EUE limit reached.
Now a 384 is at 51% & OK [no reboot].
Have had "display driver stopped working" again despit installing CUDA drivers 180.6 yesterday. This seems to be a problem for a lot of people but no suggestions as to the solution. I'm going to try an uninstall & prevent Vista from reinstalling automatically.
I suspect the driver rather than the RAM as about the time I started getting the prob. with FAH I had started geting screen lag & some artifact when the card was o/ced, now its not.
I'm going to take Riva tuner off & see if that helps as well.
 
If you got an image of the disk you could wipe it, install Vista, the graphics drivers and fah and see if on a completely clean install it still does it? If it does it's gotta be a hardware issue.

Could try it with all the various driver combinations.

If it fixes it you might be able to restore the image?
 
Sadly no drive image.
Card at 79C & has completed a WU but then EUE'd/unstable machine tryimg to get new WU until 24hr limit reached. So it's not temp & its not Riva Tuner. Could be driver?
Forgot to mention BOINC running SETI, Rosetta & Climate prediction without any problems.
Looked at Folding Forum & there seems to be big problems for lots of people with 57xx WU's. Pande Group looking into it.
 
Hi Guys - I've upped the volts to the CPU & NB slightly[0.625 & 0.02] & that appears to have cured things. However, I have had a run of 384 pts which have always been the most stable. Just about to finish a 353.
Tomorrow I will drop the CPU volts back down as I believe the GPU client hardly uses the CPU so I would be more inclince to suspect the RAM needed help.
Thanks for the input & suggestions.
 
Well, after a good run of 24hrs with no EUE's or unstable machine the driver stopped working trying to play video & I was back to square 1. Today while watching video the whole screen locked up. That's after a reboot & switching to latest drivers.:(
Could it be my graphics card as well as some carp WU's?
 
I had exactly this on a 9600 on a stock machine about a week ago. Playing with the bios did it no good whatsoever, nor did reinstalling folding. Reinstalled the operating system and it picked up and ran fine. Confuses the hell out of me, but at least its folding again
 
Shame about the image, makes life so easy.

Vista Ultimate has inbuilt drive imaging and backup. I did a 70gb drive image over the network in about 60 minutes.

Sounds like a clean install might be necessary anyway. And for backup: 1TB external hard disk for £86.24 this week only ;)
 
Back
Top Bottom