Folding@Home Client Restart Script

miniyazz · 15 May 2010 at 15:44

SiriusB said:
The script was designed with console folding clients in mind. So each script would kill the processes in the configured folder and then restart the client from the same folder.

I could modify the script [or show you how to modify it] in order to make it use your .exe to start the client. The only issue I can see is will that .exe try to start all of your GPU clients, even if you are only restarting one of them?

Indeed, and would it try to kill all of them?
Either way, the configured folder (the one with the client.cfg etc) is different from the folder containing the GPU startup .exe.

Thanks for the offer, I'll bear it in mind. I think what I may try now though is triplicating the folding @home.exe, one for each GPU, putting each one in the configured folder, and adjusting the path my shortcuts point to, as it shouldn't take long and may even work!

SiriusB · 15 May 2010 at 15:56

In order to avoid killing everything, the script only kills processes that are running in the configured client folder. So if you used GPU1 for the script, it would only kill processes running from C:\FaH\GPU1. Everything else would be safe.

Copying the starting exe to each folder would work just fine, all you would have to do is change the below line to point to your startup .exe instead of the gpu.exe or whatever it is called.

Code:

$client_name = "fah6.exe" #change client.exe name as appropriate, quotes are required

However, I am not sure why you would need the startup exe at all if you are going to start individual clients using the script. Could you not simply have it point to the client.exe already in the folder?

miniyazz · 15 May 2010 at 16:13

Sorry, slightly crossed wires

My startup exe was the gpu.exe. For some reason, probably a remnant from when I was only running a single GPU client, I installed the GPU-folding program in Program Files. This created a work directory somewhere in my Users\Application Data directory but left the gpu.exe in the C:\Program Files folder. But it worked fine with a shortcut with "Target" pointing to the gpu.exe and "Start in" pointing to the work directory.
Then, when I added more GPUs and started Windows SMP, I created a dedicated folding folder (C:\FAHSMP) and simply moved the work directories there, leaving the gpu.exe in its original folder and just changing the shortcut paths.

So I had three shortcuts each pointing to the same gpu.exe but starting it with different parameters, in different work directories!
Anyway I've just changed that so now I have a gpu.exe in each work directory so I shouldn't have any more problems with this script!

Unfortunately it doesn't look like my current GPU problem can be resolved so easily. I'm getting a message whenever I try to start one of them saying "Folding@home has run into a serious error running the core. and will shutdown."[sic] and the log file describes a "client-core communications error":

Code:

Launch directory: C:\FAHSMP\Folding@home-gpu3
Arguments: -gpu 2 -forcegpu nvidia_g80 

[10:51:17] - Ask before connecting: No
[10:51:17] - User name: miniyazz (Team 10)
[10:51:17] - User ID: 650E5337664DD8B3
[10:51:17] - Machine ID: 4
[10:51:17] 
[10:51:17] Work directory not found. Creating...
[10:51:17] Could not open work queue, generating new queue...
[10:51:18] Initialization complete
[10:51:18] - Preparing to get new work unit...
[10:51:18] + Attempting to get work packet
[10:51:18] - Connecting to assignment server
[10:51:19] - Successful: assigned to (171.64.65.71).
[10:51:19] + News From Folding@Home: Welcome to Folding@Home
[10:51:19] Loaded queue successfully.
[10:51:21] + Closed connections
[10:51:21] 
[10:51:21] + Processing work unit
[10:51:21] Core required: FahCore_11.exe
[10:51:21] Core found.
[10:51:21] Working on queue slot 01 [May 15 10:51:21 UTC]
[10:51:21] + Working ...
[10:51:26] CoreStatus = C0000135 (-1073741515)
[10:51:26] Client-core communications error: ERROR 0xc0000135
[10:51:26] This is a sign of more serious problems, shutting down.

I suspect a system restart may actually fix it, but I can't try that out for some time as I'm backing up lots of files and can't interrupt it.

miniyazz · 15 May 2010 at 16:17

Hmmm:

C0000135
CoreStatus = C0000135 (-1073741515)
Client-core communications error: ERROR 0xc0000135
This is a sign of more serious problems, shutting down.

Error C0000135 is a Windows error which means it was unable to locate a component. It could be an installation error if the .dll files used by FAH are not where they are supposed to be. Also, you may have been infected by a virus, which was partially removed.

it appears I deleted cudart.dll in my earlier faffings!

miniyazz · 15 May 2010 at 16:38

Well, the script works fine, thanks. Unfortunately doesn't solve the "unstable machine" error (even underclocked as far as MSI Afterburner will allow), but then I didn't really expect it to!

SiriusB · 15 May 2010 at 17:17

If it solved the Unstable Machine error I would still be folding on my GPUs!

miniyazz · 15 May 2010 at 23:36

SiriusB said:
If it solved the Unstable Machine error I would still be folding on my GPUs!

Hmmf I feel your pain. My 9800 resolved itself when I ran it without -gpu 2 or -forcegpu (had to extend windows desktop, ofc) but now my 8800 is doing the same thing, and nothing seems to be sorting it. Is this a problem with Stanford, or something else?

SiriusB · 15 May 2010 at 23:43

My current theory is the fact I run my GPU clients on SBS 2008. I suspect there is something not playing nice between CUDA, the OS and the client. Stanford are about as much use as a condom machine in the Vatican when it comes to finding the source of these kinds of errors.

There is the case for a hardware problem, but both cards show the same symptoms and they ran fine under Windows XP and 7. They also ran just fine under SBS 2008 for weeks before they started acting up with certain WUs.

You could try to use the FAH_GPU_IDLE environment variable to slow down the GPU a little and see if that helps. This worked for me for a while [which keeps a possible hardware issue niggling]. I would start at 20 and slowly decrease it as low as possible.

You can set the environment variable by going to System Properties and Advanced.

miniyazz · 16 May 2010 at 01:00

Well I restarted (actually it crashed, still finetuning my o/c from new mobo) and all clients started working automatically with no issues, on their normal overclocks.
:confused:

SiriusB · 16 May 2010 at 01:08

Have you monitored what Work Units they are crunching? My errors were with specific WUs. Perhaps you bagged some you can crunch?

miniyazz · 16 May 2010 at 01:41

All I'm getting is P10103 (548 points). Unless there's significant variance within a project, nope.