Arrrgh!

Soldato
Joined
16 Dec 2005
Posts
14,443
Location
Manchester
It has happened again! Another 1760 pointer just stopping of its own accord. No wonder my PPD sucks when I keep losing hours and hours of folding time!

This one randomly stopped at 1% at about 4 am and I have only just noticed! I should probably check my machine as soon as I get up but most mornings I can barely be bothered crawling out of bed nevermind checking my computer!

I am having a look at the folding forums now to see if its an issue with the WUs. Seems odd it has happened to only the two 1760 pointers I have had in the last week.
 
Linux SMP Natively.

I have been running the Linux client pretty much since it was launched and this is the first time I have ever had any trouble. Don't understand it.

Absolutely nothing on my machine has changed and both incidents occurred while I was asleep so the interrupts aren't caused by anything I am doing.

The strange thing is the WU carries on from where it left off when I restart the client. It is almost as if something is stopping the client. If it was a bad WU I would expect an EUE or something.
 
That is strange... my first SMP Linux has been running fine with no loss at all *Stelly touches wood* I have 2 set up at the moment and have never *Stelly touches wood again* had that kind of problem...

Stelly
 
That's exactly what was happening to my [SIR] ID last week... (well one of the three problems anyway)

Was your internet connected - did think it was related to ubuntu updates, notification messages. (but could have been just coinsidence)

It would do between 2 and 10 steps of a wu then just hang (Fah running but at 0%)... restarting the client would start it from the last step - no error message in log, but at ~20 steps at day I missed the deadlines.

After installing EE server again via VM) FF desktop been working fine. :confused:

Edit: don't think it was specific to WU but i had at lease one 1760 do it
 
Last edited:
It could be an update issue but I am not convinced. My Ubuntu is set only to tell me about updates, it doesn't do it on its own. Also I have never come across a problem before and I haven't changed how I do updates.

So far it has only done it once during each of the two WUs affected.

To be honest I can't find a single reason why this would happen. I will wait and see if it occurs again. In the mean time I am attempting to write a script that will periodically check on the status of my client and if it has stopped tell the lazy git to restart! :p
 
I'm a bit of a noob with when comes to Linux, but I've had no problems with any of my Linux SMPs either Native or VMware (touch head - nearest piece of solid wood :) ). None of my machines are as overclocked as yours though (my native Linux C2D, runnning at a leisurely 2.99Ghz, is my only overclocked machine). The fact its only just started doing this might possibly indicate a developing hardware fault, possibly with the memory. Just a thought.

ms9cw
 
If it was a hardware fault the WUs are far more likely to just EUE. Not politely stop. Since it has happened twice with no loss of WUs I find it hard to believe it is a hardware issue.
 
Same thing happened to one of my 1760's last night. Just borked out for no apparent reason.

This was the Mac OS X console client. The plot thickens... woooo! ;)
 
I had a 2610 chuck itself down the pan at 25% last night. I'm more concerned about the one that was successfully uploaded a few weeks ago and Stanford then lost.
 
Just a little update. I have almost finished that script I said I was writing. It isn't quite finished and is a tad rough around the edges.

I will be finishing it tonight, just having a rest from it and having food. I will test it as much as possible to make sure it doesn't break anything and once happy will make it avaliable to anyone who wants it.

The script basically uses the "folding" script that comes with a finstall installation to check on the client's status. It also checks on the amount of CPU time being used [as sometimes "./folding status" can say everything is great and the client isn't actually doing anything]. I am undecided whether to have the script on a big loop or to set it to run once but schedule it with cron or at. I should point out it is for Linux only and only if people installed their client using finstall. so if you run WinSMP tough. :p

If uncle_fungus sees this thread I wouldn't mind your opinion of the script once I am done.
 
Are you getting any error message or does it just stop?
I've had to turn off "connect without asking" as I was loosing WU that had stalled/stopped withe error as it would dsicard the WU & download a new one.

I hope you find the problem.
 
No errors or anything, it just stops with a code 15 - which is what you get if you run "./folding stop" [if you have a finstall installation].

Don't understand it. If it errored out I would at least have a clue lol.
 
Pilgrim57 said:
Are you getting any error message or does it just stop?

I got a weird error message about my MAC address. I thought it was something else (I was half asleep when I ok'ed it) but I'm starting to think it was the client monitoring software I use.
 
I am going to quote myself:

SiriusB said:

It did it again! Luckily I only lost another few hours, but this is getting annoying. Unfortunately I don't have my script up and running just yet so couldn't catch it :(

I have looked in my Ubuntu's system logs under messages and apparantly it is a segfault. Specifically:

Jun 11 22:08:09 windwardx kernel: [778947.161063] FahCore_a1.exe[21657]: segfault at 0000000000006e5b rip 000000000089ada0 rsp 00000000407fde38 error 6

Any ideas?

I doubt it is stability as this problem has only arisen recently after months with no problems. I haven't done anything to it. Could it just be bad WUs or in the very least very touchy ones?
 
I had a native linux box (OC'd) which started giving segfaults shortly before the HD ate itself (not permanently). One reformat re-install and later and re-installed folding (not OC'd) it has chugged away perfectly ever since.

Not sure if the OC was totally responsible. May want to try a re-install though and see how it goes.

Dunc
 
Hmm don't fancy a reinstall!

I have yet to see it happen on anything else but 1760 pointers so I am going to give it a few days and see what happens.

@uncle_fungus

I have a working script which I am now leaving to run overnight as a test. Currently I have forced it in a big cycle:

Code:
while true
do
     #code
sleep 1h
done

I would prefer it as a cron job for two reasons. First one is basically more control. Secondly the client will stop if you terminate the script using Ctrl+C - don't know why lol.

I will post the full script in the morning when I am reasonably sure nothing has gone to pot.
 
www.aeternum.co.uk/fahcheck-0.2

@uncle_fungus:

OK here is the script.

It shouldn't be hard to follow as I have done plenty of commenting [possibly more than is really needed - better than not enough though]. If you plan to run it I would suggest removing the while loop first so you don't have to Ctrl+C it.

Normal behavior for the script is for it to output the result of
Code:
./folding status
and then print out the exit code for the command.

If the client is reported as not running then the client will be restarted.

If the client is reported as running it checks the CPU load. After a pause of precisely 5minutes the client will print out the total CPU load [anything between 0 and 400] and a message as to whether or not the client is running. If the client isn't running the script runs
Code:
./folding stop
./folding start
.

@everyone else:

I wouldn't use the script unless you are happy with what is effectively a piece of BETA software written by me :D

It will only work if you have installed your SMP client using finstall.
 
I'd like to say nice work SB - but I've no idea what it means (let alone what to do with it)

Spoon fed linux is hard enough for me (:rolleyes: at self) but you already know that :D
 
Back
Top Bottom