UNSTABLE_MACHINE

Associate
Joined
28 Apr 2011
Posts
2,158
Location
North West
I've been getting a lot of failed WU recently and just wondered what this is due to?

I've got my CPU overclocked and thought that this might be the case. However I've been running this OC for about 8 months and haven't had a bsod, also been non-stop folding during this time. Only recently have I been getting these UNSTABLE_MACHINE errors and it seems to occur when I run MW@H in parallel with F@H.

Anyone got any ideas? Event viewer doesn't show anything.


15:54:19:WU00:FS00:0xa4:Completed 80000 out of 500000 steps (16%)
15:56:33:WU00:FS00:0xa4:Completed 85000 out of 500000 steps (17%)
15:59:18:WU00:FS00:0xa4:Completed 90000 out of 500000 steps (18%)
16:02:29:WU00:FS00:0xa4:Completed 95000 out of 500000 steps (19%)
16:04:15:WU00:FS00:0xa4:Completed 100000 out of 500000 steps (20%)
16:04:42:WU00:FS00:0xa4:mdrun returned 255
16:04:42:WU00:FS00:0xa4:Going to send back what have done -- stepsTotalG=500000
16:04:42:WU00:FS00:0xa4:Work fraction=0.2022 steps=500000.
16:04:46:WU00:FS00:0xa4:logfile size=13648 infoLength=13648 edr=0 trr=25
16:04:46:WU00:FS00:0xa4:logfile size: 13648 info=13648 bed=0 hdr=25
16:04:46:WU00:FS00:0xa4:- Writing 14186 bytes of core data to disk...
16:04:46:WU00:FS00:0xa4:Done: 13674 -> 4750 (compressed to 34.7 percent)
16:04:46:WU00:FS00:0xa4: ... Done.
16:04:46:WU00:FS00:0xa4:
16:04:46:WU00:FS00:0xa4:Folding@home Core Shutdown: UNSTABLE_MACHINE
16:04:47:WARNING:WU00:FS00:FahCore returned: UNSTABLE_MACHINE (122 = 0x7a)
16:04:47:WU00:FS00:Sending unit results: id:00 state:SEND error:FAULTY project:8076 run:14 clone:12 gen:5 core:0xa4 unit:0x000000066652edcc5121c92e13c23cf4
16:04:47:WU00:FS00:Uploading 5.14KiB to 171.67.108.60
16:04:47:WU00:FS00:Connecting to 171.67.108.60:8080
16:04:47:WU01:FS00:Connecting to assign3.stanford.edu:8080
16:04:47:WARNING:WU01:FS00:Failed to get assignment from 'assign3.stanford.edu:8080': Could not get IP address for assign3.stanford.edu: This is usually a temporary error during hostname resolution and means that the local server did not receive a response from an authoritative server.
16:04:47:WU01:FS00:Connecting to assign4.stanford.edu:80
16:04:47:WU00:FS00:Upload complete
16:04:48:WU00:FS00:Server responded WORK_ACK (400)
16:04:48:WU00:FS00:Cleaning up
16:04:48:WU01:FS00:News: Welcome to Folding@Home
 
Are MW@H and F@H both trying to use the same CPU / RAM at the same time. They will after all have a similar idle use priority. I presume you've happliy been running just one of these distributed computing things for a while so only now adding in the second one creates the problem? Remember the BOINC client is designed to run different projects at once sharing resources, but only with other BOINC stuff. F@H and BOINC will not know that the other one is there?
 
Yeah I've only recently started using BOINC for my 7970 folding. Been folding for about a year on this rig.

Not sure about using same ram/cpu. When BOINC runs without F@H the cpu idles around 0-1% so it's not like they're fighting for cpu resources. I did set priority cpu for boinc since it sometimes gets throttled when F@H is in use. i.e the MW@H wu take about 30% longer to complete when F@H is running, even though it doesn't use much cpu.
 
Just had a [email protected] has stopped working. Been getting them quite recently also. not running BOINC atm, so don't really know what the problem is.

Here's the log

19:07:41:WU01:FS00:Server responded WORK_ACK (400)
19:07:41:WU01:FS00:Final credit estimate, 1800.00 points
19:07:41:WU01:FS00:Cleaning up
19:10:26:WU00:FS00:0xa4:Completed 5000 out of 500000 steps (1%)
19:11:49:WU00:FS00:0xa4:Completed 10000 out of 500000 steps (2%)
19:13:05:WU00:FS00:0xa4:Completed 15000 out of 500000 steps (3%)
19:14:22:WU00:FS00:0xa4:Completed 20000 out of 500000 steps (4%)
19:15:37:WU00:FS00:0xa4:Completed 25000 out of 500000 steps (5%)
19:16:52:WU00:FS00:0xa4:Completed 30000 out of 500000 steps (6%)
19:18:07:WU00:FS00:0xa4:Completed 35000 out of 500000 steps (7%)
19:19:22:WU00:FS00:0xa4:Completed 40000 out of 500000 steps (8%)
19:20:37:WU00:FS00:0xa4:Completed 45000 out of 500000 steps (9%)
19:21:52:WU00:FS00:0xa4:Completed 50000 out of 500000 steps (10%)
19:34:28:WARNING:WU00:FS00:FahCore returned an unknown error code which probably indicates that it crashed
19:34:28:WARNING:WU00:FS00:FahCore returned: UNKNOWN_ENUM (-1073741783 = 0xc0000029)
19:34:28:WU00:FS00:Starting
19:34:28:WU00:FS00:Running FahCore: "C:\Program Files (x86)\FAHClient/FAHCoreWrapper.exe" C:/Users/Danny/AppData/Roaming/FAHClient/cores/www.stanford.edu/~pande/Win32/AMD64/Core_a4.fah/FahCore_a4.exe -dir 00 -suffix 01 -version 703 -lifeline 2284 -checkpoint 15 -np 4
19:34:28:WU00:FS00:Started FahCore on PID 1420
19:34:28:WU00:FS00:Core PID:3528
19:34:28:WU00:FS00:FahCore 0xa4 started
19:34:29:WU00:FS00:0xa4:
19:34:29:WU00:FS00:0xa4:*------------------------------*
19:34:29:WU00:FS00:0xa4:Folding@Home Gromacs GB Core
19:34:29:WU00:FS00:0xa4:Version 2.27 (Dec. 15, 2010)
19:34:29:WU00:FS00:0xa4:
19:34:29:WU00:FS00:0xa4:Preparing to commence simulation
19:34:29:WU00:FS00:0xa4:- Ensuring status. Please wait.
19:34:38:WU00:FS00:0xa4:- Looking at optimizations...
19:34:38:WU00:FS00:0xa4:- Working with standard loops on this execution.
19:34:38:WU00:FS00:0xa4:- Previous termination of core was improper.
19:34:38:WU00:FS00:0xa4:- Files status OK
19:34:38:WU00:FS00:0xa4:- Expanded 645519 -> 1748880 (decompressed 270.9 percent)
19:34:38:WU00:FS00:0xa4:Called DecompressByteArray: compressed_data_size=645519 data_size=1748880, decompressed_data_size=1748880 diff=0
19:34:38:WU00:FS00:0xa4:- Digital signature verified
19:34:38:WU00:FS00:0xa4:
19:34:38:WU00:FS00:0xa4:Project: 8080 (Run 69, Clone 32, Gen 6)
19:34:38:WU00:FS00:0xa4:
19:34:38:WU00:FS00:0xa4:Entering M.D.
19:34:44:WU00:FS00:0xa4:Using Gromacs checkpoints
19:34:44:WU00:FS00:0xa4:Mapping NT from 4 to 4
19:34:44:WU00:FS00:0xa4:Resuming from checkpoint
19:34:44:WU00:FS00:0xa4:Verified 00/wudata_01.log
19:34:44:WU00:FS00:0xa4:Verified 00/wudata_01.trr
19:34:44:WU00:FS00:0xa4:Verified 00/wudata_01.xtc
19:34:44:WU00:FS00:0xa4:Verified 00/wudata_01.edr
19:34:44:WU00:FS00:0xa4:Completed 52050 out of 500000 steps (10%)
19:35:33:WU00:FS00:0xa4:Completed 55000 out of 500000 steps (11%)
 
Put my cpu back to stock freq/volts 3.4Ghz, 3.8Ghz turbo and this just happened

FdUUbkP.png

I kind of suspected cpu wasn't the issue, as my OC as been solid for almost a year of folding. so was unsure it suddenly developed a fault.

Could this be a RAM issue? my settings are 1600Mhz, 8-8-8-15 1T 1.3v. But these have been used for quite a while also and no other issues to complain of.

What else could it be? I don't have ANY other desktop issues, my system is speedy, never crashes or lags and no OS issues..

I've also reinstalled F@H client a few times in past few days.

halp?
 
Might be worth running a memtest on it and see what that throws back.

Presumably your mobo or other components are sufficiently cool? Also run a smart check and block check on your hard drive could be that
 
Think my SSD is fine, no issues there.

Just ran a Super PI test to check the memory (don't have a usb for memtest)

CzJbEhd.png

memory seems ok.

Also my system is pretty cool. CPU hits low 70 mid 60s when folding and prime95ing.

also it's not like it's particular WUs that cause the crash, they're happening on long ones and short ones.
 
Latest drivers for everything? Might be worth using the Intel driver update utility to make sure everything is good.

Also you've not got an ocz ssd with the sand force chip which needed a firmware update to fix?
 
Could anything else be interfering with f@h such as antivirus scans?

If you get that pop up window for f@h has stopped working and click on the "details" option, what does that say?
 
Back
Top Bottom