Lost a Bigadv WU yesterday

Man of Honour
Man of Honour
Joined
27 Apr 2004
Posts
107,330
Location
In bed with your sister
.... at 84% :(

All I did was restart the client with the -oneunit flag as I wanted to do something with that rig once the WU had finished. It had been running fine for nearly 2 days and only had 6 hours to go. When I restarted the client, all seemed to be well until it tried to start from the checkpoint:

Code:
# Linux SMP Console Edition ################################################### 
############################################################################### 

Folding@Home Client Version 6.29 

http://folding.stanford.edu 

############################################################################### 
############################################################################### 

Launch directory: /usr/local/fah 
Executable: ./fah6 
Arguments: -bigadv -oneunit -verbosity 7 -smp 8 

[07:01:36] - Ask before connecting: No 
[07:01:36] - User name: Bigstan (Team 10) 
[07:01:36] - User ID: 729C5FBF3B4B8B62 
[07:01:36] - Machine ID: 1 
[07:01:36] 
[07:01:36] Loaded queue successfully. 
[07:01:36] 
[07:01:36] + Processing work unit 
[07:01:36] Core required: FahCore_a2.exe 
[07:01:36] - Autosending finished units... [07:01:36] 
[07:01:36] Trying to send all finished work units 
[07:01:36] + No unsent completed units remaining. 
[07:01:36] - Autosend completed 
[07:01:36] Core found. 
[07:01:37] Working on queue slot 09 [April 9 07:01:37 UTC] 
[07:01:37] + Working ... 
[07:01:38] 
[07:01:38] *------------------------------* 
[07:01:38] Folding@Home Gromacs SMP Core 
[07:01:38] Version 2.10 (Sun Aug 30 03:43:28 CEST 2009) 
[07:01:38] 
[07:01:38] Preparing to commence simulation 
[07:01:38] - Ensuring status. Please wait- Files status OK 
[07:01:47] - Expanded 30235850 -> 159270593 (decompressed 100.6 percent) 
[07:01:47] 
[07:01:47] - Files status OK 
[07:01:59] teArray: compressed_data_size=30235850 data_size=159270593, decompressed_data_size=159270593 diff=0 
[07:02:00] - Digital signature verified 
[07:02:00] 
[07:02:00] Project: 2683 (Run 8, Clone 6, Gen 50) 
[07:02:00] 
[07:02:17] Assembly optimizations on if available. 
[07:02:17] Entering M.D. 
[07:02:21] (Run 8, Clone 6, Gen 50) 
[07:02:21] 
[07:02:21] Entering M.D. 
[07:02:27] Using Gromacs checkpoints 
[07:02:50] Resuming from checkpoint 
[07:02:53] 
[07:02:53] Folding@home Core Shutdown: INTERRUPTED 
[07:02:53] e=20 
[07:02:53] fcCheckPointResume: failure in call to fcSaveRestoreState() to restore cpt hash. 
[07:03:01] CoreStatus = FF (255) 
[07:03:01] Sending work to server 
[07:03:01] Project: 2683 (Run 8, Clone 6, Gen 50) 
[07:03:01] - Error: Could not get length of results file work/wuresults_09.dat 
[07:03:01] - Error: Could not read unit 09 file. Removing from queue. 
[07:03:01] Trying to send all finished work units 
[07:03:01] + No unsent completed units remaining. 
[07:03:01] + -oneunit flag given and have now finished a unit. Exiting.- Preparing to get new work unit... 
[07:03:01] Cleaning up work directory 
[07:03:01] + Attempting to get work packet 
[07:03:01] Passkey found 
[07:03:01] ***** Got a SIGTERM signal (15) 

Folding@Home Client Shutdown.

That'll teach me. In future, if I intend restarting the client for whatever reason, I'll be making a copy of the directory first so I can hopefully resume from where I left off in case it all goes breasts up.

Wouldn't have been so bad if it was near the start but it's really galling to lose the WU after 43 hours of work :(

Bit sad but, that's the risk with beta clients I suppose (I'm trying to be stoic and brave about it when all I really want to do is sit in the corner and blub like a wee lassie :p).
 
I used to get that SaveRestoreState error every now and then with WUs on the a2 core. Think it's an obscure bug in the core :( Stanford really need to get all SMP stuff moved over to a3, pronto.
 
Back
Top Bottom