SMP not playing nicely

Soldato
Joined
31 May 2006
Posts
7,564
Location
West London
Stable smp client on my office workstation

Code:
[08:43:52] + Attempting to send results
[08:43:52] - Reading file work/wuresults_06.dat from core
[08:43:52]   (Read 5539753 bytes from disk)
[08:48:35] - Uploaded at ~19 kB/s
[08:48:35] - Averaged speed for that direction ~25 kB/s
[08:48:35] + Results successfully sent
[08:48:35] Thank you for your contribution to Folding@Home.
[08:48:35] + Number of Units Completed: 151

Looping batch file gets me to here - but can't connect so I Ctrl+C the window

Code:
# SMP Client ##################################################################
###############################################################################

                       Folding@Home Client Version 5.92beta

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Folding@home\WinSMP
Executable: C:\Folding@home\WinSMP\fah.exe
Arguments: -oneunit -verbosity 9 -local 

[08:50:40] - Ask before connecting: No
[08:50:40] - Use IE connection settings: Yes
[08:50:40] - User name: lemonman (Team 10)
[08:50:40] - User ID: 1C4657B37FC384AA
[08:50:40] - Machine ID: 1
[08:50:40] 
[08:50:40] Loaded queue successfully.
[08:50:40] - Preparing to get new work unit...
[08:50:40] - Autosending finished units...
[08:50:40] + Attempting to get work packet
[08:50:40] Trying to send all finished work units
[08:50:40] - Will indicate memory of 2046 MB
[08:50:40] + No unsent completed units remaining.
[08:50:40] - Autosend completed
[08:50:40] - Connecting to assignment server
[08:50:41] - Successful: assigned to (171.64.65.64).
[08:50:41] + News From Folding@Home: Welcome to Folding@Home
[08:50:41] Loaded queue successfully.
[08:50:45] - Receiving payload (expected size: 2433175)
[08:51:46] - Downloaded at ~38 kB/s
[08:51:46] - Averaged speed for that direction ~92 kB/s
[08:51:46] + Received work.
[08:51:46] + Closed connections
[08:51:46] 
[08:51:46] + Processing work unit
[08:51:46] Core required: FahCore_a1.exe
[08:51:46] Core found.
[08:51:46] Working on Unit 07 [May 19 08:51:46]
[08:51:46] + Working ...
[08:51:46] - Calling 'mpiexec -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -np 4 FahCore_a1.exe -dir work/ -suffix 07 -checkpoint 10 -verbose -lifeline 2448 -version 592'

[08:51:47] 
[08:51:47] *------------------------------*
[08:51:47] Folding@Home Gromacs SMP Core
[08:51:47] Version 1.76 (February 23, 2008)
[08:51:47] 
[08:51:47] Preparing to commence simulation
[08:51:47] - Looking at optimizations...
[08:51:47] - Created dyn
[08:51:47] - Files status OK
[08:51:47]  this execution.
[08:51:47] - Files status OK
[08:51:47] les status OK
[08:51:58] 2663 -> 12862801 (decompressed 528.7 percent)
[08:51:58] 28.7 percent)
[08:51:58] 3 (Run 19, Clone 107, Gen 46)
[08:51:58] 
[08:51:58] :  check for stray files
[08:51:58] - Starting from initial work packet
[08:51:58] 
[08:51:58] Project: 2653 (Run 19, Clone 107, Gen 46)
[08:51:58] 
[08:51:59] Entering M.D.
[08:52:07] Protein: Protein in POPC
[08:52:07] Writing local files
[08:52:09] Extra SSE boost OK.
[08:52:10] e.
[08:52:10] logfile size:Gromacs cannot continue further.
[08:52:10] Going to send back  ... Done.
[08:52:10] - Failed to delete work/wudata_07.arc
[08:52:10] No C.P. to delete.
[08:52:10] Warning:  check for stray files
[08:52:10] .P. to delete.
[08:52:10] Warning:  check for stray files
[08:52:10] .xtc
[08:52:10] No C.P. to delete.
[08:52:10] Warning:  check for stray files
[08:52:10] 
[08:52:10] Folding@home Core Shutdown: EARLY_UNIT_END
[08:52:10] Finalizing output
[08:54:14] CoreStatus = 63 (99)
[08:54:14] + Error starting Folding@Home core or unexpected system termination of core.
[08:54:19] 
[08:54:19] + Processing work unit
[08:54:19] Core required: FahCore_a1.exe
[08:54:19] Core found.
[08:54:19] Working on Unit 07 [May 19 08:54:19]
[08:54:19] + Working ...
[08:54:19] - Calling 'mpiexec -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -np 4 FahCore_a1.exe -dir work/ -suffix 07 -checkpoint 10 -verbose -lifeline 2448 -version 592'

[08:54:21] 
[08:54:21] *------------------------------*
[08:54:21] Folding@Home Gromacs SMP Core
[08:54:21] Version 1.76 (February 23, 2008)
[08:54:21] 
[08:54:21] Preparing to commence simulation
[08:54:21] - Ensuring status. Please wait.
[08:54:38] - Looking at optimizations...
[08:54:38] - Working with standard loops on this execution.
[08:54:38] - Previous termination of core was improper.
[08:54:38] - Going to use standard loops.
[08:54:38] - Files status OK
[08:56:38] 
[08:56:38] Folding@home Core Shutdown: MISSING_WORK_FILES
[08:56:38] Finalizing output
[08:56:42] CoreStatus = 1 (1)
[08:56:42] Client-core communications error: ERROR 0x1
[08:56:42] Deleting current work unit & continuing...
[08:59:04] - Warning: Could not delete all work unit files (7): Core returned invalid code
[08:59:04] Trying to send all finished work units
[08:59:04] + No unsent completed units remaining.
[08:59:04] - Preparing to get new work unit...
[08:59:04] + Attempting to get work packet
[08:59:04] - Will indicate memory of 2046 MB
[08:59:04] - Connecting to assignment server
[08:59:25] Error: Got status code 504 from server
[08:59:25] + Could not connect to Assignment Server
[08:59:46] Error: Got status code 504 from server
[08:59:46] + Could not connect to Assignment Server 2
[08:59:46] + Couldn't get work instructions.
[08:59:46] - Error: Attempt #1  to get work failed, and no other work to do.
             Waiting before retry.
[09:00:05] + Attempting to get work packet
[09:00:05] - Will indicate memory of 2046 MB
[09:00:05] - Connecting to assignment server
[09:05:05] Couldn't send HTTP request to server (wininet)
[09:05:05] + Could not connect to Assignment Server
[09:10:05] Couldn't send HTTP request to server (wininet)
[09:10:05] + Could not connect to Assignment Server 2
[09:10:05] + Couldn't get work instructions.
[09:10:05] - Error: Attempt #2  to get work failed, and no other work to do.
             Waiting before retry.
[09:10:16] + Attempting to get work packet
[09:10:16] - Will indicate memory of 2046 MB
[09:10:16] - Connecting to assignment server
[09:15:16] Couldn't send HTTP request to server (wininet)
[09:15:16] + Could not connect to Assignment Server
[09:20:16] Couldn't send HTTP request to server (wininet)
[09:20:16] + Could not connect to Assignment Server 2
[09:20:16] + Couldn't get work instructions.
[09:20:16] - Error: Attempt #3  to get work failed, and no other work to do.
             Waiting before retry.
[09:20:47] + Attempting to get work packet
[09:20:47] - Will indicate memory of 2046 MB
[09:20:47] - Connecting to assignment server
[09:25:47] Couldn't send HTTP request to server (wininet)
[09:25:47] + Could not connect to Assignment Server
[09:30:47] Couldn't send HTTP request to server (wininet)
[09:30:47] + Could not connect to Assignment Server 2
[09:30:47] + Couldn't get work instructions.
[09:30:47] - Error: Attempt #4  to get work failed, and no other work to do.
             Waiting before retry.
[09:31:30] + Attempting to get work packet
[09:31:30] - Will indicate memory of 2046 MB
[09:31:30] - Connecting to assignment server
[09:36:30] Couldn't send HTTP request to server (wininet)
[09:36:30] + Could not connect to Assignment Server
[09:39:47] Killing all core threads

Folding@Home Client Shutdown at user request.
[09:39:47] ***** Got a SIGTERM signal (2)
[09:39:47] Killing all core threads

Folding@Home Client Shutdown.

Then I get a whole lot of broken new client downloads


Code:
# SMP Client ##################################################################
###############################################################################

                       Folding@Home Client Version 5.92beta

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: C:\Folding@home\WinSMP
Executable: C:\Folding@home\WinSMP\fah.exe
Arguments: -oneunit -verbosity 9 -local 

[10:20:25] - Ask before connecting: No
[10:20:25] - Use IE connection settings: Yes
[10:20:25] - User name: lemonman (Team 10)
[10:20:25] - User ID: 1C4657B37FC384AA
[10:20:25] - Machine ID: 1
[10:20:25] 
[10:20:25] Loaded queue successfully.
[10:20:25] - Preparing to get new work unit...
[10:20:25] - Autosending finished units...
[10:20:25] + Attempting to get work packet
[10:20:25] Trying to send all finished work units
[10:20:25] - Will indicate memory of 2046 MB
[10:20:25] + No unsent completed units remaining.
[10:20:25] - Autosend completed
[10:20:25] - Connecting to assignment server
[10:20:26] - Successful: assigned to (171.64.65.64).
[10:20:26] + News From Folding@Home: Welcome to Folding@Home
[10:20:26] Loaded queue successfully.
[10:20:30] - Receiving payload (expected size: 2433175)
[10:20:52] - Downloaded at ~108 kB/s
[10:20:52] - Averaged speed for that direction ~95 kB/s
[10:20:52] + Received work.
[10:20:52] + Closed connections
[10:20:52] 
[10:20:52] + Processing work unit
[10:20:52] Core required: FahCore_a1.exe
[10:20:52] Core found.
[10:20:52] Working on Unit 08 [May 19 10:20:52]
[10:20:52] + Working ...
[10:20:52] - Calling 'mpiexec -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -np 4 FahCore_a1.exe -dir work/ -suffix 08 -checkpoint 10 -verbose -lifeline 3060 -version 592'

[10:20:53] 
[10:20:53] *------------------------------*
[10:20:53] Folding@Home Gromacs SMP Core
[10:20:53] Version 1.76 (February 23, 2008)
[10:20:53] 
[10:20:53] Preparing to commence simulation
[10:20:53] - Ensuring status. Please wait.
[10:21:10] - Looking at optimizations...
[10:21:10] - Working with standard loops on this execution.
[10:21:10] - Created dyn
[10:21:10] - Files status OK
[10:21:20] - Expanded 2432663 -> 12862801 (decompressed 528.7 percent)
[10:21:21] - Starting from initial work packet
[10:21:21] 
[10:21:21] Project: 2653 (Run 19, Clone 107, Gen 46)
[10:21:21] 
[10:21:21] 46)
[10:21:21] 
[10:21:22] ing M.D.
[10:21:22] M.D.
[10:21:29] Rejecting checkpoint
[10:21:30] riting local files
[10:21:30]  POPC
[10:21:30] Writing local files
[10:21:32] Extra SSE boost OK.
[10:21:33] rther.
[10:21:38] Going to send back what have done.
[10:21:38] logfile size: 8292
[10:21:38] - Writing 8828 bytes of core data to disk...
[10:21:38]   ... Done.
[10:21:38] - Failed to delete work/wudata_08.arc
[10:21:38] - Failed to delete work/wudata_08.xtc
[10:21:38] ..
[10:21:38]   ... Done.
[10:21:38] te.
[10:21:38] Warning:  check for stray files
[10:21:38] 
[10:21:38] Folding@home Core Shutdown: EARLY_UNIT_END
[10:21:38] Finalizing output
[10:23:36] CoreStatus = 63 (99)
[10:23:36] + Error starting Folding@Home core or unexpected system termination of core.
[10:23:41] 
[10:23:41] + Processing work unit
[10:23:41] Core required: FahCore_a1.exe
[10:23:41] Core found.
[10:23:41] Working on Unit 08 [May 19 10:23:41]
[10:23:41] + Working ...
[10:23:41] - Calling 'mpiexec -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -np 4 FahCore_a1.exe -dir work/ -suffix 08 -checkpoint 10 -verbose -lifeline 3060 -version 592'

[10:23:43] 
[10:23:43] *------------------------------*
[10:23:43] Folding@Home Gromacs SMP Core
[10:23:43] Version 1.76 (February 23, 2008)
[10:23:43] 
[10:23:43] Preparing to commence simulation
[10:23:43] - Ensuring status. Please wait.
[10:23:43] - Working with standard loops on this execution.
[10:24:00] - Previous termination of core was improper.
[10:24:00] - Going to use standard loops.
[10:24:00] - Files status OK
[10:26:00] 
[10:26:00] Folding@home Core Shutdown: MISSING_WORK_FILES
[10:26:00] Finalizing output
[10:26:04] CoreStatus = 1 (1)
[10:26:04] Client-core communications error: ERROR 0x1
[10:26:04] Deleting current work unit & continuing...
[10:28:26] - Warning: Could not delete all work unit files (8): Core returned invalid code
[10:28:26] Trying to send all finished work units
[10:28:26] + No unsent completed units remaining.
[10:28:26] - Preparing to get new work unit...
[10:28:26] + Attempting to get work packet
[10:28:26] - Will indicate memory of 2046 MB
[10:28:26] - Connecting to assignment server
[10:28:26] - Successful: assigned to (171.64.65.64).
[10:28:26] + News From Folding@Home: Welcome to Folding@Home
[10:28:27] Loaded queue successfully.
[10:28:30] - Receiving payload (expected size: 2433175)
[10:28:54] - Downloaded at ~99 kB/s
[10:28:54] - Averaged speed for that direction ~95 kB/s
[10:28:54] + Received work.
[10:28:54] + Closed connections
[10:28:59] 
[10:28:59] + Processing work unit
[10:28:59] Core required: FahCore_a1.exe
[10:28:59] Core found.
[10:28:59] Working on Unit 09 [May 19 10:28:59]
[10:28:59] + Working ...
[10:28:59] - Calling 'mpiexec -channel shm -env MPICH_USE_SMP_OPTIMIZATIONS 1 -np 4 FahCore_a1.exe -dir work/ -suffix 09 -checkpoint 10 -verbose -lifeline 3060 -version 592'

[10:29:00] 
[10:29:00] *------------------------------*
[10:29:00] Folding@Home Gromacs SMP Core
[10:29:00] Version 1.76 (February 23, 2008)
[10:29:00] 
[10:29:00] Preparing to commence simulation
[10:29:00] - Ensuring status. Please wait.
[10:29:17] - Looking at optimizations...
[10:29:17] - Working with standard loops on this execution.
[10:29:17] - Previous termination of core was improper.
[10:29:17] - Files status OK
[10:29:17] ndard loops.
[10:29:17] - Files status OK
[10:29:28] Starting from initial work packet
[10:29:28] 
[10:29:28] Project: 2653 (Run 19, Cl- Starting from initial work packet
[10:29:28] 
[10:29:28] Project: 2653 (Run 19, Clone 107, Gen 46)
[10:29:28] 
[10:29:28] ect: 2653 (Run 19, Clone 107, Gen 46)
[10:29:28] 
[10:29:29] Entering M.D.
[10:29:36] Rejecting checkpoint
[10:29:37] Protein: Protein in POPC
[10:29:37] Writing local files
[10:29:39] Extra SSE boost OK.
[10:29:40] o disk...
[10:29:40]   ... Done.
[10:29:40] - Failed to delete work/wudata_09.arc
[10:29:40] - Failed to delete work/wudata_09.xtc
[10:29:40] Warning:  check for stray files
[10:29:40] 
[10:29:40] Folding@home Core Shutdown: EARLY_UNIT_END
[10:29:40] Finalizing output
[10:29:40] 9.bed  ... Done.
[10:29:40]  delete work/wudata_09.sas
[10:29:40] - Failed to delete work/wudata_09.goe
[10:29:40] Warning:  check for stray files
[10:29:40] ck for stray files
[10:31:40] 
[10:31:40] Folding@home Core Shutdown: EARLY_UNIT_END
[10:31:40] Finalizing output
[10:31:43] CoreStatus = 63 (99)
[10:31:43] + Error starting Folding@Home core or unexpected system termination of core.
[10:31:43] - Attempting to download new core...
[10:31:43] + Downloading new core: FahCore_a1.exe
[10:31:43] Downloading core (/~pande/Win32/x86_Deino/Core_a1.fah from www.stanford.edu)
[10:31:45] + 10240 bytes downloaded
[10:36:13] + 19478 bytes downloaded
[10:36:13] Verifying core Core_a1.fah...
[10:36:13] Error reading signature from downloaded core file.
[10:36:13] Failed to verify core
[10:36:13] + Error: Could not extract core
[10:36:13] + Core download error (#2), waiting before retry...

[10:36:31] + Downloading new core: FahCore_a1.exe
[10:36:31] Downloading core (/~pande/Win32/x86_Deino/Core_a1.fah from www.stanford.edu)
[10:36:44] + 10240 bytes downloaded
[10:36:44] + 20480 bytes downloaded
[10:37:02] + 30720 bytes downloaded
[10:37:03] + 40960 bytes downloaded
[10:37:05] + 51200 bytes downloaded
[10:37:10] + 61440 bytes downloaded
[10:37:25] + 71680 bytes downloaded
[10:41:31] + 80023 bytes downloaded
[10:41:31] Verifying core Core_a1.fah...
[10:41:31] Error reading signature from downloaded core file.
[10:41:31] Failed to verify core
[10:41:31] + Error: Could not extract core
[10:41:31] + Core download error (#3), waiting before retry...

[10:41:41] + Downloading new core: FahCore_a1.exe
[10:41:41] Downloading core (/~pande/Win32/x86_Deino/Core_a1.fah from www.stanford.edu)
[10:41:41] + 10240 bytes downloaded
[10:41:41] + 20480 bytes downloaded
[10:41:41] + 30720 bytes downloaded
[10:41:41] + 40960 bytes downloaded
[10:41:41] + 51200 bytes downloaded
[10:41:41] + 61440 bytes downloaded
[10:41:41] + 71680 bytes downloaded
[10:41:41] + 80023 bytes downloaded
[10:41:41] Verifying core Core_a1.fah...
[10:41:41] Error reading signature from downloaded core file.
[10:41:41] Failed to verify core
[10:41:41] + Error: Could not extract core
[10:41:41] + Core download error (#4), waiting before retry...

[10:42:08] + Downloading new core: FahCore_a1.exe
[10:42:08] Downloading core (/~pande/Win32/x86_Deino/Core_a1.fah from www.stanford.edu)
[10:42:08] + 10240 bytes downloaded
[10:42:08] + 20480 bytes downloaded
[10:42:08] + 30720 bytes downloaded
[10:42:08] + 40960 bytes downloaded
[10:42:08] + 51200 bytes downloaded
[10:42:08] + 61440 bytes downloaded
[10:42:08] + 71680 bytes downloaded
[10:42:08] + 80023 bytes downloaded
[10:42:08] Verifying core Core_a1.fah...
[10:42:08] Error reading signature from downloaded core file.
[10:42:08] Failed to verify core
[10:42:08] + Error: Could not extract core
[10:42:08] + Core download error (#5), waiting before retry...

[10:42:52] + Downloading new core: FahCore_a1.exe
[10:42:52] Downloading core (/~pande/Win32/x86_Deino/Core_a1.fah from www.stanford.edu)
[10:42:52] + 10240 bytes downloaded
[10:42:52] + 20480 bytes downloaded
[10:42:52] + 30720 bytes downloaded
[10:42:52] + 40960 bytes downloaded
[10:42:52] + 51200 bytes downloaded
[10:42:52] + 61440 bytes downloaded
[10:42:52] + 71680 bytes downloaded
[10:42:52] + 80023 bytes downloaded
[10:42:52] Verifying core Core_a1.fah...
[10:42:52] Error reading signature from downloaded core file.
[10:42:52] Failed to verify core
[10:42:52] + Error: Could not extract core
[10:42:52] + Core download error (#6), waiting before retry...

[10:44:18] + Downloading new core: FahCore_a1.exe
[10:44:18] Downloading core (/~pande/Win32/x86_Deino/Core_a1.fah from www.stanford.edu)
[10:44:18] + 10240 bytes downloaded
[10:44:18] + 20480 bytes downloaded
[10:44:18] + 30720 bytes downloaded
[10:44:18] + 40960 bytes downloaded
[10:44:18] + 51200 bytes downloaded
[10:44:18] + 61440 bytes downloaded
[10:44:18] + 71680 bytes downloaded
[10:44:18] + 80023 bytes downloaded
[10:44:18] Verifying core Core_a1.fah...
[10:44:18] Error reading signature from downloaded core file.
[10:44:18] Failed to verify core
[10:44:18] + Error: Could not extract core
[10:44:18] + Core download error (#7), waiting before retry...

[10:47:01] + Downloading new core: FahCore_a1.exe
[10:47:01] Downloading core (/~pande/Win32/x86_Deino/Core_a1.fah from www.stanford.edu)
[10:47:01] + 10240 bytes downloaded
[10:47:01] + 20480 bytes downloaded
[10:47:01] + 30720 bytes downloaded
[10:47:01] + 40960 bytes downloaded
[10:47:01] + 51200 bytes downloaded
[10:47:01] + 61440 bytes downloaded
[10:47:01] + 71680 bytes downloaded
[10:47:01] + 80023 bytes downloaded
[10:47:01] Verifying core Core_a1.fah...
[10:47:01] Error reading signature from downloaded core file.
[10:47:01] Failed to verify core
[10:47:01] + Error: Could not extract core
[10:47:01] + Core download error (#8), waiting before retry...

[10:52:31] + Downloading new core: FahCore_a1.exe
[10:52:31] Downloading core (/~pande/Win32/x86_Deino/Core_a1.fah from www.stanford.edu)
[10:52:31] + 10240 bytes downloaded
[10:52:31] + 20480 bytes downloaded
[10:52:31] + 30720 bytes downloaded
[10:52:31] + 40960 bytes downloaded
[10:52:31] + 51200 bytes downloaded
[10:52:31] + 61440 bytes downloaded
[10:52:31] + 71680 bytes downloaded
[10:52:31] + 80023 bytes downloaded
[10:52:31] Verifying core Core_a1.fah...
[10:52:31] Error reading signature from downloaded core file.
[10:52:31] Failed to verify core
[10:52:31] + Error: Could not extract core
[10:52:31] + Core download error (#9), waiting before retry...

[11:03:19] + Downloading new core: FahCore_a1.exe
[11:03:19] Downloading core (/~pande/Win32/x86_Deino/Core_a1.fah from www.stanford.edu)
[11:03:19] + 10240 bytes downloaded
[11:03:19] + 20480 bytes downloaded
[11:03:19] + 30720 bytes downloaded
[11:03:19] + 40960 bytes downloaded
[11:03:19] + 51200 bytes downloaded
[11:03:19] + 61440 bytes downloaded
[11:03:19] + 71680 bytes downloaded
[11:03:19] + 80023 bytes downloaded
[11:03:19] Verifying core Core_a1.fah...
[11:03:19] Error reading signature from downloaded core file.
[11:03:19] Failed to verify core
[11:03:19] + Error: Could not extract core
[11:03:19] + Core download error (#10), waiting before retry...

[11:13:54] Killing all core threads

Folding@Home Client Shutdown at user request.
[11:13:54] ***** Got a SIGTERM signal (2)
[11:13:54] Killing all core threads

Folding@Home Client Shutdown.

Nuked the whole folder (after a -send all flag)
Running the 5.91 client without problems.
Very odd, just happy it happen this morning when I could spot it - and not friday night :)

Was doing a lot of photoshoping / downloading images this morning but a client never done this to me before.

Keep and eye on yours people
 
cheers - will keep eyes peeled.
Since changing my usb wireless dongle to a edimax pci card it's been rock solid the last 4 days or so with winsmp.
 
Just FYI - I've had an increased number those since I went to a PCI wireless card on a minimal 64-Bit Ubuntu install.

Was thinknig I may put a cron job in to Ping the net every ten minutes or so until I can figure out what the real root casue of it is. Wasnt' aware that WinSMP has similar problems.
.
 
I think something is broke at Stanford. I'm getting this sort of thing too at the moment. Probably relates to the server outage they had earlier this week :(
 
Back
Top Bottom