SMP Problem

  • Thread starter Thread starter Cob
  • Start date Start date

Cob

Cob

Soldato
Joined
30 Jul 2006
Posts
18,487
Location
Antrim town
Has anyone else had any problems with the latest beta stopping on one core?
So far, I've discovered that it has stopped folding on a single core three times. This morning was like this-

Code:
[00:53:36] Completed 200000 out of 5000000 steps  (4 percent)
[01:01:20] Writing local files
[01:01:20] Completed 250000 out of 5000000 steps  (5 percent)
[01:09:04] Writing local files
[01:09:04] Completed 300000 out of 5000000 steps  (6 percent)
[05:31:37] - Autosending finished units...
[05:31:37] Trying to send all finished work units
[05:31:37] + No unsent completed units remaining.
[05:31:37] - Autosend completed
[11:31:37] - Autosending finished units...
[11:31:37] Trying to send all finished work units
[11:31:37] + No unsent completed units remaining.
[11:31:37] - Autosend completed
[11:33:53]

Then I stopped and restarted it and it started fine
Code:
11:33:53] Folding@home Core Shutdown: INTERRUPTED
[11:33:57] CoreStatus = 66 (102)
[11:33:57] + Shutdown requested by user. Exiting.***** Got a SIGTERM signal (15)
[11:33:57] Killing all core threads

Folding@Home Client Shutdown.
--- Opening Log file [Feburary 5 11:34:18] 


# SMP Client ##################################################################
###############################################################################

                       Folding@Home Client Version 5.91beta3

                          http://folding.stanford.edu

###############################################################################
###############################################################################

Launch directory: /home/davy/foldingathome/CPU2
Executable: /home/davy/foldingathome/CPU2/fah5
Arguments: -verbosity 9 

[11:34:18] - Ask before connecting: No
[11:34:18] - User name: Cob (Team 10)
[11:34:18] - User ID: 16CCE0CC1AF0BFC1
[11:34:18] - Machine ID: 2
[11:34:18] 
[11:34:18] Loaded queue successfully.
[11:34:18] - Autosending finished units...
[11:34:18] Trying to send all finished work units
[11:34:18] + No unsent completed units remaining.
[11:34:18] - Autosend completed
[11:34:18] 
[11:34:18] + Processing work unit
[11:34:18] Core required: FahCore_a1.exe
[11:34:18] Core found.
[11:34:18] Working on Unit 08 [Feburary 5 11:34:18]
[11:34:18] + Working ...
[11:34:18] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 08 -checkpoint 30 -verbose -lifeline 23756 -version 591'

[11:34:18] 
[11:34:18] *------------------------------*
[11:34:18] Folding@Home Gromacs SMP Core
[11:34:18] Version 1.73 (November 27, 2006)
[11:34:18] 
[11:34:18] Preparing to commence simulation
[11:34:18] - Ensuring status. Please wait.
[11:34:18] 
[11:34:18] Project: 3025 (Run 6, Clone 69, Gen 19)
[11:34:18] 
[11:34:18] Assembly optimizations on if available.
[11:34:18] Entering M.D.
[11:34:35] OK
[11:34:35] - Expanded 291729 -> 1507813 (decompressed 516.8 percent)
[11:34:35] 
[11:34:35] Project: 3025 (Run 6, Clone 69, Gen 19)
[11:34:35] 
[11:34:35] Entering M.D.
[11:34:42] mpleted 300000 out of 5000000 steps  (6 pCompleteExtra SSE boExtra SSE boost OK.
[11:34:42] 6 percent)
[11:34:42] Extra SSE boost OK.
[11:42:34]  5000000 steps  (7 percent)
 
Forgive me for being dense, but how do you know it is only running on one core? The top log doesn't suggest anything :\

SiriusB
 
SiriusB said:
Forgive me for being dense, but how do you know it is only running on one core? The top log doesn't suggest anything :\

SiriusB
Cob runs 2 SMP clients on his machine (being a highly clocked E6600) so I'm assuming the other client was progressing fine.

I've not seen this problem in any of the 16WUs I've done so far but I think you've done quite a few more, is it the same WU doing it or the same client or just quite random?
 
rich99million said:
Cob runs 2 SMP clients on his machine (being a highly clocked E6600) so I'm assuming the other client was progressing fine.

I've not seen this problem in any of the 16WUs I've done so far but I think you've done quite a few more, is it the same WU doing it or the same client or just quite random?
thought the SMP used both cores?
isn't that the point?
 
VeNT said:
thought the SMP used both cores?
isn't that the point?

One SMP client uses about 80% of both cores.

Two SMP clients use 100% of both cores.

Whilst a single client running over 2 cores will complete a single WU faster, two clients will complete two WU's faster.
 
rich99million said:
Cob runs 2 SMP clients on his machine (being a highly clocked E6600) so I'm assuming the other client was progressing fine.

Yea the other client was running fine.

rich99million said:
I've not seen this problem in any of the 16WUs I've done so far but I think you've done quite a few more, is it the same WU doing it or the same client or just quite random?

I'm not sure tbh. The first two times I noticed it were just after I had got up for work and the old memory hadn't quite woke up yet :o
 
Last edited:
Cob said:
One SMP client uses about 80% of both cores.

Two SMP clients use 100% of both cores.

Whilst a single client running over 2 cores will complete a single WU faster, two clients will complete two WU's faster.
Is this on vmWare on in a native environment?
 
Back
Top Bottom