Ubuntu smp problem :(

Soldato
Joined
18 Oct 2002
Posts
3,023
Location
Temuka, New Zealand
I've set up a second VMware ubuntu machine to use all 4 cores and that's crunching fine but now my old VM won't crunch:( I've tried to reinstall manually and now I've done a fresh finstall and I get the same error.
Help!
[12:09:54]
[12:09:54] *------------------------------*
[12:09:54] Folding@Home Gromacs SMP Core
[12:09:54] Version 1.73 (November 27, 2006)
[12:09:54]
[12:09:54] Preparing to commence simulation
[12:09:54] - Ensuring status. Please wait.
[12:09:54] - Starting from initial work packet
[12:09:55]
[12:09:55] Project: 2653 (Run 4, Clone 22, Gen 6)
[12:09:55]
[12:09:55] Assembly optimizations on if available.
[12:09:55] Entering M.D.
[12:10:13] .3 percent)
[12:10:14] - Starting from initial work packet
[12:10:14]
[12:10:14] Project: 2653 (Run 4, Clone 22, Gen 6)
[12:10:14]
[12:10:14] Entering M.D.
NNODES=4, MYRANK=1, HOSTNAME=ubuntu
NNODES=4, MYRANK=3, HOSTNAME=ubuntu
NNODES=4, MYRANK=0, HOSTNAME=ubuntu
NNODES=4, MYRANK=2, HOSTNAME=ubuntu
[cli_3]: aborting job:
Fatal error in MPI_Wait: Error message texts are not available
[cli_0]: aborting job:
Fatal error in MPI_Wait: Error message texts are not available
[12:10:23] Finalizing output
[cli_2]: aborting job:
Fatal error in MPI_Wait: Error message texts are not available
 
I can't see it as an OC problem, the other VM is running fine.
I have two VM's running, one called ubuntu and on bridged network ID ***.*** .1.7 and ubuntu2 on ***.***.1.8. Both connect to the web OK, both do other tasks fine but the original one won't crunch:confused: even with a fresh finstall.
I might try loading up another VM and binning the first if I can't get it to work again :(
The OC is very small ATM, just running it in a bit:) 300 fsb from 266
 
I've copied the folder from a working client to the dead one, now I get this.
[15:21:36] Trying to unzip core FahCore_a1.exe
[15:21:37] Decompressed FahCore_a1.exe (3624144 bytes) successfully
[15:21:37] + Core successfully engaged
[15:21:42]
[15:21:42] + Processing work unit
[15:21:42] Core required: FahCore_a1.exe
[15:21:42] Core found.
[15:21:42] Working on Unit 01 [August 21 15:21:42]
[15:21:42] + Working ...
[15:21:42] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 01 -checkpoint 5 -verbose -lifeline 10376 -version 591'

sh: ./mpiexec: not found
[15:21:42] CoreStatus = 7F (127)
[15:21:42] Client-core communications error: ERROR 0x7f
[15:21:42] Deleting current work unit & continuing...
sh: ./mpiexec: not found
[15:21:42] - Warning: Could not delete all work unit files (1): Core returned invalid code
[15:21:42] Trying to send all finished work units
[15:21:42] + No unsent completed units remaining.
[15:21:42] - Preparing to get new work unit...
[15:21:42] + Attempting to get work packet
What is MPI and how do I fix it?
I wish I knew what I was doing with linux rather than blindly following guides:o
 
Cheers for that sculptor, the second one had the answer :) well almost but it showed up the problem!
Somehow ( no idea how/why) the network address came up as ***.***.1.8 for ubuntu instead of 1.7 which it started out as.
In system/administration/network/connections the address came up as 1.7 but in the hosts tab it was 1.8, the address of the other VM:confused:
I changed it in hosts and all's well :D
 
Back
Top Bottom