Ubuntu smp problem :(

Soldato
Joined
18 Oct 2002
Posts
3,023
Location
Temuka, New Zealand
I've set up a second VMware ubuntu machine to use all 4 cores and that's crunching fine but now my old VM won't crunch:( I've tried to reinstall manually and now I've done a fresh finstall and I get the same error.
Help!
[12:09:54]
[12:09:54] *------------------------------*
[12:09:54] Folding@Home Gromacs SMP Core
[12:09:54] Version 1.73 (November 27, 2006)
[12:09:54]
[12:09:54] Preparing to commence simulation
[12:09:54] - Ensuring status. Please wait.
[12:09:54] - Starting from initial work packet
[12:09:55]
[12:09:55] Project: 2653 (Run 4, Clone 22, Gen 6)
[12:09:55]
[12:09:55] Assembly optimizations on if available.
[12:09:55] Entering M.D.
[12:10:13] .3 percent)
[12:10:14] - Starting from initial work packet
[12:10:14]
[12:10:14] Project: 2653 (Run 4, Clone 22, Gen 6)
[12:10:14]
[12:10:14] Entering M.D.
NNODES=4, MYRANK=1, HOSTNAME=ubuntu
NNODES=4, MYRANK=3, HOSTNAME=ubuntu
NNODES=4, MYRANK=0, HOSTNAME=ubuntu
NNODES=4, MYRANK=2, HOSTNAME=ubuntu
[cli_3]: aborting job:
Fatal error in MPI_Wait: Error message texts are not available
[cli_0]: aborting job:
Fatal error in MPI_Wait: Error message texts are not available
[12:10:23] Finalizing output
[cli_2]: aborting job:
Fatal error in MPI_Wait: Error message texts are not available
 
Ive just had 2 pcs with the same problem on native linux, one was a 2652 and the other was a 2653. I do know these WUs are about the toughest as far as having your machine and software set correctly.

My pc that borked on the 2653 started its second one and is running fine now. So it may be some dodgy WUs. I cant get onto the Folding-community forum to see if they have any info there.

As long as your pc is stable just let it try again.
 
Not sure about the SMP problem but I can help with the FCF forums issue:

It appears that the database driving the forum software is having a few problems with its tracking of sessions.

If you are currently browsing as a guest, be that as a new user, or a registered one, it is quite likely that you cannot login at present.

Before you login, please read the following so you can restore your ability to browse the forum if your login fails.

If you do try and login you will most likely be presented by a blank page, and any subsequent visits to pages on this forum will cause the same thing to happen.
If this happens to you, you need to delete the session_id cookie for forum.folding-community.org, the name of the cookie is phpbb2fah_sid, then restart your browser to make sure.

As long as you remain as a guest user, you will still be able to browse the forums, and you can track the progress of this problem in this thread: http://forum.folding-community.org/viewtopic.php?t=20820

Users who have set their profile to "Remember me" should not be affected by this problem and can post as normal.

Edit: It appears that the above statement is not entirely accurate. Many people have not been able to login even if they were set to have a permanent cookie, and others can login even if they don't have this setting enabled. So far there doesn't seem to be any discernible pattern to determine who can still login. However, if you can login once, you will probably still be able to do so with subsequent attempts.

For those who find they cannot login, there is a discussion going on here: http://www.dslreports.com/forum/r18799237-forumfoldingcommunityorg in which several FCF mods are also taking part.

It might be useful if members who can see this post, repost it in their home forums, just to let people know whats going on.

Updated: 2007/08/10

Sorry that's a bit of a mess - couldn't quote it as it's a locked thread - the bit in Italics has been struck through


edit: From having a look around there are two suggestions

1. Hardware Instability

2. Networking Issue

If it's the second then maybe it's something to do with how you've set the VM to act with regards to network connection? :confused:
 
Last edited:
I can't see it as an OC problem, the other VM is running fine.
I have two VM's running, one called ubuntu and on bridged network ID ***.*** .1.7 and ubuntu2 on ***.***.1.8. Both connect to the web OK, both do other tasks fine but the original one won't crunch:confused: even with a fresh finstall.
I might try loading up another VM and binning the first if I can't get it to work again :(
The OC is very small ATM, just running it in a bit:) 300 fsb from 266
 
I've copied the folder from a working client to the dead one, now I get this.
[15:21:36] Trying to unzip core FahCore_a1.exe
[15:21:37] Decompressed FahCore_a1.exe (3624144 bytes) successfully
[15:21:37] + Core successfully engaged
[15:21:42]
[15:21:42] + Processing work unit
[15:21:42] Core required: FahCore_a1.exe
[15:21:42] Core found.
[15:21:42] Working on Unit 01 [August 21 15:21:42]
[15:21:42] + Working ...
[15:21:42] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a1.exe -dir work/ -suffix 01 -checkpoint 5 -verbose -lifeline 10376 -version 591'

sh: ./mpiexec: not found
[15:21:42] CoreStatus = 7F (127)
[15:21:42] Client-core communications error: ERROR 0x7f
[15:21:42] Deleting current work unit & continuing...
sh: ./mpiexec: not found
[15:21:42] - Warning: Could not delete all work unit files (1): Core returned invalid code
[15:21:42] Trying to send all finished work units
[15:21:42] + No unsent completed units remaining.
[15:21:42] - Preparing to get new work unit...
[15:21:42] + Attempting to get work packet
What is MPI and how do I fix it?
I wish I knew what I was doing with linux rather than blindly following guides:o
 
diogenese said:
What is MPI and how do I fix it?
I wish I knew what I was doing with linux rather than blindly following guides:o

I can remember seeing a thread on the Folding-community forum about peeps having problems with MPI, as Rich says I think it has a lot to do with networking and I think the solution was to change certain settings.
If I can manage to gain access to the forum tonight Ill hunt down the thread.
 
Cheers for that sculptor, the second one had the answer :) well almost but it showed up the problem!
Somehow ( no idea how/why) the network address came up as ***.***.1.8 for ubuntu instead of 1.7 which it started out as.
In system/administration/network/connections the address came up as 1.7 but in the hosts tab it was 1.8, the address of the other VM:confused:
I changed it in hosts and all's well :D
 
Back
Top Bottom