Any VM folders updated their A2 core?

Associate
Joined
9 Mar 2008
Posts
1,039
Hi,

About a week ago, Stanford posted on the official forum that a new cleint was available for Linux that made use of a newer A2 core, with some improvements to stop WUs being generated with the wrong number of steps. I downloaded it and have pretty much been unable to complete an A2 WU since, besides one or two exceptions... I dont leave the client running 24/7, but when stopping and resuming the A2 units i get checkpoint resume errors, and the client throws a wobbly and doesnt know what to do. I have reported this on the official forum, along with others, but Dr Kasson has replied saying they are finding it difficult to replicate the error. Anyone here had this problem who would be willing to help out? The thread is here for more info:

http://foldingforum.org/viewtopic.php?f=44&t=8356

Its quite frustrating really and has cost me quite a lot of points - until now the Linux VMs have been rock solid in their stability, and this is the first problem i have really encountered with them. Anyone who is contemplating the core upgrade, think twice or be prepared for some (potential) trouble...
 
Strange! I have reinstalled the clients completely (not the VM) and one picked up an A2, and so far it has not had the checkpointing error. Unfortunately i dont get through too many SMP WUs so its difficult to identify the cause, and the servers keep throwing A1s my way!

What version of ubuntu are you using Mattus? That actually looks a lot like an error i was getting with some code for one of our uni projects on an Ubuntu 8.10 VM. Switching to 8.04 resulted in no problems with identical code..

EDIT: Does ubuntuServer6 indicate you are using Ubuntu 6 by any chance?
 
Interesting - as i mentioned in the thread i made the mistake of letting VMware setup most of the default options so my images are around 8Gb each. Still, at least i know its not just me! Im not sure its the case for every WU, as i mentioned, i have completed a few A2 units, but i would say about half have failed to resume from checkpoints. Its interesting you mention that if you try and start them lots of times they start eventually, and i'll give that a go if i ever see an A2 unit again!
 
I'll post the logs for you if you like...? Its also very interesting to know that it is not just a VM related issue if you are running a native client.
 
It just seems a pain to lose a days work so easily.

I uploaded the log - Dr Kasson has said that they have managed to isolate the bug, but that a fix may take a little while to surface. I agree fully, its a real pain. I dont get a lot of points from the SMp client, but it has already bricked 5 units for me, all on above 50% completion. I have started to close the clients after they have only completed a few % of each A2 WU - if they dont fail then, they dont seem to fail later on, but it is a little inconvienient at times.
 
I just posted on the folding forum regarding this issue which has sort of surfaced again for me now that there are more A2 units out and about. Dr Kasson indicated that they have a fix, and that it could be rolled out early (although no indication of a timescale was given). However, i did get some particularly useful advice from another member, dnamechanic (thanks!).

This only applies to VMware users unfortunately, but instead of ctrl+c to close the client, use VMware to take a snapshot of the guest operating system. This just dumps the contents of the RAM to disk, and then the client can be shutdown however. Instead of restarting the VM and restarting the client, you simply restore from the snapshot, and it carrys on from the time of the snapshot. Apparently this can take some time with a low RAM machine, but on my system (4Gb of RAM) its pretty damn quick. Its also much quicker to resume the VM from the snapshot than it is to manually boot up the VM each time. Not sure if this is news to people or indeed if others have always been doing this (i think from now on i will!), but its certainly a fix for the problem. The only problem is it puts the VM's clocks out of sync with the host, meaning its necessary to either alter them or set the VM to sync clocks automatically to avoid FahMon problems.

Hope this helps.
 
Back
Top Bottom