F@H Questions/problems

NightmareXX · 27 Jan 2010 at 10:46

Well as I said, I've now used up all my BOINC units so I've switched to F@H to see what sort of output I can give. However, I've a few problems and questions that I can't seem to find the answer to so I thought I'd as you lovely people here

http://nightmarexx.is-a-geek.com/fah.html

Firstly and perhaps most importantly, my GPU (GTX 285) doesn't seem to be working properly. It's the GPU system tray client and last night it stomped through a unit in about an hour. However, as is visible above, it now seems to be struggling with a unit. Checking the unit by displaying it confirms this, the simulation pauses more often than it's running! Why?

Secondly, should I run 1 or 2 instances of the SMP client for my quads? A recent forum post I read said 1 client using all 4 cores gave more PPD than 2 clients assigned to 2 cores.

Thirdly, do those PPD numbers look right?

Thanks in advance

EDIT: Reinstalled GPU client and it seems to be playing ball now

Biffa · 27 Jan 2010 at 10:55

My reccomendations/advice:

1st understand that FAHmon doesn't show the new SMP2 PPD correctly most of us have switched to HFM.NET for monitoring

One SMP instance per quad, especially the new SMP2 client.

Check the GPU hasn't throttled back. It only takes one mistake such as connecting with RDP accidentaly or running an application/game that uses Physx and it will crash the GPU and it will be running at desktop powersave speeds.

Also check that the GPU client is running at "Slightly higher" Core priority (in the Advanced tab)

Also don't watch the pretty graphics they just slow the whole thing down.

NightmareXX · 27 Jan 2010 at 10:59

Ah ha, excellent.

When I get home I shall switch to HFM.NET, sort out a single SMP client for all of my quads.

I've checked my graphics card and it hasn't throttled back. I've been using VNC to connect to my PC to check. I've ticked the higher priority now

I don't watch the pretty graphics, I merely use it as an indication of whether it's working or not.

NightmareXX · 27 Jan 2010 at 13:46

Sorted a few things out, now have a single SMP client on each of my quads.

I've seen some mention over using the -forceasm flag. Should I use that on both my AMD and Intel systems or not at all?

I've also noticed that my Q9450 appears to be running 4 instances of something whereas both my 940 and 810 chips are running a single process using up 100% CPU time. Is this to do with the new A3 core that's coming out?

Also, in HFM.NET, some of the status bars are yellow and I can't work out why. It says "RunningNoFrameTimes". What does that mean?

http://nightmarexx.is-a-geek.com/FAH/index.html

Marine Iguana · 27 Jan 2010 at 14:05

Well the Q9450 should go green after a couple of frames have been done so it works out the relevant data for PPD and such, but your other yellow one should have gone green by now but hasn't but appears to be crunching away so wouldn't worry.

Also your Q9450 is using the old a1 core which will be so slow to get done

NightmareXX · 27 Jan 2010 at 14:11

Will my Q9450 get the A3 core when it gets a unit for it or is there something I can do to force it to get the newer core?

verbal · 27 Jan 2010 at 14:17

You'll get one if you're patient, you can't force it but there are plenty around. Eventually the older A1 and A2 units will be phased out.

With your Q9450 and GTX285 you can be one of the highest producers. See you on page one of the stats very soon

NightmareXX · 27 Jan 2010 at 14:21

Well if you check my page I've got all my machines on the job. They're producing better than I expected

I'll try and pick up another graphics card at some point too ;-)

Marine Iguana · 27 Jan 2010 at 14:23

Ooh 20k PPD not bad at all, never used to be that way.

NightmareXX · 28 Jan 2010 at 01:50

Well my GPU client seems to have killed itself again. Downloaded a new core (FAH_14) and project 5912. It simply refuses to load the GPU to full and there's no reason why not. Hsa been working perfectly fine all day.

Thoughts?

KE1HA · 28 Jan 2010 at 05:04

NightmareXX said:
Well my GPU client seems to have killed itself again. Downloaded a new core (FAH_14) and project 5912. It simply refuses to load the GPU to full and there's no reason why not. Hsa been working perfectly fine all day.

Thoughts?

What do you mean "refuses to load the GPU to full" ? Pse post the Log from where you restarted the client until you get this problem.
.

NightmareXX · 28 Jan 2010 at 14:21

Well previous units would complete in just over an hour and each frame would take about 1 minute. However, with this new unit, each frame takes an incredibly varied length of time and does heat my GPU up at all

My PPD is almost halved too!

KE1HA · 28 Jan 2010 at 15:02

Need to see the log, but on my GPU's (9800GT's Stock Speeds) here's what I was getting:

5910 - Avg. Time / Frame : 1mn 36s - 4248.00 ppd
5912 - Avg. Time / Frame : 5mn 20s - 5097.60 ppd
5912 - Avg. Time / Frame : 6mn 21s - 4281.45 ppd
5914 - Avg. Time / Frame : 5mn 14s - 5195.01 ppd
5915 - Avg. Time / Frame : 5mn 05s - 5348.30 ppd

My GTX-280 does about 7.1 to 7.4 accross all those.

Generally, when the Core isn't engauged or producing properly, its a duft WU or the Core itself has gotten currupt somehow.

I've found with these GPU's, when there's a problem, it's easier to just post the Log at Stanford, nuke the folder ( delete it ) and get a New WU / Core. Although you defefinately want to post the problem Stanford, and see if others are having the same issue with that WU / Core / Client combination.

I had one this morning (Project: 5913 (Run 14, Clone 979, Gen 10)) when i checked FAHMon, was just hung for no reason. Restarted it, same thing. Was the first issue on that machine for 60 or so WU's, nuked the folder, downloaded another WU, been hammering away since.
.

NightmareXX · 28 Jan 2010 at 18:31

Here's my FAH log. The time between frames is still far too big

I get much better PPD with the older units.

Code:

[17:54:44] + Processing work unit
[17:54:44] Core required: FahCore_14.exe
[17:54:44] Core found.
[17:54:44] Working on queue slot 01 [January 28 17:54:44 UTC]
[17:54:44] + Working ...
[17:54:44] 
[17:54:44] *------------------------------*
[17:54:44] Folding@Home GPU Core - Beta
[17:54:44] Version 1.26 (Wed Oct 14 13:09:26 PDT 2009)
[17:54:44] 
[17:54:44] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[17:54:44] Build host: vspm46
[17:54:44] Board Type: Nvidia
[17:54:44] Core : 
[17:54:44] Preparing to commence simulation
[17:54:44] - Looking at optimizations...
[17:54:44] - Files status OK
[17:54:44] - Expanded 70342 -> 360060 (decompressed 511.8 percent)
[17:54:44] Called DecompressByteArray: compressed_data_size=70342 data_size=360060, decompressed_data_size=360060 diff=0
[17:54:44] - Digital signature verified
[17:54:44] 
[17:54:44] Project: 5906 (Run 7, Clone 759, Gen 99)
[17:54:44] 
[17:54:44] Assembly optimizations on if available.
[17:54:44] Entering M.D.
[17:54:50] Will resume from checkpoint file
[17:54:50] Tpr hash work/wudata_01.tpr: 1081074134 1261258597 2710553407 2536293055 144662385
[17:54:50] Working on Protein
[17:54:51] Client config found, loading data.
[17:54:52] Starting GUI Server
[17:54:52] Resuming from checkpoint
[17:54:52] fcCheckPointResume: retrieved and current tpr file hash:
[17:54:52] 0 1081074134 1081074134
[17:54:52] 1 1261258597 1261258597
[17:54:52] 2 2710553407 2710553407
[17:54:52] 3 2536293055 2536293055
[17:54:52] 4 144662385 144662385
[17:54:52] Verified work/wudata_01.log
[17:54:52] Verified work/wudata_01.edr
[17:54:52] Verified work/wudata_01.xtc
[17:54:52] Completed 1%
[18:04:12] Completed 2%
[18:06:32] Completed 3%
[18:11:06] Completed 4%
[18:13:31] Completed 5%
[18:22:23] Completed 6%

And my config file

Code:

[settings]
username=NightmareXX
team=10
passkey=...........
asknet=no
machineid=1
bigpackets=big
local=7

[http]
active=no
host=localhost
port=8080
usereg=no

[core]
checkpoint=5
nocpulock=1
priority=96
addr=

[clienttype]
type=3

KE1HA · 28 Jan 2010 at 21:22

Yep, something definately not right with that situaiton. I don't see anything in the log that is a smoking gun, other than the times are way out of whack. I checked my benchmarks, and the average im getting for 5906 is about (1) minute per segment.

I'd clear the folder, Post the log on Stanford, and let it get a new Core and WU. If the same thing happends, then i'd be looking at a HW / SW host config issue and what if anything has changed.

Only thing I see by way of your config file is move that check point out to at least 15 minuts, 5 is far to short. I don't set localhost, priority nor disable infinty. Maybe somebody wth a WinDoze box can comment on that.

I'm assuming your running SMP on the box as well, what WU is it processing at the moment ot was processing when this slow down happened, and are you running all the cores on SMP ?
.

NightmareXX · 28 Jan 2010 at 23:23

I've changed the checkpoint to 15 minutes although no change yet.

I am running SMP and it's crunching on all 4 cores although with other GPU units it didn't have an impact. Currently doing a 6012 unit.