Next gen GPU client in open beta

Cool, been waiting for this. I don't have a card yet but I'll probably get one and use this client just cos I like to be different :)
 
Nice. Although I probably won't be using it myself, it will be interesting to see what happens with it. :)

edit - be also nice to see some numbers, ppd, flops etc. Just for numbers sake of course. I wonder if it will make some waves on the client statistics?
 
Last edited:
Thanks. Just been reading the FAQ in the link you gave, its a good bit of info. Also, one of the forum posts suggests it seems to be CPU limited? Looks like there's a lot of milage in this new GPU client!
 
Thanks. Just been reading the FAQ in the link you gave, its a good bit of info. Also, one of the forum posts suggests it seems to be CPU limited? Looks like there's a lot of milage in this new GPU client!

Yes, CPU speed is limiting the client at the moment. The GPUs are so fast that they can perform all the parallel parts of the code faster than the CPU can do the serial parts.
 
The nVidia issue is that they cannot guarantee what is actually executed.

CTM is effectively assembler so what they code is not touched by the AMD drivers on the host machine..

CUDA is basically output is actually recompiled by the drivers on the host machine.. so if nV mess up the drivers etc then they can't guarantee valid results.

I hope that helps.
 
Yes, CPU speed is limiting the client at the moment. The GPUs are so fast that they can perform all the parallel parts of the code faster than the CPU can do the serial parts.

Yup - think of like this:
a) the GPU is able todo hundreds of maths operations in parallel but on a specific data format.
b) the CPU has to organise the data into that format.
c) programs require the CPU to prepare the data before passing to the GPU.. including arranging the data transfers..

The more maths you can load into the GPU pass the better otherwise the overhead of organising the data and data transfer times prevent it being viable.
 
Thats some good info NickK, I'd always assumed it was a driver issue but on the grounds of locking people out rather than, as you explained, a radically different software process.

I fully understand about the CPU being the bottleneck for the GPU, but would two cores feeding the GPU be better than one in this case? Or is it all designated to one CPU for simplicities sake?
 
Anyone getting good results on this client?

Im still going to be running S@H but when I get my quad could let S@H have 3 cores & one core feeding the GPU client...

Just a thought...

P.S I have a 2900XT atm but it won't up to 3d gfx when I run it (tried it for 5 mins) Someone is offering me £110 for it, would it be worth me doing this then getting a 3870 to fold on or a 8800GT?

Cheers, Doug.
 
Thats some good info NickK, I'd always assumed it was a driver issue but on the grounds of locking people out rather than, as you explained, a radically different software process.

I fully understand about the CPU being the bottleneck for the GPU, but would two cores feeding the GPU be better than one in this case? Or is it all designated to one CPU for simplicities sake?

The GPU itself it fed with DMA transfers from system memory into GPU memory - the same process is used for texture loading for games/graphics. This is because the data is stored as textures.
Data is read off by effectively copying what is rendered (or transformed textures) back into system memory using DMAs again.

It's this packaging of data into textures that needs the CPU's help. So this could be done multi-threaded.
DC applications such as folding require this packaging & unpackaging to be done continuously (although often optimisation means there's a lot of attempts to keep the data in the GPU memory between GPU programs when they don't need transferring).

So, yes, a multi-threaded system could make use of multiple cores to feed a GPU although the bottleneck becomes the PCI-E and the memory bus bandwidth for all these operations (both CPU packaging and GPU DMA transfers and everything else use the system memory bandwidth).
The downside is that the size of data would have to be quite large otherwise keeping multiple threads syncronised (and the data in the CPU caches etc) would undo the benefit.

The CPU also has to organise the GPU programs to execute too as the GPU is actually quite dumb. The GPU programs are loaded by the same DMA process (sometimes a few are pre-loaded) and the CPU triggers the start of each program's execution.

So the CPU becomes the administrator, the GPU does the actual data processing.
 
Last edited:
Very good info, NickK. Thanks for taking the time to answer. If I understand what you have written correctly, then we should see some big improvement on newer Nehalem systems with the new QPI memory access (although the PCIe interface will remain the same).

IRT Doug, £110 sounds like a good price for that card if you're looking to upgrade. Also, the 8800GT, or any Nvidia card, will not work with folding@home - only AMD cards. :)
 
Last edited:
Roughly 800-900 depending on the card and the CPU. Test work units are out at the moment and points are being reviewed so everything is a bit up in the air. If I was considering the client I would hang fire on buying new hardware until other people do all the donkey work for you.
 
Roughly 800-900 depending on the card and the CPU. Test work units are out at the moment and points are being reviewed so everything is a bit up in the air. If I was considering the client I would hang fire on buying new hardware until other people do all the donkey work for you.

You can get 1800+ PPD with a 3870 and a fast CPU.
 
Is that running 2x GPU2 clients on it? The modal figure I was seeing banded about was 800/900ppd, but there's obviously massive variations between setups. Regardless, you know more about it than I do. :)
 
Is a sign of cpu limitation that the gpu core doesnt get fully loaded? I only get around 74% load on the gpu while one of my cpu cores is fully loaded.

edit: overclocking the cpu did the trick. gpu usage now is ~95%. I get 2095ppd with the 2799 wu doing each frame in 40s. hd3850(256mb) clocked at 864/999.
 
Last edited:
Back
Top Bottom