X79 (Nvidia) PCI-e 3.0 Reg Hack

andybird123 · 10 Jul 2012 at 12:47

http://forums.anandtech.com/showthread.php?t=2238947

at 2560x1600 you are not using the full capability of the cards - you could run that on 1 690 and get similar results

GoogalyMoogaly · 10 Jul 2012 at 13:12

I'm assuming that 690 Quad SLI example had 2 PCI-e slots runnign at 16x?
If using individual cards the speeds of the slots will drop to 8x.

I ran my 670 SLI at PCI-e 1.1 and PCI-e 2.0 16x and noticed a 10% difference at 1080p (~3300 @ PCI-e 2.0 and ~3000 @ PCI-e 1.1)

andybird123 · 10 Jul 2012 at 13:25

GoogalyMoogaly said:
I'm assuming that 690 Quad SLI example had 2 PCI-e slots runnign at 16x?
If using individual cards the speeds of the slots will drop to 8x.

I ran my 670 SLI at PCI-e 1.1 and PCI-e 2.0 16x and noticed a 10% difference at 1080p (~3300 @ PCI-e 2.0 and ~3000 @ PCI-e 1.1)

this, plus 690's communicate internally at pcie3.0 regardless of what the mobo is set to so whilst one 690 might be communicating with the other at pcie1.0 or 2.0 the primary 690 will still be running as GTX680 SLI @ pcie3.0, so whilst you might have poor scaling it won't neccesarily show up at 1600p as 1600p isn't all that demanding in comparison with 3 screen setups

Kaapstad · 10 Jul 2012 at 14:39

Single GTX690 using PCIe-1.0 PCIe-2.0 and PCIe-3.0

PCIe-1.0

PCIe-2.0

PCIe-3.0

The results were much the same as the quad sli examples

this, plus 690's communicate internally at pcie3.0 regardless of what the mobo is set to so whilst one 690 might be communicating with the other at pcie1.0 or 2.0 the primary 690 will still be running as GTX680 SLI @ pcie3.0, so whilst you might have poor scaling it won't neccesarily show up at 1600p as 1600p isn't all that demanding in comparison with 3 screen setups

If the GTX690 was 1 GPU instead of 2 the internal communication would be a lot faster than PCIe-3.0.

In the examples above where only one card was used and stressed more than two cards the results are even closer together.

On multi monitor setups GTX6xx series cards can suffer due to lack of vram. In sli the memory does not stack so we are stuck with 2gbs.

http://forums.anandtech.com/showthread.php?t=2238947
In the above test it would have been interesting to see a single GTX680 running PCIe-2.0x8 and PCIe-3.0x8. The difference in performance should be the same as the 4 card setup but at a much lower framerate.

andybird123 · 10 Jul 2012 at 14:48

...

when you have more cards and more PCIe motherboard lanes being used you need more bandwidth over the pcie bus... you have the equivalent of 2 cards on the motherboard so you aren't taxing the pcie motherboard lanes anywhere near as much as 4 GTX 680s would be, your resolution also isn't requiring enough bandwidth to make a difference

the results posted above are both on the same cards and the same resolutions, so VRAM limitation isn't a factor as both setups have the same limitation - yet pcie3.0 makes a massive difference in that setup (4 individual 680's)

4 individual cards all need to talk to each other over the motherboard, where as 2 690's halve that requirement because 2 of them talk to each other directly and then each pair shares the data sent back and forth over the mobo

if you can't see why 2 690's need less bandwidth than 4 680's then I really struggle to be able to have a conversation with you on the subject

pcie bandwidth only becomes an issue (e.g. possible to show an improvement) when you are reaching the limit of that bandwidth... from the graphs on anand it is clear that 4 individual 680's at x8 speeds need more bandwidth, where as a single card or even 2 dual-gpu cards (Running at x16 by the way) don't need more bandwidth

690's are more efficient in the way that they talk to each other (not quite half the bandwidth but almost), and they are also both connected at x16 so they have double the bandwidth available, compare with 4 680's which is effectively a 6 way conversation with half the bandwidth available as most of them are only using x8 slots

Kaapstad · 10 Jul 2012 at 15:19

when you have more cards and more PCIe motherboard lanes being used you need more bandwidth over the pcie bus... you have the equivalent of 2 cards on the motherboard so you aren't taxing the pcie motherboard lanes anywhere near as much as 4 GTX 680s would be, your resolution also isn't requiring enough bandwidth to make a difference

the results posted above are both on the same cards and the same resolutions, so VRAM limitation isn't a factor as both setups have the same limitation - yet pcie3.0 makes a massive difference in that setup (4 individual 680's)

4 individual cards all need to talk to each other over the motherboard, where as 2 690's halve that requirement because 2 of them talk to each other directly and then each pair shares the data sent back and forth over the mobo

if you can't see why 2 690's need less bandwidth than 4 680's then I really struggle to be able to have a conversation with you on the subject

pcie bandwidth only becomes and issue (e.g. possible to show an improvement) when you are reaching the limit of that bandwidth... from the graphs on anand it is clear that 4 individual 680's at x8 speeds need more bandwidth, where as a single card or even 2 dual-gpu cards (Running at x16 by the way) don't need more bandwidth

A good way of stopping a CPU throttling your GPU is to turn up the screen resolution and or ingame settings. This reduces the traffic over the PCIe slots to your CPU due to the reduced frame rate.

If you really want to see what PCIe-3.0 can do. The way to go would be to run a multi GPU setup with a highly overclocked CPU using a benchmark which runs at low resolution generating hundreds of frames a second to work the PCIe-3.0 slots.

Increasing the resolution to what anand used for their tests should reduce the amount of work the PCIe-3.0 slots do due to the lower frame rate and more being done on the GPUs. If they are getting a big difference between PCIe-2.0 and 3.0 I suspect the bottleneck is somewhere else.

This is an interesting topic which comes up quite a lot on these forums. If anyone whats to chip in I would be interested as I want to learn more and am prepared to stick my hand up if im wrong.

andybird123 · 10 Jul 2012 at 16:29

Kaapstad said:
A good way of stopping a CPU throttling your GPU is to turn up the screen resolution and or ingame settings. This reduces the traffic over the PCIe slots to your CPU due to the reduced frame rate.

increasing the resolution increases the amount of time it takes for each GPU to complete a frame, but it also INCREASES the data moving back and forth over the pcie slots, not decreases

it reduces the workload of the CPU itself, but all 4 cards need to talk to each other more, the more workload each of them is being tasked with - overclocking the CPU doesn't give you more pcie bandwidth

running at low resolution will always cause the CPU to become the bottleneck not the pcie bandwidth, as can be seen by the fact that 2 or even 3 of the 680's will just start running at 10% or less utilisation... I'm running a 3930k at 4.5ghz with GTX670 SLI and even I can demonstrate this by running at low resolution and seeing my SLI utilisation go down (which will also see pcie bandwidth requirements go down), and my CPU utilisation go up

your 690's always have part of the conversation going on at 16x/16x even if you reduce the middle bit to x8 or x4 (by dropping the pcie to 2.0 or even 1), there is still a big chunk of that conversation going on at the higher speed

with quad 680's you start off at 16/8/8/8 so you are already at a disadvantage and then if you go to pcie 2.0 you are then at 8/4/4/4 so almost all of the traffic is limited to x4 and none of it is happening at x16

you are never going to be able to replicate the effects of running proper quad SLI using your 690's as the way in which they make use of pcie is so totally different - I guess in this way the 690's justify their expense as they mean you can run a much lower specced motherboard and get similar results to an x79/quad 680 setup

Kaapstad · 10 Jul 2012 at 18:23

with quad 680's you start off at 16/8/8/8 so you are already at a disadvantage and then if you go to pcie 2.0 you are then at 8/4/4/4 so almost all of the traffic is limited to x4 and none of it is happening at x16

@PCIe 2.0 you are still @16/8/8/8 the number of PCIe lanes do not change only the bandwidth going from PCIe 3.0 to 2.0

GoogalyMoogaly · 10 Jul 2012 at 20:46

Kaapstad said:
@PCIe 2.0 you are still @16/8/8/8 the number of PCIe lanes do not change only the bandwidth going from PCIe 3.0 to 2.0

I'm guessing he meant effectively running at 8/4/4/4 because PCI-e 2.0 is about half the speed of PCI-e 3.0.
So if you run 4 cards at PCI-e 2.0 speeds you would effectively be getting 8/4/4/4 PCI-e 3.0 speeds.

The question is would a 670/680 (or 7950/7970) hit bandwidth issues if running at PCI-e 3.0 4x. I think even your tests show you would as I believe PCI-e 3.0 @ 4x is the same as PCI-e 1.1 @ 16x.

GoogalyMoogaly · 10 Jul 2012 at 20:49

BaDBoY_uK said:
http://www.tomshardware.co.uk/forum/310170-12-pcie

Google link

I have just done the reg hack, so haven't tried the guide not written in English

Post your results

Thanks, that's what I was after.

Durzel · 10 Jul 2012 at 21:15

andybird123 said:
http://forums.anandtech.com/showthread.php?t=2238947

at 2560x1600 you are not using the full capability of the cards - you could run that on 1 690 and get similar results

Not with Tesselation on Extreme, at least from my experience too.

BaDBoY_uK · 10 Jul 2012 at 22:04

GoogalyMoogaly said:
Thanks, that's what I was after.

I've just checked my results in the reg and it's 0004 from the reg hack! Still **** score in Heaven

Kaapstad · 10 Jul 2012 at 22:29

I've just checked my results in the reg and it's 0004 from the reg hack! Still **** score in Heaven

What sort of scores were you getting before and after.

andybird123 · 10 Jul 2012 at 22:53

Kaapstad said:
@PCIe 2.0 you are still @16/8/8/8 the number of PCIe lanes do not change only the bandwidth going from PCIe 3.0 to 2.0

Facedesk

Kaapstad · 10 Jul 2012 at 23:42

Facedesk

As Muhammad Ali the boxer used to say
"I don't know what it means but if its good thats me"
Ha Ha

BaDBoY_uK · 11 Jul 2012 at 08:34

Kaapstad said:
What sort of scores were you getting before and after.

Before only around 2100
After only around 2100

1920x1080, normal tes, 4x

Not good

Kaapstad · 11 Jul 2012 at 10:39

Before only around 2100
After only around 2100

1920x1080, normal tes, 4x

Not good

Looking at the scores on the Heaven 3 thread your in the middle of the pack for GTX 680s. Thing is this is not the best benchmark for us nvidia users.

Have you had a run with 3Dmark11 we tend to do a lot better in that.

clone · 11 Jul 2012 at 13:21

Didnt need to do the hack on my setup ran PCIE 3.0 from the start

GoogalyMoogaly · 11 Jul 2012 at 13:33

clone said:
Didnt need to do the hack on my setup ran PCIE 3.0 from the start

Because you have a 690.

BaDBoY_uK · 11 Jul 2012 at 13:38

Kaapstad said:
Looking at the scores on the Heaven 3 thread your in the middle of the pack for GTX 680s. Thing is this is not the best benchmark for us nvidia users.

Have you had a run with 3Dmark11 we tend to do a lot better in that.

I'll have a go tonight and see, cheers

GoogalyMoogaly said:
Because you have a 690.

He is lying haha!