R9 290X and 290 Blog: modernizing multi-GPU gaming with XDMA

GoogalyMoogaly · 3 Jan 2014 at 23:14

I can't say I've noticed any difference with my bridgless 290 CF compared to my bridged 7950 CF setup. While it is a fiddle, I quite like the bridges.
Maybe it'd be easier if you could try CF with a bridge and CF without a bridge on the same setup.

If XDMA is so good, why didn't the other R9 200 series cards get it? I realise they're 7900/7800/etc. cards with new stickers on but if XDMA is so good couldn't it have been worked in?
Is XDMA not really that special or are the other R9 200 cards not worth bothering with?

While I'm not suggesting it was as advanced a technology, but I remember being able to SLI 2 Nvidia GTS 250 cards without a bridge cable years ago! Nice to see we've moved so far forward we're back there again

Orangey · 3 Jan 2014 at 23:25

Thanks Warsam, going to add it to the 290 thread.

pgi947 · 4 Jan 2014 at 00:27

andybird123 said:
On a single screen he did, 7680 tests that Vega did showed more like 20% iirc in BF3

If so, PCI bus speed could be very important once 4K becomes mainstream.

GoogalyMoogaly said:
If XDMA is so good, why didn't the other R9 200 series cards get it? I realise they're 7900/7800/etc. cards with new stickers on but if XDMA is so good couldn't it have been worked in?
Is XDMA not really that special or are the other R9 200 cards not worth bothering with?

Cost, mainly. The PCB designs were already finalised and produced from the 7900's, makes little financial sense to put effort into creating another PCB revision purely to loose the crossfire fingers.

I'd expect what ever follows next from AMD to be 'fingerless' throughout the range.

Confused Stu · 4 Jan 2014 at 01:26

GoogalyMoogaly said:
I can't say I've noticed any difference with my bridgless 290 CF compared to my bridged 7950 CF setup. While it is a fiddle, I quite like the bridges.
Maybe it'd be easier if you could try CF with a bridge and CF without a bridge on the same setup.

If XDMA is so good, why didn't the other R9 200 series cards get it? I realise they're 7900/7800/etc. cards with new stickers on but if XDMA is so good couldn't it have been worked in?
Is XDMA not really that special or are the other R9 200 cards not worth bothering with?

While I'm not suggesting it was as advanced a technology, but I remember being able to SLI 2 Nvidia GTS 250 cards without a bridge cable years ago! Nice to see we've moved so far forward we're back there again

I've noticed a huge difference between my 7970s and my 290s, which I'm sure is mostly because I'm now getting frame pacing where I couldn't before (crossfire + eyefinity + bridges = no frame pacing). I have nothing particularly scientific to back it up as my 7970s died rather unexpectedly, but the smoothness of games is much greater than the difference in FPS accounts for.

It's not just NVidia that had bridgeless multi-card before the last few months, don't forget AMD's 7750, 6750 and 5750 didn't need them either!

I actually miss my crossfire bridges, I thought they added a little something to the inside of a case that marked it out as a bit special. Bit like go faster stripes for your GPUs.

DrBombcrater · 4 Jan 2014 at 01:44

GoogalyMoogaly said:
IIf XDMA is so good, why didn't the other R9 200 series cards get it? I realise they're 7900/7800/etc. cards with new stickers on but if XDMA is so good couldn't it have been worked in?
Is XDMA not really that special or are the other R9 200 cards not worth bothering with?

XDMA requires hardware support in the GPU in the form of an additional DMA engine, which is only present in the Hawaii GPU used on the 290(x). It can't be done on cards based on older GPUs, the hardware just isn't there.

Kaapstad · 4 Jan 2014 at 08:25

pgi947 said:
If so, PCI bus speed could be very important once 4K becomes mainstream.

At very high resolutions there are big gains to be had but I don't know if it is better to have very fast vram or a very wide bus or both.

At 1600p doing tests with the buses on the Titan and 290X (384bit v 512bit) comparing clock for clock the Titan seemed to have the edge as the vram speed was increased. This could be due to a number of things like better memory chips on the Titan, was the 512bit bus not being fully used @1600p or many other reasons. The one thing that can be taken is the amount of info per frame as the resolution gets higher will also increase meaning the 512 bit bus will help more.

The question is at 4K, how much of the 512 bit bus on the 290X is actually getting used. This is what needs answering by the tech review sites, not running games with reduced settings so they can get a result on card X.

andybird123 · 4 Jan 2014 at 08:34

Its about bandwidth kaap, bus x speed = bandwidth
Small bus with high speed is the same end result as bigger bus with slower mem if the bandwidth works out the same

Kaapstad · 4 Jan 2014 at 08:44

andybird123 said:
Its about bandwidth kaap, bus x speed = bandwidth
Small bus with high speed is the same end result as bigger bus with slower mem if the bandwidth works out the same

This is a bit of testing I did a while ago using Tomb Raider

http://forums.overclockers.co.uk/showthread.php?t=18564600

The strange thing was as the vram clock speed was raised the 290X gained more bandwidth but it was the Titan that gained more performance.

andybird123 · 4 Jan 2014 at 09:02

You were testing the effect of bus width so you used the same speed on each card, to test bandwidth you would need to set both cards to the same bandwidth instead

At 1251 the titan was more likely to be bandwidth limited being only 384 x 1251 vs 512 x 1251
So increasing the bandwidth on the titan derestricted it, where as the 290x was never really restricted

If my maths is right, 1251 x 512 is the same as 1661 x 384

pgi947 · 4 Jan 2014 at 09:31

Kaapstad said:
At very high resolutions there are big gains to be had but I don't know if it is better to have very fast vram or a very wide bus or both.

At 1600p doing tests with the buses on the Titan and 290X (384bit v 512bit) comparing clock for clock the Titan seemed to have the edge as the vram speed was increased. This could be due to a number of things like better memory chips on the Titan, was the 512bit bus not being fully used @1600p or many other reasons. The one thing that can be taken is the amount of info per frame as the resolution gets higher will also increase meaning the 512 bit bus will help more.

The question is at 4K, how much of the 512 bit bus on the 290X is actually getting used. This is what needs answering by the tech review sites, not running games with reduced settings so they can get a result on card X.

I was referring to the actual internal pci bus speed of the motherboard rather then pci bus width of the gpu

As in, would tri fire 290's run significantly faster at 130mhz, 16/8/8 pci gen 3 vs say 100mhz 8/8/8 pci gen 2. As the crossfire wizardry is now performed over the motherboards pci lanes rather then the traditional external bridge.

almighty15 · 4 Jan 2014 at 09:54

Kaapstad said:
At very high resolutions there are big gains to be had but I don't know if it is better to have very fast vram or a very wide bus or both.

At 1600p doing tests with the buses on the Titan and 290X (384bit v 512bit) comparing clock for clock the Titan seemed to have the edge as the vram speed was increased. This could be due to a number of things like better memory chips on the Titan, was the 512bit bus not being fully used @1600p or many other reasons. The one thing that can be taken is the amount of info per frame as the resolution gets higher will also increase meaning the 512 bit bus will help more.

The question is at 4K, how much of the 512 bit bus on the 290X is actually getting used. This is what needs answering by the tech review sites, not running games with reduced settings so they can get a result on card X.

The whole bus gets used on the 290 cards.... Why do you think it doesn't?

If it didn't get used fully then the card would..

1. Have less real world bandwidth

2. Have a stupidly inefficient design which goes to prove how rushed the 290's were.

GoogalyMoogaly · 4 Jan 2014 at 11:34

Confused Stu said:
It's not just NVidia that had bridgeless multi-card before the last few months, don't forget AMD's 7750, 6750 and 5750 didn't need them either!

I didn't own any of those cards, so I didn't know, I did own a couple GTS 250s, so that's what I based the comment on. I didn't mean to suggest that it was exclusive to Nvidia at the time, just that Nvidia had done it.

Confused Stu said:
I actually miss my crossfire bridges, I thought they added a little something to the inside of a case that marked it out as a bit special. Bit like go faster stripes for your GPUs.

Yeah, I'm a bit the same. Mind you, generally I just don't like change.

So if it uses the PCI-e bandwidth and this is more important with 4K displays, does that mean that PCI-e 3.0 will be required?
I know some people are running 290 CF on board that I believe only support PCI-e 2.1 8x/8x, is this gonna start causing problems with higher resolutions, especially if some of that bandwidth is now being used to do crossfire stuff?

Kaapstad · 4 Jan 2014 at 11:54

pgi947 said:
I was referring to the actual internal pci bus speed of the motherboard rather then pci bus width of the gpu

As in, would tri fire 290's run significantly faster at 130mhz, 16/8/8 pci gen 3 vs say 100mhz 8/8/8 pci gen 2. As the crossfire wizardry is now performed over the motherboards pci lanes rather then the traditional external bridge.

It does make quite a difference using extreme resolutions in some situations just going from PCI-E 2.0 to 3.0 I have seen tests on the internet where I think they were using three 2gb GTX 670s on BF3 at some huge resolution and using PCI-E 2.0 it was unplayable yet switching to PCI-E 3.0 made a huge difference in fps.

andybird123 · 4 Jan 2014 at 11:59

it would seem to make sense - in the 2 scenarios Paul has listed you are getting double to four times the bandwidth available to certain cards just from the pcie 2>3 and 8>16 changes, an extra 30% on top of that might not make a huge difference now but if the 20nm cards stick with pcie3.0 then that extra 30% might well make a difference in 3+ card setups

Kaapstad · 4 Jan 2014 at 12:04

almighty15 said:
The whole bus gets used on the 290 cards.... Why do you think it doesn't?

If it didn't get used fully then the card would..

1. Have less real world bandwidth

2. Have a stupidly inefficient design which goes to prove how rushed the 290's were.

You can only use the full width of a bus if you have enough data to fill it. If you only have say 128 bits of data at a given time you can not fill a 512 bit bus, I know this example is a bit extreme but if you don't have 512 bits of data all the time to send down a 512 bit bus it could be a waste. You could have lots of chunks of a smaller size that would be faster on say a 384 bit bus with faster vram chips.

Another example of this is MS DOS, if you run it on a 32 bit CPU it won't run any faster than it did on a 16 bit CPU as it is a 16 bit program and just running on a CPU with a 32 bit bus is a total waste.

Kaapstad · 4 Jan 2014 at 12:06

andybird123 said:
it would seem to make sense - in the 2 scenarios Paul has listed you are getting double to four times the bandwidth available to certain cards just from the pcie 2>3 and 8>16 changes, an extra 30% on top of that might not make a huge difference now but if the 20nm cards stick with pcie3.0 then that extra 30% might well make a difference in 3+ card setups

Do you have a link to that test they did with BF3 at huge resolutions using PCI - E 2.0 and 3.0 and showing the benefits of PCI-E 3.0

andybird123 · 4 Jan 2014 at 12:10

http://forums.evga.com/tm.aspx?m=1537816

BF3 and Heaven on 3 screen

on 2 card setups is was like 4% difference, on a 4 card setup 40%

pgi947 · 4 Jan 2014 at 12:16

andybird123 said:
http://forums.evga.com/tm.aspx?m=1537816

BF3 and Heaven on 3 screen

on 2 card setups is was like 4% difference, on a 4 card setup 40%

Crikey! So now with cross fire running from the same bus 3 & 4 card setups would surely get insta choked on 2.0? As the bus already appears to be at its maximum throughput using an external bridge, the data between cards being sent through the lanes would now also become congested?

andybird123 · 4 Jan 2014 at 12:46

Yep, if a 4 card 680 setup can saturate pcie 2.0 you can be certain that a 3+ card 290 setup can too

edit; at very high screen resolutions/multimonitor setups like the one linked to obviously
the setup in the link is the portrait version of 7680x1440 - about 11 megapixels, 4K is about 8 megapixels and 5760x1080 is about 6 megapixels, so quite where on that scale the bottleneck would appear would need some more testing

Diagro · 4 Jan 2014 at 13:17

GoogalyMoogaly said:
I didn't own any of those cards, so I didn't know, I did own a couple GTS 250s, so that's what I based the comment on. I didn't mean to suggest that it was exclusive to Nvidia at the time, just that Nvidia had done it.

Yeah, I'm a bit the same. Mind you, generally I just don't like change.

So if it uses the PCI-e bandwidth and this is more important with 4K displays, does that mean that PCI-e 3.0 will be required?
I know some people are running 290 CF on board that I believe only support PCI-e 2.1 8x/8x, is this gonna start causing problems with higher resolutions, especially if some of that bandwidth is now being used to do crossfire stuff?

According to AMD's Blog. The tech is supported on PCI-E 3 8X (for xfire) and PCI-E 2 8X setups.

There is plenty of bandwidth left over. And it says the tech will change the amount of room on the PCI-E it needs on the fly