AMD Radeon R9 390X Arrives In 1H 2015 – May Feature “Hydra” Liquid Cooling

bru · 27 Sep 2014 at 09:35

humbug said:
The Bus width on HBM is stacked, for example...

Standard Bus:
256Bit @ 6000Mhz = 192GB/s (2GB, 4GB, 8GB Layout)
384Bit @ 6000Mhz = 288GB/s (3GB, 6GB, 12GB Layout)
512Bit @ 6000Mhz = 384GB/s (4GB, 8GB, 16GB Layout)

HBM 2+1 Stack
256Bit x2 (512bit effective) @ 6000Mhz = 384GB/s (2GB, 4GB, 8GB Layout)
384Bit x2 (768bit effective) @ 6000Mhz = 576GB/s (3GB, 6GB, 12GB Layout)
512Bit x2 (1024bit effective) @ 6000Mhz = 768GB/s (4GB, 8GB, 16GB Layout)

HBM 4+1 Stack
256Bit x4 (1024bit effective) @ 6000Mhz = 768GB/s (2GB, 4GB, 8GB Layout)
384Bit x4 (1536bit effective) @ 6000Mhz = 1152GB/s (3GB, 6GB, 12GB Layout)
512Bit x4 (2048bit effective) @ 6000Mhz = 1536GB/s (4GB, 8GB, 16GB Layout)

Erm exactly my point, so if the first AMD card with HBM comes with the 4+1 stack it could theoretically only have and external 128 bit bus width which would give it an effective 512 bit one.

Do people really think that it is a good idea to continue using the memory bus width as an arguing point if that is the scenario that happens. So I'll ask the question again, What do you think the next arguing point will be, Bandwidth?

Kaapstad · 27 Sep 2014 at 09:57

bru said:
Erm exactly my point, so if the first AMD card with HBM comes with the 4+1 stack it could theoretically only have and external 128 bit bus width which would give it an effective 512 bit one.

Do people really think that it is a good idea to continue using the memory bus width as an arguing point if that is the scenario that happens. So I'll ask the question again, What do you think the next arguing point will be, Bandwidth?

At high resolutions Width seems to be more important than speed.

An Elephant won't get through your front door if it is doing 3 or 30mph. I think it works the same with 4K up to a point.

Davedree · 27 Sep 2014 at 10:18

bru said:
Erm exactly my point, so if the first AMD card with HBM comes with the 4+1 stack it could theoretically only have and external 128 bit bus width which would give it an effective 512 bit one.

Do people really think that it is a good idea to continue using the memory bus width as an arguing point if that is the scenario that happens. So I'll ask the question again, What do you think the next arguing point will be, Bandwidth?

Get someone who owns a GTX 970 to do some scaling benches of games and benchmark tools at 1080p, 1440p, and even 4k if possible. Keep the gpu clock fixed at a static boost clock and run the memory at the very lowest and the very highest.
Compare the findings, display the findings and talk about it on here.

My prediction is that as good as efficient as maxwell is with the cache and 256bit bus,
You'll see greater scaling between low clocked memory to max clocked memory.
THan a 290 tested in the same way.

tommybhoy · 27 Sep 2014 at 10:58

bru said:
Do people really think that it is a good idea to continue using the memory bus width as an arguing point if that is the scenario that happens. So I'll ask the question again, What do you think the next arguing point will be, Bandwidth?

Can only be answered when it arrives really.

The bus is a genuine argument, not an issue unless running 4K imo:

The 980 is still scaling well, but the 384-bit 780 and 780 Ti are clearly scaling better, as is the 512-bit 290X.

It does appear that the raw memory bandwidth of the 780, 780 Ti, and 290X come in handy at this resolution, despite the optimizations of Maxwell

http://www.maximumpc.com/nvidia_geforce_gtx_980_review2014?page=0,3

Batman/Metro aren't vram intensive even at high res=Maxwell stretches it's legs, intensify the vram and it starts to trip up.

I would probably pay the premium and get a 970 over a 290 if I was upgrading today, very good card matching last years AMD and go 120Hz 1440p as 4K holds very little interest, 120/144Hz is more important personally.

Kaapstad said:
At high resolutions Width seems to be more important than speed.

An Elephant won't get through your front door if it is doing 3 or 30mph. I think it works the same with 4K up to a point.

Davedree · 27 Sep 2014 at 11:02

Currently I use a 7950 @ 1175 1600 at 1080p.
I would definitely purchase a gtx970 over a 290.

_Alatar_ · 27 Sep 2014 at 11:08

Kaapstad said:
At high resolutions Width seems to be more important than speed.

An Elephant won't get through your front door if it is doing 3 or 30mph. I think it works the same with 4K up to a point.

Good thing we're cramming bits instead of elephants through our memory buses so this isn't an issue. Gonna have to have a bloody small bus if you can't fit a single bit through the thing.

Bus width alone is absolutely irrelevant.

drunkenmaster · 27 Sep 2014 at 11:21

humbug said:
The Bus width on HBM is stacked, for example...

Standard Bus:
256Bit @ 6000Mhz = 192GB/s (2GB, 4GB, 8GB Layout)
384Bit @ 6000Mhz = 288GB/s (3GB, 6GB, 12GB Layout)
512Bit @ 6000Mhz = 384GB/s (4GB, 8GB, 16GB Layout)

HBM 2+1 Stack
256Bit x2 (512bit effective) @ 6000Mhz = 384GB/s (2GB, 4GB, 8GB Layout)
384Bit x2 (768bit effective) @ 6000Mhz = 576GB/s (3GB, 6GB, 12GB Layout)
512Bit x2 (1024bit effective) @ 6000Mhz = 768GB/s (4GB, 8GB, 16GB Layout)

HBM 4+1 Stack
256Bit x4 (1024bit effective) @ 6000Mhz = 768GB/s (2GB, 4GB, 8GB Layout)
384Bit x4 (1536bit effective) @ 6000Mhz = 1152GB/s (3GB, 6GB, 12GB Layout)
512Bit x4 (2048bit effective) @ 6000Mhz = 1536GB/s (4GB, 8GB, 16GB Layout)

This isn't even slightly how HBM works, fundamentally, on speeds or on total bandwidth.

A single 4 hi stack of HBM provides 128GB/s of bandwidth, no more no less and afaik the first production run will be 1GB per stack with it moving to higher density 2GB per stack(if required) within about 6-12 months, though the way production lines up it might be available by the time the first cards are available.

It also has a 128bit bus, so you need a 512bit memory controller on the gpu to access 4 stacks of 4 hi HBM. 4 stacks would give you 4x128GB/s bandwidth and 4GB(8GB when they start using the higher density chips).

768GB/s would require 6 stacks, a 768bit memory controller and give you 6GB or 12GB, 1024GB/s would require 8 stacks, a 1024bit memory controller and give you 8 or 16GB.

This is where I've read almost no information effectively, the memory controller on die. The biggest reason for the large power saving is sending a signal over say 10-50CM(the traces on a PCB can end up very long in the end when they travel up and down the PCB over up to 12 layers), when you send this signal over 1-2CM on package on copper traces a fraction of the width the power drops significantly. But I don't know how this effects the memory controller. I presume it simplifies the memory controller massively. It doesn't have to generate powerful signals so the input/output for power will be much smaller. I suspect they could be facing a situation where a conventional GDDR5 memory controller might be say 50mm^2 for 512bit but for HBM a 512bit controller may be say 40mm^2, or 20mm^2.... I really don't know. I've basically not seen anyone mention how it will change on that side.

Ultimately I don't know if it's viable to go beyond a 512bit memory controller yet. The reason we haven't done so without HBM is power/cost issues. 512bit mem controller eats power and space on die. At some point it becomes cheaper and easier to add more transistors for on die texture compression, or colour compression. We've had this for years, memory controller size and power cost increase till a new compression method helps you scale back the bandwidth required or a new memory tech comes along and helps scale power back.

I would be surprised if they jumped beyond 512bit/4 stacks HBM for the next generation. They might find they need to go to 768bit and 6 stacks to get to 6gb mark, or because of the delay getting to 20nm the 2GB stacks might be available and the cards might work fine with 6GB and 3 stacks. It's worth noting that effectively with every additional chip you stick together on an interposer effective yields go down and cost goes up, so ultimately they'll keep it as tight as possible on bandwidth/costs.

There are lots of questions that remain about HBM, because the memory is stacked is there a max thermal limit on the top chip to make sure the bottom chip is also not too hot. IE 90C might be fine for the gpu, but if the top memory chip is 90C, the bottom memory chip might be 120C. They might need the top memory chip to hit 70C max so the bottom chip doesn't hit over 90C. Will HBM overclock at all or are they locked forever at 1GHz in clock speed, how could that effect overclocking, will AMD have to include bandwidth overhead to cover overclocking.

HBM can bring benefits for certain but there could be problems, thermal, overclocking, there might be no downsides.

Either way, the bandwidth doesn't scale as simply as you think and due to cost AMD/Nvidia won't be throwing 8 stacks on an interposer to get 1024bit bandwidth any time soon.

tommybhoy · 27 Sep 2014 at 11:28

_Alatar_ said:
Bus width alone is absolutely irrelevant.

Memory speed alone is absolutely irrelevant.

Combine them though and performance would be great, but that will be reserved for T2/ti milk.

_Alatar_ · 27 Sep 2014 at 11:32

tommybhoy said:
Memory speed alone is absolutely irrelevant.

I never said it wasn't?

tommybhoy · 27 Sep 2014 at 11:50

Didn't disagree/said you did, simply pointed out they are both irrelevant individually.

Davedree · 27 Sep 2014 at 11:52

Gtx 680 memory Bus talk take II

Nvida boys - Well we don't need such big busses.
Amd boys- Well we do we got big elephants to take home.

Orangey · 27 Sep 2014 at 11:55

What if AMD got HBM out the door first and used it to create their own Titan brand of pro-cards-under-gaming-label. People would definitely pay to get early access to that.

Geeman1979 · 27 Sep 2014 at 12:19

tommybhoy said:
Can only be answered when it arrives really.

The bus is a genuine argument, not an issue unless running 4K imo:

Batman/Metro aren't vram intensive even at high res=Maxwell stretches it's legs, intensify the vram and it starts to trip up.

I would probably pay the premium and get a 970 over a 290 if I was upgrading today, very good card matching last years AMD and go 120Hz 1440p as 4K holds very little interest, 120/144Hz is more important personally.

The Bus size doesn't look like a big issue to me.

Taken from a quick look at the AnandTech Review Comparisons.

Theif 3840x2160 HQ
290x Uber/Mantle 40.0 Fps vs 980, 47.9

Battlefield 4 3840x2160 HQ
290x Uber/Mantle 38.8 Fps vs 980, 42.2

Bioshock infinite 3840x2160 HQ
290x Uber mode 30.2 Fps vs 980, 43.2

Metro last light 3840x2160 HQ
290X Uber mode 30 Fps vs 980, 37.3

Crisis 3 3840x2160 HQ
290x Uber mode 26.6 Fps vs 980, 29.1

Company of heros 3840x2160 HQ
290x Uber mode 41.3 Fps vs 980, 42.8

http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review

eddyr · 27 Sep 2014 at 12:26

andybird123 said:
Yeah, i had a pair of high clocking 670's and got roughly the same percormance from a single titan, i'm not sure where people are getting the +30% only from

Gregster said:
Agreed, stock for stock, one of my Titans is almost as fast as a pair of 680s I had.

That would be only in situations where the 680/770 was suffering bandwith limitations such as high resolution and copious AA.

Looking back +30% over the 680/770 as an avg is a little low. The 780ti seems to show +40-45% typically in benches. Il add an edit re that in my paragraph about the GK110.

http://www.anandtech.com/bench/product/1072?vs=1037
http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_780_Ti/27.html

tommybhoy · 27 Sep 2014 at 12:52

Geeman1979 said:
The Bus size doesn't look like a big issue to me.

If you want to play on lesser settings, yes it isn't a big issue, ramp up the AA like I showed earlier and performance starts bottlenecking as shown with the performance drop off.

Amd have better frame times on low settings too, if you combine the low settings/frame time data perhaps that's an indication why review guidelines are controlled/enforced by Nvidia(AMD do it too).

Zethor · 27 Sep 2014 at 13:05

Kaapstad said:
You are missing the point

The GTX 980 is a mid range card on 28nm, NVidia have got plenty of scope to pile on the transistors to build a high end card @28nm.

"The GeForce GTX 980 is the world's most advanced graphics card, powered by next-gen NVIDIA Maxwell architecture."

http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-980

Watch out for this lad, he's more informed than the manufacturer from which he buys cards. He's the Van Halen groupie who knows more about a guitar solo than Eddie Van Halen. :rolleyes:

Geeman1979 · 27 Sep 2014 at 13:10

tommybhoy said:
If you want to play on lesser settings, yes it isn't a big issue, ramp up the AA like I showed earlier and performance starts bottlenecking as shown with the performance drop off.

Amd have better frame times on low settings too, if you combine the low settings/frame time data perhaps that's an indication why review guidelines are controlled/enforced by Nvidia(AMD do it too).

Or is it due to Hitman & Tomb Raider being more Optimised on the AMD cards than the Nvidia cards. I know the last time I ran the Hitman benchmark with a pair of Titans in SLI the scores were ***** at any Res.

tommybhoy · 27 Sep 2014 at 13:16

No, as the performance hit would be across the whole Nvidia range-780/780ti don't suffer in the same way.

CAT-THE-FIFTH · 27 Sep 2014 at 13:17

Geeman1979 said:
Or is it due to Hitman & Tomb Raider being more Optimised on the AMD cards than the Nvidia cards. I know the last time I ran the Hitman benchmark with a pair of Titans in SLI the scores were ***** at any Res.

Dunno about Hitman,but within a few weeks I thought performance in TR was quite close though,unless recent driver updates changed things again??

Geeman1979 · 27 Sep 2014 at 13:22

tommybhoy said:
No, as the performance hit would be across the whole Nvidia range-780/780ti don't suffer in the same way.

So how come Batman and Metro don't suffer in the same way?