AMD Working On An Entire Range of HBM GPUs To Follow Fiji And Fury Lineup – Has Priority To HBM2 Cap

Boomstick777 · 13 Jul 2015 at 21:38

Agreed with the other peeps, for Notebooks, APU's etc this will be pure awesomeness.

N19h7m4r3 · 13 Jul 2015 at 21:42

I hope the new process node really helps Fiji spread its wings. The clock for clock performance of Fiji compared to Hawaii just doesn't seem right when you factor in the huge jump in Stream processors and mem bandwidth.

Come on AMD! Make me want you in my system!

JediFragger · 13 Jul 2015 at 21:49

N19h7m4r3 said:
Come on AMD! Make me want you in my system!

siriq111 · 13 Jul 2015 at 22:56

SK Hynix won't be he's own enemy to give only exclusive access to AMD. You know, money talks.

AllBodies · 13 Jul 2015 at 23:13

mmj_uk said:
APU's is the best place for HBM memory, AMD really need to bring out a line of motherboards with a more powerful selection of APU's and unified HBM memory... 4GB isn't going to cut it though.

4GB would be extremely good for an APU. It will be some time before an APU could use more than 4GB, since it'll run out of processing power before memory. Remember there's no benefit in the R9 390X having 8GB of memory unless you crossfire it.

That would be in the context of gaming though, if we're talking workstation APUs then that could be a different story.

Orangey · 13 Jul 2015 at 23:21

Do they use LP GDDR5 in laptops or regular? HBM should vastly improve battery life + save a lot of space. Especially when dealing with those silly amounts they love to bung in laptops, 8GB for a 980M & M390X.

Orangey · 13 Jul 2015 at 23:33

siriq111 said:
SK Hynix won't be he's own enemy to give only exclusive access to AMD. You know, money talks.

As Iron Sheik says: "money talk, money walk!"

Mauller · 14 Jul 2015 at 00:24

Kaapstad said:
Driver works fine @2160p

Problem is HBM and it's low clockspeed which is not much use for 1080p and very high fps.

HBM has a wide interface which increases the bandwidth. so more pins at lower clocks are used to transfer more data at lower power. it's not a problem with the clock of the ram as the system is completely different to how GDDR5 works. Wide instead of high clocks.

higher clocks are used to increase interface bandwidth over fewer pins as the information is transmitted at higher frequencies.

Higher resolution is less cpu bound compared to 1080p so you tend to hit GPU hardware limits compared to API limits at higher resolution. But you already know that.

And when looking at 1080p performance on the Furyx, you can see that Drivers are holding it back when you watch Gregs Thief Video. with the FX using mantle It matches gregs 1.3ghz Titan X, with the fury x at stock. Yet if you look at 1080p DX11 benchmarks on review sites, the FX is trailing the TX by 25 - 30 fps.

IT Troll · 14 Jul 2015 at 00:32

I am sure priority access will really give AMD a much needed advantage. Nvidia will have to make do with whatever HBM2 is left after AMD have produced their 40 cards.

nashathedog · 14 Jul 2015 at 00:45

Boomstick777 said:
Agreed with the other peeps, for Notebooks, APU's etc this will be pure awesomeness.

Once it's all up and running as hoped/intended then sure but these things have a tendency to hit a lot of sleeping policemen before getting to the point where there running smoothly.

Kaapstad · 14 Jul 2015 at 00:47

Mauller said:
HBM has a wide interface which increases the bandwidth. so more pins at lower clocks are used to transfer more data at lower power. it's not a problem with the clock of the ram as the system is completely different to how GDDR5 works. Wide instead of high clocks.

higher clocks are used to increase interface bandwidth over fewer pins as the information is transmitted at higher frequencies.

Higher resolution is less cpu bound compared to 1080p so you tend to hit GPU hardware limits compared to API limits at higher resolution. But you already know that.

And yet it is pretty easy to demonstrate that HBM is not good @1080p, every Fury X review has shown this. 1080p needs high clockspeed not a wide bus.

The problem with the Fury X is the exact opposite of the GTX 980 but for the same reason. The GTX 980 suffers at high resolutions because it is set up exactly opposite to the Fury X.

GTX 980 - Narrow 256 bit bus + very fast clocked memory = very good 1080p and bad 2160p performance.

Fury X - Wide 4096 bit bus + very slow clocked memory = bad 1080p and very good 2160p performance.

The two cards have the same problem but coming at it from opposite directions.

And I do wish people would stop working out bandwidth by multiplying bus width x clockspeed, it does not work like that.

Final8y · 14 Jul 2015 at 01:22

Kaapstad said:
And yet it is pretty easy to demonstrate that HBM is not good @1080p, every Fury X review has shown this. 1080p needs high clockspeed not a wide bus.

The problem with the Fury X is the exact opposite of the GTX 980 but for the same reason. The GTX 980 suffers at high resolutions because it is set up exactly opposite to the Fury X.

GTX 980 - Narrow 256 bit bus + very fast clocked memory = very good 1080p and bad 2160p performance.

Fury X - Wide 4096 bit bus + very slow clocked memory = bad 1080p and very good 2160p performance.

The two cards have the same problem but coming at it from opposite directions.

And I do wish people would stop working out bandwidth by multiplying bus width x clockspeed, it does not work like that.

Sorry but you have got it all wrong, the reason why the 290x/390x and the Fiji are better at higher resolution is because of there superior bandwidth.

Lower resolution needs less bandwidth so bandwidth is not the bottleneck, CPU and driver overhead are that's why both the 290 and the Fiji are less impressive at lower resolutions, bandwidth issues would only get worse the higher the resolutions and not get better like you are implying, if the memory bandwidth was the issue at 1080p then it would only get worse at 2160p and up, so the memory bandwidth bus is not the issue.

Mauller · 14 Jul 2015 at 01:25

Kaapstad said:
And I do wish people would stop working out bandwidth by multiplying bus width x clockspeed, it does not work like that.

You are the one who is confused. you can either go wide at lower clocks using more pins or go higher clocks over fewer pins. you end up with the same bandwidth. a 32mhz signal over 2 pins has the same overall bandwidth as a 64mhz signal over 1 pin. Data for DDR memory is measured on the rising edge and lowering edge of the signal.

https://en.wikipedia.org/wiki/Memory_bandwidth#Bandwidth_computation_and_nomenclature look it up yourself.

although it does not work in the same way you can use the analogy of fluid dynamics through pipes.

You either move the same volume at higher velocity down a thin pipe or at lower velocity down a wider pipe. you end up with the same volume of water moved in a period of time.

Plus the Fury X has greater bandwidth so will perform better overall in bandwidth constrained scenarios.

You also completely ignored the part where i mentioned Gregs own review. The majority of review sites did not test with mantle. Where greg showed that the Fury x still has a lot more grunt to show at lower resolutions. But it is being held back by overhead.

Kaapstad · 14 Jul 2015 at 01:34

Final8y said:
Sorry but you have got it all wrong, the reason why the 290 and the Fiji are better at higher resolution is because of there superior bandwidth.

Lower resolution needs less bandwidth so bandwidth is not a bottleneck CPU and driver overhead are that's why both the 290 and the Fiji are less impressive at lower resolutions, bandwidth issues would only get worse the higher the resolutions and not get better like you are implying, if the memory bandwidth was the issue at 1080p then it would only get worse at 2160p and up, so the memory bandwidth bus is not the issue.

Mauller said:
You are the one who is confused. you can either go wide at lower clocks using more pins or go higher clocks over fewer pins. you end up with the same bandwidth. a 32mhz signal over 2 pins has the same overall bandwidth as a 64mhz signal over 1 pin. Data for DDR memory is measured on the rising edge and lowering edge of the signal.

although it does not work in the same way you can use the analogy of fluid dynamics through pipes.

You either move the same volume at higher velocity down a thin pipe or at lower velocity down a wider pipe. you end up with the same volume of water moved in a period of time.

Plus the Fury X has greater bandwidth so will perform better overall in bandwidth constrained scenarios.

You also completely ignored the part where i mentioned Gregs own review. The majority of review sites did not test with mantle. Where greg showed that the Fury x still has a lot more grunt to show at lower resolutions. But it is being held back by overhead.

This reminds me of when the GTX 970 and 980 came out and I said they were not good @2160p due to their bus lol.

All you guys have got to do to prove me wrong is to show the Fiji cards doing better @1080p than GM 200 based cards, they have more bandwidth after all.

As I said to the NVidia guys at the time newer drivers won't improve things much if the cards are coming up short at one resolution and not the other, it is a hardware level problem.

Final8y · 14 Jul 2015 at 01:40

Kaapstad said:
This reminds me of when the GTX 970 and 980 came out and I said they were not good @2160p due to their bus lol.

All you guys have got to do to prove me wrong is to show the Fiji cards doing better @1080p than GM 200 based cards, they have more bandwidth after all.

As I said to the NVidia guys at the time newer drivers won't improve things much if the cards are coming up short at one resolution and not the other, it is a hardware level problem.

Its nothing to do with prove anything, its basic fundamentals and all its doing is reminding me of your mini DP thread and your assumption that it had less bandwidth than full size DP.

Im going to agree to disagree because im not going to waste a second more with you on this subject.

Kaapstad · 14 Jul 2015 at 01:48

Final8y said:
Its nothing to do with prove anything, its basic fundamentals and all its doing is reminding me of your mini DP thread and your assumption that it had less bandwidth than full size DP.

Im going to agree to disagree because im not going to waste a second more with you on this subject.

The basics of it is HBM does not perform @1080p, prove me wrong.

D.P. · 14 Jul 2015 at 01:48

Kaapstad said:
This reminds me of when the GTX 970 and 980 came out and I said they were not good @2160p due to their bus lol.

All you guys have got to do to prove me wrong is to show the Fiji cards doing better @1080p than GM 200 based cards, they have more bandwidth after all.

As I said to the NVidia guys at the time newer drivers won't improve things much if the cards are coming up short at one resolution and not the other, it is a hardware level problem.

On that I agree. The other poof that it is not a driver issue is the the fire performance is also pretty good with FuryX, driver and API overhead pretty much double going cross fire as every GPU requires its own draw calls. Every draw call will be sent in duplicate to every GPU and only there will any culling and clipping be performed.

You would also see in some games the 1080p being much faster than the 980Ti in games with lower draw calls resulting in lower driver and API overhead. The results are pretty consistent. Even the new driver that boosted draw call performance on Hawaii and tonga cards didn't improve performance on Fiji because it wasn't a driver issue.

I don't know if it is an HBM issue, I do t think it is. I think it is just an architectural issue. Fiji mostly gain in pixel shaders, but if a gave is not bottle necked by pixel shaders then you won't see any performance difference to Hawaii.

Mauller · 14 Jul 2015 at 01:50

Kaapstad said:
All you guys have got to do to prove me wrong is to show the Fiji cards doing better @1080p than GM 200 based cards, they have more bandwidth after all.

You don't read, have a short memory, or have some other problem. I never said that the memory had anything to do with its 1080p Performance, that was you. I was correcting you about how the lower clock speed of HBM is not the problem.

But here is gregs video showing the Fury x in mantle neck and neck with his Titan x.

Gregs video

http://www.tomshardware.co.uk/amd-radeon-r9-fury-x,review-33235-6.html
an example review at 1440p but shows the Furyx behind the TX at lower than 2160p. you can look for others.

Final8y · 14 Jul 2015 at 01:57

D.P. said:
On that I agree. The other poof that it is not a driver issue is the the fire performance is also pretty good with FuryX, driver and API overhead pretty much double going cross fire as every GPU requires its own draw calls. Every draw call will be sent in duplicate to every GPU and only there will any culling and clipping be performed.

You would also see in some games the 1080p being much faster than the 980Ti in games with lower draw calls resulting in lower driver and API overhead. The results are pretty consistent. Even the new driver that boosted draw call performance on Hawaii and tonga cards didn't improve performance on Fiji because it wasn't a driver issue.

I don't know if it is an HBM issue, I do t think it is. I think it is just an architectural issue. Fiji mostly gain in pixel shaders, but if a gave is not bottle necked by pixel shaders then you won't see any performance difference to Hawaii.

When it comes to multi GPU its more than a matter draw calls, its a completely different ball game, they are not comparable.

Kaapstad · 14 Jul 2015 at 01:59

Mauller said:
You don't read, have a short memory, or have some other problem. I never said that the memory had anything to do with its 1080p Performance, that was you. I was correcting you about how the lower clock speed of HBM is not the problem.

But here is gregs video showing the Fury x in mantle neck and neck with his Titan x.

Gregs video

http://www.tomshardware.co.uk/amd-radeon-r9-fury-x,review-33235-6.html
an example review at 1440p but shows the Furyx behind the TX at lower than 2160p. you can look for others.

HBM and the clockspeed it uses are linked strangely enough.

As for Mantle until GM200 cards can run it there is no point in using it as an example.

Mantle based games also tend to err favour AMD cards.