AMD Working On An Entire Range of HBM GPUs To Follow Fiji And Fury Lineup – Has Priority To HBM2 Cap

Davedree · 14 Jul 2015 at 02:03

Kaapstad said:
The basics of it is HBM does not perform @1080p, prove me wrong.

Firstly does anyone know if tonga 285 has 512kb or 1024kb of l2 cache.
Because Fury is just tonga 1792/2048 scaled up by 2x respectively.
So if you were to Run 4gb tonga 285 crossfire at the same clocks as the fury 3584 at 1080p,1440p,4k. Then in theory the 285 should never beat fury at 1080p.

edtied for mistakes

Mauller · 14 Jul 2015 at 02:08

Kaapstad said:
HBM and the clockspeed it uses are linked strangely enough.

As for Mantle until GM200 cards can run it there is no point in using it as an example.

Mantle based games also tend to err favour AMD cards.

Now you are just spewing nonsense. All memory is linked to its clock, just as it is linked to its bus/line width. And like i said, either high clock and thin, or wide and low clock. You can get the same bandwidth in the end.

You only have to look at the DX11 results to show the AMD cards behind the Nvidia ones. But you are only saying they "Favour" because they end up ahead in mantle mode. when they are no longer held back by overhead.

Kaapstad · 14 Jul 2015 at 02:13

Mauller said:
Now you are just spewing nonsense. All memory is linked to its clock, just as it is linked to its bus/line width. And like i said, either high clock and thin, or wide and low clock. You can get the same bandwidth in the end.

You only have to look at the DX11 results to show the AMD cards behind the Nvidia ones. But you are only saying they "Favour" because they end up ahead in mantle mode. when they are no longer held back by overhead.

As I said earlier prove me wrong and show me a HBM Fiji card out performing the GM200 cards @1080p.

It is a known fact that the Fiji cards are not good @1080p !!!

Mauller · 14 Jul 2015 at 02:20

Kaapstad said:
As I said earlier prove me wrong and show me a HBM Fiji card out performing the GM200 cards @1080p.

It is a known fact that the Fiji cards are not good @1080p !!!

I did, i gave you a link. Greg's Fury x being neck and neck with his Titan X.

plus the FX doing better by a large % at 1080p in mantle shows a driver/api issue, not a hardware one. regardless of the comparison being with a titan x, the FX is still beating itself in mantle compared to DX11. Plus no reviewers used mantle for their Fury x benchmarks to show this. since amd asked them not to due to the driver being immature, yet it still shows a decent uplift.

Going to finish here since there is no point continuing this conversation.

humbug · 14 Jul 2015 at 02:34

He's right ^^^^

Kaapstad said:
Driver works fine @2160p

Problem is HBM and it's low clockspeed which is not much use for 1080p and very high fps.

Kaap with respect its absolutely nothing to do with the clock speed, the Memory Bandwidth is what it is regardless of how its achieved that is the performance of it.

Kaapstad · 14 Jul 2015 at 02:43

humbug said:
He's right ^^^^

Kaap with respect its absolutely nothing to do with the clock speed, the Memory Bandwidth is what it is regardless of how its achieved that is the performance of it.

If we ever see a dramatic improvement in the Fiji 1080p performance you guys can say you were right until then the facts speak for themselves.

Fiji and HBM are not great at 1080p.

nashathedog · 14 Jul 2015 at 02:49

You can't use a game on different api's to show how a card will keep up with another card at 1080p, That's ridiculous.

humbug · 14 Jul 2015 at 03:08

Kaapstad said:
If we ever see a dramatic improvement in the Fiji 1080p performance you guys can say you were right until then the facts speak for themselves.

Fiji and HBM are not great at 1080p.

Whats the point? You see it all the time. Mauller has already explained it, as have i, and it just gets excluded, the only way any explanation is acceptable is one thats unchanged.

It performing better where the Driver overhead matters less is a reasonable test to prove the concept, but higher res to off load the CPU to GPU and low overhead API are excluded because its doing something different to what causes the bottleneck, there for only a bottleneck cause is acceptable as long as its not recognised as a bottleneck.

This isn't so much a debate about whats going on as it is about putting forward an argument and trying to make it stick, Very unfortunate but par for course with hardware enthusiasts. its a war about ones favourite team.
I do wonder sometimes Kaap, most of the time you seem like a well balanced guy with no emotional attachment to either brand, unlike some people. yet post about people not understanding bottlenecks and in the next post dismiss what is an obvious bottleneck with the Fiji chips.

nashathedog said:
You can't use a game on different api's to show how a card will keep up with another card at 1080p, That's ridiculous.

Rroff · 14 Jul 2015 at 04:04

I don't believe HBM is what is holding the Fury back at 1080 type resolution or atleast not the whole story - when you start pushing things like SPs and so on upto huge counts it becomes increasingly harder to fully utilise those capabilities and a 4K workload tends to suit that kind of thing better.

Aside from process limitations and new hardware features its part of the reason we aren't on variants of the R600 or G92 core but with like 16384 shaders, 2048 ROPs, etc.

Silent_Scone · 14 Jul 2015 at 08:10

It says top to bottom solutions. What's the point in buying Fury now. Not that you can find any even if you wanted to.

I knew this would happen. I'd like to hope it's at least a 6 to 8 months away at the very least.

andybird123 · 14 Jul 2015 at 08:25

Historically, AMD have always been "first" to a new process. But the fact is that NVIDIA sell more cards than AMD, so AMD buying up all the HBM2 capacity will just mean they are sitting on a bunch of stock that they aren't selling. People are far more likely to wait for an nvidia release than they are an AMD release.

andybird123 · 14 Jul 2015 at 08:29

humbug said:
He's right ^^^^

Kaap with respect its absolutely nothing to do with the clock speed, the Memory Bandwidth is what it is regardless of how its achieved that is the performance of it.

Actually, speed is relevant as it also relates to latency. Whilst slow wide ram will have the same throughput as thin fast ram, if you have less data to process as with 1080p vs higher resolutions, the bus width is less of an advantage, so processing thinner widths faster means your actual latency and throughput is better with the faster ram.

With memory running at 500mhz, each access takes a minimum of 2ms... With memory running at 6000mhz each access takes 0.16ms, so yeah where you need high throughput they even out, but at lower throughput levels gddr has a clear advantage.

This is likely why AMD talk about better memory management being needed with hbm, as they need to rework the memory loads to try to maximise the data being read each pass and minimise the number of calls

Orangey · 14 Jul 2015 at 09:44

Silent_Scone said:
It says top to bottom solutions. What's the point in buying Fury now. Not that you can find any even if you wanted to.

I knew this would happen. I'd like to hope it's at least a 6 to 8 months away at the very least.

There isn't even a suitable process available @ GF yet, I'd say it's at least that yes.

mmj_uk · 14 Jul 2015 at 10:41

AllBodies said:
4GB would be extremely good for an APU. It will be some time before an APU could use more than 4GB, since it'll run out of processing power before memory. Remember there's no benefit in the R9 390X having 8GB of memory unless you crossfire it.

That would be in the context of gaming though, if we're talking workstation APUs then that could be a different story.

I was thinking more along the lines of a desktop Fury Nano with a decent CPU and unified HBM embedded, it should be doable with AIO cooling, I wouldn't want such an APU with only 4GB though. I think the best thing that HBM will enable going forward is high performance unified memory.

rtho782 · 14 Jul 2015 at 10:48

I wonder if nVidia will actually use HMC not HBM, it's what they originally said they were working on, and Intel are shortly releasing a product with it - http://anandtech.com/show/9436/quick-note-intel-knights-landing-xeon-phi-omnipath-100-isc-2015

Note that the pictured Pascal we've seen doesn't have a silicon interposer. http://cdn.wccftech.com/wp-content/uploads/2014/03/NVIDIA-Pascal-GPU-Chip-Module.jpg

andybird123 · 14 Jul 2015 at 11:01

rtho782 said:
I wonder if nVidia will actually use HMC not HBM, it's what they originally said they were working on, and Intel are shortly releasing a product with it - http://anandtech.com/show/9436/quick-note-intel-knights-landing-xeon-phi-omnipath-100-isc-2015

Note that the pictured Pascal we've seen doesn't have a silicon interposer. http://cdn.wccftech.com/wp-content/uploads/2014/03/NVIDIA-Pascal-GPU-Chip-Module.jpg

I would imagine that pic is basically just a mock up and not an actual working final

ALXAndy · 14 Jul 2015 at 11:02

Shows a clear and resolute plan for 4k and I like that. It will definitely replace 1080p and will make PC gaming elite again, rather than just "Prettier than console but same resolution".

rtho782 · 14 Jul 2015 at 11:08

andybird123 said:
I would imagine that pic is basically just a mock up and not an actual working final

I imagine you're right that it's a mockup, but HMC doesn't seem to require an interposer and is what nVidia were originally working on. Seems at least a possibility.

Kaapstad · 14 Jul 2015 at 11:37

andybird123 said:
Actually, speed is relevant as it also relates to latency. Whilst slow wide ram will have the same throughput as thin fast ram, if you have less data to process as with 1080p vs higher resolutions, the bus width is less of an advantage, so processing thinner widths faster means your actual latency and throughput is better with the faster ram.

With memory running at 500mhz, each access takes a minimum of 2ms... With memory running at 6000mhz each access takes 0.16ms, so yeah where you need high throughput they even out, but at lower throughput levels gddr has a clear advantage.

This is likely why AMD talk about better memory management being needed with hbm, as they need to rework the memory loads to try to maximise the data being read each pass and minimise the number of calls

+1

You worded it a lot better than me.

Gibbo also said that clockspeed was important the other day.

Mauller · 14 Jul 2015 at 11:38

andybird123 said:
Actually, speed is relevant as it also relates to latency. Whilst slow wide ram will have the same throughput as thin fast ram, if you have less data to process as with 1080p vs higher resolutions, the bus width is less of an advantage, so processing thinner widths faster means your actual latency and throughput is better with the faster ram.

With memory running at 500mhz, each access takes a minimum of 2ms... With memory running at 6000mhz each access takes 0.16ms, so yeah where you need high throughput they even out, but at lower throughput levels gddr has a clear advantage.

This is likely why AMD talk about better memory management being needed with hbm, as they need to rework the memory loads to try to maximise the data being read each pass and minimise the number of calls

The memory latency does not work like that. The chip read latency Is constant regardless of the interface clock frequency. Latency is measured in clock cycles, it is why ddr4 has higher latency measurements than ddr3. And the memory cells being used are no different to those in ddr5, just they are stacked etc.

Plus as I showed, the 1080p performance is fine when using mantle so the low 1080p performance is being caused elsewhere than the HBM.

I thought about it last night and I think the cause is 'Over aggressive memory management' in the driver. The memory managment is in the game for thief in mantle mode so will not be affected by amds driver memory management, causing lower res performance to tank.

If you look at game memory usage at lower resolutions, such as in gregs Pcars video, the FX is using half the ram of the TX yet there is plenty off memory to use.