AMD Locks Down HBM Frequency on Fiji Cards

Greebo · 19 Jun 2015 at 14:01

The thing is gddr5 was reaching a a maximum.

HBM is almost as fast as having on chip integrated ram.

The main point with HBM is that it offers 3x the bandwidth per watt.

Which means you can either have smaller gfx cards or you can use the surplus power on the core to make faster ones.

tommybhoy · 19 Jun 2015 at 14:09

Silent_Scone said:
Hi,

You're entitled to make your own speculative reason informed or uninformed. Just as much as you're entitled to not care either way, no peer pressure here. If everything was simply waiting and relaying legalisation laid out by manufacturers we might as well all be robots.

Fair play, doesn't change anything from what we know though, well apart from speculative negativity from certain quarters, but that's the way people of the forum run.

I'm enjoying your politeness btw, much easier to have a convo, keep it up.

Bye for now.

Silent_Scone · 19 Jun 2015 at 14:12

tommybhoy said:
Fair play, doesn't change anything from what we know though, well apart from speculative negativity from certain quarters, but that's the way people of the forum run.

Bye.

Hi, not sure what the latter part of your post is in relation to here. I made it clear in my opening post that personally it would not stop me from buying this card (I can't say 'The Fury' as it sounds too dramatical)

tommybhoy · 19 Jun 2015 at 14:18

Course you do, I never mentioned specific posters but we all know the type.

drunkenmaster · 19 Jun 2015 at 14:19

There are LOTS of things people don't understand about HBM memory so I'll try to do some points without too much info and without being too long..... I said TRY

1/ 512GB/s from gddr5 and HBM isn't equivalent. HBM has more efficient usage of bandwidth. For many reasons you never get 'full' bandwidth, HBM and HMC are designed for higher efficiency of bandwidth meaning you can use more of the available bandwidth. 512GB/s of gddr5 may only have 70% effective bandwidth where HBM might have 85% effective bandwidth.

In that scenario while 512GB/s is 60% more than 320GB/s, factoring in the efficiency you actually have 435GB/s vs 224GB/s which is over a 90% increase in effective bandwidth. Increasing the efficiency as shown with the example increases the effective bandwidth increase HBM is really providing. The actual efficiency numbers for both I don't know, I have seen in many papers/articles that HBM and HMC are designed to increase efficiency by a decent amount over GDDR5. Basically 512GB/s HBM is a significantly bigger increase in bandwidth than most think.

2/ With current gpu's you can scale bandwidth/clock speeds easily and tune memory speed to what the gpu actually needs at stock, anything beyond it is wasted power. With HBM they are running at the lowest clocks/voltages possible, there are upper and lower voltage/clock speed limits to all chips and HBM isn't likely tuned to what the GPU core needs at stock speed, it's merely the lowest bandwidth possible with HBM at the lowest clocks with the lowest number of stacks possible to get 4GB currently. As such while 99% of gpu's have the bandwidth they need at stock, Fiji likely has(coupled with point 1) WAY more bandwidth than is currently required at stock. So Fiji likely has a significant amount of excess bandwidth such that increasing core clocks won't become bandwidth limited.

3/ HBM, HMC, and really all stacked chips have complex temperature monitoring and throttling. Increasing speed, voltage and temp would improve speed for the top chips but ultimately cause more throttling on the bottom chips anyway. This would often cause uneven performance. If the data is in the top chip it's faster, but if it's in the bottom chip which may be throttled or turned off temporarily then it can be slower.

So overclocking may not improve performance and could potentially decrease it.

4/ I haven't honestly seen it spoken about anywhere but HBM might well have static clocks anyway and be unable to overclock. In the future this could prove a problem, hopefully we will see AMD/Nvidia factor in some headroom in bandwidth in the future. This may not be the case, as bandwidth goes through the roof architecture will be tuned towards utilising the higher bandwidth better. in 1-2 generations they might depend on massive bandwidth, be tuned right on the limit and become bandwidth limited when overclocking.

Kaapstad · 19 Jun 2015 at 14:22

With all the arguing about how good HBM is or not have people missed the obvious.

In AMDs leaked benchmarks the Fury X does slightly better than a 980 Ti, if the card had been using GDDR5 instead the bench results look like they would have come out about the same.

rtho782 · 19 Jun 2015 at 14:24

drunkenmaster said:
1/ 512GB/s from gddr5 and HBM isn't equivalent. HBM has more efficient usage of bandwidth. For many reasons you never get 'full' bandwidth, HBM and HMC are designed for higher efficiency of bandwidth meaning you can use more of the available bandwidth. 512GB/s of gddr5 may only have 70% effective bandwidth where HBM might have 85% effective bandwidth.

In that scenario while 512GB/s is 60% more than 320GB/s, factoring in the efficiency you actually have 435GB/s vs 224GB/s which is over a 90% increase in effective bandwidth. Increasing the efficiency as shown with the example increases the effective bandwidth increase HBM is really providing. The actual efficiency numbers for both I don't know, I have seen in many papers/articles that HBM and HMC are designed to increase efficiency by a decent amount over GDDR5. Basically 512GB/s HBM is a significantly bigger increase in bandwidth than most think.

Please give me one proper source that supports this. Since when is there some massive overhead accessing vram.

Power efficiency is much greater with HBM, short signal traces etc. It's what has allowed AMD to allocate more power budget to compute and pack in 4096 shaders.

I'm not sure where you have invented this "bandwidth efficiency" stuff from.

Deleted User 234324 · 19 Jun 2015 at 14:24

I'd be inclined to agree that the reason for the lock down is the clock speed is either static no matter what or the headroom for overclocking is merely single digit improvements to clock speed and offers little if any benefit, with a strong chance of breaking the hardware.

Core clock speed on the other hand is probably going to have a boat load of headroom for 10%+ or much higher increases, there's already a massive amount of headroom in the cooling available.

rtho782 · 19 Jun 2015 at 14:26

Kaapstad said:
With all the arguing about how good HBM is or not have people missed the obvious.

In AMDs leaked benchmarks the Fury X does slightly better than a 980 Ti, if the card had been using GDDR5 instead the bench results look like they would have come out about the same.

I think HBM has massive potential for APUs, and even CPUs. APUs have always been limited by the bandwidth of system ram.

Give me an i3 with 2GB HBM, an i5 with 4GB, and an i7 with 4GB+DDR3 memory controller, and I'll bite your hand off.

ukxenon · 19 Jun 2015 at 14:30

sit and wait for actual real life benchmarks and see what happens

Silent_Scone · 19 Jun 2015 at 14:33

drunkenmaster said:
There are LOTS of things people don't understand about HBM memory so I'll try to do some points without too much info and without being too long..... I said TRY

1/ 512GB/s from gddr5 and HBM isn't equivalent. HBM has more efficient usage of bandwidth. For many reasons you never get 'full' bandwidth, HBM and HMC are designed for higher efficiency of bandwidth meaning you can use more of the available bandwidth. 512GB/s of gddr5 may only have 70% effective bandwidth where HBM might have 85% effective bandwidth.

In that scenario while 512GB/s is 60% more than 320GB/s, factoring in the efficiency you actually have 435GB/s vs 224GB/s which is over a 90% increase in effective bandwidth. Increasing the efficiency as shown with the example increases the effective bandwidth increase HBM is really providing. The actual efficiency numbers for both I don't know, I have seen in many papers/articles that HBM and HMC are designed to increase efficiency by a decent amount over GDDR5. Basically 512GB/s HBM is a significantly bigger increase in bandwidth than most think.

2/ With current gpu's you can scale bandwidth/clock speeds easily and tune memory speed to what the gpu actually needs at stock, anything beyond it is wasted power. With HBM they are running at the lowest clocks/voltages possible, there are upper and lower voltage/clock speed limits to all chips and HBM isn't likely tuned to what the GPU core needs at stock speed, it's merely the lowest bandwidth possible with HBM at the lowest clocks with the lowest number of stacks possible to get 4GB currently. As such while 99% of gpu's have the bandwidth they need at stock, Fiji likely has(coupled with point 1) WAY more bandwidth than is currently required at stock. So Fiji likely has a significant amount of excess bandwidth such that increasing core clocks won't become bandwidth limited.

3/ HBM, HMC, and really all stacked chips have complex temperature monitoring and throttling. Increasing speed, voltage and temp would improve speed for the top chips but ultimately cause more throttling on the bottom chips anyway. This would often cause uneven performance. If the data is in the top chip it's faster, but if it's in the bottom chip which may be throttled or turned off temporarily then it can be slower.

So overclocking may not improve performance and could potentially decrease it.

4/ I haven't honestly seen it spoken about anywhere but HBM might well have static clocks anyway and be unable to overclock. In the future this could prove a problem, hopefully we will see AMD/Nvidia factor in some headroom in bandwidth in the future. This may not be the case, as bandwidth goes through the roof architecture will be tuned towards utilising the higher bandwidth better. in 1-2 generations they might depend on massive bandwidth, be tuned right on the limit and become bandwidth limited when overclocking.

Static clocks may well be the reason with error correction in tow. I'm not really sure where you're pulling these numbers from with theoretical bandwidth though. That said there is no denying efficiency will be much improved

David Bisset · 19 Jun 2015 at 14:36

rtho782 said:
The latency from the signal travelling along the circuitboard, at the speed of light, is nothing.

We don't actually know what the memory timings are like, I suspect measured in nanoseconds they will be similar to other memory.

The things you have named are big advantages - lower power, smaller footprint - but don't have anything to do with how fast it is.

It's 33% faster than 390x. Not 800% faster.

An SSD is no faster on a 1inch SATA cable than it is on an 18inch one. mSATA SSDs tend to be slower.

Not true I'm afraid, it gets quite significant over even a fairly short trace.

I'd agree it's not 800% faster but I'd disagree that a straight throughput comparison is valid as the advantages from the elimination of having to handle varying trace lengths etc are totally ignored in that case.

Rroff · 19 Jun 2015 at 14:37

Kaapstad said:
With all the arguing about how good HBM is or not have people missed the obvious.

In AMDs leaked benchmarks the Fury X does slightly better than a 980 Ti, if the card had been using GDDR5 instead the bench results look like they would have come out about the same.

We don't really have the data to support that one way or another - neither can we really see how much if any architecture changes have an impact on how useful or not overclocking HBM would be - in the future that might change but I don't see big enough changes compared to the current GPUs that it would significantly change the story there and by and large outside of a limited number of synthetic benchmarks VRAM overclocking doesn't generally yield much unless your trying to get that last 0.1% for a benchmark world record.

David Bisset said:
Not true I'm afraid, it gets quite significant over even a fairly short trace.

I'd agree it's not 800% faster but I'd disagree that a straight throughput comparison is valid as the advantages from the elimination of having to handle varying trace lengths etc are totally ignored in that case.

Its far from something I have much knowledge of but IIRC even over short traces you have to deal with the wavelength of the signal as well as the interconnect length. Also with longer traces you potentially have to start dealing with parasitics and other interference, etc. that can cause extra latency if your having to deal with that either through extra tolerances, error correction, cleaning up the signal, etc.

Stanners · 19 Jun 2015 at 14:52

It maybe that AMD have decided that after many years usage of DDR memory that they do not want to let Joe public loose overclocking, as much smaller increments are required with HBM as opposed to what was used on DDR. It would seem a sensible solution if damage can be caused.

Mauller · 19 Jun 2015 at 14:57

rtho782 said:
The latency from the signal travelling along the circuitboard, at the speed of light, is nothing.

We don't actually know what the memory timings are like, I suspect measured in nanoseconds they will be similar to other memory.

Latency improvements come more from fewer logic controllers managing larger amounts of memory. So you have a logic controller managing 4 - 8hi stacks compared to a logic controller per memory die.

also signal propagation is dependent on material used, not sure what it is for copper but it can vary from 50 - 99% the speed of light. Although shorter traces would have less interference which improves latency over the trace. It is why Ram slots are so close to a cpu instead of being spread out.

I am sure i read that HBM is bit writable compared to block writable like GDDR. This also improves read/write latency but dont quote me.

queamin · 19 Jun 2015 at 14:57

Stanners said:
It maybe that AMD have decided that after many years usage of DDR memory that they do not want to let Joe public loose overclocking, as much smaller increments are required with HBM as opposed to what was used on DDR. It would seem a sensible solution if damage can be caused.

I agree with this, I wouldn't let some people on those forums near over clocking HBM never mind Joe public.

rtho782 · 19 Jun 2015 at 14:58

Rroff said:
We don't really have the data to support that one way or another - neither can we really see how much if any architecture changes have an impact on how useful or not overclocking HBM would be - in the future that might change but I don't see big enough changes compared to the current GPUs that it would significantly change the story there and by and large outside of a limited number of synthetic benchmarks VRAM overclocking doesn't generally yield much unless your trying to get that last 0.1% for a benchmark world record.

Its far from something I have much knowledge of but IIRC even over short traces you have to deal with the wavelength of the signal as well as the interconnect length. Also with longer traces you potentially have to start dealing with parasitics and other interference, etc. that can cause extra latency if your having to deal with that either through extra tolerances, error correction, cleaning up the signal, etc.

You're right, there are lots of engineering challenges to fast buses, especially fast, wide busses. If you are having the RAM on the board you need to deal with trace lengths, signal reflection, etc. You need lots of error correction etc to handle it.

This is partly why GDDR5 is more power hungry than say, HBM. All that error correction and driving the signal at higher voltages has a cost.

But, the resultant implementation, be it 512bit 6Gbps, or 384bit 7Gbps, is what you've been able to achieve once you've dealt with the above, so 384GB/s is still 384GB/s.

HBM is revolutionary and it lays the start point for more revolution, but it's still 33% faster than GDDR5 in 390X.

Stanners · 19 Jun 2015 at 15:00

queamin said:
I agree with this, I wouldn't let some people on those forums near over clocking HBM never mind Joe public.

TheMorningStar · 19 Jun 2015 at 15:00

rtho782 said:
I think HBM has massive potential for APUs, and even CPUs. APUs have always been limited by the bandwidth of system ram.

Give me an i3 with 2GB HBM, an i5 with 4GB, and an i7 with 4GB+DDR3 memory controller, and I'll bite your hand off.

This is what I have been thinking about, in theory they will dominate laptop gaming with an APU combined with HBM, next gen of consoles will be interesting to look at too.

queamin · 19 Jun 2015 at 15:02

Cannot wait to see what AMD wll do with their APU's and GCN with HBM.