Will HBM be an actual benefit on a consumer GPU anytime soon?

Caracus2k · 10 Jun 2018 at 21:41

OK so trying to avoid a red vs green arguement on current cards and remembering that you can, currently, buy cards from both teams with HBM which you can game on (if not really a gaming card from the green team as the Titan V isn't a gaming card but is a card you can game on) .....

Is there likely to be any point, for consumers, for gpu's that employ HBM any time soon?

When the first HBM cards came out from AMD it seemed to me that at lot of the opinion at the time was that AMD had paved the way being an early adopter with this type of memory and that within a generation or two all higher end cards from both teams would utilise it?

March forward a few years and AMD have kept plugging away with HBM but its not given them any appreciable edge over cards using GDDR memory (yes I know there's other considerations here) and GDDR tech seems to have kept advancing in frequencies to the point where using HBM in a consumer card looks to be an expensive way to no gain basically no more usable performance in the consumer sphere?

Zeed · 10 Jun 2018 at 21:56

when i look at it.. NOPE basically failed technology for Gamer cards

Ian Evey · 10 Jun 2018 at 22:54

Amd needed it to reduce power to still be competitive in the mid / upper mid end.

Kaapstad · 10 Jun 2018 at 23:29

HBM has managed to throttle every gaming card I have used including the Titan V.

Bandwidth is no substitute for raw clockspeed.

Titan V only gets interesting when the memory has a very big overclock.

Caracus2k · 11 Jun 2018 at 07:52

Okay so pretty much as I thought no sign yet that HBM will be a must have for consumer GPU any time soon (if ever?)

David Bisset · 11 Jun 2018 at 09:17

HBM is the better tech - it's current weakness is price. This may or may not prevent it becoming the standard for gaming cards, but the tech itself is better so assuming costs can be controlled it should become adopted across a broader price range.

Yaayuh! · 11 Jun 2018 at 10:59

HBM is very nice for mining.

Kaapstad · 11 Jun 2018 at 11:33

David Bisset said:
HBM is the better tech - it's current weakness is price. This may or may not prevent it becoming the standard for gaming cards, but the tech itself is better so assuming costs can be controlled it should become adopted across a broader price range.

It is different tech and great for mining or professional work but is not very good for gaming.

Horses for courses.

Rroff · 11 Jun 2018 at 11:41

HBM has its uses but I'm not sure it is going to distinguish itself on gaming cards any time soon - beyond 1xnm shrinks will continue to buy GDDR more time.

JediFragger · 11 Jun 2018 at 11:50

By the time HBM is fast enough to become useful on gaming cards then the next big thing will already be out.

FredFlint · 11 Jun 2018 at 16:30

HBM3 was said to be the point when the price would drop. HBM1=first gen lower capacity and high price, HBM2=HBM + higher capacity and bandwidth, HBM3 = HB2+lower cost. Don't know if it still applies as memory is at a silly price generally.

LoadsaMoney · 11 Jun 2018 at 17:01

For gaming, no, GDDRs fine for that.

LePhuronn · 11 Jun 2018 at 17:04

With GDDR6 happening I don't see HBM being viable for gaming cards any time soon.

Kei · 11 Jun 2018 at 17:41

Kaapstad said:
Bandwidth is no substitute for raw clockspeed.

That doesn't make any sense as bandwidth is determined by both the bus width and the clock speed.

Theoretical Bandwidth = memory bus clock rate × pump rate (multiplier for effective frequency, 2 for HBM, 4 for GDDR & 8 for GDDR5/5X) × bus width / 8 (number of bits/byte)

So for a stock vs stock comparison
1080ti - (1376x8x352)/8 = 484.4GB/s
Vega 64 - (945x2x2048)/8 = 483.8GB/s - a difference of 0.6GB/s in favour of GDDR5x

For the average overclock of +155 you get:
1080ti - (1531x8x352)/8 = 538.9GB/s
Vega 64 - (1100x2x2048)/8 = 563.2GB/s - a difference of 24.3GB/s in favour of HBM

With higher overclocking of +225 you get:
1080ti - (1601x8x352)/8 = 563.5GB/s
Vega 64 - (1170x2x2048)/8 = 599.0GB/s - a difference of 35.5GB/s in favour of HBM

HBM gains a lot more by overclocking as the fixed bus width is the biggest factor in the equation whereas the fixed factors with GDDR5 are both quite small meaning the frequency change needs to be much much bigger. In principal whilst you get the power savings from HBM, you should also gain a fair bit from latency reduction as the HBM is on die.

Looking a the specs of cards that use HBM, it's obvious that gaming is an after thought as their FP16/32/64 compute performance is way, way higher than that of a titan Xp. I'd hazard a guess that the pixel fill rate goes some way to determining why vega's gaming performance is rather average considering the sheer horsepower on tap, same story for titan V.

Code:

                1080Ti                Titan Xp         Titan V            Vega 64
Pixel Rate      139.2 GPixel/s     151.9 GPixel/s     139.7 GPixel/s    98.30 GPixel/s
Texture Rate    354.4 GTexel/s     379.7 GTexel/s     465.6 GTexel/s    393.2 GTexel/s
FP16 (half)     177.2 GFLOPS       189.8 GFLOPS       29,798 GFLOPS    25,166 GFLOPS
FP32 (float)    11,340 GFLOPS      12,150 GFLOPS      14,899 GFLOPS    12,583 GFLOPS
FP64 (double)   354.4 GFLOPS        379.7 GFLOPS      7,450 GFLOPS     786.4 GFLOPS

melmac · 11 Jun 2018 at 18:10

I think it's too early in the life of HBM to make any sort of call on it been a success or failure. Though it does seem like it's value is limited at the moment for gaming.

Kaapstad · 11 Jun 2018 at 20:24

Kei said:
That doesn't make any sense as bandwidth is determined by both the bus width and the clock speed.

Theoretical Bandwidth = memory bus clock rate × pump rate (multiplier for effective frequency, 2 for HBM, 4 for GDDR & 8 for GDDR5/5X) × bus width / 8 (number of bits/byte)

So for a stock vs stock comparison
1080ti - (1376x8x352)/8 = 484.4GB/s
Vega 64 - (945x2x2048)/8 = 483.8GB/s - a difference of 0.6GB/s in favour of GDDR5x

For the average overclock of +155 you get:
1080ti - (1531x8x352)/8 = 538.9GB/s
Vega 64 - (1100x2x2048)/8 = 563.2GB/s - a difference of 24.3GB/s in favour of HBM

With higher overclocking of +225 you get:
1080ti - (1601x8x352)/8 = 563.5GB/s
Vega 64 - (1170x2x2048)/8 = 599.0GB/s - a difference of 35.5GB/s in favour of HBM

HBM gains a lot more by overclocking as the fixed bus width is the biggest factor in the equation whereas the fixed factors with GDDR5 are both quite small meaning the frequency change needs to be much much bigger. In principal whilst you get the power savings from HBM, you should also gain a fair bit from latency reduction as the HBM is on die.

Looking a the specs of cards that use HBM, it's obvious that gaming is an after thought as their FP16/32/64 compute performance is way, way higher than that of a titan Xp. I'd hazard a guess that the pixel fill rate goes some way to determining why vega's gaming performance is rather average considering the sheer horsepower on tap, same story for titan V.

Code:

1080Ti Titan Xp Titan V Vega 64 Pixel Rate 139.2 GPixel/s 151.9 GPixel/s 139.7 GPixel/s 98.30 GPixel/s Texture Rate 354.4 GTexel/s 379.7 GTexel/s 465.6 GTexel/s 393.2 GTexel/s FP16 (half) 177.2 GFLOPS 189.8 GFLOPS 29,798 GFLOPS 25,166 GFLOPS FP32 (float) 11,340 GFLOPS 12,150 GFLOPS 14,899 GFLOPS 12,583 GFLOPS FP64 (double) 354.4 GFLOPS 379.7 GFLOPS 7,450 GFLOPS 786.4 GFLOPS

You are missing the point, for gaming all the bandwidth that comes with HBM is not needed but high clockspeed is. Also latency is not as important as clockspeed.

A gaming card is like a motorbike, you only need a narrow road even if you are doing 200mph.

Rroff · 11 Jun 2018 at 20:28

Kaapstad said:
You are missing the point, for gaming all the bandwidth that comes with HBM is not needed but high clockspeed is. Also latency is not as important as clockspeed.

A gaming card is like a motorbike, you only need a narrow road even if you are doing 200mph.

Latency and clockspeed have a relationship - I know what you are seeing though - just the overall way HBM has been implemented tends to make it more suited to shifting certain types of data in an optimal way versus other i.e. "big" compute tasks versus gaming and in some cases you really have to clock up the HBM a lot to overcome that.

IMO though without a deeper analysis it comes more down to how the data is being batched up, queued and use of caching and so on than purely down to bandwidth or latency, etc.

Kaapstad · 11 Jun 2018 at 20:44

Rroff said:
Latency and clockspeed have a relationship - I know what you are seeing though - just the overall way HBM has been implemented tends to make it more suited to shifting certain types of data in an optimal way versus other i.e. "big" compute tasks versus gaming and in some cases you really have to clock up the HBM a lot to overcome that.

IMO though without a deeper analysis it comes more down to how the data is being batched up, queued and use of caching and so on than purely down to bandwidth or latency, etc.

This^

KillBoY_UK · 11 Jun 2018 at 20:55

David Bisset said:
HBM is the better tech - it's current weakness is price. This may or may not prevent it becoming the standard for gaming cards, but the tech itself is better so assuming costs can be controlled it should become adopted across a broader price range.

Better in bandwidth sure if pushed, cost no, and make the package it connected to much much more difficult to manufacturer.

Kei · 11 Jun 2018 at 21:26

Rroff said:
Latency and clockspeed have a relationship - I know what you are seeing though - just the overall way HBM has been implemented tends to make it more suited to shifting certain types of data in an optimal way versus other i.e. "big" compute tasks versus gaming and in some cases you really have to clock up the HBM a lot to overcome that.

IMO though without a deeper analysis it comes more down to how the data is being batched up, queued and use of caching and so on than purely down to bandwidth or latency, etc.

It would suggest that games are not supplying enough bits to saturate the bus per clock. Or to put it another way, not enough big chunks of data infrequently vs lots of little chunks of data very frequently.

So in the same theory as before:
1080Ti - 352 bits 11008000000 times a second
Vega 64 - 2048 bits 1890000000 times a second

More analysis on this would be a great insight.