Low cost HBM on the way, will hit mass market soon Plus HBM3 teased up to 64GB

shankly1985 · 22 Aug 2016 at 04:08

HBM3 is being worked on by SK Hynix and Samsung and will offer up to 64GB VRAM at higher speeds than HBM2, but a low-cost version of HBM is also in the works, which will feature less bandwidth but a lower cost point than HBM1 and HBM2.
The new low-cost HBM will feature increased pin speeds, from the 2Gbps on HBM2 to around 3Gbps on the new low-cost HBM while the memory bandwidth shifts from 256GB/sec per DRAM stack, to around 200GB/sec per stack. This means the upcoming low-cost HBM could reach the mass market, so we could be looking at HBM-powered notebooks and consumer graphics cards, more so than just the three from AMD that we have now in the Radeon R9 Fury X, Radeon R9 Fury and R9 Nano graphics cards.

Read more: http://www.tweaktown.com/news/53536....it&utm_medium=twitter&utm_campaign=tweaktown

HBM3 is in development, will reportedly have twice the bandwidth and a cheaper price attached to it

When the first wave of HBM arrived, we were blown away by its bandwidth (512GB/sec) but it was the form factor that really made me take a step back, allowing for super-fast graphics cards like the Radeon R9 Nano from AMD. Well, HBM2 is already here and used by NVIDIA on their Pascal-based Tesla P100 graphics card, but not in the consumer space... yet.

SK Hynix and Samsung are working on new HBM technologies, with HBM3 sitting at the top of the hill. HBM3 will offer twice the bandwidth, but it will feature a lower cost. Right now, HBM3 is known in multiple forms - SK Hynix refers to it as HBM3 or HBMx, while Samsung calls it xHBM or Extreme HBM. Either way, the next generation HBM technology is an improvement over both of its predecessors in HBM1 and HBM2. HBM2 offers 256GB/sec of bandwidth per layer of DRAM (1024GB/sec total), while HBM3 doubles that to 512GB/sec (2GB/sec+) of memory bandwidth. Better yet, HBM3 should usher in higher-end graphics cards with 64GB of HBM3, which will just be incredible. I don't think we'll see HBM3 on consumer graphics cards anytime soon, but the low-cost HBM technology that is on the way will instead be used - that or GDDR5 and GDDR5X which still offer great performance.

Read more: http://www.tweaktown.com/news/53535....it&utm_medium=twitter&utm_campaign=tweaktown

Kaapstad · 22 Aug 2016 at 06:14

I don't want HBM on any graphics card until it offers high clockspeed.

HBM2 is ok for NVidia Pascal Tesla P100 cards as they don't need high clockspeeds.

mustrum · 22 Aug 2016 at 07:01

Kaapstad said:
I don't want HBM on any graphics card until it offers high clockspeed.

HBM2 is ok for NVidia Pascal Tesla P100 cards as they don't need high clockspeeds.

And why is that? I had a Fury X before this Titan X Pascal. The Fury X wasn't in a memory bandwith limit ever. Even at 500mhz HBM gen 1 offers incredible bandwith.
Bandwith is what counts, not clockspeed.
The Titan X Pascal has a lot of bandwith but to achive that they have to use GDDR5X and a very wide memory interface wich is costly to produce.

With the Titan X Pascal you see the end of the line for non HBM memory on highend GPUs unless someone dares to bring a 512 bit interface wich i seriously doubt.

nashathedog · 22 Aug 2016 at 07:29

I think having HBM affects how well a card overclocks and that's why Nvidia went with 5x instead. Nvidia's cards use gpu boost which relies on good clocking, As for AMD none of there stuff clocks that well in comparison, and more importantly they do not have a gpu boost style system so it's not as important..

Bonjour · 22 Aug 2016 at 08:08

Whack it on a Zen APU, please.

D.P. · 22 Aug 2016 at 08:11

mustrum said:
And why is that? I had a Fury X before this Titan X Pascal. The Fury X wasn't in a memory bandwith limit ever. Even at 500mhz HBM gen 1 offers incredible bandwith.
Bandwith is what counts, not clockspeed.
The Titan X Pascal has a lot of bandwith but to achive that they have to use GDDR5X and a very wide memory interface wich is costly to produce.

With the Titan X Pascal you see the end of the line for non HBM memory on highend GPUs unless someone dares to bring a 512 bit interface wich i seriously doubt.

This is a bit mixed up. HBM on the FuryX was achieved with a very wide interface and low clock speeds. The Titan M and P achieve high bandwidth with a moderately wide interface and high clock speed.

The 290x and 390X have a wide 512bit interface and high clock speeds, so it is certainly feasible to do that again for a hallow product. A 512 bit GDDR5X interface with 14GBits chips provides mountains of bandwidth that will last at least another 2 generations of architectures. However, if HBM2/3 can be mass produced at suitable price points then it is less liekly Nvidia or AMD will go that route. It is clear with Pascal that Nvidia developed solutions for both routes but ultimately could only use HBM2 on the Tesla parts due to cost and availability. Conversely, since there isn't even a whiff of a high end AMD competitor it seems like they banked on HBM2 being widely available and saved the R&D of a GDRR5X backup plan.

Anyway, good news that HBM is being rapidly iterated. The first version left a lot to be desired and although HBM2 looks much better the cost is still very high. There is also a lo of talk that HBM is just a stop gap because it too will veyr soon require far to much power to hit the necessary bandwidth.

Kaapstad · 22 Aug 2016 at 08:26

mustrum said:
And why is that? I had a Fury X before this Titan X Pascal. The Fury X wasn't in a memory bandwith limit ever. Even at 500mhz HBM gen 1 offers incredible bandwith.
Bandwith is what counts, not clockspeed.
The Titan X Pascal has a lot of bandwith but to achive that they have to use GDDR5X and a very wide memory interface wich is costly to produce.

With the Titan X Pascal you see the end of the line for non HBM memory on highend GPUs unless someone dares to bring a 512 bit interface wich i seriously doubt.

So should I upgrade from a GTX 960 to a 1060, 1070 or Fury X ?

nashathedog · 22 Aug 2016 at 10:01

As mentioned by DP faster 5x will be available than the currently used version which I believe is 5x in it's slowest form so there's no urgency for HBM2 on the current gen or the next. I'd say the smart money is on 5x for a couple more years.

Bonjour said:
Whack it on a Zen APU, please.

It'd also do a good job if it was used with the next consoles.

Lokken86 · 22 Aug 2016 at 10:09

Kaapstad said:
I don't want HBM on any graphics card until it offers high clockspeed.

HBM2 is ok for NVidia Pascal Tesla P100 cards as they don't need high clockspeeds.

What does clock speed matter. The Fury X has a whopping 4096bit memory that does 512gbs at just 500mhz.
Imagine what HBM3 with 1Tbs on a card like Pascal TitanX could do. Even if that only has 1000mhz who cares

Yaayuh! · 22 Aug 2016 at 10:49

Bandwidth and frequency go hand in hand. Having such a bandwidth without the throughput isn't very efficient. We want both really.

JediFragger · 22 Aug 2016 at 11:01

nashathedog said:
I think having HBM affects how well a card overclocks and that's why Nvidia went with 5x instead. Nvidia's cards use gpu boost which relies on good clocking, As for AMD none of there stuff clocks that well in comparison, and more importantly they do not have a gpu boost style system so it's not as important..

I don't think overclocking has much to do with it tbh, AMD have been pushing the limits on their cards for a while now just to remain competitive

Lokken86 · 22 Aug 2016 at 11:02

Rossi~ said:
Bandwidth and frequency go hand in hand. Having such a bandwidth without the throughput isn't very efficient. We want both really.

No this is not correct. Bandwidth is the total throughput of the memory. It is the rate at which the data is transferred and is the only thing that matters. Bandwidth is derived from multiple factors including frequency, the bus width, timings etc of the memory.

Kaapstad · 22 Aug 2016 at 11:12

Lokken86 said:
No this is not correct. Bandwidth is the total throughput of the memory. It is the rate at which the data is transferred and is the only thing that matters. Bandwidth is derived from multiple factors including frequency, the bus width, timings etc of the memory.

10 billion bit bus running at 1mhz would be the worst memory ever but it's bandwidth would be fantastic lol.

We need clockspeed too.

NVidia Pascal Titans spotted running with HBM.

Lokken86 · 22 Aug 2016 at 11:36

Kaapstad said:
10 billion bit bus running at 1mhz would be the worst memory ever but it's bandwidth would be fantastic lol.

We need clockspeed too.

Explain to me why clock speed matters so much because I don't think so.

benjii · 22 Aug 2016 at 12:14

Lokken86 said:
Explain to me why clock speed matters so much because I don't think so.

Think of it like a hose pipe. A fireman's hose can allow a load more water to pass through it than your average garden hose, but if you've hooked the fireman's hose up to the outside tap in your garden, there's no point using the fireman's hose.

h4rm0ny · 22 Aug 2016 at 12:23

benjii said:
Think of it like a hose pipe. A fireman's hose can allow a load more water to pass through it than your average garden hose, but if you've hooked the fireman's hose up to the outside tap in your garden, there's no point using the fireman's hose.

That's the wrong way round. It's a question of how much water you can handle at the other end, not how much you can push through. I.e. there's no point having more memory bandwidth than the GPU itself can reasonably use.

However, that's not what they were asking. Or rather you're conflating memory frequency with GPU frequency. They're asking why double the width and half the VRAM speed would be worse than double the VRAM speed on the same width when both give the same memory bandwidth.

uZi · 22 Aug 2016 at 12:26

benjii said:
Think of it like a hose pipe. A fireman's hose can allow a load more water to pass through it than your average garden hose, but if you've hooked the fireman's hose up to the outside tap in your garden, there's no point using the fireman's hose.

Care to explain it in actual hardware terms instead of an analogy that doesn't mean anything?

AFAIK (in simple terms), the clockspeed is the number of times a second you can write/read to/from memory. The bus width is the amount of data you can push to memory in one of those clock cycles.

Unless you actually need to read/write something, which is smaller than the bus width, more than a million times a second why is a clockspeed of 1MHz any less useful than 1GHz?

Burr · 22 Aug 2016 at 12:32

Shame the article doesn't mention anything about power draw. I imagine SKH an Samsung must have found a way to keep the power in check.
https://postimg.org/image/mnow5cfml/

HBM3 + GDDR6: http://www.overclock3d.net/news/gpu_displays/hbm3_and_gddr6_have_been_detailed_at_hot_chips_28/1

benjii · 22 Aug 2016 at 12:42

I was mistaken on how HBM worked, my apologies. I just did some reading on it and essentially it's like this:

GDDR5 has a much higher clock speed than HBM. However, the issue is that the speed is largely compensating for the lack of bandwidth. To increase bandwidth on GDDR5 you need more chips, and the 512bit that AMD achieved in recent generations is at the limit of space that can be allocated for memory.

HBM tackles this by massively improving the bandwidth by stacking chips and while the clock speed is significantly less, it is able to utilise that bandwidth to the fullest. This means that despite HBM being slower, clock for clock, it's overall transfer rate is drastically higher as it has considerably more avenues for read/write.

eddyr · 22 Aug 2016 at 12:53

Some confusion of frequency with latency/response time (gpu request->ram access ->gpu) going on here perhaps, which is going to involve a lot more than just the frequency at which the dram or controller are running but also the arbitration logic and delays contributed to by various subsystems between where the data is and where it is needed. The severity of the delay on performance will depend on the workload.

You could raise the frequency and receive a performance benefit in a workload until it reaches a threshold whereby its contribution and benefit is minimal to none as the delay in other subsystems become the major remaining limitation.

I don't think anyone here, including me, is in a position to determine what that is for HBM+GPU generally or when looking at HBM as a standalone mem technology. About the best we can say is that it seemed inadequate at times on HBM+Fiji for certain gaming workloads. Simply saying HBM needs high clock speed to be any good is missing it.