AMD Fiji HBM limited to 4GB stacked memory

Kaapstad · 11 Feb 2015 at 12:19

SiDeards73 said:
In reality then these first gen HBM cards are still going to be 4K limited in single gpu use, while they will offer great performance below 4k and will blow pretty much all other gpu's away.

I don't think they will be much faster than cards not using HBM. Once the memory is fast enough to keep up with the GPU anything over the top is wasted. To get more speed the GPU needs to be faster to use the extra memory speed.

Kaapstad · 11 Feb 2015 at 12:19

JamesM said:
Depends on the speed of their core too. It doesn't matter what much or little vram the card has if the core hasn't got the grunt to make use of it.

+1

fs123 · 11 Feb 2015 at 12:21

Not sure if that package picture is showing the interposer/dram layer on top of the gpu itself or is it a separate chip. If it is separate I don't see why they can't have 2 x interposer/dram chips to make it up to 8GB.

drunkenmaster · 11 Feb 2015 at 12:22

Jesus, that article and most of the posts in this thread are so completely daft.

One, HBM requires an interposer, HBM2 doesn't change this, there is no moving to a single chip design. The interposer is the thing that makes HBM work over HMC. HMC uses normal bumps off package design, HBM is an on package design.

Second, HBM IS STACKED RAM, Fud you unbelievably ignorant ****. HBM is a stack of 4 to 8 chips, without chip stacking HBM nor HMC could exist. Nvidia are going to be using HBM, they are going to be using an interposer, they are absolutely not using a "better method than AMD", they are using the exact same method in exactly the same way. Group of HBM chips connected via interposer to the gpu to get the memory on package in as low a power and high bandwidth way as possible.

"Nvidia on the other hand is using Vertical stacking 3D, or on-package sacked DRAM for its Pascal 2016 GPUs".

The first pic he shows in the article is attempting to show that Vertical stacking of various chips with the memory on top of the processor, that is Vertical stacking, it is absolutely NOT what Nvidia is using. We've seen pics of Pascal designs with the 4 stacks of memory AROUND the gpu, not on top of it. Second part of his sentence, calling it on package, on package means not ON the chip but on the same package, you do this with an interposer.

Almost no one has done this yet, it's barely suitable(as yet) for extremely low power processors, it is completely unsuitable for high power processors. It reduces cooling, has significantly worse yield implications(and therefore cost) than using an interposer and offers no particular advantage for discrete gpus. In a ultra small device, sure, we're talking watches and wearables, even phones don't really need it, though they'll go that way.

"From what we've learned,...., with the current technology the GPU would simply be too big to put on an interposer and package".

You can't connect HBM to a gpu without the HBM and GPU being on the interposer, and once they are both on the interposer they are on the same package.

Fud is so utterly stupid and shows such a completely fundamental misunderstanding of the technology it's painful to read. You can safely discount the entire article because it's complete rubbish start to finish, he has zero understanding of the actual technologies involved.

D1craig · 11 Feb 2015 at 12:22

So if AMD is saying this. Does that not out them in the same boat as Nvidia with the 970 VRAM debacle?

Doesn't say anywhere in them 8gb cards that once they reach 4gb anything more is slower vram?

Kaapstad · 11 Feb 2015 at 12:24

Noctifer said:
Can't they use the compression technology that was used in Tonga? If they could 4GB of VRAM should be enough.

Compression only helps to move the data but when it arrives in memory it still needs the same amount of space as a system that does not use it.

Compression also hinders performance when used in a lot of things as time and sometimes extra hardware is needed to do the compressing.

Rroff · 11 Feb 2015 at 12:25

pmc25 said:
Should give AMD an entire year with what should amount to an impregnible performance lead in any game that values memory bandwidth, regardless of what chips NVIDIA has.

I wouldn't count nVidia out, though they'd have to stop being cheap on VRAM/interface - manufacturers have started moving to new shrunk GDDR5 and its quite capable of (in a 512bit configuration) putting up speeds not that far from what is being quoted in the article - supposedly nVidia has a big order in with Micron though that might not be for desktop GPUs.

drunkenmaster · 11 Feb 2015 at 12:28

Rroff said:
Dunno I've never really been VRAM bandwidth limited when playing with 4K - clocking the core gave the most significant gains and it was easy to hit VRAM amount limits - 4GB will be less than ideal at 4K.

Remember that the architectures are fundamentally designed to work within a given limit, be that shaders, rops, bandwidth, whatever. If you have lets say 250GB/s then you design a chip that uses that much bandwidth. When you move the goal posts, you move the architecture with it. What compromises are made to reduce bandwidth usage? How much die space and power is dedicated to reducing bandwidth usage that can now be assigned to more shader/rops/tmus when you have ample bandwidth, or what else can you add.

Designs change, we'll see how quickly. Personally I expect more than anything this is going to have more bandwidth than it requires, it's more of a test vehicle, but when they get it working well and optimise it that will all work towards the next design. When they can role out HBM across the range of things they make, maybe with GCN 2.0/14nm chips, then we might get a design based around the concept of having doubled the available bandwidth.

Comparing todays cards bandwidth needs to that of a future one is rather arbitrary, in the same way people discounted the 980 for only having a 256bit bus and therefore not enough bandwidth(I didn't, but specifically pointed out that generation to generation comparison is pointless as the way memory is used changes, efficiency of access, compression methods, etc), yet when it came out people realised that while it had significantly less bandwidth it found new ways to work to a different amount of bandwidth.

Kaapstad · 11 Feb 2015 at 12:28

drunkenmaster said:
Jesus, that article and most of the posts in this thread are so completely daft.

One, HBM requires an interposer, HBM2 doesn't change this, there is no moving to a single chip design. The interposer is the thing that makes HBM work over HMC. HMC uses normal bumps off package design, HBM is an on package design.

Second, HBM IS STACKED RAM, Fud you unbelievably ignorant ****. HBM is a stack of 4 to 8 chips, without chip stacking HBM nor HMC could exist. Nvidia are going to be using HBM, they are going to be using an interposer, they are absolutely not using a "better method than AMD", they are using the exact same method in exactly the same way. Group of HBM chips connected via interposer to the gpu to get the memory on package in as low a power and high bandwidth way as possible.

The first pic he shows in the article is attempting to show that Vertical stacking of various chips with the memory on top of the processor, that is Vertical stacking, it is absolutely NOT what Nvidia is using. We've seen pics of Pascal designs with the 4 stacks of memory AROUND the gpu, not on top of it. Second part of his sentence, calling it on package, on package means not ON the chip but on the same package, you do this with an interposer.

Almost no one has done this yet, it's barely suitable(as yet) for extremely low power processors, it is completely unsuitable for high power processors. It reduces cooling, has significantly worse yield implications(and therefore cost) than using an interposer and offers no particular advantage for discrete gpus. In a ultra small device, sure, we're talking watches and wearables, even phones don't really need it, though they'll go that way.

You can't connect HBM to a gpu without the HBM and GPU being on the interposer, and once they are both on the interposer they are on the same package.

Fud is so utterly stupid and shows such a completely fundamental misunderstanding of the technology it's painful to read. You can safely discount the entire article because it's complete rubbish start to finish, he has zero understanding of the actual technologies involved.

As soon as I posted the article I wondered how long it would take to get a reply out of you.

Rroff · 11 Feb 2015 at 12:33

drunkenmaster said:
Remember that the architectures are fundamentally designed to work within a given limit, be that shaders, rops, bandwidth, whatever. If you have lets say 250GB/s then you design a chip that uses that much bandwidth. When you move the goal posts, you move the architecture with it. What compromises are made to reduce bandwidth usage? How much die space and power is dedicated to reducing bandwidth usage that can now be assigned to more shader/rops/tmus when you have ample bandwidth, or what else can you add.

Designs change, we'll see how quickly. Personally I expect more than anything this is going to have more bandwidth than it requires, it's more of a test vehicle, but when they get it working well and optimise it that will all work towards the next design. When they can role out HBM across the range of things they make, maybe with GCN 2.0/14nm chips, then we might get a design based around the concept of having doubled the available bandwidth.

Comparing todays cards bandwidth needs to that of a future one is rather arbitrary, in the same way people discounted the 980 for only having a 256bit bus and therefore not enough bandwidth(I didn't, but specifically pointed out that generation to generation comparison is pointless as the way memory is used changes, efficiency of access, compression methods, etc), yet when it came out people realised that while it had significantly less bandwidth it found new ways to work to a different amount of bandwidth.

There is no getting around if a game has 4+GB of payload with a large amount of that being game data, then buffers, etc. on top of that - no matter what architecture changes your still going to need one way or the other a way to store and access that data with reasonable speed. Its a different story to being bandwidth limited (and personally I'm not that convinced by compression anyhow its a crutch at best with the 900s series - useful for some situations but not something you can blanket apply without penalties elsewhere).

In memory compression is of increasingly limited use as devs are finally using things like DXT and, custom asset compression/streaming, etc. and you can't compress that data down any more as its already heavily compressed.

Orangey · 11 Feb 2015 at 12:44

"our sources"... = forum chatter from the past 6 months LOL

I really can't believe how scummy these publications are.

pmc25 · 11 Feb 2015 at 12:52

Rroff said:
I wouldn't count nVidia out, though they'd have to stop being cheap on VRAM/interface - manufacturers have started moving to new shrunk GDDR5 and its quite capable of (in a 512bit configuration) putting up speeds not that far from what is being quoted in the article - supposedly nVidia has a big order in with Micron though that might not be for desktop GPUs.

As Drunkenmaster goes on to explain in much more detail, it's not that simple and NVIDIA won't be using a GPU architecture designed for high memory bandwidth until H2 '16, even if they do release cards with nominally very high GDDR5 based bandwidth later this year or early next (unlikely). Maxwell was never designed with this in mind. Also, I would assume there has to be some kind of latency / more direct access benefit to having the memory on package rather than on the card itself.

drunkenmaster · 11 Feb 2015 at 12:52

I'm not saying more memory isn't required, I was addressing your point that you've never felt bandwidth limited. My point is, no one designs an architecture that requires 600GB/s but only has 250GB/s, architectural decisions are made as a result of the memory bandwidth available and if you look around processor designs of any type the biggest limit is memory performance, a HUGE portion of core logic even on an ARM chip let alone an Intel chip is designed around getting data to the core. An Arm core is tiny, even an Intel core or AMD module on Bulldozer is a tiny portion of the cpu. The majority is about funnelling a small amount of memory bandwidth into as efficient as possible usage.

SO you don't feel memory bandwidth limited because a monumental amount of work goes into masking memory limits. As bandwidth becomes less limited, architecture will work to a new limit, that is basically how it works. I was just using the 980 as an example, people were all over how terrible it would be on a 256bit bus and less bandwidth but once again you see an architecture designed around that amount of memory bandwidth. In the 980's case it was introducing lots of things to maximise efficiency of the limited bandwidth, if you doubled the bandwidth that work instead would go on filling the memory bandwidth available.

In terms of memory, personally don't give a crap about 4k, 1440p with a decent screen/sensible price is the next. I'll care about memory amount as and when I hit the limit, which I haven't yet and won't on 4GB for some time. When I get a 4k screen I'll want 8GB, but with zero decent 4k gaming screens available that won't be a problem, even being generous the earliest we'll get 120hz 4k screens will be likely mid 2016, actually good screens with decent pricing, add at least another 6 months, maybe a year. I'll be on a different card by then.

Lokken86 · 11 Feb 2015 at 12:54

Baboonanza said:
Think of the VRAM like a teapot. Bandwidth is the speed you can pour the tea out, capacity (4GB) is the amount of tea it holds. If you make the spout bigger (more bandwidth) you aren't changing the amount of tea it can hold, just the speed it comes out at.

So in theory this means the VRAM limit will be harder to top out than with standard GDDR5 given it's high speed yes?

SiDeards73 · 11 Feb 2015 at 12:54

What we all need and want to know is, cant it run crysis?

pmc25 · 11 Feb 2015 at 12:57

Pretty sure you're wrong about 120hz 4K taking more than 12 months to arrive. I'd expect announcements at Computex this year. There's no technical reason for them not to appear as DP1.3 controllers / cards will be available by then.

Kaapstad · 11 Feb 2015 at 12:59

Lokken86 said:
So in theory this means the VRAM limit will be harder to top out than with standard GDDR5 given it's high speed yes?

No it will be the same, 4gb is 4gb unless it's on a GTX 970.

drunkenmaster · 11 Feb 2015 at 13:01

Rroff said:
I wouldn't count nVidia out, though they'd have to stop being cheap on VRAM/interface - manufacturers have started moving to new shrunk GDDR5 and its quite capable of (in a 512bit configuration) putting up speeds not that far from what is being quoted in the article - supposedly nVidia has a big order in with Micron though that might not be for desktop GPUs.

Well aside from the fact that the bigger the memory bus the generally slower it will be due to power concerns(slower in as much as max clocks on the memory you'll hit), the best 512bit bus to date has produced what, 320GB/s, that isn't close, in any realistic way, to 512GB/s that should be easy to achieve, let alone a 640GB/s that is being rumoured(likely not correctly IMHO).

The trouble is that 512GB/s of bandwidth under HBM will use less power than the 320GB/s 4GB Hawaii provides, and certainly less than if you want to try and push clocks on the memory up further. At very best we're probably looking at 400GB/s with a significant increase in clocks, but even higher power usage to go with it.

That is where HBM wins, for any amount of bandwidth GDDR5 can provide, HBM can do it in 30% of the power. 512GB/s has ALWAYS been achievable with ggdr5, it would just take a likely 768bit bus, or 512bit bus with insane memory speeds, and it would probably use up 100-125W of the power on the card... leaving all over 125-150W realistically for the gpu itself. HBM can provide that same bandwidth in 30-40W, which would in the same situation leaving 210-220W for the gpu inside the same 250W gpu power budget.

GDDR5 is completely and utterly uncompetitive. If AMD or Nvidia produced a HBM and a GDDR5 version of their latest gen 250W cards, the HBM would spank the GDDR5 card silly because you could up the gpu clocks 40-50% or increase the shader/rop/tmu count by 40-50% with the extra power savings provided by HBM.

Squink · 11 Feb 2015 at 13:33

Orangey said:
"our sources"... = forum chatter from the past 6 months LOL

I really can't believe how scummy these publications are.

Fud has been at best notoriously unreliable and at worst a laughing stock for the many years I've been on and off interested in hardware trends. Not that rumour sites particularly offend me.

Kaapstad · 11 Feb 2015 at 13:36

mulpsmebeauty said:
Fud has been at best notoriously unreliable and at worst a laughing stock for the many years I've been on and off interested in hardware trends. Not that rumour sites particularly offend me.

Fudzilla are not that good.