Nvidia working on their own Multi chip GPU designs.

Mauller · 3 Jul 2017 at 22:49

It appears team green have also been quietly busy in the background, ferreting away at MCM type GPU designs. There is a research paper on the link.

http://research.nvidia.com/publication/2017-06_MCM-GPU:-Multi-Chip-Module-GPUs

MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability

Historically, improvements in GPU-based high performance computing have been tightly coupled to transistor scaling. As Moore's law slows down, and the number of transistors per die no longer grows at historical rates, the performance curve of single monolithic GPUs will ultimately plateau. However, the need for higher performing GPUs continues to exist in many domains. To address this need, in this paper we demonstrate that package-level integration of multiple GPU modules to build larger logical GPUs can enable continuous performance scaling beyond Moore's law. Specifically, we propose partitioning GPUs into easily manufacturable basic GPU Modules (GPMs), and integrating them on package using high bandwidth and power efficient signaling technologies. We lay out the details and evaluate the feasibility of a basic Multi-Chip-Module GPU (MCM-GPU) design. We then propose three architectural optimizations that significantly improve GPM data locality and minimize the sensitivity on inter-GPM bandwidth. Our evaluation shows that the optimized MCM-GPU achieves 22.8% speedup and 5x inter-GPM bandwidth reduction when compared to the basic MCM-GPU architecture. Most importantly, the optimized MCM-GPU design is 45.5% faster than the largest implementable monolithic GPU, and performs within 10% of a hypothetical (and unbuildable) monolithic GPU. Lastly we show that our optimized MCM-GPU is 26.8% faster than an equally equipped Multi-GPU system with the same total number of SMs and DRAM bandwidth.

HoneyBadger · 3 Jul 2017 at 23:18

If it hits mass market, it won't be cheap!

Mauller · 3 Jul 2017 at 23:20

HoneyBadger said:
If it hits mass market, it won't be cheap!

The idea is to reduce manufacturing costs, similar to what AMD has done with EPYC. Although using something like an interposer to connect the dies together.

It is far more profitable to make a smaller chip than it is to make a large monolithic die. As chip size goes down, yields go up as you get more chips per wafer so fewer defects per chip.

SupraWez · 3 Jul 2017 at 23:24

Like AMD CPUs

Orcvader · 3 Jul 2017 at 23:41

Wonder how this will compare to something like AMD's Infinity Fabric, although they have yet to use that in APUs/GPUs. But it does make sense going this direction, having smaller dies means higher yields compared to designing a gigantic one. Also means parts of the GPU that won't benefit from the die shrink can be made on a cheaper, larger process.

Rroff · 4 Jul 2017 at 05:40

The approach nVidia is going with is a bit different to Infinity Fabric - these are more dedicated to a specific purpose interconnects that allow essentially "headless" GPU modules to work together rather than where IF, etc. is designed to tie the capabilities of multiple monolithic cores - which does potentially mean gaming viable performance with the latest developments in substrate/interposer tech.

The article suggests Navi might go in that direction of supporting such operation as well but doesn't look like it will be possible with Vega.

AllBodies · 4 Jul 2017 at 06:00

So I wonder if this will be what Nvidia builds after Volta, on 7nm+ in 2019/2020. As they haven't publicly announced Volta's successor (nor AMD announced Navi's).

Also I wonder if HBM3 is pivotal to making this work.

Their maths suggests 4 modules, paired with 2 stacks of HBM, is their optimum setup. But HBM2 would only allow them to give each module 512 GB/s bandwidth (while also taking up a lot of room, and only providing 2 GHz of clock-linking if their system works similarly to infinity-fabric). Whereas HBM3 would allow them to give each module 1 TB/s, also lower latency, smaller physical size (I think), and 4 GHz of clock-linking potential.

The reason I wonder if 1 TB/s would be ideal for each module is they keep referencing 800mm2 as roughly maximum size manufacturable, then say their 4 MCM solution performs within 10% of a 'theoretical ideal' large chip which can't be manufactured in reality. So I'm guessing this is around 1000mm2, and therefore their modules are around 250mm2 each.

On 7nm+ 250mm2 would be enough for around 4500 pascal cores (especially if some of the functionality is offloaded to that "SYS + I/O" section). So each of the modules in their hypothetical chip is 10%+ faster than a Titan Xp. It is therefore reasonable to assume 512 GB/s per module would be less than ideal, and 1 TB/s would be far more preferable (along with all the other advantages HBM3 has over HBM2).

Also HBM3 would allow to make a 'baby' version of this chip. Say 100-125mm2 modules, with 1 stack of HBM3 each. That would be incredibly cheap, due to the extremely high yield 125mm2 would give, and it would represent around 2000 Pascal cores having 512 GB/s each. (Again, 256 GB/s is probably less than ideal for 2000 Pascal cores)

Lastly HBM3 is slated for 2019/2020, so this would nicely align with when the successor to Volta should be released. And also when 7nm+ (7nm with EUV) will be out.

Rroff · 4 Jul 2017 at 06:26

This isn't designed to use HBM3 - its 3D stacked DRAM but not HBM flavour. AFAIK the successor to Volta is envisioned to be basically half of what they are talking about in that article - 128 SMs with 1.4TB/s bandwidth and 4x the TF of a Titan X Pascal.

There is a load more info in the linked pdf file BTW http://research.nvidia.com/sites/de...U:-Multi-Chip-Module-GPUs//p320-Arunkumar.pdf

ubersonic · 4 Jul 2017 at 08:23

Heh, after years of actively working against multi chip GPU design Nvidia are now all for it after seeing how it could give AMD an edge, figures

smogsy · 4 Jul 2017 at 08:28

ubersonic said:
Heh, after years of actively working against multi chip GPU design Nvidia are now all for it after seeing how it could give AMD an edge, figures

its a little different to multi-gpu, but yeh. The AMD Raj guy predicted this many years ago. he said the future was multi GPU dies where you could interconnect them ;p

KNiVES · 4 Jul 2017 at 08:33

To be fair I'm actually quite surprised it's taking so long, after seeing Intel Core 2 Duos hit the scene over a decade ago I was expecting the same to happen to GPUs any day now. That it's taken this long is so perplexing. A GPU with two cores and natural driver/game support is a no-brainer.

Mauller · 4 Jul 2017 at 08:38

KNiVES said:
To be fair I'm actually quite surprised it's taking so long, after seeing Intel Core 2 Duos hit the scene over a decade ago I was expecting the same to happen to GPUs any day now. That it's taken this long is so perplexing. A GPU with two cores and natural driver/game support is a no-brainer.

From the Paper submitted on the link above, the system will pretty much be invisible to the application. It will just appear as a single monolithic GPU since the cores themselves are just co-processors in essence to a single front-end control chip.

smilertoo · 4 Jul 2017 at 10:21

Isn't multiple gfx chips how the old transputer cards worked.

AllBodies · 4 Jul 2017 at 11:36

Rroff said:
This isn't designed to use HBM3 - its 3D stacked DRAM but not HBM flavour. AFAIK the successor to Volta is envisioned to be basically half of what they are talking about in that article - 128 SMs with 1.4TB/s bandwidth and 4x the TF of a Titan X Pascal.

There is a load more info in the linked pdf file BTW http://research.nvidia.com/sites/default/files/pubs/2017-06_MCM-GPU:-Multi-Chip-Module-GPUs//p320-Arunkumar.pdf

Yeah but wasn't that from when Nvidia backed the one that failed, HBC was it called?

So their plan will now be for those stacks to be HBM2/3.

EDIT: Also a double-checked the paper and it doesn't mention anywhere what the intended DRAM actually is (DRAM is a generic term). All that's mentioned is the target for their 4 MCM chip is 3 TB/s total. And with 8 stacks HBM2 would give 2 TB/s and HBM3 4 TB/s. So that suggests in line with what I said before, HBM2 is not enough, but non-max-spec HBM3 would be.

Also weirdly my estimations for cores were nearly spot on, this plan states 4096 cores per module, making for 16,384 total. I was going for 4500 per core, 18,000 total.

So yeah, this basically guarantees they intend to build this on 7nm and use HBM3. Unless it was purely for research, and they're not going to implement it till later.

Rroff · 4 Jul 2017 at 14:48

KNiVES said:
To be fair I'm actually quite surprised it's taking so long, after seeing Intel Core 2 Duos hit the scene over a decade ago I was expecting the same to happen to GPUs any day now. That it's taken this long is so perplexing. A GPU with two cores and natural driver/game support is a no-brainer.

It is actually very very difficult to do and get viable gaming performance - requiring a new architecture and the use of a different design rather than the self contained monolithic cores. Its only been enabled by recent breakthroughs in substrate tech.

AllBodies said:
Yeah but wasn't that from when Nvidia backed the one that failed, HBC was it called?

I won't entirely rule out HBM but there seems to be a reluctance with nVidia to accept it and Micron, who nVidia seem to be quite tied to these days, are still working on some form of HMC. A lot of their whitepapers don't name a technology but don't seem to fit HBM2 or 3 exactly in the terms of 3D stacked DRAM they are talking of - I'm pretty sure they haven't completely given up on a direction of their own for it.

Competitor rules

Nvidia working on their own Multi chip GPU designs.

More options

Mauller

Mauller

HoneyBadger

HoneyBadger

Mauller

Mauller

SupraWez

SupraWez

Orcvader

Orcvader

Rroff

Rroff

AllBodies

AllBodies

Rroff

Rroff

ubersonic

ubersonic

smogsy

smogsy

KNiVES

KNiVES

Mauller

Mauller

smilertoo

smilertoo

AllBodies

AllBodies

Rroff

Rroff