NVIDIA Volta with GDDR6 in early 2018?

JediFragger · 3 Jul 2017 at 11:33

Fair enough, was a flaky rumour then

opethdisciple · 3 Jul 2017 at 11:34

Is there even a remote chance this could land before the end of 2017?

JediFragger · 3 Jul 2017 at 11:37

opethdisciple said:
Is there even a remote chance this could land before the end of 2017?

No-one really knows, all speculation. But if it is I think we'll start to hear little tidbits when Vega is finally launched...

TheF34RChannel · 3 Jul 2017 at 18:05

opethdisciple said:
Is there even a remote chance this could land before the end of 2017?

I'll eat @JediFragger if it does... It's anyone's guess, but somewhere next year for sure.

Rroff · 3 Jul 2017 at 18:20

opethdisciple said:
Is there even a remote chance this could land before the end of 2017?

TSMC's 12nm variants have only just entered risk production so unlikely to see anything before December at the earliest.

EDIT: Both "12nm" and 10nm processes are ramping into production earlier than originally planned so potentially we could see a new consumer GPU on those nodes towards the end of the year.

TheF34RChannel · 3 Jul 2017 at 18:26

Rroff said:
TSMC's 12nm variants have only just entered risk production so unlikely to see anything before December at the earliest.

March - May being more logical options.

Rroff · 3 Jul 2017 at 18:28

TheF34RChannel said:
March - May being more logical options.

Dunno they seem to be putting a lot of effort into getting 10nm ramped up for Apple though that would likely mean nVidia wouldn't really get a look in until early next year but the "12nm" capabilities are also getting ramped up pretty quickly as well.

TheF34RChannel · 3 Jul 2017 at 18:33

Rroff said:
Dunno they seem to be putting a lot of effort into getting 10nm ramped up for Apple though that would likely mean nVidia wouldn't really get a look in until early next year but the "12nm" capabilities are also getting ramped up pretty quickly as well.

With the current Ti selling very nicely and still no competition they don't need to either.

Rroff · 4 Jul 2017 at 05:38

TheF34RChannel · 5 Jul 2017 at 04:28

NVIDIA's Future MCM GPU Design Detailed - 256 SMs With Over 16000 Cores, Multiple TeraBytes of Bandwidth

https://www.overclock3d.net/news/gp...-chip_gpu_modules_to_scale_past_moore_s_law/1

AllBodies · 5 Jul 2017 at 07:00

TheF34RChannel said:
NVIDIA's Future MCM GPU Design Detailed - 256 SMs With Over 16000 Cores, Multiple TeraBytes of Bandwidth

https://www.overclock3d.net/news/gp...-chip_gpu_modules_to_scale_past_moore_s_law/1

Yeah there's a thread discussing that.

Upshot of it is, the earliest they could make that (both cheaply/small and meeting their memory bandwidth guideline) would be 7nm+ and HBM3 in late 2019/early 2020.

So this could be the successor to Volta, as that timeline would align. Or it could be further off if it was mainly for research purposes and they're not ready to do it yet (for instance they were intending on using the other stacked DRAM type which failed, not HBM, so they may need to redesign some things to accept HBM instead).

Panos · 5 Jul 2017 at 10:02

TheF34RChannel said:
NVIDIA's Future MCM GPU Design Detailed - 256 SMs With Over 16000 Cores, Multiple TeraBytes of Bandwidth

https://www.overclock3d.net/news/gp...-chip_gpu_modules_to_scale_past_moore_s_law/1

So basically they try to make something similar to infinity fabric for 2021+
That leaves 4+ years (from the end of this month) to AMD mopping the market for good and without serious competition there......

AllBodies · 5 Jul 2017 at 11:31

Panos said:
So basically they try to make something similar to infinity fabric for 2021+
That leaves 4+ years (from the end of this month) to AMD mopping the market for good and without serious competition there......

Basically sounds like it yeah.

But they could make it as early as late 2019 if Samsung is ready with HBM3 by then.

Or they could even do one in late 2018/early 2019 on early 7nm with HBM2 if they did a smaller one. Like a 8000-10,000 core 'test' card, where they don't need 3+ TB/s bandwidth.

TheF34RChannel · 5 Jul 2017 at 16:15

Probably there'll be more cards in between Volta and an MCM design, too big a gap otherwise.

AllBodies · 5 Jul 2017 at 17:47

TheF34RChannel said:
Probably there'll be more cards in between Volta and an MCM design, too big a gap otherwise.

Well not if the design in the paper is close to final.

Although Nvidia are in a bit of a weird cadence since they've decided to go for Volta on 12nm. I can only assume there'll be some kind of 7nm refresh for Volta, unless they skip 7nm and go straight to 7nm+ in 2019/2020.

Point is, if they're going for ~4000 cores per module, and 3+ TB/s memory bandwidth, this means doing it on 7nm/7nm+ and with HBM3. This would mean 2021 at the very latest, which perhaps leaves enough room for 1 more architecture inbetween Volta and this MCM arch.

If they waited longer it would be on 5nm, in which case they'd be looking at considerably more than ~4000 per module and ~16,000 overall. Be more like ~25,000 overall bare minimum, and then they'd likely need something faster than HBM3 or need to redesign the layout to have 3 or 4 stacks per module, which then mean they may want more/less than 4 modules.

Basically if they waited till 5nm, it would completely change the design, whereas 7nm aligns perfectly with what they've outlined.

Remember in all of this the 7nm node is like a 1.5 generation jump rather than the normal 1, so we should expect to see the largest cards have 6000 cores minimum, and could easily have 8000+ (even 490mm2 on 7nm should be over 8000 cores).

TheF34RChannel · 5 Jul 2017 at 18:43

AllBodies said:
Well not if the design in the paper is close to final.

Although Nvidia are in a bit of a weird cadence since they've decided to go for Volta on 12nm. I can only assume there'll be some kind of 7nm refresh for Volta, unless they skip 7nm and go straight to 7nm+ in 2019/2020.

Point is, if they're going for ~4000 cores per module, and 3+ TB/s memory bandwidth, this means doing it on 7nm/7nm+ and with HBM3. This would mean 2021 at the very latest, which perhaps leaves enough room for 1 more architecture inbetween Volta and this MCM arch.

If they waited longer it would be on 5nm, in which case they'd be looking at considerably more than ~4000 per module and ~16,000 overall. Be more like ~25,000 overall bare minimum, and then they'd likely need something faster than HBM3 or need to redesign the layout to have 3 or 4 stacks per module, which then mean they may want more/less than 4 modules.

Basically if they waited till 5nm, it would completely change the design, whereas 7nm aligns perfectly with what they've outlined.

Remember in all of this the 7nm node is like a 1.5 generation jump rather than the normal 1, so we should expect to see the largest cards have 6000 cores minimum, and could easily have 8000+ (even 490mm2 on 7nm should be over 8000 cores).

Well laid-out post, nice! The time line and node shrink surely fits so the question remains how hypothetical are the contents of that paper right now (like you suggested before me)? If the MCM arch can be Volta's grandkids that's be fantastic for computing in all its facets (from DP to Quadro cards to GeForce products) - CPUs will become the problem then...way to slow to keep up the big bang performance in crease of such GPUs I estimate. I'm still in awe by reading ~16.000 cores...

Refresh my memory please; why did Nvidia go with 12nm for Volta?

Rroff · 5 Jul 2017 at 18:54

AllBodies said:
Yeah there's a thread discussing that.

Upshot of it is, the earliest they could make that (both cheaply/small and meeting their memory bandwidth guideline) would be 7nm+ and HBM3 in late 2019/early 2020.

So this could be the successor to Volta, as that timeline would align. Or it could be further off if it was mainly for research purposes and they're not ready to do it yet (for instance they were intending on using the other stacked DRAM type which failed, not HBM, so they may need to redesign some things to accept HBM instead).

Panos said:
So basically they try to make something similar to infinity fabric for 2021+
That leaves 4+ years (from the end of this month) to AMD mopping the market for good and without serious competition there......

This is not like Infinity Fabric other than superficially - this is basically dissecting and laying out a monolithic GPU into separated packages on an interposer using advances in substrate technology to be able to make the interconnects (which would be highly specialised and nothing like the flexibility of IF) fast enough and low enough latency to work for gaming performance. This way you aren't limited to trying to cram 100s of SMs into one package but can spread them out in "headless" GPU modules that are tied together by another control package on the interposer. To my knowledge this is currently not possible with IF and current monolithic GPU architectures - it is likely AMD is actually behind the curve on this* it is much more specialised than AMD's current approaches which are for linking up self-contained packages using a more general purpose interconnect.

While probably not possible with current 16nm or 12nm technologies its probably possible with 10nm and below and especially when DRAM moves to smaller 1x or lower nm nodes - a lot of it is currently "still" on 21nm or 22nm type fabrication.

I wouldn't bet on this using HBM2 or 3 either though neither would I rule it out - nVidia have talked in very vague terms about "3D stacked DRAM" and seem to have a huge reluctance to use HBM more than they have to but the specs they do talk about don't really match either HBM2 or 3 - they also have fairly strong ties with Micron at the moment (probably due to agreements with regard to getting GDDR5X early) who are still working on competing standards to HBM despite some earlier fails in getting it to take off.

* Its currently thought in the industry they will be working on similar with Navi to some extent but unknown if it will just be groundwork or anything through to a full implementation.

TheF34RChannel said:
Refresh my memory please; why did Nvidia go with 12nm for Volta/

Volta was primarily designed with Summit as a major development focus which required a certain minimum performance/density/power target (due to being used in huge installations) which wasn't possible on 16nm and it was looking like 10nm wasn't going to be ready in time for nVidia to fulfil contracts using it. (By all reports the costs for 10nm and 7nm are astronomically more than those based off 20nm planar technologies as well so probably some cost saving aspects to it).

TheF34RChannel · 5 Jul 2017 at 19:17

@Rroff : excellent explanations, thanks very much for the effort! It will be exciting following this!

Rroff · 5 Jul 2017 at 19:30

I'm actually getting kind of interested in what we will see when manufacturing in this area of technology moves wholesale to 10nm or below - a lot of DRAM, partly because it can be complex stuff to produce, is still using bigger nodes or approaches that are still hanging onto techniques for bigger than 20nm manufacturing.

AllBodies · 5 Jul 2017 at 19:31

TheF34RChannel said:
Well laid-out post, nice! The time line and node shrink surely fits so the question remains how hypothetical are the contents of that paper right now (like you suggested before me)? If the MCM arch can be Volta's grandkids that's be fantastic for computing in all its facets (from DP to Quadro cards to GeForce products) - CPUs will become the problem then...way to slow to keep up the big bang performance in crease of such GPUs I estimate. I'm still in awe by reading ~16.000 cores...

Refresh my memory please; why did Nvidia go with 12nm for Volta?

Rroff said:
Volta was primarily designed with Summit as a major development focus which required a certain minimum performance/density/power target (due to being used in huge installations) which wasn't possible on 16nm and it was looking like 10nm wasn't going to be ready in time for nVidia to fulfil contracts using it. (By all reports the costs for 10nm and 7nm are astronomically more than those based off 20nm planar technologies as well so probably some cost saving aspects to it).

Probably 12nm just for timing and cementing their deep-learning market share (remember the HPC Volta chip has specialist Tensor cores so has 120 Tflops deep-learning compute).

Although Roff has a point about the design costs shooting up for sub-20nm processes, on the flip side Nvidia can only get 1 working V100 chip per wafer because it's 825mm2. If they built it on 7nm it'd be more like 500mm2, so they could produce in the 10s per wafer (since chip failure is an exponential increase with die size). This would likely outweigh the extra R&D cost.

Also CPUs likely won't be a problem for 2 reasons.

1. As we go to 16,000 cores and beyond (at 2GHz that would be 64 Tflops FP32 btw, or about 4.5x heavily overclocked 1080 Ti's), people will move to higher resolution. At 4K CPU bottlenecks are very far away currently.

2. Soon (finally) games will ACTUALLY start being properly many-threaded. An 8-core CPU could feed an extremely powerful GPU if it was threaded properly. I'd imagine 8-core 4 GHz CPUs would be enough for the ballpark of 50+ Tflops of GPU at 4K. Also bear in mind the 8-core Ryzen die is very small at ~200mm2, so if AMD just shrunk Zen1 to 7nm it'd be something crazy small like ~85mm2, so I think it's extremely likely Zen2 (on 7nm) will be 12-cores for Ryzen/AM4 and 24-cores for Threadripper/X399. The server chip on 7nm called Starship, the successor to EPYC, is already confirmed to be 48-core.

Rroff said:
I'm actually getting kind of interested in what we will see when manufacturing in this area of technology moves wholesale to 10nm or below - a lot of DRAM, partly because it can be complex stuff to produce, is still using bigger nodes or approaches that are still hanging onto techniques for bigger than 20nm manufacturing.

Also interesting is that Finfet stops working below 5nm apparently, so Finfet nodes/design techniques are only going to last 3 main nodes (14/16nm, 10nm, 7nm).

5nm and 3nm are apparently using a new thing called "gate-all-around" or GAAFET. And then I think there's no even slightly concrete plans for below 3nm yet.