The NVidia GV100 News Thread

D.P. · 12 May 2017 at 16:38

JediFragger said:
Revenue of $2 billion for the last reported Quarter wouldn't have hurt either!!!

well over 400 million from HPC, soemthign like a 200-300% quarter on quarter growth. That is why Nvidia can crate a 815mm^2 chip that will have yield so low they might only get a couple of working dies forma wafer. they can sell he dies at 15-20K a time. Wont be long before HPC is netting nvidia well over $1bn revenue a quarter.

Mauller · 13 May 2017 at 18:38

This is how i see things going, it can go one or two ways. Either consumer volta will have 6SM per GPC or 7SM per GPC with 128FP32 cores per SM. Although i don't know if they will put 2xFP16 capable FP32 cores in consumer Volta, like they didn't with consumer Pascal. Consumer pascal had dedicated FP32 shaders, which is why they couldn't run FP16 at 1:1 at the least.

All of these assume clocks at 1600-1800, with the theoretical considering 1800 and not the base clock, since the majority of Pascal cards run at 1800 and not the base clock which their theoretical performance is calculated with.

So with 6SM per GPC we get:
GV102 6GPC - 4608 - 16.5 Tflops
GV104 4GPC - 3072 - 11.1 Tflops
GV106 2GPC - 1536 - 5.5 Tflops
GV108 1GPC - 768 - 2.7 Tflops

With 7SM per GPC we get:
GV102 6GPC - 5376 - 19.3 Tflops (same as uncut GV100)
GV104 4GPC - 3584 - 12.9 Tflops (Same as Cutdown GP102 / 1080Ti)
GV106 2GPC - 1792 - 6.5 Tflops
Gv108 1GPC - 896 - 3.2 Tflops

Either Nvidia will cheap out on cores to save die size and go with option 1. And instead set base clocks to 1800 - 2000 MHz
Or they go with Option 2 and have the same clock range as Pascal. But then they have to deal with making larger Die's, 12nm does not offer a large density saving, it is only an improvement on 16nm, not an entirely new node.

The other thing is that consumer volta will only have the FP32 cores, no INT 32/16, Tensor or FP64 cores.

AllBodies · 13 May 2017 at 18:38

D.P. said:
well over 400 million from HPC, soemthign like a 200-300% quarter on quarter growth. That is why Nvidia can crate a 815mm^2 chip that will have yield so low they might only get a couple of working dies forma wafer. they can sell he dies at 15-20K a time. Wont be long before HPC is netting nvidia well over $1bn revenue a quarter.

Funny to think in 3-4 years you'll be able to fit GV100 in around 200mm^2.

Mauller said:
This is how i see things going, it can go one or two ways. Either consumer volta will have 6SM per GPC or 7SM per GPC with 128FP32 cores per SM. Although i don't know if they will put 2xFP16 capable FP32 cores in consumer Volta, like they didn't with consumer Pascal. Consumer pascal had dedicated FP32 shaders, which is why they couldn't run FP16 at 1:1 at the least.

All of these assume clocks at 1600-1800, with the theoretical considering 1800 and not the base clock, since the majority of Pascal cards run at 1800 and not the base clock which their theoretical performance is calculated with.

So with 6SM per GPC we get:
GV102 6GPC - 4608 - 16.5 Tflops
GV104 4GPC - 3072 - 11.1 Tflops
GV106 2GPC - 1536 - 5.5 Tflops
GV108 1GPC - 768 - 2.7 Tflops

With 7SM per GPC we get:
GV102 6GPC - 5376 - 19.3 Tflops (same as uncut GV100)
GV104 4GPC - 3584 - 12.9 Tflops (Same as Cutdown GP102 / 1080Ti)
GV106 2GPC - 1792 - 6.5 Tflops
Gv108 1GPC - 896 - 3.2 Tflops

Either Nvidia will cheap out on cores to save die size and go with option 1. And instead set base clocks to 1800 - 2000 MHz
Or they go with Option 2 and have the same clock range as Pascal. But then they have to deal with making larger Die's, 12nm does not offer a large density saving, it is only an improvement on 16nm, not an entirely new node.

The other thing is that consumer volta will only have the FP32 cores, no INT 32/16, Tensor or FP64 cores.

I assume they're going to go for the 19.3 Tflop option for the top card (i.e. the Titan Xv or 2080 Ti).

The 1080 Ti is already just over 14 Tflops when overclocked, so nothing other than your 19.3 Tflop option there makes sense for a top end card at the bleeding edge of the process.

Mauller · 13 May 2017 at 20:03

AllBodies said:
I assume they're going to go for the 19.3 Tflop option for the top card (i.e. the Titan Xv or 2080 Ti).

The 1080 Ti is already just over 14 Tflops when overclocked, so nothing other than your 19.3 Tflop option there makes sense for a top end card at the bleeding edge of the process.

Yeah, but like i mentioned, they coudl go with the first option and since clocks might be a little better on 12nm, stick 1800mhz base clock and 2.0 ghz for the max clocks, that then gives the 6SM GV102 option 18 Tflops theoretical at 2.0Ghz.

AllBodies · 13 May 2017 at 20:56

Mauller said:
Yeah, but like i mentioned, they coudl go with the first option and since clocks might be a little better on 12nm, stick 1800mhz base clock and 2.0 ghz for the max clocks, that then gives the 6SM GV102 option 18 Tflops theoretical at 2.0Ghz.

I don't think ~25% is enough of a performance boost for their flagship top card, with a new process and new architecture.

I expect the middle card (GTX 1080 replacement) to be in the ~16 Tflop range, and the top card to be much faster.

akarypid · 13 May 2017 at 21:06

Mauller said:
This is how i see things going, it can go one or two ways. Either consumer volta will have 6SM per GPC or 7SM per GPC with 128FP32 cores per SM. Although i don't know if they will put 2xFP16 capable FP32 cores in consumer Volta, like they didn't with consumer Pascal. Consumer pascal had dedicated FP32 shaders, which is why they couldn't run FP16 at 1:1 at the least.

All of these assume clocks at 1600-1800, with the theoretical considering 1800 and not the base clock, since the majority of Pascal cards run at 1800 and not the base clock which their theoretical performance is calculated with.

So with 6SM per GPC we get:
GV102 6GPC - 4608 - 16.5 Tflops
GV104 4GPC - 3072 - 11.1 Tflops
GV106 2GPC - 1536 - 5.5 Tflops
GV108 1GPC - 768 - 2.7 Tflops

With 7SM per GPC we get:
GV102 6GPC - 5376 - 19.3 Tflops (same as uncut GV100)
GV104 4GPC - 3584 - 12.9 Tflops (Same as Cutdown GP102 / 1080Ti)
GV106 2GPC - 1792 - 6.5 Tflops
Gv108 1GPC - 896 - 3.2 Tflops

Either Nvidia will cheap out on cores to save die size and go with option 1. And instead set base clocks to 1800 - 2000 MHz
Or they go with Option 2 and have the same clock range as Pascal. But then they have to deal with making larger Die's, 12nm does not offer a large density saving, it is only an improvement on 16nm, not an entirely new node.

The other thing is that consumer volta will only have the FP32 cores, no INT 32/16, Tensor or FP64 cores.

I would assume that for consumer versions they'd go with option 1 to save on die space. Anything above 550mm2 is stretching it (for consumer cards). I think Fiji and 980ti were the biggest chips at around 600...

But will this mean that Volta consumer will be 16.5Tflops for both FP32 and FP16 or does it double for FP16? I was under the impression Vega runs 2xFP16 even in the consumer version, no?

Mauller · 13 May 2017 at 21:42

akarypid said:
I would assume that for consumer versions they'd go with option 1 to save on die space. Anything above 550mm2 is stretching it (for consumer cards). I think Fiji and 980ti were the biggest chips at around 600...
But will this mean that Volta consumer will be 16.5Tflops for both FP32 and FP16 or does it double for FP16? I was under the impression Vega runs 2xFP16 even in the consumer version, no?

I think that unless they feel that AMD has forced their hand to include 2xFP16 on consumer cards, then i think they will gimp it again.

Vega does have 2xFP16 even on consumer parts. And AMD will have game devs take advantage of it, they can get ~30% more performance in some situations with a few effects.

akarypid · 13 May 2017 at 21:48

Mauller said:
I think that unless they feel that AMD has forced their hand to include 2xFP16 on consumer cards, then i think they will gimp it again.

Vega does have 2xFP16 even on consumer parts. And AMD will have game devs take advantage of it, they can get ~30% more performance in some situations with a few effects.

Exactly and it seems to me like something that could give AMD a huge advantage... I hope developers DO take advantage of it and Nvidia's hand DOES get forced. We need things to push forward.

I was also just reading this from their developer blog:

Volta’s independent thread scheduling allows the GPU to yield execution of any thread, either to make better use of execution resources or to allow one thread to wait for data to be produced by another. To maximize parallel efficiency, Volta includes a schedule optimizer which determines how to group active threads from the same warp together into SIMT units. This retains the high throughput of SIMT execution as in prior NVIDIA GPUs, but with much more flexibility: threads can now diverge and reconverge at sub-warp granularity, and Volta will still group together threads which are executing the same code and run them in parallel.

and then:

It is interesting to note that Figure 12 does not show execution of statement Z by all threads in the warp at the same time. This is because the scheduler must conservatively assume that Z may produce data required by other divergent branches of execution in which case it would be unsafe to automatically enforce reconvergence. In the common case where A, B, X, and Y do not consist of synchronizing operations, the scheduler can identify that it is safe for the warp to naturally reconverge on Z, as on prior architectures.

It seems to me that Nvidia is going back to 'proper-async compute' with Volta similar to how GCN works. Lots more register files and hardware needed. It'll be really interesting to see how consumer Volta stacks up against Vega in the consumer space...

Kaapstad · 14 May 2017 at 14:13

Kaapstad · 7 Jun 2017 at 06:38

https://www.techpowerup.com/234117/micron-announces-16-gbps-memory-speeds-achieved-over-gddr5x

Kaapstad · 13 Jun 2017 at 11:37

http://www.fudzilla.com/news/graphics/43873-next-geforce-doesn-t-use-hbm-2

Sargatanas2511 · 14 Jun 2017 at 22:10

Does anyone else find it unusual that with Volta being shown off already we still have no information at all on what is coming after Volta?

With AMD we know that Navi follows Vega and have known for a long time and know what some changes and amendments that are being aimed for with Navi.

On the CPU side we know a bunch of Intel's next steps. Kaby Lake > Coffee Lake > CannonLake > Ice Lake and AMD will be Zen > Zen+ > Zen 2 > Zen 3.

Is it just me that thinks it's strange for Nvidia to not even mention a code name for Voltas successor?

Rroff · 15 Jun 2017 at 01:57

Sargatanas2511 said:
Is it just me that thinks it's strange for Nvidia to not even mention a code name for Voltas successor?

Ages back they pencilled in Einstein on 10nm but then it was pushed aside for Pascal and Volta IIRC so might still be their long term plan. (I suspect it will get renamed to something else though).

JediFragger · 15 Jun 2017 at 02:12

edit - Ninja edit by Rroff!!!!!

Rroff · 15 Jun 2017 at 02:25

^^ I wonder if they just use it as a placeholder internally.

V F · 15 Jun 2017 at 03:28

LOL!

Rroff · 15 Jun 2017 at 03:43

Kind of cool little novelty product to be fair. Shame they don't seem to have any plans to sell them.

V F · 15 Jun 2017 at 03:48

It is cute. Get two for USB SLi.

Rroff · 15 Jun 2017 at 03:51

RAID0 ReadyBoost - increase your framerate by 400%!!!

Sargatanas2511 · 15 Jun 2017 at 09:02

Rroff said:
Ages back they pencilled in Einstein on 10nm but then it was pushed aside for Pascal and Volta IIRC so might still be their long term plan. (I suspect it will get renamed to something else though).

Ah Thanks, can't say I have heard of that before. Volta has been mentioned since slides in the Kepler days but nothing past that at all to my knowledge.