• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

The NVidia GV100 News Thread

Caporegime
Joined
18 Oct 2002
Posts
32,618
Revenue of $2 billion for the last reported Quarter wouldn't have hurt either!!! :eek:


well over 400 million from HPC, soemthign like a 200-300% quarter on quarter growth. That is why Nvidia can crate a 815mm^2 chip that will have yield so low they might only get a couple of working dies forma wafer. they can sell he dies at 15-20K a time. Wont be long before HPC is netting nvidia well over $1bn revenue a quarter.
 
Soldato
Joined
7 Feb 2015
Posts
2,864
Location
South West
This is how i see things going, it can go one or two ways. Either consumer volta will have 6SM per GPC or 7SM per GPC with 128FP32 cores per SM. Although i don't know if they will put 2xFP16 capable FP32 cores in consumer Volta, like they didn't with consumer Pascal. Consumer pascal had dedicated FP32 shaders, which is why they couldn't run FP16 at 1:1 at the least.

All of these assume clocks at 1600-1800, with the theoretical considering 1800 and not the base clock, since the majority of Pascal cards run at 1800 and not the base clock which their theoretical performance is calculated with.

So with 6SM per GPC we get:
GV102 6GPC - 4608 - 16.5 Tflops
GV104 4GPC - 3072 - 11.1 Tflops
GV106 2GPC - 1536 - 5.5 Tflops
GV108 1GPC - 768 - 2.7 Tflops

With 7SM per GPC we get:
GV102 6GPC - 5376 - 19.3 Tflops (same as uncut GV100)
GV104 4GPC - 3584 - 12.9 Tflops (Same as Cutdown GP102 / 1080Ti)
GV106 2GPC - 1792 - 6.5 Tflops
Gv108 1GPC - 896 - 3.2 Tflops


Either Nvidia will cheap out on cores to save die size and go with option 1. And instead set base clocks to 1800 - 2000 MHz
Or they go with Option 2 and have the same clock range as Pascal. But then they have to deal with making larger Die's, 12nm does not offer a large density saving, it is only an improvement on 16nm, not an entirely new node.


The other thing is that consumer volta will only have the FP32 cores, no INT 32/16, Tensor or FP64 cores.
 
Associate
Joined
28 Jan 2010
Posts
1,547
Location
Brighton
well over 400 million from HPC, soemthign like a 200-300% quarter on quarter growth. That is why Nvidia can crate a 815mm^2 chip that will have yield so low they might only get a couple of working dies forma wafer. they can sell he dies at 15-20K a time. Wont be long before HPC is netting nvidia well over $1bn revenue a quarter.

Funny to think in 3-4 years you'll be able to fit GV100 in around 200mm^2.

This is how i see things going, it can go one or two ways. Either consumer volta will have 6SM per GPC or 7SM per GPC with 128FP32 cores per SM. Although i don't know if they will put 2xFP16 capable FP32 cores in consumer Volta, like they didn't with consumer Pascal. Consumer pascal had dedicated FP32 shaders, which is why they couldn't run FP16 at 1:1 at the least.

All of these assume clocks at 1600-1800, with the theoretical considering 1800 and not the base clock, since the majority of Pascal cards run at 1800 and not the base clock which their theoretical performance is calculated with.

So with 6SM per GPC we get:
GV102 6GPC - 4608 - 16.5 Tflops
GV104 4GPC - 3072 - 11.1 Tflops
GV106 2GPC - 1536 - 5.5 Tflops
GV108 1GPC - 768 - 2.7 Tflops

With 7SM per GPC we get:
GV102 6GPC - 5376 - 19.3 Tflops (same as uncut GV100)
GV104 4GPC - 3584 - 12.9 Tflops (Same as Cutdown GP102 / 1080Ti)
GV106 2GPC - 1792 - 6.5 Tflops
Gv108 1GPC - 896 - 3.2 Tflops


Either Nvidia will cheap out on cores to save die size and go with option 1. And instead set base clocks to 1800 - 2000 MHz
Or they go with Option 2 and have the same clock range as Pascal. But then they have to deal with making larger Die's, 12nm does not offer a large density saving, it is only an improvement on 16nm, not an entirely new node.


The other thing is that consumer volta will only have the FP32 cores, no INT 32/16, Tensor or FP64 cores.

I assume they're going to go for the 19.3 Tflop option for the top card (i.e. the Titan Xv or 2080 Ti).

The 1080 Ti is already just over 14 Tflops when overclocked, so nothing other than your 19.3 Tflop option there makes sense for a top end card at the bleeding edge of the process.
 
Soldato
Joined
7 Feb 2015
Posts
2,864
Location
South West
I assume they're going to go for the 19.3 Tflop option for the top card (i.e. the Titan Xv or 2080 Ti).

The 1080 Ti is already just over 14 Tflops when overclocked, so nothing other than your 19.3 Tflop option there makes sense for a top end card at the bleeding edge of the process.

Yeah, but like i mentioned, they coudl go with the first option and since clocks might be a little better on 12nm, stick 1800mhz base clock and 2.0 ghz for the max clocks, that then gives the 6SM GV102 option 18 Tflops theoretical at 2.0Ghz.
 
Associate
Joined
28 Jan 2010
Posts
1,547
Location
Brighton
Yeah, but like i mentioned, they coudl go with the first option and since clocks might be a little better on 12nm, stick 1800mhz base clock and 2.0 ghz for the max clocks, that then gives the 6SM GV102 option 18 Tflops theoretical at 2.0Ghz.

I don't think ~25% is enough of a performance boost for their flagship top card, with a new process and new architecture.

I expect the middle card (GTX 1080 replacement) to be in the ~16 Tflop range, and the top card to be much faster.
 
Associate
Joined
30 May 2016
Posts
620
This is how i see things going, it can go one or two ways. Either consumer volta will have 6SM per GPC or 7SM per GPC with 128FP32 cores per SM. Although i don't know if they will put 2xFP16 capable FP32 cores in consumer Volta, like they didn't with consumer Pascal. Consumer pascal had dedicated FP32 shaders, which is why they couldn't run FP16 at 1:1 at the least.

All of these assume clocks at 1600-1800, with the theoretical considering 1800 and not the base clock, since the majority of Pascal cards run at 1800 and not the base clock which their theoretical performance is calculated with.

So with 6SM per GPC we get:
GV102 6GPC - 4608 - 16.5 Tflops
GV104 4GPC - 3072 - 11.1 Tflops
GV106 2GPC - 1536 - 5.5 Tflops
GV108 1GPC - 768 - 2.7 Tflops

With 7SM per GPC we get:
GV102 6GPC - 5376 - 19.3 Tflops (same as uncut GV100)
GV104 4GPC - 3584 - 12.9 Tflops (Same as Cutdown GP102 / 1080Ti)
GV106 2GPC - 1792 - 6.5 Tflops
Gv108 1GPC - 896 - 3.2 Tflops


Either Nvidia will cheap out on cores to save die size and go with option 1. And instead set base clocks to 1800 - 2000 MHz
Or they go with Option 2 and have the same clock range as Pascal. But then they have to deal with making larger Die's, 12nm does not offer a large density saving, it is only an improvement on 16nm, not an entirely new node.


The other thing is that consumer volta will only have the FP32 cores, no INT 32/16, Tensor or FP64 cores.

I would assume that for consumer versions they'd go with option 1 to save on die space. Anything above 550mm2 is stretching it (for consumer cards). I think Fiji and 980ti were the biggest chips at around 600...

But will this mean that Volta consumer will be 16.5Tflops for both FP32 and FP16 or does it double for FP16? I was under the impression Vega runs 2xFP16 even in the consumer version, no?
 
Soldato
Joined
7 Feb 2015
Posts
2,864
Location
South West
I would assume that for consumer versions they'd go with option 1 to save on die space. Anything above 550mm2 is stretching it (for consumer cards). I think Fiji and 980ti were the biggest chips at around 600...
But will this mean that Volta consumer will be 16.5Tflops for both FP32 and FP16 or does it double for FP16? I was under the impression Vega runs 2xFP16 even in the consumer version, no?

I think that unless they feel that AMD has forced their hand to include 2xFP16 on consumer cards, then i think they will gimp it again.

Vega does have 2xFP16 even on consumer parts. And AMD will have game devs take advantage of it, they can get ~30% more performance in some situations with a few effects.
 
Associate
Joined
30 May 2016
Posts
620
I think that unless they feel that AMD has forced their hand to include 2xFP16 on consumer cards, then i think they will gimp it again.

Vega does have 2xFP16 even on consumer parts. And AMD will have game devs take advantage of it, they can get ~30% more performance in some situations with a few effects.

Exactly and it seems to me like something that could give AMD a huge advantage... I hope developers DO take advantage of it and Nvidia's hand DOES get forced. We need things to push forward.

I was also just reading this from their developer blog:

Volta’s independent thread scheduling allows the GPU to yield execution of any thread, either to make better use of execution resources or to allow one thread to wait for data to be produced by another. To maximize parallel efficiency, Volta includes a schedule optimizer which determines how to group active threads from the same warp together into SIMT units. This retains the high throughput of SIMT execution as in prior NVIDIA GPUs, but with much more flexibility: threads can now diverge and reconverge at sub-warp granularity, and Volta will still group together threads which are executing the same code and run them in parallel.

and then:

It is interesting to note that Figure 12 does not show execution of statement Z by all threads in the warp at the same time. This is because the scheduler must conservatively assume that Z may produce data required by other divergent branches of execution in which case it would be unsafe to automatically enforce reconvergence. In the common case where A, B, X, and Y do not consist of synchronizing operations, the scheduler can identify that it is safe for the warp to naturally reconverge on Z, as on prior architectures.

It seems to me that Nvidia is going back to 'proper-async compute' with Volta similar to how GCN works. Lots more register files and hardware needed. It'll be really interesting to see how consumer Volta stacks up against Vega in the consumer space...
 
Soldato
Joined
26 Oct 2013
Posts
4,012
Location
Scotland
Does anyone else find it unusual that with Volta being shown off already we still have no information at all on what is coming after Volta?

With AMD we know that Navi follows Vega and have known for a long time and know what some changes and amendments that are being aimed for with Navi.

On the CPU side we know a bunch of Intel's next steps. Kaby Lake > Coffee Lake > CannonLake > Ice Lake and AMD will be Zen > Zen+ > Zen 2 > Zen 3.

Is it just me that thinks it's strange for Nvidia to not even mention a code name for Voltas successor?
 
Man of Honour
Joined
13 Oct 2006
Posts
91,114
Is it just me that thinks it's strange for Nvidia to not even mention a code name for Voltas successor?

Ages back they pencilled in Einstein on 10nm but then it was pushed aside for Pascal and Volta IIRC so might still be their long term plan. (I suspect it will get renamed to something else though).
 
Last edited:
Soldato
Joined
26 Oct 2013
Posts
4,012
Location
Scotland
Ages back they pencilled in Einstein on 10nm but then it was pushed aside for Pascal and Volta IIRC so might still be their long term plan. (I suspect it will get renamed to something else though).

Ah Thanks, can't say I have heard of that before. Volta has been mentioned since slides in the Kepler days but nothing past that at all to my knowledge.
 
Back
Top Bottom