The NVidia GV100 News Thread

D.P. · 11 May 2017 at 17:16

nashathedog said:
That's why I can't understand why all the Gamers were getting excited about it, We're never going to see a GTX Gaming card with anything close to this, Are we?

The gaming GPUs will be even better because they wont have the non-gaming support for FP64 and so on. The fact that the CUDA cores are now 50% more efficient will have a big impact, as will the new thread scheduling.

Plec · 11 May 2017 at 17:26

Rroff said:
No but the tech advancement will likely bring other advantages with it that can be applied to GeForce and likewise the potential stats when stripping out the other stuff gives room for a lot of potential.

Agreed, very much like F1: areodynamics, KERS, active suspension, ABS etc... have all been adapted and variations adopted in everyday road cars over the years.

Just hope they adopt sooner rather than later.

AllBodies · 11 May 2017 at 17:31

I was surprised how large and powerful the GV100 is to be honest, especially including Tensor cores.

Of course the GTX 2080 will not have the FP64 cores, and probably also not the Tensor cores exactly as-is, but I wonder if Nvidia will offer some mixed-precision stuff for FP16 and/or FP8 like AMD are doing with Vega.

AMD has shown you can use half-precision (FP16) for certain tasks without losing visual quality (like hair rendering).

It'll be interesting if we see large performance gains in game engines if they start utilising FP16/FP8 workloads for suitable tasks. Could this be a sign of the hardware starting to support that?

D.P. · 11 May 2017 at 17:38

AllBodies said:
I was surprised how large and powerful the GV100 is to be honest, especially including Tensor cores.

Of course the GTX 2080 will not have the FP64 cores, and probably also not the Tensor cores exactly as-is, but I wonder if Nvidia will offer some mixed-precision stuff for FP16 and/or FP8 like AMD are doing with Vega.

AMD has shown you can use half-precision (FP16) for certain tasks without losing visual quality (like hair rendering).

It'll be interesting if we see large performance gains in game engines if they start utilising FP16/FP8 workloads for suitable tasks. Could this be a sign of the hardware starting to support that?

Consumer Volta will most liekly have FP16 support. Pascal Gp100 already had it and it isn't clear if consumer pascal has it but its disabled in drivers or its missing entirely You can specify FP16 variables but there is not a 2x speed up, but potential you get the improvements of the smaller size with less register pressure and better cache coherence etc. It is always a trade-off though, it may just be better to spend the transistors on additional FP32 cores.

Duff-Man · 11 May 2017 at 18:07

Quartz said:
Would you care to explain this for numpties like me?

Okay - I'll give it a try!

Think of a matrix as like a 2D array of numbers; say N x N numbers arranged in a table.

Most scientific computing makes heavy use of "matrix - vector" multiplication, where the matrix is multiplied by a 1D array of numbers (a vector). For a matrix of size "N", this takes "N^2" floating point operations. If you want to make this sort of operation parallel, you are usually limited by how fast you can transfer data around the machine (i.e. by memory bandwidth). For these sorts of applications GPUs aren't so effective, because you saturate the memory bandwidth long before you max out the compute capability.

But, for certain scientific computing applications, you can formulate the problem in terms of matrix-matrix multiplications instead. Here you multiply one matrix (2D array) with another. This costs "N^3" operations, so you end up transferring a similar amount of data around, but need "N" times more computations. These are the sort of applications that benefit the most from GPUs, because you can unleash the full computing power of the GPU (... at least if your matrices are big enough, i.e. for large N).

Previously, once you had moved your two matrices into the GPU memory, you would use the standard FP32 or FP64 cores to do the calculation - that is, you set up a list of instructions for the CUDA cores to carry out in order to do the multiplication. With the "tensor cores" it seems that the *entire matrix-matrix multiplication* is done in hardware. So, the only thing you can send to tensor cores is a pair of matrices, but you will get the result much much faster than by going through the general purpose FP32 cores.

So, for any scientific application that relies heavily on matrix-matrix multiplications (where all / most of the numbers in the matrix are non-zero), could see a further ~10x speedup from this setup, on top of the (probably) 20 - 100x speedup they already see over using a CPU.

For applications like machine learning, or molecular mechanics simulation, you could see a very real 10x speedup (if Nvidias numbers are to be believed). For applications like finite element analysis, or computational fluid dynamics, it's not going to help at all.

Rroff · 11 May 2017 at 18:11

Quick and dirty explanation is that matrix operations lets you process a load of numbers like an assembly line, you do a group at a time rather than go through each operation one by one.

Duff-Man · 11 May 2017 at 18:16

Rroff said:
Quick and dirty explanation is that matrix operations lets you process a load of numbers like an assembly line, you do a group at a time rather than go through each operation one by one.

Yes, pretty much. ... but only a very small subset of algorithms can be formulated in such a way as to make use of this.

Basically, for the applications that already make efficient use of GPUs, this will be an absolute godsend. For everyone else, it's unlikely to make any difference at all.

Duff-Man · 11 May 2017 at 18:21

AllBodies said:
It'll be interesting if we see large performance gains in game engines if they start utilising FP16/FP8 workloads for suitable tasks. Could this be a sign of the hardware starting to support that?

I think so.

Mixed / half precision is quite a hot topic at the moment. As games get more complex there should be plenty of opportunity to drop various algorithms down to half precision. I imagine that having access to improved performance for FP16 / FP8 will be a lot more useful to game developers than (say) tensor cores or a big stack of FP64.

AllBodies · 11 May 2017 at 20:41

Duff-Man said:
I think so.

Mixed / half precision is quite a hot topic at the moment. As games get more complex there should be plenty of opportunity to drop various algorithms down to half precision. I imagine that having access to improved performance for FP16 / FP8 will be a lot more useful to game developers than (say) tensor cores or a big stack of FP64.

Could be interesting if 4K is cheap & easy to run as soon as 2018/2019 through a combo of hardware changes and mixed precision being adopted in games.

Quartz · 11 May 2017 at 22:36

Duff-Man said:
Okay - I'll give it a try!

Very clear, thank you. Will this help games too?

hyperseven · 12 May 2017 at 00:31

Duff-Man · 12 May 2017 at 12:27

Quartz said:
Very clear, thank you. Will this help games too?

Not really... This mostly comes into play for certain types of complex scientific simulation.

Perhaps it'll "unlock" certain algorithms for real-time implementation, allowing developers to try new things, but I can't think what off the top of my head. To be honest I'm not expecting the tensor cores to be present (or at least active) in the Geforece line.

Bug One · 12 May 2017 at 13:26

That looks awesome. But will it be able to run Crysis?

D.P. · 12 May 2017 at 15:35

Duff-Man said:
Not really... This mostly comes into play for certain types of complex scientific simulation.

Perhaps it'll "unlock" certain algorithms for real-time implementation, allowing developers to try new things, but I can't think what off the top of my head. To be honest I'm not expecting the tensor cores to be present (or at least active) in the Geforece line.

I doubt it in the short term. but you know what, I think there could be some amazing use of it in the future. Deep-learning is taking over so many fields, applying it to computer games could be the next big thing and then of course tensor cores would be perfect. There is obvious things like enemy AI but there are other tings where deep-learning could be used for graphical effects or understanding what the player is doing.

Duff-Man · 12 May 2017 at 16:02

D.P. said:
I doubt it in the short term. but you know what, I think there could be some amazing use of it in the future. Deep-learning is taking over so many fields, applying it to computer games could be the next big thing and then of course tensor cores would be perfect. There is obvious things like enemy AI but there are other tings where deep-learning could be used for graphical effects or understanding what the player is doing.

Hmm... "Deep learning" is generally a long, intricate process that's best suited to dealing with massive amounts of loosely-correlated data. Not really suitable for running in real-time in traditional "GPU heavy" applications like FPS or similar.

Could be very interesting for strategy games though I suppose... Here you wouldn't be constrained to doing updates at every frame, or keeping everything synchronised. The machine learning could essentially run as a background process, taking advantage of any unoccupied resources. I can imagine it being useful for something like an RTS or turn-based strategy. Could be used to adaptively develop enemy tactics based on your own moves for example.

chef-borjan · 12 May 2017 at 16:20

So... buy Nvidia shares right? haha.

AllBodies · 12 May 2017 at 16:25

Duff-Man said:
Hmm... "Deep learning" is generally a long, intricate process that's best suited to dealing with massive amounts of loosely-correlated data. Not really suitable for running in real-time in traditional "GPU heavy" applications like FPS or similar.

Could be very interesting for strategy games though I suppose... Here you wouldn't be constrained to doing updates at every frame, or keeping everything synchronised. The machine learning could essentially run as a background process, taking advantage of any unoccupied resources. I can imagine it being useful for something like an RTS or turn-based strategy. Could be used to adaptively develop enemy tactics based on your own moves for example.

You're thinking about the training part, not the utilisation part. You can have a (pre-trained) neural network identify whether it's looking at a cat, or construction worker, or bike, or house, etc. in a single picture. Just as an example.

D.P. said:
I doubt it in the short term. but you know what, I think there could be some amazing use of it in the future. Deep-learning is taking over so many fields, applying it to computer games could be the next big thing and then of course tensor cores would be perfect. There is obvious things like enemy AI but there are other tings where deep-learning could be used for graphical effects or understanding what the player is doing.

It could be used for enemy AI that just makes smarter choices, and/or ACTUALLY learns from you as it plays you. The possibility of this should become clear when DeepMind play at the Starcraft Tournament later this year.

Also it could be used for some interesting productivity-boosting things, like procedural generation. Procedural generation at the moment is slightly bad/boring, partly because it works in a similar way to current enemy AI (which isn't really AI) technology. If a neural network could be trained to produce environments/buildings/etc. which were on a par with human hand-placed assets, THAT would be interesting.

chef-borjan said:
So... buy Nvidia shares right? haha.

Look at what's happened to their price since Feb 2016 :eek:

D.P. · 12 May 2017 at 16:31

chef-borjan said:
So... buy Nvidia shares right? haha.

Too late, Nvidia shares shot up nearly 30% since the Volta announcement. Now at a record high

JediFragger · 12 May 2017 at 16:34

D.P. said:
Too late, Nvidia shares shot up nearly 30% since the Volta announcement. Now at a record high

Revenue of $2 billion for the last reported Quarter wouldn't have hurt either!!! :eek:

D.P. · 12 May 2017 at 16:36

Duff-Man said:
Hmm... "Deep learning" is generally a long, intricate process that's best suited to dealing with massive amounts of loosely-correlated data. Not really suitable for running in real-time in traditional "GPU heavy" applications like FPS or similar.

Could be very interesting for strategy games though I suppose... Here you wouldn't be constrained to doing updates at every frame, or keeping everything synchronised. The machine learning could essentially run as a background process, taking advantage of any unoccupied resources. I can imagine it being useful for something like an RTS or turn-based strategy. Could be used to adaptively develop enemy tactics based on your own moves for example.

As AllBodies pointed out, you seem to have mixed up the deep neural network training with the inference. Training is incredibly computationally expensive, can take weeks on a large server farm. The inference is relatively fast and the applications are typcially used in real-time. Whenever you talk your Android phone it uses deep-learning to do speech recognition. Deep-learning is at the core of all autonomous vehicle technologies right now, analyzing the entire environment around them form multiple 4K camera and LIDAR sensors with millions of data points, all analyzed at 20-100 hertz.