Volta is here...ish...

orbitalwalsh · 29 Sep 2016 at 20:06

Hasn't seen this posted up yet

videocardz.com/63744/nvidia-launches-xavier-soc-with-volta-gpu

Quartz · 29 Sep 2016 at 20:21

January 2018?

Pottsey · 29 Sep 2016 at 20:51

16nm FinFET in 2018 that's going by a very outdated SoC.

Panos · 29 Sep 2016 at 22:50

Aye. AMD going 7nm in 2018 both on CPU and GPU....

r7slayer · 30 Sep 2016 at 00:24

Its a car SoC... Nothing to do with gaming.

drunkenmaster · 30 Sep 2016 at 00:36

All of a sudden Drive PX2 is 80W????

When they were talking up Drive PX2 they were calling it what a 250W part and it was listed at 24 fake DLflops or 3 x the FP32 8Tflop rating. It's apparently quietly become a 80W part with 20fake DLflops but still the 8TF FP32. So their initial DLTF was made up, and how did it go from 250W to 80W? Then it's going to, on a similar process, drop to a single chip with the same performance using 20W? 1 to 4 chips and that power drop, possibly, 4 chips down to one on the same process, same power? Seems unlikely.

AllBodies · 30 Sep 2016 at 12:20

I guess since this is over 4x the performance per watt on the same node, it must only be in this specific metric. Seems totally impossible otherwise, and it'll surely be a far smaller improvement in 'vanilla' FP32.

So even though Pascal is supposed to be very impressive for deep learning/AI calculations, Volta is even more ridiculous on that front?

bru · 30 Sep 2016 at 14:32

drunkenmaster said:
All of a sudden Drive PX2 is 80W????

When they were talking up Drive PX2 they were calling it what a 250W part and it was listed at 24 fake DLflops or 3 x the FP32 8Tflop rating. It's apparently quietly become a 80W part with 20fake DLflops but still the 8TF FP32. So their initial DLTF was made up, and how did it go from 250W to 80W? Then it's going to, on a similar process, drop to a single chip with the same performance using 20W? 1 to 4 chips and that power drop, possibly, 4 chips down to one on the same process, same power? Seems unlikely.

Have to agree, something very fishy about these new numbers.

bru · 30 Sep 2016 at 14:34

AllBodies said:
I guess since this is over 4x the performance per watt on the same node, it must only be in this specific metric. Seems totally impossible otherwise, and it'll surely be a far smaller improvement in 'vanilla' FP32.

So even though Pascal is supposed to be very impressive for deep learning/AI calculations, Volta is even more ridiculous on that front?

Hehe yup probably just as it boots, 20w only just before it says ok time to wake up now give me POWWWWEEERRRRR.

Kaapstad · 30 Sep 2016 at 14:39

drunkenmaster said:
All of a sudden Drive PX2 is 80W????

When they were talking up Drive PX2 they were calling it what a 250W part and it was listed at 24 fake DLflops or 3 x the FP32 8Tflop rating. It's apparently quietly become a 80W part with 20fake DLflops but still the 8TF FP32. So their initial DLTF was made up, and how did it go from 250W to 80W? Then it's going to, on a similar process, drop to a single chip with the same performance using 20W? 1 to 4 chips and that power drop, possibly, 4 chips down to one on the same process, same power? Seems unlikely.

Get stuck in DM !!!

For once I fully support you as there is so much rubbish being written about future NVidia products.

drunkenmaster · 30 Sep 2016 at 14:51

I still don't even know what the direct specs are for a Drive PX2, everyone was saying it was likely a GP106 used in it. So you've got 2x 256cuda core Tegra chips and 2 GP106 on die, giving 8TF FP32 total. So you're talking about basically 3000 Cuda cores on 16nm TF for Drive PX2 to achieve that performance.

Then the new one is going to be a single chip with 512 Cuda cores and have the same performance? Thing is, Nvidia are the ones seemingly giving this out, Anandtech have posted the same information... something is extremely odd about the numbers.

EDIT:- Anandtech and Nvidia blog don't mention 8TF FP32, so I think wccf and others have just assumed it didn't lose FP32 performance, but that seems nearly impossible as for 512Cuda cores to match ~3000, it would need to suddenly be clocked 6 times higher or be 600% more efficient or any combination which adds up to 6x more performance per core which just isn't happening. Originally it was implied that deep learning ops were a based off total cuda core power. So 8TF FP32 performance meant 3 ops for deep learning per clock and 24DLTF. But the biggest gain was image processing and the images showed 2 or 3 third party chips on the PCB, one of which was a known image processing SOC.

If this chip doesn't have 8TF but has the same DPTF, it actually indicates the massive majority of the deep learning ops is image processing and the majority(or maybe absolutely all of it) is done on this third party image processing chip. Meaning, deep learning happens off the Nvidia chips almost entirely, so the 2 discrete GPUs on the PX2 were basically unused. It could also be the reason why the PX2 went from 250 to 80W out of nowhere.... turn off the discrete GPUs as they were unnecessary? So two socs, which maybe were also wasted, turns into one lower clocked better optimised soc and the work is still done on the third party chip hence no drop off in deep learning performance.

In which case PX2 was a con and Nvidia is still using third party chips to provide the performance they need. This isn't new either, almost all their car systems afaik used third party chips to do most of the work.

Regardless of what you think about my biased level against Nvidia, the only way I can think of for them to go from 250W and 3000 Cuda cores, to 20W and 512 Cuda cores and lose zero deep learning performance... is if the performance simply doesn't come from their chips.

AllBodies · 1 Oct 2016 at 00:29

drunkenmaster said:
I still don't even know what the direct specs are for a Drive PX2, everyone was saying it was likely a GP106 used in it. So you've got 2x 256cuda core Tegra chips and 2 GP106 on die, giving 8TF FP32 total. So you're talking about basically 3000 Cuda cores on 16nm TF for Drive PX2 to achieve that performance.

Then the new one is going to be a single chip with 512 Cuda cores and have the same performance? Thing is, Nvidia are the ones seemingly giving this out, Anandtech have posted the same information... something is extremely odd about the numbers.

EDIT:- Anandtech and Nvidia blog don't mention 8TF FP32, so I think wccf and others have just assumed it didn't lose FP32 performance, but that seems nearly impossible as for 512Cuda cores to match ~3000, it would need to suddenly be clocked 6 times higher or be 600% more efficient or any combination which adds up to 6x more performance per core which just isn't happening. Originally it was implied that deep learning ops were a based off total cuda core power. So 8TF FP32 performance meant 3 ops for deep learning per clock and 24DLTF. But the biggest gain was image processing and the images showed 2 or 3 third party chips on the PCB, one of which was a known image processing SOC.

If this chip doesn't have 8TF but has the same DPTF, it actually indicates the massive majority of the deep learning ops is image processing and the majority(or maybe absolutely all of it) is done on this third party image processing chip. Meaning, deep learning happens off the Nvidia chips almost entirely, so the 2 discrete GPUs on the PX2 were basically unused. It could also be the reason why the PX2 went from 250 to 80W out of nowhere.... turn off the discrete GPUs as they were unnecessary? So two socs, which maybe were also wasted, turns into one lower clocked better optimised soc and the work is still done on the third party chip hence no drop off in deep learning performance.

In which case PX2 was a con and Nvidia is still using third party chips to provide the performance they need. This isn't new either, almost all their car systems afaik used third party chips to do most of the work.

Regardless of what you think about my biased level against Nvidia, the only way I can think of for them to go from 250W and 3000 Cuda cores, to 20W and 512 Cuda cores and lose zero deep learning performance... is if the performance simply doesn't come from their chips.

I don't know if this helps it add up, but I watched a tech podcast (can't remember the name sorry) where they were talking about the upcoming workstation Pascal cards.

They said deep learning and image processing were going to be run 8-bit, or quarter-precision on Pascal, because supposedly the increase in precision going above that didn't translate to a meaningful difference in the real world application.

So 8TFlops 32-bit/single-precision on Pascal should be 32 TFlops for deep learning and image processing.

Then Volta might be designed to give far more TFlops per watt specifically for 8-bit? Much in the same way Maxwell was **** for 64-bit perf/w, but really good for 32-bit.

Pottsey · 1 Oct 2016 at 08:23

Its common for Soc's to drop FP32 units and go for extra FP16 units or just less 32 units as that saves a lot of power and space. In certain markets FP32 doesn't do anything but waste power and space.

drunkenmaster · 1 Oct 2016 at 10:14

AllBodies said:
I don't know if this helps it add up, but I watched a tech podcast (can't remember the name sorry) where they were talking about the upcoming workstation Pascal cards.

They said deep learning and image processing were going to be run 8-bit, or quarter-precision on Pascal, because supposedly the increase in precision going above that didn't translate to a meaningful difference in the real world application.

So 8TFlops 32-bit/single-precision on Pascal should be 32 TFlops for deep learning and image processing.

Then Volta might be designed to give far more TFlops per watt specifically for 8-bit? Much in the same way Maxwell was **** for 64-bit perf/w, but really good for 32-bit.

The issue is that they say they had 8TF FP32 from 2 discrete Gp106/2x Tegra all on 16nm with PX2 giving 20 deep learning TF in performance. INitially this was called 250W, then somehow out of nowhere this is now listed as 80W( from the same product with the same chips). Now they are saying this single 20W Soc that only has 512 Cuda cores total, compared to ~3000 on the PX2, gives the same deep learning TF. Now all 3000 cuda cores previously could obviously do one small number each, even if they've changed to 8bit shader cores(which I don't think is likely), there are still 1/6th of the number their used to be.

The whole thing is just very very strange.

Rroff · 1 Oct 2016 at 10:27

The original was stated as "upto" 250 watt power draw and didn't state typical power use it was also not a TDP figure (though they are usually closely related). The new specs are a TDP of 80 watt though again doesn't state typical. While a bit unlikely as the numbers seem a bit suspicious it could be it draws 60 watt typical for both but they've resolved some areas which pushed up max i.e. like you said unused parts of the cores for its main processing.

EDIT: Again I would urge caution with regard to Volta - I don't have any details but a couple of people I know who do have made comments along the lines of "people have the completely wrong end of the stick with Volta".

Competitor rules

Volta is here...ish...

More options

orbitalwalsh

orbitalwalsh

Quartz

Quartz

Pottsey

Pottsey

Panos

Panos

r7slayer

r7slayer

drunkenmaster

drunkenmaster

AllBodies

AllBodies

bru

bru

bru

bru

Kaapstad

Kaapstad

drunkenmaster

drunkenmaster

AllBodies

AllBodies

Pottsey

Pottsey

drunkenmaster

drunkenmaster

Rroff

Rroff