How many SM's in full fat TU-102 and TU-104 ?

ToOo · 21 Aug 2018 at 20:39

As far as i can see this information hasn't been publically anounced but i'm thinking maybe we have some information to work from.

Nvidia's Maxwell and Pascal architectures have 128 ALUs (aka cuda cores) per Streaming Multiprocessor (SM) group. But what about Turing, does that hold true ?

GP-102 for example has 30SMs for a total of 3840 cuda cores but this was only ever fully released in the Titan Xp, while Titan X(p) and 1080Ti only ever exposed 28SMs.

The image below is from the RTX2080TI page and i belive it represents TU-102. I've highlighted it for clarity but it would seem to show 6 sets of 2x3 units arranged in two blocks on either side of the chip, in theory correlate to 36 SMs in total for TU102 if we are talking 128 ALUs per SM, it's worth noting that i could be out by a factor of 2 and TU-102 might have 72 SMs if it's 64 ALUs per SM as with Volta.

Since we know that RTX2080TI is 4352 cuda cores, that would imply only 34(68) of the possible 36(72) SM's are activated which leaves room for a theoretical 4608 cuda core Titan R card to sit above 2080TI in the product stack.

Unfortunately i can't seem to find a simlar graphic for TU-104 so there's not really enough information to speculate further.

EDIT: As Muon points out below just look at RTX 5000 and 6000, TU-104 = 3072 and TU-102 = 4608 cores.

SupernovaUK · 21 Aug 2018 at 22:26

I'm confused by all the negative speculation around the 2080Ti regarding its relative performance compared to the 1080Ti.

Ignoring the ray tracing and AI components of the chip there are 768 additional SMs over the 1080Ti (4352 - 3584). Not sure how the clock speeds will play out but they should be very close.

I watched the Nvidia launch and heard Jensen Huang saying there is a new architecture for these chips and the SMs will have significantly improved performance over Pascal. So just on additional SMs and the new architecture surely these cards will be significantly faster than a 1080Ti before any additional new capabilities come into play. When you add in the sizable uplift in memory bandwidth and the new capabilities of the AI segment of the chip to improve performance I'm expecting it to be a sizable step up in performance over pascal.

I'm looking forward to the benchmarks and seeing if all this negative speculation is justified.

Minstadave · 21 Aug 2018 at 22:38

SupernovaUK said:
I'm confused by all the negative speculation around the 2080Ti regarding its relative performance compared to the 1080Ti.

Ignoring the ray tracing and AI components of the chip there are 768 additional SMs over the 1080Ti (4352 - 3584). Not sure how the clock speeds will play out but they should be very close.

I watched the Nvidia launch and heard Jensen Huang saying there is a new architecture for these chips and the SMs will have significantly improved performance over Pascal. So just on additional SMs and the new architecture surely these cards will be significantly faster than a 1080Ti before any additional new capabilities come into play. When you add in the sizable uplift in memory bandwidth and the new capabilities of the AI segment of the chip to improve performance I'm expecting it to be a sizable step up in performance over pascal.

I'm looking forward to the benchmarks and seeing if all this negative speculation is justified.

If the performance was something to write home about, we’d have been told about it.

Yes the 2080Ti has a fair few more CUDA cores but that ignores the fact that it’s an entire price tier more expensive than the 1080Ti.

crinkleshoes · 21 Aug 2018 at 22:48

SP alone at same clocks would be a 20% bump.

With IPC improvements and GDDR6 bandwidth improvements... I'm hoping for closer to 30%

iakhtar · 21 Aug 2018 at 23:14

We don't know anything about heat or power either, it might not be able to sustain high clocks as well as pascal for example, wouldn't be surprising with that big die.

muon · 22 Aug 2018 at 00:39

Isn't it simply a case of looking at the Quadro RTX.

We know that has 36SMs with 128 shaders (or 64SMs with 64) resulting in 4608 shaders.

Answered.

2080Ti will outperform the 1080Ti, more shaders. But the 2080 has fewer than the 1080Ti by quite some margin.

ToOo · 22 Aug 2018 at 01:56

muon said:
Isn't it simply a case of looking at the Quadro RTX.

Yes, yes it is. I didn't spot that at all. Thanks

AthlonXP1800 · 22 Aug 2018 at 02:53

muon said:
But the 2080 has fewer than the 1080Ti by quite some margin.

Same with 1080 has fewer CUDA cores than 980 Ti.

Kaapstad · 22 Aug 2018 at 04:01

I suspect that the full fat Turing chip will pack 5120 SP cores, 384 bit bus, 12gb of GDDR6 and will appear as a Titan variant some time in the near future.

It will also be called the TU100 chip is my guess.

I also don't think we will see it until NVidia have sold as many 2080 Ti cards as the market will take as the price of the full fat chip will be eye watering.

ubersonic · 22 Aug 2018 at 08:56

ToOo said:
How many SM's in full fat TU-102 and TU-104 ?

I dunno about the TU-102, but the TU-104 has a max takeoff weight of 78,100kg, so factor in fuel and you should be able to get a good 500,000+ on there, it's the physical size of the cards and their boxes that will be the limiting factor.

LeMson · 22 Aug 2018 at 09:02

https://en.m.wikipedia.org/wiki/Tupolev_Tu-102

And

https://en.m.wikipedia.org/wiki/Tupolev_Tu-104

Glad I could help

Mauller · 22 Aug 2018 at 11:58

Kaapstad said:
I suspect that the full fat Turing chip will pack 5120 SP cores, 384 bit bus, 12gb of GDDR6 and will appear as a Titan variant some time in the near future.

It will also be called the TU100 chip is my guess.

I also don't think we will see it until NVidia have sold as many 2080 Ti cards as the market will take as the price of the full fat chip will be eye watering.

I don't think anything like that will happen, not until 7nm at least. But I think Nvidia have fully segmented their compute and gaming with 2 very different architectures. Volta and Turing.

COYS · 22 Aug 2018 at 12:25

iakhtar said:
We don't know anything about heat or power either, it might not be able to sustain high clocks as well as pascal for example, wouldn't be surprising with that big die.

I'm thinking they run hotter, hense the Founders Edition now coming with a dual fan design

Kaapstad · 22 Aug 2018 at 13:07

Mauller said:
I don't think anything like that will happen, not until 7nm at least. But I think Nvidia have fully segmented their compute and gaming with 2 very different architectures. Volta and Turing.

Have a look at the 2080 Ti PCB there is room for 12 memory chips or in other words there is something bigger in the wings.

I think the only reason we don't see the full (5120 SP) chip is yields and cost that goes with it.

Mauller · 22 Aug 2018 at 14:06

Kaapstad said:
Have a look at the 2080 Ti PCB there is room for 12 memory chips or in other words there is something bigger in the wings.

I think the only reason we don't see the full (5120 SP) chip is yields and cost that goes with it.

TU102 is already 754MM^2 in size, there likely is nothing bigger than it till 7nm. And TU102 is 128*6*6 Cuda cores, so 4,608.

They are also using 11 memory Packages with these parts being cut down.

The Quadro's also use completely different PCB's to these consumer parts.

Kaapstad · 22 Aug 2018 at 14:46

Mauller said:
TU102 is already 754MM^2 in size, there likely is nothing bigger than it till 7nm. And TU102 is 128*6*6 Cuda cores, so 4,608.

They are also using 11 memory Packages with these parts being cut down.

The Quadro's also use completely different PCB's to these consumer parts.

GV100 die as used in the Titan V is bigger.

There is a reason NVidia are calling the die used in the 2080 Ti the TU102 and that is because there is a TU100 die lurking around somewhere.

Also have you tried adding 512 + 4608 together, you get a number that has appeared somewhere before, or putting it another way 10% of the TU102 is probably disabled for yield reasons.

Mauller · 22 Aug 2018 at 17:07

Kaapstad said:
GV100 die as used in the Titan V is bigger.

There is a reason NVidia are calling the die used in the 2080 Ti the TU102 and that is because there is a TU100 die lurking around somewhere.

Also have you tried adding 512 + 4608 together, you get a number that has appeared somewhere before, or putting it another way 10% of the TU102 is probably disabled for yield reasons.

512 added doesn't work bud. The Sm count is not right, you would need to add 768 to the overall design giving Volta SM to GPC ratios. But in doing that you need to add more tensor, integer and RT cores to keep the ratio. Which will make it much bigger. Likely bigger than Volta and not worth the extra SM's on 12/16nm

Apparently there are also tiring parts bring made on 16nm, making them around 780mm^2.

Bigger die also means more variability per chip and higher possibility of fewer dies that work to spec. NVidia are already pushing it with TU102 for a consumer part. Anything higher core will be 7nm.

Turing is either going to be transferred to 7nm and become the TU2xx series or it is going to be a very short lived one since they already launched the x80TI part.

Silent_Scone · 22 Aug 2018 at 17:28

Kaapstad said:
I suspect that the full fat Turing chip will pack 5120 SP cores, 384 bit bus, 12gb of GDDR6 and will appear as a Titan variant some time in the near future.

It will also be called the TU100 chip is my guess.

I also don't think we will see it until NVidia have sold as many 2080 Ti cards as the market will take as the price of the full fat chip will be eye watering.

Precicely. It would be closer to Titan V <price>, which is crazy. Power draw would also be interesting lol.

Meaker · 23 Aug 2018 at 14:22

They are not going to pay a billion for the masks to get an extra 10% die area on the same design.

Kaapstad · 23 Aug 2018 at 15:06

Meaker said:
They are not going to pay a billion for the masks to get an extra 10% die area on the same design.

The area is probably already there but used to increase yields on defective chips.

For example the GV100 chip has more than the 5120 cores you actually get to use.

To produce the TU 102 chip with 4352 cores and all its other features for the price and yield NVidia want means using bigger chips with defects on them.