• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

** The Official Nvidia GeForce 'Pascal' Thread - for general gossip and discussions **

If the max Cuda cores is 3840 that's not really impressive... Let's make a Volta thread :D :p

Dude this is just tesla, we were discussing and there's no point in focusing all the proccess to fp64 untill you have decided to make this card just for computational resources that i think is the case.


For gaming we will see in may/june i hope the real gaming chip, this one for sure is just for tesla.
 
Perhaps someone can answer something for me....

This has 64 FP32 cores and 32 FP64 cores per SM, I assume that the FP64 cores are probably slightly larger than the FP32 cores, so over 1/3rd the shader die area is dedicated to FP64.

I understand that the FP32 cores can now run two FP16 ops per clock, giving it a 4:2:1 ratio of FP16:FP32:FP64 performance.

If they used whatever technique they have used to make two FP16 operations run in a single FP32 core in a single clock to make two FP32 operations run in a single FP64 core in a single clock, could they just replace all FP32 cores with FP64 cores and get much better performance for the same die area?

Right now it seems like, for gaming which only uses FP32, then 1/3rd the die will be useless.
 
Dude this is just tesla, we were discussing and there's no point in focusing all the proccess to fp64 untill you have decided to make this card just for computational resources that i think is the case.


For gaming we will see in may/june i hope the real gaming chip, this one for sure is just for tesla.

Yields will be hilariously bad, and die costs absolutely brutal. This will be the Titan chip, because they have to use all the highly defective chips for something. It'll just have 16GB instead of 32GB. I can't see its performance or price ever being competitive with Vega.

Hell, its performance is barely competitive with the Fiji Pro Duo for single precision (FP32 - what gaming / commercial VR use) and it'll probably be more than triple the cost and has a huge nodal advantage. I think this puts in perspective why the Oxide games dev hinted that Pascal would not be very efficient compared to Polaris.

I've seen a couple of sites claiming that Pascal does asynchrous too, without any reference (wccftech among them) .... they're wrong by the look of it.

It seems to be an exceptionally large, low density, power hungry Maxwell with DP added back and HBM / NVLink.

From poster Muziqaz on semiaccurate: (looks like no significant architectural improvements atall)

"Did anyone take a look at that Compute Capability 6.0?
Compared to Maxwell Pascal has:
Same amount of Threads/Warp
Same amount of Max Warps/MP
Same amount of Max Threads/MP
Same amount of Max Thread Blocks/MP
Same amount of Max 32-bit Regs/SM
Same amount of Max Regs/Thread
Same size of Max Thread Block
Double Max Regs/Block
Half The CUDA Cores
32k bytes less of Shared Memory

So it is basically Maxwell on smaller node with jacked up frequency and added DP."
 
Last edited:
Perhaps someone can answer something for me....

This has 64 FP32 cores and 32 FP64 cores per SM, I assume that the FP64 cores are probably slightly larger than the FP32 cores, so over 1/3rd the shader die area is dedicated to FP64.

I understand that the FP32 cores can now run two FP16 ops per clock, giving it a 4:2:1 ratio of FP16:FP32:FP64 performance.

If they used whatever technique they have used to make two FP16 operations run in a single FP32 core in a single clock to make two FP32 operations run in a single FP64 core in a single clock, could they just replace all FP32 cores with FP64 cores and get much better performance for the same die area?

Right now it seems like, for gaming which only uses FP32, then 1/3rd the die will be useless.

No. Maxwell / Pascal are not workload agnostic. GCN is, Volta may be. They need dedicated hardware for FP64, hence why this thing is so huge and yet has so few shader units. Hence why Maxwell looked so 'efficient' for gaming despite a rather decrepid architecture.
 
I can see them having dedicated consumer models that just have SP shaders with no DP shaders in hardware. Nvidia's architecture is not as scalable as GCN so they need far more silicon and space to perform DP, hence why GP100 is so large.

They may never use GP100 in consumer parts this round, they may even drop a new level of part called GP102 or something that replaces all the DP shaders for SP. but it depends if Nvidia want to drop consumer parts at such a large die size already. Those parts are more than likely already expensive to manufacture due to yields.


going over to AMD.

If vega is 4096 shaders like fiji, then i can see Vega being around 300-350mm^2 while having better SP and DP performance than GP100.

The main points being that fiji would only need to be slightly larger to accommodate the readded DP compute support circuitry. And although this would probably push it over 600mm^2 on 28nm, when shrunk to LPP it should be around 300-350MM^2 due to the slightly under 1/2 area shrink that 14nm LPP has compared to just around 1/2 with 16nm FF+.

Since fiji already has 8Tflops SP at 1ghz, adding another 40-50% on the clocks gives it more SP compute than GP100 at a smaller size. Then add any improvements due to GCN 4.0 and it will probably be even higher yet again.

Then include efficiency improvement and the smaller die size, it could probably run at up to 2ghz giving it greater than 16Tflops of SP going by fiji values. it could even break 20Tflops SP at 2ghz with GCN 4.0 cores.
 
Like I said P100 is TP100 and GP100.

They need to use the defective dies for Titans or the economics don't work. It certainly won't be competitive with whatever AMD launch, but I think NVIDIA would prefer to get an absolute hiding in high end consumer space, and ultimately HPC too if it means they get a 3 month lead on the HPC DP Tesla cards launching before Vega. It virtually guarantees that however bad the next 18 months are for them, in the short term they'll have some huge hpc contracts, saving their bacon.

Re: Vega being 4096 shaders, that report was totally made up. There haven't been any leaks yet.
 
Last edited:
Like I said P100 is TP100 and GP100.

They need to use the defective dies for Titans or the economics don't work.

They don't have to use the defective dies in titan or consumer cards, their entire lineup of Tesla cards can be filled with just GP100. With the defective parts making up the lower tier models.

They could even trickle into the highest end quadro, but this round we may not see GP100 in consumer parts. Would be surprised if they did since it is such a waste of silicon in the consumer space. and would more than likely be far more expensive than past titans due to the die size and DP compute level of the hardware.
 
They don't have to use the defective dies in titan or consumer cards, their entire lineup of Tesla cards can be filled with just GP100. With the defective parts making up the lower tier models.

They could even trickle into the highest end quadro, but this round we may not see GP100 in consumer parts. Would be surprised if they did since it is such a waste of silicon in the consumer space. and would more than likely be far more expensive than past titans due to the die size and DP compute level of the hardware.

The highly defective ones won't be suitable for HPC products. They'll lack performance and won't be able to sustain 24/7 workloads.

This is Titan, and yes it will need to have Quadro SKUs too.

They probably won't have any fully working dies to begin with, and defect rates will be huge at this die size on new node. To make 1 Tesla card there'll be dozens if not hundreds of Titan / Quadro binned chips.
 
How are people talking about the expected yields manufacturers will achieve in production and what they will do with the parts that don't meet the grade with such authority?

Is this genuinely people in the know or is it just conjecture, is this people who think they know what they're talking about?
 
How are people talking about the expected yields manufacturers will achieve in production and what they will do with the parts that don't meet the grade with such authority?

Is this genuinely people in the know or is it just conjecture, is this people who think they know what they're talking about?


You will be in for it now :D
 
Lets play a game of wait till computex :D

In a month's time, "Let's play wait until September for the actual cards."

The launch of this gen has been, and continues to be, painful to live through (for those of us who need a new card pronto! Need, yes. Mine is dying :( ).
 
Think i'll skip this gen.

I got the itch to upgrade last year a few months before Skylake came out, bought an X99 system and gladly never regretted it upon Skylake's disappointing eventual release.

I bought a 980 Ti five months ago, a beast of a GPU that should last till Volta, which will iron out all the flaws Pascal was introduce with everything being brand new.
 
How are people talking about the expected yields manufacturers will achieve in production and what they will do with the parts that don't meet the grade with such authority?

Is this genuinely people in the know or is it just conjecture, is this people who think they know what they're talking about?

It's how the industry's economics work. Chip binning and subsequent different SKUs and market segments. Re: yields, it couldn't not be disastrous at 600mm2 on a new node.
 
Back
Top Bottom