AMD Polaris architecture – GCN 4.0

TwsT · 10 May 2016 at 22:58

Perfect_Chaos said:
I wasn't aware either company would bring out the highest end cards near the end of the year, i thought most were saying next year.

most have been saying 2017 it's absolute hearsay that it will before 2017

Especially with Zen launching aswell

OFC in true AMD style they can change it at anytime but certainly nothing concrete in the roadmap for vega at all. AS you would expect becuase no one releases a chipset with the promise of a better one a few months later even if that's the plan.

flopper · 10 May 2016 at 23:09

BF1+Vega =

andybird123 · 10 May 2016 at 23:11

queamin said:
I think Nvidia have a steep learning curve with interposer's, more so than Amd.

Nvidia arent building interposers, AMD didnt build any interposers, its all done by third parties, probably the same ones

pmc25 · 10 May 2016 at 23:21

Rroff said:
Dunno nVidia have been doing some strange things with regard to interposers so could be they are struggling with it or could mean something completely different but might indicate HBM2 won't come on Pascal any time soon. I don't really buy the SP numbers though unless they are counting both FP32 and 64 in there which is a strange thing to do - nVidia don't usually scale them up like that - when claims are made of them shooting up generation on generation it usually tends to indicate fake info.

If there's a GP102 it likely means that GP100 dies binned as defective will literally be (rubbish) binned. GP102 with similar FP32 to FP64 unit ratio as the GP104 would probably be about right re: 4000+ (they say ~4500) shader units. That doesn't look wrong to me.

I'd assumed that they would have to reclaim the GP100s in consumer products, even if the performance for gaming was a total disaster. But maybe it's just too bad.

This would probably be the biggest advantage AMD have, regardless of whether they turn out to have extraordinary product performance ... the ability to use the same dies for pro graphics and compute products as gaming / multimedia. GCN units are completely agnostic; there is no single precision or double precision hardware. The components of NVIDIA's shader units are dedicated to FP32 or FP64. If they need more FP64 for double precision compute (like supercomputer type workloads), per the GP100, that cripples pro-graphics, gaming and single precision.

queamin · 10 May 2016 at 23:27

andybird123 said:
Nvidia arent building interposers, AMD didnt build any interposers, its all done by third parties, probably the same ones

I know but Amd still had a lot more experience than Nvidia, never mind the experience with fury.

andybird123 · 10 May 2016 at 23:29

pmc25 said:
. GCN units are completely agnostic; there is no single precision or double precision hardware. .

Then why were amd forced to make the Fury catds 1/16 instead of 1/4 or even 1/2 like previous top tier cards?

pmc25 · 10 May 2016 at 23:34

It isn't the same people doing NVIDIA's interposers. It's already been revealed.

NVIDIA went with TSMC rather than UMC. TSMC have only done much smaller, ultra low volume interposers for FPGAs ... GP100 is their first attempt at something big and high(er) volume - GP100 though not a high volume part, will look high volume compared with the FPGAs. They've only relatively recently started devloping them. AMD were working on them for the best part of a decade, and intensively with UMC for the last couple.

stewski · 10 May 2016 at 23:35

Is the gp100 high volume?

Rroff · 10 May 2016 at 23:42

pmc25 said:
If there's a GP102 it likely means that GP100 dies binned as defective will literally be (rubbish) binned. GP102 with similar FP32 to FP64 unit ratio as the GP104 would probably be about right re: 4000+ (they say ~4500) shader units. That doesn't look wrong to me.

I'd assumed that they would have to reclaim the GP100s in consumer products, even if the performance for gaming was a total disaster. But maybe it's just too bad.

This would probably be the biggest advantage AMD have, regardless of whether they turn out to have extraordinary product performance ... the ability to use the same dies for pro graphics and compute products as gaming / multimedia. GCN units are completely agnostic; there is no single precision or double precision hardware. The components of NVIDIA's shader units are dedicated to FP32 or FP64. If they need more FP64 for double precision compute (like supercomputer type workloads), per the GP100, that cripples pro-graphics, gaming and single precision.

GP100 has upto 3840 shader units (FP32) - doesn't make sense for GP102 to have 4500.

pmc25 · 10 May 2016 at 23:44

stewski said:
Is the gp100 high volume?

It was always going to be the lowest volume Pascal part, but looking at the GP100's single precision FLOPS vs GP104 now, it is highly unlikely it ever gets used in a consumer or gaming product. It'd have low performance, unreasonable cost and huge TDP. This means it's literally just going to be FP64 focused supercomputer / Fermi cards.

GP104 / Fiji / Fiji X2 / Polaris / Vega will blow it out of the water for FP32 loads at a fraction of the cost and TDP, at much greater density.

pmc25 · 10 May 2016 at 23:53

Rroff said:
GP100 has upto 3840 shader units (FP32) - doesn't make sense for GP102 to have 4500.

3580 are enabled. Rest are apparently always disabled (or until they get much better yields).

Yes it does. As I say, NVIDIA's shader units have dedicated FP32 and FP64 hardware. If GP100 and GPeverything-else have a lower number of FP64 units per shader, then where is the disconnect you're seeing?

I'm sure someone can work out the ratio from the single precision FLOPS figure disclosed by NVIDIA (something they won't lie about) for GP100 and GP104 and their respective shader unit counts.

If GP104 had the same ratio of FP64 units to FP32 units as a GP100, its FP32 performance would probably be just above a 980, and well below a 980Ti, not above a Titan X.

andybird123 · 10 May 2016 at 23:56

pmc25 said:
It isn't the same people doing NVIDIA's interposers. It's already been revealed.

NVIDIA went with TSMC rather than UMC. TSMC have only done much smaller, ultra low volume interposers for FPGAs ... GP100 is their first attempt at something big and high(er) volume - GP100 though not a high volume part, will look high volume compared with the FPGAs. They've only relatively recently started devloping them. AMD were working on them for the best part of a decade, and intensively with UMC for the last couple.

I can find documents referencing TSMC working on TSV amd interposers going back at least 8 years as well. Its not accurate to say that TSMC has only recently started developing them.

pmc25 · 11 May 2016 at 00:07

andybird123 said:
I can find documents referencing TSMC working on TSV amd interposers going back at least 8 years as well. Its not accurate to say that TSMC has only recently started developing them.

I guarantee it wasn't related to huge die GPUs until NVIDIA tendered a contract. It was all about smaller FPGAs at very low volumes and extraordinary margins (even relative to GP100).

Rroff · 11 May 2016 at 00:07

pmc25 said:
3580 are enabled. Rest are apparently always disabled (or until they get much better yields).

Yes it does. As I say, NVIDIA's shader units have dedicated FP32 and FP64 hardware. If GP100 and GPeverything-else have a lower number of FP64 units per shader, then where is the disconnect you're seeing?

I'm sure someone can work out the ratio from the single precision FLOPS figure disclosed by NVIDIA (something they won't lie about) for GP100 and GP104 and their respective shader unit counts.

If GP104 had the same ratio of FP64 units to FP32 units as a GP100, its FP32 performance would probably be just above a 980, and well below a 980Ti, not above a Titan X.

AFAIK P100 has 10.6TF single precision, 1080 8.87TF - 3584 CUDA cores v 2560 - I'll be surprised if nVidia doesn't just knock out some FP64 units and jack up the frequency like they always have. (I highly suspect a GP102 like core will have some combination of clock speed and shader units that give around 12.9TF single precision).

The FP64 units are not counted in those numbers.

D.P. · 11 May 2016 at 01:04

pmc25 said:
I guarantee it wasn't related to huge die GPUs until NVIDIA tendered a contract. It was all about smaller FPGAs at very low volumes and extraordinary margins (even relative to GP100).

Nvidia had the choice of going with TSMC, UMC or others. There is a good reason they went with TSMC.

AMD the size of the interposer is fairly irrelevant AFAIK.

drunkenmaster · 11 May 2016 at 01:18

D.P. said:
Nvidia had the choice of going with TSMC, UMC or others. There is a good reason they went with TSMC.

AMD the size of the interposer is fairly irrelevant AFAIK.

It's not at all irrelevant, these aren't full chips but only very few layers compared to any standard chip, that means strength of the chip is dramatically weakened. Pcb's and packages bend, they need to withstand that flexing, maintain connections and not break. The bigger the chip the more susceptible it is to problems. Manufacturing directly isn't too much of an issue over 1000mm^2 is big but more than doable, the circuitry is easy, redundancy is used and hbm2 chips can be recovered and use alternate lines if some are broken and there is very little reason to make anything on the limits of production capability and these are very simple structures. It's post production, packaging, installing a heatsink, shipping and installing in a pci-e slot where it can easily bend where the problems with large interposers become a huge problem.

If the story about bump problems is accurate with GP100 it would almost certainly be down to bending from installation and potential contracting and bending through heating/cooling that would cause a problem for the bumps on the interposer.

Mtom · 11 May 2016 at 07:23

Its is known from last year that the most tricky part of the package is the interposer.
Also there was another thing AMD faced with Fiji. They took a good chip, good memories, put on an interposer and it didn't worked together. So there is plenty of mines to step on with the interposer.

Not to mention if they put together a set, and suddenly one of the memory chips fails, thats 3 more chip + a gpu + interposer goes to trash.

flopper · 11 May 2016 at 08:56

Vega is coming shiner brightly than Polaris
I am star struck

JediFragger · 11 May 2016 at 09:05

flopper said:
Vega is coming shiner brightly than Polaris
I am star struck

If Polaris is brighter than a thousand Suns, how many is Vega floppy?? :eek:

Ayahuasca · 11 May 2016 at 09:11

You know you're struggling when flopper is already writing Polaris off