Fermi possibly castrated?

Gerard · 31 Oct 2010 at 22:11

http://www.nordichardware.com/news/71-graphics/41570-the-geforce-gtx-580-mystery-is-clearing.html

We have had a hard time seeing how NVIDIA would be able to activate its sixteenth SM unit without severe problems with the power consumption. But with GF110 NVIDIA made an active choice and sacrificed the HPC functionality (High Performance Computing) that it talked so boldly about for Fermi, not only to make it smaller but also more efficient

So it seems all that computing power they were trumpeting about for Fermi may be gone...Seems the boasting lasted longer than the actual end product. :confused:

Rroff · 31 Oct 2010 at 22:22

Hmm... its not like theres a block sitting there dedicated to GPGPU functionality... as some people seem to think.

An Exception · 31 Oct 2010 at 22:26

I thought JHH would start shouting retreat eventually...
Really Nvidia needs two architecture design teams working in parallel, one focused on graphics and the other on GPGPU.
The only problem with that is the GPGPU market isn't large enough to warrant it's own architecture and the resource investment required to make it work.

An Exception · 31 Oct 2010 at 22:28

Rroff said:
Hmm... its not like theres a block sitting there dedicated to GPGPU functionality... as some people seem to think.

No, but the article states that it allowed Nvidia to reduce the transistor count by 300mil, so something has been removed obviously...

Lightnix · 31 Oct 2010 at 22:46

Rroff said:
Hmm... its not like theres a block sitting there dedicated to GPGPU functionality... as some people seem to think.

Yeah but there are things that must cost some amount of transistors in GF100 that could be potentially done away with in a chip aimed just at the gamer's market. For example half-rate double precision; that was paired down in the consumer variants of GF100 anyway, at the end of the day it must have some wasted transistors there. Multiple kernel execution, not something that has proven benefits in the consumer market. Large amounts of cache (64KB of L1 cache per SM vs 16KB in GT200), again great for CUDA, but there's no evidence it helps in a gaming context.

Rroff · 31 Oct 2010 at 22:48

Yeah some odd bits here and there but the GPGPU functionality is for the most part intrinsic to the design. I can see them maybe lopping some cache, doubt they'd hit the DP again... few odd other changes but theres nothing much they can dramatically change without a major redesign.

Broken Hope · 31 Oct 2010 at 23:39

Eh Fermi was already castrated, it's called a GTX 480, the GTX 580 should be around what Fermi was supposed to be months and months ago. Apparently the November launch will just be a paper launch with no cards this year though.

Yurihyuga · 1 Nov 2010 at 00:05

Nvidia could surprise us with a november release who knows but one things for sure I am not banking on it. With the full use of the fermi architecture though thats gonna be very very interesting and im gonna be waiting till I see benches on the 580 vs the 6900 series cards.

drunkenmaster · 1 Nov 2010 at 00:14

Meh, they artificially disable most of the DP power for non Quadro/Tesla cards anyway to stop people just buying a "cheat" 480gtx instead of a 5 times the cost Tesla version, for a gaming device, i'd wager 99.999999% of people buying them wouldn't notice nor care.

As Rroff said though, by far the biggest transistor count in DP/GPGPU functionality, is the shaders themselves, most often its combining a couple of shaders to do a higher precision calculation, but there will be a few bits of logic associated with combining them, not much, I wouldn't be entirely surprised if that was where the saving came from. I also wouldn't be surprised if the cache was gone but, I don't know if the cache was really a DP/GPGPU only usage cache, or something graphics uses, does it have separate tiny cache on each cluster/SM/whatever Nvidia call it aswell or did they just centralise it.

Unfortunately for Nvidia, 300mil(if its true thats what they've cut down) isn't a whole hell of a lot, its 10% of the core give or take, which would only drop it down from 530mm2 to 480mm2 or so, which is still frankly way to big to give great yields, though the difference between embarassing supply of 512sp parts, and not quite as embarassing supply of 512sp parts is a fairly fine line.

Though, next gen part, 580gtx, with a respun bug fixed GF100, with functionality removed, less features up a generation?

drunkenmaster · 1 Nov 2010 at 00:20

Yurihyuga said:
Nvidia could surprise us with a november release who knows but one things for sure I am not banking on it. With the full use of the fermi architecture though thats gonna be very very interesting and im gonna be waiting till I see benches on the 580 vs the 6900 series cards.

Why do people act like its a massive deal, the difference between 480 and 512 shaders, its 6.6% more shading power, so unless you have Raven's card which speeds up more than he overclocks, you'd be exceptionally lucky to see 6% improved performance from the extra shaders. Likewise, between a 7 and 10% clock bump depending on the source of info, you're really looking at if its all but identical to a GF100, if lucky 15% more speed.

The more we here about how confident AMD are over the 6970, the more I think they've gone fairly large, not insane, just bigger than the 5870 core, but with at least 35% increase in efficiency, I think we're looking at 45% faster than a 5870 now at least. WHich frankly while stuck on 40nm would be phenomenal.

An Exception · 1 Nov 2010 at 00:23

480mm2 probably won't be that much larger than Cayman...

Rroff · 1 Nov 2010 at 00:29

drunkenmaster said:
Meh, they artificially disable most of the DP power for non Quadro/Tesla cards anyway to stop people just buying a "cheat" 480gtx instead of a 5 times the cost Tesla version, for a gaming device, i'd wager 99.999999% of people buying them wouldn't notice nor care.

As Rroff said though, by far the biggest transistor count in DP/GPGPU functionality, is the shaders themselves, most often its combining a couple of shaders to do a higher precision calculation, but there will be a few bits of logic associated with combining them, not much, I wouldn't be entirely surprised if that was where the saving came from. I also wouldn't be surprised if the cache was gone but, I don't know if the cache was really a DP/GPGPU only usage cache, or something graphics uses, does it have separate tiny cache on each cluster/SM/whatever Nvidia call it aswell or did they just centralise it.

Unfortunately for Nvidia, 300mil(if its true thats what they've cut down) isn't a whole hell of a lot, its 10% of the core give or take, which would only drop it down from 530mm2 to 480mm2 or so, which is still frankly way to big to give great yields, though the difference between embarassing supply of 512sp parts, and not quite as embarassing supply of 512sp parts is a fairly fine line.

Though, next gen part, 580gtx, with a respun bug fixed GF100, with functionality removed, less features up a generation?

The article says something like "possibly upto 300 million" I don't think they know anything and are guessing.

Each SM has 64KB - if they dropped that back a bit it probably wouldn't hurt rendering performance tho you'd see quite a hit in certain types of compute tasks... but off the top of my head you'd have to drop that back to about 8-12KB to possibly stand the chance of lighting up another SM.

drunkenmaster · 1 Nov 2010 at 01:45

Rroff said:
The article says something like "possibly upto 300 million" I don't think they know anything and are guessing.

Each SM has 64KB - if they dropped that back a bit it probably wouldn't hurt rendering performance tho you'd see quite a hit in certain types of compute tasks... but off the top of my head you'd have to drop that back to about 8-12KB to possibly stand the chance of lighting up another SM.

http://www.anandtech.com/show/2977/...tx-470-6-months-late-was-it-worth-the-wait-/3

Hmm, thats more what I thought, tbh I'm too tired to read that and make sense of it, I was under the impression a "cluster" of shaders would have a little cache, and it seems also each 32 shader unit has a little cache, and then it seems to have an L2 cache separately, I don't know if its actually a separate cache, or if its just each area maybe calls to a central cache and has part of it assigned, I'll read it tomorrow though a quick glance and I couldn't see what was being suggested.

I'll be interested/surprised if Nvidia have done anything to increase efficiency, but it looks more like they've worked as hard as possible just to shrink the damn thing to a size where they could at least finally release a 512sp part.

Problem is a 480gtx overclocked is comftably in a "580GTX" ballpark of performance, its looking increasingly likely that a 5870 overclocked won't come close to a 6970 in performance. If Nvidia actually lose the fastest card crown, or actually becomes a genuine dispute, with nothing new till 28nm is ready thats going to be another potential 4 quarters of even worsening problems.

drunkenmaster · 1 Nov 2010 at 02:04

Ejizz said:
480mm2 probably won't be that much larger than Cayman...

The problem is, yields go down exponentially as size increases, come in before the curve gets mental and you're fine, after or borderline, wave by by to your profits.

Profits are on the opposite exponential curve.

If a wafer is still $5k, if you get 10 chips working, they'll be $500 each, get 20 chips working, and they've dropped by half to $250, get 40 working and you're down to $125, get 80 working and its $62.50 a chip.

Thats the problem, the rumours of likely sub 10 or maybe 20 512sp full Fermi's off a wafer are likely true, or frankly they'd release them, several thousand wafers for a run and thats still 40-50k they could have made, they've release "ultra" type cards with lower numbers than that before.

So the difference between getting 40 chips and 80 off a wafer is pretty huge in terms of price, where you'll hit a profit, and where you can raise the price to make a dent in the billions you've spent on R&D.

Either way Cayman is supposed to be sub 400mm2, I can't remember what guess I put in for Cayman, I guess 255mm2 for the 6870/50 though, could not have been more correct for that, still a guess my Cayman guess could be way off.

From rumours, general yields of "big" chips in the past, issues Nvidia have had over the last 3 process's in terms of yields/cost/problems due to size, above 400mm2 isn't great, above 450mm2 is going to be bad, above 500mm2 is just ridiculous.

If Cayman goes above 400mm2, it won't be by much, or they've gotten some silly pricing from TSMC based off their potential to leave for GloFo very very soon. This is where the problem lies, current 5870 architecture/efficiency puts the GF100 about 60% bigger but at best 20% faster, whats going to happen when that gap reduces to 20% bigger if both chips had a similar architecture AMD would have a significant lead, if AMD have increased their efficiency dramatically Nvidia won't have a hope in hell.

Even ignoring that, lets call it at a guess, Nvidia are going to be 20% bigger(I think it will be at least 30% personally), take any number, it doesn't really matter, but say Nvidia get 40 chips off the wafer $5k /40 = $125, $5k / 48 = $104.

Thats COST, thats zero profit, thats making nothing at all and not even paying for the shipping to get them to be put on cards somewhere. We're talking billions in R&D from both companies, they tend to go for about a 100% profit, thats ideal, at which point its now $250 vs $208, a $42 larger price before you talk about power, memory everything else, and in this case, AMD would almost certainly have the performance advantage, $42 cheaper, or $42 more profit, and the faster card, its win win.

If its 30%, AMD end up gettign 30% more cores per wafer, which comes down to $96, basically 1/4 cheaper, and its still faster.

Thats also best case scenario, in the real world a 400mm2 will have marginally better yields than a 401mm2 core, a 400mm2 will have significantly higher yields than a borderline 480mm2 core, that could throw another 10% more core per wafer to AMD, which means its $125 vs $89.

That 80mm2 is HUGE, if the gaps bigger(385-395mm2 vs 500-510mm2, it just gets worse, and worse).

An extra £50 is fine for a card thats actually the fastest, thats how the market works, people will pay a premium, if its actually slower though, very very few people will go for a more expensive and slower card.

biuro74 · 1 Nov 2010 at 13:50

WHo wants to be a nVidia's engineer ? Constant job for you: do more, quicker, better and do satisfy millions of customers over the world ;-)

Broken Hope · 1 Nov 2010 at 19:00

kmufc77 · 1 Nov 2010 at 19:04

Looking at the rumoured specs of the 580 there dosent look like there is a massive difference compared to the 480?

I dont think ill be buying a 580 as i cant see a huge improvement over the 480 imo anyway,thats if the rumoured specs are correct.

blackninja · 1 Nov 2010 at 19:26

Broken Hope said:

What's the point of that without context? All done at 2560x1600 no doubt with 50xAA and mega tesselation where available

Competitor rules

Fermi possibly castrated?

More options

Gerard

Gerard

Rroff

Rroff

An Exception

An Exception

An Exception

An Exception

Lightnix

Lightnix

Rroff

Rroff

Broken Hope

Broken Hope

Yurihyuga

Yurihyuga

drunkenmaster

drunkenmaster

drunkenmaster

drunkenmaster

An Exception

An Exception

Rroff

Rroff

drunkenmaster

drunkenmaster

drunkenmaster

drunkenmaster

biuro74

biuro74

Broken Hope

Broken Hope

kmufc77

kmufc77

blackninja

blackninja