• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

Nvidia news direct from Jen-Hsun Huang!

That's cool, but just out of curiosity, why doesn't fermi suck in terms of performance to size/cost/heat/power etc?

Is it purely because DM doesn't offer tangible evidence in your opinion, or do you have evidence to the contrary? I'm not trying to start a row here, I actually am interested.

I don't think Duff Man ever said that Fermi isn't bad in terms of performance to size/cost/heat/etc.

However, drunkenmaster is making sweeping statements about the quality of Nvidia's design. Yes, it isn't efficient on TSMCs 40nm process, but we don't know that it isn't a great design on another 40nm process, or 32 or 28nm.
 
That's cool, but just out of curiosity, why doesn't fermi suck in terms of performance to size/cost/heat/power etc?

Is it purely because DM doesn't offer tangible evidence in your opinion, or do you have evidence to the contrary? I'm not trying to start a row here, I actually am interested.

Certainly Fermi is less than optimal when it comes to the performance-per-watt stakes. There is no real argument there... :p


--- an aside about the design and operation of Fermi ---

However, consider that Fermi is a radical redesign of the GPU process. The way that data is passed through the GPU allows for much bigger geometry throughput (nvidia's horribly named "polymorph engine"), allowing (among other things) excellent tessellation performance. This redesign also allows for much more effective scheduling of parallel kernels, which allows a lot more efficient solution in a number of GPGPU applications (though admittedly does not help too much in gaming).

More importantly, the layout of the Fermi design offers great scalability. I'm venturing very slightly into conjecture here (which is one of the things I slammed DM for :p) but the highly segregated design, with largely independent "quads" connected by a L2 cache, should allow upscaling to larger GPU sizes with near-linear improvements in performance. Of course, at 40nm this simply isn't viable given the already gargantuan size of the die.

In contrast, the design used by ATI will suffer increasingly from diminishing returns as it is scaled up in size. Now don't think that I'm criticising ATI here; I would argue that, given 20:20 hindsight, their design path was the correct one for this generation (as it gave them better performance per die-size and allowed them to get to market much more quickly). But certainly they will need to perform a radical redesign for the next generation (not the 6800 series which is more of a refresh). I think we will see some radical changes in the Northern Islands architecture planned for 2011, and who knows, perhaps they will suffer a similar set of Fermi-esque issues.

------


Anyway... I'm beginning to get away from the basic point, which is that Fermi is a radical redesign of the GPU architecture, which will be improved upon fopr many generations to come. It has not gone as planned by nvidia, but such radical design changes rarely go without issue (consider the nvidia FX5800, or the ATI 2800). There is every reason to assume that a refined design on a smaller manufacturing process will offer great performance.

To simply say "performance per watt and per transistor is poor, therefore the entire architecture sucks" is lazy and short-sighted. The same could have been said about either of the two examples above, and would have been equally wrong. Nvidia was rushed to market (thanks to excellent work by AMD), and had to contend with a very poor 40nm process at TSMC. Only time will tell the true power of the Fermi-architecture.
 
Last edited:
Physx can give you an anwser to 1decimal point, or 50, you wouldn't see the difference in the end result, the bullet took 0.002 or 0.002586769 seconds to get to the target, with games "close enough" is WAY more than good enough.

Some stuff you have to use as high precision as possible or you get odd things happening like objects not colliding when they should - but most of the time PhysX doesn't use that high a precision when it can get away with it generally only 3 decimal places.

The only real design issue relating to Fermi is that its not well suited to the process in most other regards its a very good design... as I've mentioned a few times nVidia didn't really have much choice in the matter it was go with the design and shoe horn it in best they could or delay even more months coming up with something from scratch.

As duff-man said above the current ATI design suffers from diminishing returns as you scale it up, which is why with the 6 series they've concentrated on improving efficency rather than out right ramping it up for performance.
 
Last edited:
As duff-man said above the current ATI design suffers from diminishing returns as you scale it up
This may be true, but it doesn't lead to this:
which is why with the 6 series they've concentrated on improving efficency rather than out right ramping it up for performance.
The main reason they aren't 'ramping it up for performance' is because more performance = more die size = more expensive chips. What seems most likely is an increase in efficency coupled with a smaller size increase to give a significant, but not earth shattering, boost in overall performance while retaining a healthy profit margin.

What's the point of making an architecture that scales brilliantly if it's so big you can't take advantage of it? The current ATi arch might not scale quite as well but I'd bet that if they made a chip as big as a GF100 it would outperform it easily.
 
Last edited:
Anyway... I'm beginning to get away from the basic point, which is that Fermi is a radical redesign of the GPU architecture, which will be improved upon fopr many generations to come. It has not gone as planned by nvidia, but such radical design changes rarely go without issue (consider the nvidia FX5800, or the ATI 2800). There is every reason to assume that a refined design on a smaller manufacturing process will offer great performance.
Maybe, but in that still makes it bad architecture now. By the time they release a Fermi based chip on 22nm they will be whole generation behind ATi, who are taking a more iterative approach (by using parts of their 22nm design on the 6000 series) and could have at least as good an architecture for 22nm by then.

To simply say "performance per watt and per transistor is poor, therefore the entire architecture sucks" is lazy and short-sighted.
No offense, but that's just ridiculous. What makes a good architecture besides performance per transistor and performance per watt? Remember that the GF100 is more than 1.5x the area of Cypress for what, a 20-30% performance boost in general? Even if the architecture was on 32nm it would still be larger than Cypress. It is unequivicably a bad architecture for a gaming card at this time, partly because they put too many GPCPU eggs in one basket.

The same could have been said about either of the two examples above, and would have been equally wrong. Nvidia was rushed to market (thanks to excellent work by AMD), and had to contend with a very poor 40nm process at TSMC. Only time will tell the true power of the Fermi-architecture.
What, were nVidia not planning on releasing new chips until 32nm came online (with a delay after that before cards came out)? Were they expecting ATi to just sit on their hands? They've been pretty boneheaded recently, but I find it hard to believe that they were that stupid.

Plus there is no indication that the 32nm process would have been less problematic for them, since a major problem seems to have been an inability or unwillingness to design around the problems in the process, as ATi did. ATi did have more experience on the process, but even if nVidia had the same information they would have had difficulty replicating fixes like the double-vias because their chips were already too big.
 
Last edited:
so in one breath you saying that you cant believe that Nvidia are that stupid and then in the next your saying they are that stupid ......i mean they only spent a couple of billion developing the whole Fermi architecture I'm sure they just slapped it together and hoped it would work.

just because Jen-Hsun Huang comes across as a bit of a plonker doesn't mean that all the engineers who actually design and make the chip don't know what they are doing.
but of course all the armchair critics will undoubtedly tell me I'm wrong anyway.:rolleyes:
 
No offense, but that's just ridiculous. What makes a good architecture besides performance per transistor and performance per watt?

A GPU architecture is about more more than just the present generation. It must be capable of scaling to far bigger designs in the future, carrying over the basic design to larger GPUs (in terms of transistor count), and smaller manufacturing processes. GPU design must take a long term view, because it is far too research intensive to completely overhaul the design for every generation / iteration. As I pointed out, this does not always lead to a great card at the first iteration; take the ATI 2800-series for example. Hot, inefficient, and produced on too large of a process to take advantage of its strengths, it couldn't compete with the 8800-series. But, two generations later, with very minor changes to the architecture, it has evolved into the 5800 series - a highly efficient GPU.

In short; you can't judge a GPU architecture entirely based on its first iteration. If that were the case, the cypress architecture would have been written off in its infancy (the 2800).


By the time they release a Fermi based chip on 22nm they will be whole generation behind ATi, who are taking a more iterative approach (by using parts of their 22nm design on the 6000 series) and could have at least as good an architecture for 22nm by then.

Absolutely not - if anything it will be the other way around. Nvidia have already gone through an iteration of their new base-architecture, whereas AMD are still working on theirs (Northern Islands). The next iteration of GPUs (on 28nm not 22nm by the way), will see a second generation nvidia architecture (Kepler) competing with a first-generation AMD one (Northern Islands). That's not to say nvidia will automatically "win" that round, but you can't prejudge the outcome.

Remember, cypress and Fermi are two completely different beasts. Cypress is the penultimate iteration of a tried-and-tested formula, which is reaching the end of its lifespan, whereas Fermi is the first iteration of a new architecture with many years ahead of it.

Nvidia and AMD are out of sync with their developments; something which hasn't been the case for a while. But it is nvidia who have jumped onto the 'redesign' ship first. Should they have simply pushed out a scaled-up GT200 instead of Fermi, to compete with Cypress? Perhaps... I imagine that given 20:20 hindsight they would have taken this route and delayed Fermi until late 2010 or 2011. But NONE of this changes the fact that Fermi is a radically different and highly scalable architecture that will form the basis of nvidia GPUs for the next five years. Dismissing it entirely based on its current performance is no different to dismissing the last three generations of AMD GPUs based on the performance of the 2800.



As for the rest of your post...

Plus there is no indication that the 32nm process would have been less problematic for them

You have absolutely no reason to believe that. Without knowing the intricacies of the GPU design, or what it is on the micro-scale that causes the high power useage, no-one does. This is an assumption pure and simple. You will need to wait for the next round of GPUs in 2011 to see this confirmed or dismissed.
 
Last edited:
A GPU architecture is about more more than just the present generation. It must be capable of scaling to far bigger designs in the future, carrying over the basic design to larger GPUs (in terms of transistor count), and smaller manufacturing processes. GPU design must take a long term view, because it is far too research intensive to completely overhaul the design for every generation / iteration. As I pointed out, this does not always lead to a great card at the first iteration; take the ATI 2800-series for example. Hot, inefficient, and produced on too large of a process to take advantage of its strengths, it couldn't compete with the 8800-series. But, two generations later, with very minor changes to the architecture, it has evolved into the 5800 series - a highly efficient GPU.

Whilst I agree with your overall point, I think it's unfair to say that the 5800 series only underwent 'very minor' architectural changes compared to the 2900 series considering RV770 was significantly different from R600 in itself having replaced the ring bus with a crossbar memory architecture, the raster operators got fixed, and of course support for a completely new memory technology which allowed the 4800 series to be competitive with a much narrower memory bus than the GTX200 series. I think that whilst you're right, the Fermi architecture is probably going to improve, it's not going to come without some kind of significant design improvement.
 
You have absolutely no reason to believe that. Without knowing the intricacies of the GPU design, or what it is on the micro-scale that causes the high power useage, no-one does. This is an assumption pure and simple. You will need to wait for the next round of GPUs in 2011 to see this confirmed or dismissed.
The fact that it was cancelled suggests to me that it wasn't going to be an amazing process. Anyway, I was actually attempting to counter your assumption that it would have been so much better that it would have completely saved Fermi. There is no evidence of that either.

As to the rest of your post: From my perspective I see that in order to be competetive again (in terms of making money on GPUs) nVidia will have to start releasing smaller GPUs like Cypress that are still competetive in performance. If you naively took a GF100 and reduced it to 28nm (not sure where I got 22 from), it would still be almost Cypress sized for no extra performance. The architecture might be great on paper but currently it's grossly inefficent compared to the competition and until proof has been shown that it can be fixed that makes it a bad architecture in my book.

So nVidia have to dramatically increase the efficency of the Fermi arch, by at least 1.5 times, in order to be competetive on 28nm while hoping that ATi's new architecture isn't even better than that. Personally I don't think they'll be able to do it, but I guess we'll wait and see :)
 
Last edited:
Duff-man, you mean the 2900....

Yes, I do - thanks :p Another brainfart...



Whilst I agree with your overall point, I think it's unfair to say that the 5800 series only underwent 'very minor' architectural changes compared to the 2900 series considering RV770 was significantly different from R600 in itself having replaced the ring bus with a crossbar memory architecture, the raster operators got fixed, and of course support for a completely new memory technology which allowed the 4800 series to be competitive with a much narrower memory bus than the GTX200 series. I think that whilst you're right, the Fermi architecture is probably going to improve, it's not going to come without some kind of significant design improvement.

Certainly :)

I don't mean to give the impression that scaling up a GPU is going to be straightforward (nothing is simple when you're dealing with designs that have 1Bn+ transistors). There will always be improvements, tweaks, and adjustments based on lessons learned (replacing of the ringbus certainly falls under the catagory of "lessons learned"), as well as compromises to keep design costs down.

But I'm sure you will agree that the base operation of the GPU (i.e. the way it handles floating point arithmetic through the clustering of 'stream processing units' into shader cores and SIMD cores) has remained very similar from the 2900 through to the 5*** series (and almost certainly the 6*** series as well). In comparative terms, the design has undergone minor changes compared to the jump made from x1900 to 2900, where the entire ethos of the pipeline was reworked.
 
Like Duff-Man said, we have no idea if Fermi would have performed better on a smaller process, so we really can't slam it for being a bad design.

Hmm, I think even DM believes it will perform better on a smaller process, but that's the problem, Nv are fabless so will never be a generation ahead in process technology.
Cypress was a mediocre unbalanced design as it simply didn't scale well with transistor doubling, it however just happens to look good, because wait for it...
Fermi was a bad design, bad for the process and bad at graphics, but good at compute relative to the competition, but that is irrelevant for >95% of us here as we are mostly gamers not crunchers.

GF104 is an improvement with the reduction of non-essential gaming transistors, but it's still behind in efficiency compared to Cypress which itself is not particularly efficient. It is flawed by DESIGN not by the process, what Fermi does in software Cypress does in hardware. Functions implemented in hardware always beat software solutions in performance and power per mm^2. That is a well known fact, and was why Larrabee was hot and failed to be competitive against it's competition that was using fixed function hardware.

Fermi needs some serious improvement in order to avoid a similar repeat of 40nm on 28nm.
To begin with who actually thinks 28nm (TSMC) will be any less leaky than 40nm?
IMHO if Nv wish to create a HPC market, it needs to build dedicated chips, one architecture for computing and another for graphics, not a Jack of all trades, as Jack is going to fail miserably in the trade that matters most to him, his bread and butter.
They tried to do this with GF104 (I don't think I even need to mention GF106), but it still can't compete economically against Cypress in terms of performance/efficiency per mm^2 as it's still a compute chip.

Performance per mm^2 = cost
Power consumption per mm^2 = efficiency



Fermi simply can't compete with the above two factors in the graphics realm, other following architectures will most probably suffer similar fates if they continue in the direction of Fermi, i.e. 'one chip/architecture to do it all' approach.


All in all I find it ironic that the people shooting down DM claiming he doesn't know what he's talking about, citing he isn't an "electrical engineer" (GPU Architect), clearly knows less than he does.
 
Last edited:
The fact that it was cancelled suggests to me that it wasn't going to be an amazing process.

It's important to point out that the process was cancelled at TSMC (one particular semiconductor processing plant) because of internal technical issues. It has nothing to do with the quality of the process itself. Intel have been using a 32nm process at a variety of plants since 2009, and the latest intel CPUs all use this process.

TSMC has had a lot of manufacturing issues recently, including quality issues at 40nm. Both AMD and nvidia are now making moves to expand their range of manufacturing facilities away from TSMC.


From my perspective I see that in order to be competetive again (in terms of making money on GPUs) nVidia will have to start releasing smaller GPUs like Cypress that are still competetive in performance.

...


Regarding the size issue; I'm sure that nvidia would like to reduce their die-size if possible, but I don't think it's the real crux to "making money from GPUs". Even at GF100 sizes (24mm*24mm) the cost per chip is around $50. This is a small fraction of the sale cost, and reducing the die size by half would only save a proportionally small amount of even the overall manufacturing cost.

The more important issue, as far as cost is concerned, is yield. A more mature and well-established design (such as cypress) is far less likely to suffer from yield issues than a brand new architecture (such as Fermi), as sensitivities and likely issues will have been somewhat weeded out from experience with previous iterations. AMD may well suffer similar yield issues with their brand-new northern islands architecture (pure conjecture I admit...), but hopefully they will not suffer from the same poor manufacturing issues that nvidia had to contend with during the manufacture of Fermi.

The final point I will make is that improving scalability (usually the primary reason behind a radical redesign) will always require more command-and-control logic, in order to handle and redistribute the expected increase in the data flow. This is fundamentally what allows further design upscaling, but also demands more transistors. I'm not suggesting this is solely reponsible for Fermi's large die-size, but certainly it plays a large part. If you take a look at a photo of the Fermi die you can see how much space is taken up with cache and control logic. AMD will not be immune from this requirement either, when they move to their new design (NI).
 
Last edited:
Ejizz;

As I have pointed out, performance per mm^2 is only related to the cost of the die, and this is a small part of the retail sales price.

Performance per watt is strongly related to the specifics of the architecture, and how it operates on a particular process. It is a tricky beast.


As for the rest - I have made the point many times now that Fermi is the first generation of a new design paradigm, whereas Cypress is towards the end of an older design process (read my previous posts for more detail). In this sense, they are not competing on the same field when it comes to price per mm^2 of die. You cannot add the extra command and control logic required to scale an architecture for several generations over the next five years without adding a significant amount of extra transistors.

I'm not going to respond to your final comment, which is pure baiting. I will let the various posts speak for themselves.
 
It's important to point out that the process was cancelled at TSMC (one particular semiconductor processing plant) because of internal technical issues. It has nothing to do with the quality of the process itself. Intel have been using a 32nm process at a variety of plants since 2009, and the latest intel CPUs all use this process.

TSMC has had a lot of manufacturing issues recently, including quality issues at 40nm. Both AMD and nvidia are now making moves to expand their range of manufacturing facilities away from TSMC.
Oh come on. Intel's 32nm process has nothing to do with TMSC's process bar the size. And nVidia isn't going anywhere for GPU production, they will be relying on TSMC for the forseable future wheras ATi/AMD will soon have a choice of fabs.

Regarding the size issue; I'm sure that nvidia would like to reduce their die-size if possible, but I don't think it's the real crux to "making money from GPUs". Even at GF100 sizes (24mm*24mm) the cost per chip is around $50. This is a small fraction of the sale cost, and reducing the die size by half would only save a proportionally small amount of even the overall manufacturing cost.

The more important issue, as far as cost is concerned, is yield. A more mature and well-established design (such as cypress) is far less likely to suffer from yield issues than a brand new architecture (such as Fermi), as sensitivities and likely issues will have been somewhat weeded out from experience with previous iterations. AMD may well suffer similar yield issues with their brand-new northern islands architecture (pure conjecture I admit...), but hopefully they will not suffer from the same poor manufacturing issues that nvidia had to contend with during the manufacture of Fermi
You're completely ignoring the fact that yeild is also inversely proportional to die-size. The bigger the chip the lower the yield, by an ever increasing amount. It is also widely believed that some of the yield issues were also due to nVidia not designing around the limitaions of the process, which doesn't exactly inspire confidence in them being able to migrate successfuly to the next node.

Your $50 figure for GF100 cost is also high dubious, since it takes no account of the lower yields or the fact that bigger, more power hungry chips require more expensive power circuitry on the PCBs and more expensive cooling solutions. GF100 is considerably more expensive per chip than Cypress and this cuts directly into nVidia's profits and limits their competetiveness regarding pricing.
 
Last edited:
Oh come on. Intel's 32nm process has nothing to do with TMSC's process bar the size.

... This is precisely the point I was making :confused: There is nothing wrong with 32nm as a process, only the implementation at TSMC...

You're completely ignoring the fact that yeild is also inversely proportional to die-size. The bigger the chip the lower the yield, by an ever increasing amount.

You need to separate out yield issues which are due to straightforward manufacturing faults, and those that are due to over-aggressive design (a design which is too sensitive to the tiny natural variation in the manufacturing process). The former is a statistical process which is entirely down to the quality of the plant, the latter can be reduced by improved design. Only the basic manufacturing faults are directly related to die-size (inversely proportional to area).

On a well established and high-quality process, the manufacturing faults are minimal, but design sensitivity issues will still remain, and will often require several iterations ("respins") in order to remove. These cause massive delays in bringing a product to market, and lead to consistent poor yields (again I can make parallels with the 2900). With a well established base design, based on past experience, the number of "respins" can be dramatically reduced, and a good yield can be obtained relatively quickly.

Your $50 figure for GF100 cost is also high dubious, since it takes no account of the lower yields or the fact that bigger, more power hungry chips require more expensive power circuitry on the PCBs and more expensive cooling solutions.
That's another can of worms entirely. It would be interesting to see a breakdown of manufacturing costs for different cards so we can stop talking in vague terms. I don't have time to search for this right now.

...And I do recall saying that yield issues were far more important than die-size (though the two are admittedly interrelated).


Anyway, we will need to continue this debate later - I have to get back to work now.
 
Last edited:
The engineering philosophies behind both products are completely different.
The 5000 series is stuffed full of ASICs.
The GF100 series is designed around general compute.

We can prove two things;
1) AMD has a better understanding of where their market is, and what their customers want.
The stalling failure of GF100 in the face of AMD's products is plain for all to see. The GTX460 is the only real success they've had this generation.
AMD on the other hand have been washing themselves in hot tubs full of money and chicks.

2) Fermi is not a bad architecture.
The technology used to produce current designs is bad at working with GTX480 sized chips. Besides the problems of producing it, there is a simpler mechanical problem; cooling it. Big chip, lots of heat, needs cooling.
On the other hand, working with GTX460 sized chips, it produces a great chip at a reasonable price. Oh and whisper quiet too.

Where does that leave the two teams?
AMD continue focusing their markets and creating more product differentiation with a strong understanding of what their customers want. Maybe a gentle nod towards OpenCL, but AMD's core market are graphics.

Nvidia race towards the next process node at full pelt. Hoping along the way they can produce more efficient designs of their current solutions to recoup some money from this generation.
It's tough trying to establish a market, especially when someone like AMD or Intel can jump into that market after you've done all the hard work...
 
Your $50 figure for GF100 cost is also high dubious, since it takes no account of the lower yields or the fact that bigger, more power hungry chips require more expensive power circuitry on the PCBs and more expensive cooling solutions. GF100 is considerably more expensive per chip than Cypress and this cuts directly into nVidia's profits and limits their competetiveness regarding pricing.

+1
Not taking into account these issues when arguing chip costs is simply misleading and is a duff argument.
 
Not taking into account these issues when arguing chip costs is simply misleading and is a duff argument.

Nice turn of phrase :D

But seriously, we need to be able to quantify this supposed increase in cost before we can assign any relevance to it. If the difference is $5, then it's relatively insignificant. If it's $25 then it makes a big difference - maybe more than the ~$20 extra in die manufacturing costs. You could also look at aspects such as PCB board length, and the cost of the heatsink / cooler.

The point is, there are many aspects to manufacturing complex devices, and without a cost breakdown we can't really comment on how these impact production costs and/or profit margins.
 
The engineering philosophies behind both products are completely different.
The 5000 series is stuffed full of ASICs.
The GF100 series is designed around general compute.

We can prove two things;
1) AMD has a better understanding of where their market is, and what their customers want.
The stalling failure of GF100 in the face of AMD's products is plain for all to see. The GTX460 is the only real success they've had this generation.
AMD on the other hand have been washing themselves in hot tubs full of money and chicks.

2) Fermi is not a bad architecture.
The technology used to produce current designs is bad at working with GTX480 sized chips. Besides the problems of producing it, there is a simpler mechanical problem; cooling it. Big chip, lots of heat, needs cooling.
On the other hand, working with GTX460 sized chips, it produces a great chip at a reasonable price. Oh and whisper quiet too.

I agree with most of what your saying but I want to pull you up on this.

The technology used to produce current designs is bad at working with GTX480 sized chips.

That's no excuse, if Nvidia done there homework before jumping headlong into 40nm they would have known about the problems with TSMC's 40nm process being able to print the size of dies that Nvidia had designed.

If you recall the first 40nm video card was the Hd4770 and it was reported on news sites not long after launch that there were supply issues as result of poor yields on the new 40nm process at TSMC. AMD were well aware of the issues at TSMC (needless to say Nvidia would have known about this as well) and responded accordingly with there 5000 series. Again as it was known that the 40nm was broken that early why didn't Nvidia revise Fermi because its not as if they didn't have the time as the first HD4770 came out in April 2009 a full year before the GXT480 and GTX470.
 
Back
Top Bottom