There’s been a lot of discussion on Fermi here recently, and rightly so since it’s a very interesting bit of technology
We don’t know for sure what the price or the performance will be like, but I think there are a few interesting things we can conclude from the stuff we’ve seen so far.
Anyway, here are my views on Fermi and other related GPU stuff, so have at it and feel free to tear them up
I know it's long, but consider this a replacement for 20 or so comments I never bothered to write in various threads
BTW in case anyone cares about this stuff, I have a 4870x2 right now but I've owned plenty of cards from both manufacturers in the past.
1 - Scalability:
A lot has been said about whether the Fermi design will be able to scale to produce mid-range and low-end GPUs. It’s true that architectural scaling might be more difficult than with GT200, or r800, but there’s nothing to suggest it’s going to be a huge problem. If you look at the schematic diagram for Fermi, it’s still highly modular. You mostly have a design with 32 largely self-contained blocks (each with 16 ‘cores’), and a fair amount of L2 cache spanning the whole lot. There’s nothing to suggest that nvidia can’t produce a design with 8 or 16 blocks instead (i.e. 128 or 256 ‘cores’ total), and a suitably reduced amount of cache. We will have to wait and see, but I certainly wouldn’t write Fermi off based on this.
2 - Impact of manufacturing failures:
One of the biggest problems with the kind of architecture that nvidia has designed is that it becomes more sensitive to faults in manufacturing. This is largely because a fair amount of physical die-space is given over to cache, and other bits and pieces that span across multiple processing blocks (or ‘streaming multiprocessors’ as nvidia calls them).
Consider a chip with, say, 10% die space given over to global structures, and 90% contained in the individual processing blocks. If a failure occurs in the one of the processing blocks, it can usually just be disabled, and the chip can be sold at a lower grade (GTX260 or whatever). If a failure occurs in one of the globally reaching structures (like the L2 cache), then the entire chip will be almost certainly be unusable.
If we instead use a different design with double the proportion of die area given over to global structures (i.e a 20% / 80% split) then, assuming a single fault occurs randomly, we double our chances of getting a dead chip from 10% up to 20%. If multiple faults occur then the odds become even more grim as the probabilities compound. Add to this that the GF100 chip is physically a lot bigger than the r800 (so we expect more faults per chip on average), and Fermi looks particularly vulnerable to the quality of the process at TSMC. Since TSMC is still reporting some issues with its 40nm process, this could turn out to be Fermi’s Achilles heel, at least in the short term.
3 - GPGPU programming:
Nvidia are clearly pushing the GPGPU side of things hard with Fermi. The biggest markets for this are scientific and financial modelling, and their various sub-groups. These are potentially really huge markets, and the calculations they are interested in are generally highly suitable for acceleration with GPUs (since they can largely be broken into small and independent processes). The problem is that writing code to work efficiently with a GPU is a different animal even to writing for a CPU cluster. While certain codes can be easily adapted to work with GPUs, most of the time you will need to write a large part of the software from the ground up with stream processing in mind - at least if you want to use the GPU efficiently.
Re-writing software to take advantage of GPUs is time consuming, and expertise with GPU programming is still quite rare, so there hasn’t really been the kick needed to really start the conversion en masse. There is a lot of interest, but very few people have actually made the switch so far (I work in scientific modelling by the way...). Fermi could well be the kick that gets things rolling, since the ability to execute native C++ code will allow easier conversion of existing algorithms, and should require a little less specialist knowledge in order to get started. On top of that, the good double precision performance and error checking on the memory fill in essentials that were missing from the previous generation.
I guess that nvidia are gambling on explosive growth in this sector over the next 2 or 3 years, and from my experience I’m pretty sure it’s going to happen. But still, it’s going to be a long time before the market for GPGPU applications matches that of gaming (way outside the lifetime of Fermi), so nvidia will need to stay competitive in the graphics market if they want to survive. I also wonder whether we will start to see two separate GPU designs in the next ‘real’ generation of nvidia cards...
4 - Efficiency:
In principle, Fermi is designed to handle a wide variety of data types more steadily and efficiently, whereas the r800 design is focused on raw number crunching (r800 has nearly double the floating point power of Fermi). Obviously the reason for this as a design choice is the GPGPU market, but I’m hoping that it could also lead to more consistent framerates in games. I guess this isn’t something we will know until release, but I can hope
Of course, another aspect of efficiency is the efficient use of transistors. I think it’s fair to assume that in terms of “performance per square mm of die space” r800 is going to be hands down the winner (as r700 was against GT200), so from that point of view it’s fair to say that ATIs is currently more efficient in terms of architecture. But, it’s still worth considering that a rigid design like the r7/800 is not going to achieve the scalability of a more dynamic and adaptable design like Fermi, as computational demand increases. So, when the next round of GPU designs comes around, ATI will be faced with an increasingly large bottleneck in efficiently feeding data to their increasing number of stream processors, whereas nvidia will have a new and flexible architecture to build upon.
5 - Physics:
One area where a Fermi-type architecture will really shine is in hardware physics. And no, I’m not talking about Nvidia’s highly pimped Physx (which I wish would die a quick and quiet death... closed standards won’t get us anywhere). The type of computations which occur in collision physics are generally more complex than those involved in shading pixels etc. More importantly, they require much greater connectivity between data – that is; the result of one part of the calculation may rely heavily on another part. This requires more sophisticated communication between threads if the calculation is to be done efficiently. You can see this in action by comparing the performance of current high-end GPUs against a CPU doing hardware physics as compared to rendering. A good current generation GPU may be 5 times faster or so than a quad-core CPU at hardware physics, but at more “simple” stuff like rendering it’s going to be well over 100 times faster.
Anyway, I guess my point is that the advanced scheduling and shared cache in Fermi should be ideal for performing hardware physics calculations. So, if developers actually start supporting a good open source physics API (like the one based on OpenCL that AMD was touting a few months back), and putting effort into designing physics features, then we could be in for some fun.
Overall:
I think that Fermi is a bold move by nvidia to establish themselves as the market leader in a new and expanding field, while maintaining their presence in a much bigger market. I don’t think there’s any doubt that Fermi will become the de-facto choice for scientific stream computing and other GPGPU applications, but whether it can compete with ATI in the graphics market will depend on a few things: The performance of the thing, its ability to scale down to the mid- and low-end market, and perhaps most importantly, the yields at TSMC. I’m quite confident it will perform well – probably better than most people are expecting, but as for the other two it’s going to be hard going for nvidia. I can only see them losing further market share to ATI this year and the next, but Fermi could still turn out to be a good investment in the long term.
Thanks for reading

Anyway, here are my views on Fermi and other related GPU stuff, so have at it and feel free to tear them up


1 - Scalability:
A lot has been said about whether the Fermi design will be able to scale to produce mid-range and low-end GPUs. It’s true that architectural scaling might be more difficult than with GT200, or r800, but there’s nothing to suggest it’s going to be a huge problem. If you look at the schematic diagram for Fermi, it’s still highly modular. You mostly have a design with 32 largely self-contained blocks (each with 16 ‘cores’), and a fair amount of L2 cache spanning the whole lot. There’s nothing to suggest that nvidia can’t produce a design with 8 or 16 blocks instead (i.e. 128 or 256 ‘cores’ total), and a suitably reduced amount of cache. We will have to wait and see, but I certainly wouldn’t write Fermi off based on this.
2 - Impact of manufacturing failures:
One of the biggest problems with the kind of architecture that nvidia has designed is that it becomes more sensitive to faults in manufacturing. This is largely because a fair amount of physical die-space is given over to cache, and other bits and pieces that span across multiple processing blocks (or ‘streaming multiprocessors’ as nvidia calls them).
Consider a chip with, say, 10% die space given over to global structures, and 90% contained in the individual processing blocks. If a failure occurs in the one of the processing blocks, it can usually just be disabled, and the chip can be sold at a lower grade (GTX260 or whatever). If a failure occurs in one of the globally reaching structures (like the L2 cache), then the entire chip will be almost certainly be unusable.
If we instead use a different design with double the proportion of die area given over to global structures (i.e a 20% / 80% split) then, assuming a single fault occurs randomly, we double our chances of getting a dead chip from 10% up to 20%. If multiple faults occur then the odds become even more grim as the probabilities compound. Add to this that the GF100 chip is physically a lot bigger than the r800 (so we expect more faults per chip on average), and Fermi looks particularly vulnerable to the quality of the process at TSMC. Since TSMC is still reporting some issues with its 40nm process, this could turn out to be Fermi’s Achilles heel, at least in the short term.
3 - GPGPU programming:
Nvidia are clearly pushing the GPGPU side of things hard with Fermi. The biggest markets for this are scientific and financial modelling, and their various sub-groups. These are potentially really huge markets, and the calculations they are interested in are generally highly suitable for acceleration with GPUs (since they can largely be broken into small and independent processes). The problem is that writing code to work efficiently with a GPU is a different animal even to writing for a CPU cluster. While certain codes can be easily adapted to work with GPUs, most of the time you will need to write a large part of the software from the ground up with stream processing in mind - at least if you want to use the GPU efficiently.
Re-writing software to take advantage of GPUs is time consuming, and expertise with GPU programming is still quite rare, so there hasn’t really been the kick needed to really start the conversion en masse. There is a lot of interest, but very few people have actually made the switch so far (I work in scientific modelling by the way...). Fermi could well be the kick that gets things rolling, since the ability to execute native C++ code will allow easier conversion of existing algorithms, and should require a little less specialist knowledge in order to get started. On top of that, the good double precision performance and error checking on the memory fill in essentials that were missing from the previous generation.
I guess that nvidia are gambling on explosive growth in this sector over the next 2 or 3 years, and from my experience I’m pretty sure it’s going to happen. But still, it’s going to be a long time before the market for GPGPU applications matches that of gaming (way outside the lifetime of Fermi), so nvidia will need to stay competitive in the graphics market if they want to survive. I also wonder whether we will start to see two separate GPU designs in the next ‘real’ generation of nvidia cards...
4 - Efficiency:
In principle, Fermi is designed to handle a wide variety of data types more steadily and efficiently, whereas the r800 design is focused on raw number crunching (r800 has nearly double the floating point power of Fermi). Obviously the reason for this as a design choice is the GPGPU market, but I’m hoping that it could also lead to more consistent framerates in games. I guess this isn’t something we will know until release, but I can hope

Of course, another aspect of efficiency is the efficient use of transistors. I think it’s fair to assume that in terms of “performance per square mm of die space” r800 is going to be hands down the winner (as r700 was against GT200), so from that point of view it’s fair to say that ATIs is currently more efficient in terms of architecture. But, it’s still worth considering that a rigid design like the r7/800 is not going to achieve the scalability of a more dynamic and adaptable design like Fermi, as computational demand increases. So, when the next round of GPU designs comes around, ATI will be faced with an increasingly large bottleneck in efficiently feeding data to their increasing number of stream processors, whereas nvidia will have a new and flexible architecture to build upon.
5 - Physics:
One area where a Fermi-type architecture will really shine is in hardware physics. And no, I’m not talking about Nvidia’s highly pimped Physx (which I wish would die a quick and quiet death... closed standards won’t get us anywhere). The type of computations which occur in collision physics are generally more complex than those involved in shading pixels etc. More importantly, they require much greater connectivity between data – that is; the result of one part of the calculation may rely heavily on another part. This requires more sophisticated communication between threads if the calculation is to be done efficiently. You can see this in action by comparing the performance of current high-end GPUs against a CPU doing hardware physics as compared to rendering. A good current generation GPU may be 5 times faster or so than a quad-core CPU at hardware physics, but at more “simple” stuff like rendering it’s going to be well over 100 times faster.
Anyway, I guess my point is that the advanced scheduling and shared cache in Fermi should be ideal for performing hardware physics calculations. So, if developers actually start supporting a good open source physics API (like the one based on OpenCL that AMD was touting a few months back), and putting effort into designing physics features, then we could be in for some fun.
Overall:
I think that Fermi is a bold move by nvidia to establish themselves as the market leader in a new and expanding field, while maintaining their presence in a much bigger market. I don’t think there’s any doubt that Fermi will become the de-facto choice for scientific stream computing and other GPGPU applications, but whether it can compete with ATI in the graphics market will depend on a few things: The performance of the thing, its ability to scale down to the mid- and low-end market, and perhaps most importantly, the yields at TSMC. I’m quite confident it will perform well – probably better than most people are expecting, but as for the other two it’s going to be hard going for nvidia. I can only see them losing further market share to ATI this year and the next, but Fermi could still turn out to be a good investment in the long term.
Thanks for reading

Last edited: