i see nothing but words of wisdom and unbiased knowledge comeing direct from DM
yet again
Sorry but before you say something like that, infact while you say something like that, please point out what I said that was biased or incorrect, go ahead.
It would require much more than a 'tweak' to make DP as fast as SP- I would guess that it is impossible.
A large part of why Fermi is poor when it comes to performance per watt IS the process. The 40nm process has a lot of leakage which ATi anticipated and because of that they did a much better job executing their design than Nvidia did. That doesn't mean that Nvida didn't mess up, they did, but it is possible that they would see a huge jump in performance per watt increase JUST by moving to 28nm- way above what ATi will get as ATi protected against it by using a slightly larger die than they would have liked.
Why, oh why can't people take an EXAMPLE for what an EXAMPLE is, I said FOR INSTANCE. In reality the 280gtx, I really do forget, I think its DP throughput was around 1/8th of its SP throughput, they took this up to 1/4 throughput or so, with a doubling of shaders on top giving a heck of an increase.
Nvidia would actually only take a "fairly" small tweak to improve the DP throughput as each shader is essentially individual, AMD's current DP throughput is horrible due to teh 4+1 architecture, it basically has 1/5th of the SP throughput. Nvidia's pretty simple and separate shaders could be quite easily tweaked to increase the DP throughput against SP dramatically without a huge amount of extra work, to do so in the same die size limits would probably require dropping overall shader count for more of the DP shaders and less overal SP shaders, it would certainly require a specific and different version(not just different bios's) for the Tesla and consumer GPU versions as if you say cut half the shaders to put in more DP shaders, well it would have horrible gaming performance.
Nvidia have an architecture that sucks for core size, but is easy to increase DP power on, the point where they can afford to push through wafers of GPGPU only versions though, Tesla's only LAST YEAR were only available in servers from a SINGLE OEM, thats a heck of a lot of R&D cost, and production cost for very small output. THe newer cards are far more widely available in terms of companies that supply them, I haven't seen figures for actual quantity of cards available compared to the older Telsa cards I wouldn't be surprised if it was lower.
As for the power, you're wrong I'm afraid to say, the 480gtx is 60% bigger and uses 60% more power than the 5870, its really no less power efficient per transistor or per mm2 than AMD(well marginally worse but not hugely), the issue is the size, and the size is because of the architecture. They had this issue at 65nm and 55nm with delays on every part for the past two years on 3 separate processes AMD has had no power issues on. Its NOT the process. The process has bad yields, and more leakage than you'd and its effected AMD exactly as much as Nvidia, if Nvidia made a core 40% smaller, it would have 40% lower power usage, as evidenced by the 460gtx, roughly 35% smaller, roughly 35% less power usage. IF you really want to get into the specifics, Nvidia added pretty much nothing to the size to accomodate the poor yielding process(at over 500mm2, it wouldn't have made a difference), AMD added some 10-15% die size JUST for accomodating the yields/process that DON'T add any performance, 15% and they are still that much smaller. Also leakage isn't fixed by any of the added 10-15% die size, it INCREASES leakage, leakage is bad on the process, it effects EVERY transistor to basically the same degree. Architecture is the only reason the core is over 500mm2. Doubling up transistor count with a process shrink will maintain a similarly large core and thats the fundamental problem not the process.
Overclock the crap out of a 460gtx it will perform the same as a 470gtx, and use more power than it.
EDIT:- 280gtx had DP throughput = 1/8 of SP, 480gtx does have 1/2 DP throughout, so it as 4x the throughput as standard, it has double the SP throughput(or was aimed to have) so that makes it 8x the throughput total, and considering the 285gtx was 204 W and the 480gtx with 8 times the performance is at 250W, well, its got a little under 7x the performance per W for DP performance.
Hence 4x the performance is a fairly small increase all told from designs 2 years apart.