Come off it
Fermi takes a completely different approach to the flow of geometry through the pipeline (the horribly named "polymorph engine"), and to scheduling. This is what caused so many problems during the design process.
It's absolutely true that nvidia faced serious problems in getting the architecture to work the way they intended, and it resulted in poor yields and per-watt performance (rectified only with GF110), but that doesn't take away from the fact that it was a fairly significant adjustment to the GT200 architecture.
Complete and utter tosh, sorry but just tosh, it really, I mean really doesn't matter how you tart up a diagram, the tmu's do tmu work, the rops on the polymorph engine do the rop work, moving then around doesn't really change what they do OR how they do it. I mean, the 6970 has put its front end into two separate "engines", its just for simplicities sake, firstly its easier to scale down if you can remove one whole engine rather than cut out 1/3 of an engine and rework a lot of memory connections and this and that and the other.
At EVERY stage, every generation since god knows when this has happened, reach a certain size, regroup things to make it easier to scale down. Get shaders over a certain number, stick them in two groups, over another number, stick them in 4 groups. This is natural progression stuff, not game changing stuff.
Also NON of that has to do with architecture and yields, saying so couldn't be more ridiculous.
If the polymorph engine was harder to make work than rops/tmu's and everything else called some other name, then why would all but one work.
Yields are pretty simple EVERY wafer EVER made has faults, the bigger your core, the more cores will be taken out by these faults, this is a fundamental part of making chips. They tried to make a humongous core, on a crap process, it has smeg all to do with architecture.
it has everything to do with a leaky and unrefined process from a company who refused to invest properly until it got some competition, the second GloFo started making waves TSMC started upping their investments by 2-3 times what they were before, talking billions and billions since GloFo got in the game, before that TSMC were cruising along spending as little as possible.
Take ANY chip, ANY design, and a year later said design will have higher yields and lower power usage, this is how foundries work.
You'll also find a 1024 shader Fermi 2, performing maybe 80% faster than Fermi 1, and performing roughly exactly where you think a 1024shader 285gtx would perform.
Seriously, go to ANY top level diagram of ANY AMD/Nvidia architecture, now in photoshop remove the box around any "engine" you want, draw a box around 3 new things, and give said "engine" a new name. Marketing != fantastically new architecture.
Realistically gpu architecture is really very simple when you break it down to its base components.
Shaders and what they can do per clock, Nvidia, one shader, one instruction, crossbar memory controller, rops, tmu's, generation to generation what they can do increases but those fundamental things stay.
K8 changes from iteration to iteration, but the fundamental issues per core stays the same, as does the memory controller, as does the HT, as does so much of the rest of it. Theres tweaks and theres improvements but its the same basic principle. Core 2 duo had many of the same abilities, but it went about it a completely different way to the P4, 4 issue core(balls I can't remember been so long since I thought about it).
Anyway all those fundamental things and the fundamental clock range and the fundamental performance per clock per shader has remained very similar from 8800 through Fermi Gf100B, and lets be honest 80% of the performance is in the shaders, and the rest you can screw up by feeding them badly(5870).
The 2900xt vs 4870, memory controller, radically different, crossbar vs ringbus, 1024bit internal ringbus vs, not actually sure, 512bit crossbar internal, 256bit, not sure. Those are so radically different and a COMPLETE change in the arrangement and communication of EVERYTHING in the core.
5870 to 6970, 5 way shader, 4 way shader, completely different shaders with radically different performance per shader, with a list of things each shader can do where before there were MANY things each type of shader count NOT do the others could. Here this is far closer to Nvidia with each shader equally capable.