This is an interesting response, and something that seemed odd to me too. As the poster below it indicates, AMD has only increased (useful?) transistor count by 64%, yet we would have expected around 100% by consideration of area alone. This seems to suggest that there is a lot of redundant silicon on the 7970. I read an article a while back that said that in the face of poor manufacturing processes, designers will double or triple up important elements that might not fab right, leading to redundant silicon. Not doing enough of this on an immature process is what led to nvidia's problems with the GTX480/470 release. So it could well be that 28nm is still not working at all well. By the time Nvidia come to market it might be working rather better, which would be bad news for AMD.
Against the most favourable reviews, I suppose an increase of ~40% on 64% more transistors isn't all that bad, especially if they can improve it with better drivers.
So perhaps it's a combination of a slight design fail and a big process fail.
Two things, if there was lots of redundancy, transistor count would most likely have been up, and 5870 already had what was likely similar levels of redundancy, you're not really talking about SP's or anything major but some via's and other things doubled up, this cost 10-15% die size on the 5870, but its likely the proportion would likely stick.
Other things worth noting, 40nm and 28nm are just PR, the real numbers aren't that, I have no idea what they are, but I've heard from multiple people these are rounded up/down. LIkewise HKMG for AMD at least costs around 10% die size, and its quite possible 28nm is really bigger than that, significantly so. They have 28nm varients that have no HKMG afaik, so its perfectly possible 28nm is for low power non HKMG, and the HP process is really pushing 31-32nm.
Secondly, transistor count "usually" doubles from the initial core on a process to the initial core on the next process. 4870/5870, 5870/7970..
It also seems Cayman took up rather a large chunk of the efficiency improvement between 5870/7970, moving to VLIW4 improved the shader efficiency dramatically.
Personally I've also thought and we always seem to see AMD underspec ROP's, which seems to somewhat limit their "high end" performance. THose several games where not cpu limited on older games Nvidia can run away off to 200fps in something old while AMD tends to be limited.
Anyway, so far, still as far as we know, the only failure is price. it should be cheaper, simple as that. I don't like Nvidia pricing, full stop, but they ARE making 530mm2 cores with frankly awful yields, still. It would take something along the lines of, 40-50% cost increase of wafers to make a 360mm2 28nm core cost more to produce than a 530mm2.
The ONLY comparison I would make to Bulldozer is, have AMD designed a 28nm core for higher speed's to help bring latency down for compute work, but missing the higher speed bins of a not really ready 28nm could potentially have hurt it? Maybe, who knows.
I think there is more to come from drivers, this is an entirely different architecture, meaning NO games are optimised for it, not a single one, and that shouldn't be ignored either. But driver wise, VLIW 4 drivers have come on quite a long way, there are a lot of situations a 6970 is significantly ahead of a 5870 now, something that wasn't necessarily the case on launch. But VLIW 4 drivers weren't drastically different from VLIW 5. GCN will be utterly different, though also simplified, still takes time to get used to it and it will take time for AMD drivers to be optimised for games on a new architecture(where possible) and it will take time for games to come out with dedicated optimisations to work better with GCN.
That could be anything from 5% to god knows what, time will tell.
Still, even considering all that performance in extreme situations is stupid faster. Frequently(from the minimal reviews I've seen including results) 80-100% ahead in eyefinity situations which also again suggests to me there isn't enough raw horsepower in the front end to push the frames out, but in the backend when in silly power requireing situations and lower framerates, the front end is less limited and the shader power really comes out, and its crazy good.
At 300 this card would rock.