A lot of that is because Fermi is a disaster from a performance per watt point of view.
Hmm, well it is but its a disaster due to design, not the process, and I see where the performance is coming from, he's talking about DP performance I would guess, not normal performance, the difference is very easy.
If the architecture is setup so for instance you can only run DP instructions at half the speed of SP instructions, then by tweaking the architecture so it can run the same DP as SP you'd be doubling performance per watt, switching to a new process generally brings with it a doubling of performance per watt, I would guess this is where 4x the performance/watt comes from.
Which unfortunately suggests a similar architecture, which will unfortunately mean its likely to suffer from every problem Fermi had.
We'll see, I just don't see him making the necessary changes, if they go with GloFo, which seems to be the case, then theres potential for less issues with a huge core if its a better process/yields.
However theres a very good chance only Tegra will get done at GloFo(as they'll be pretty much the masters of ARM core production as they are partners), meaning probably a lot of sharing of performance/yield tweaks to ARM based chips to be had by working with GloFo.
Didn't the 480gtx supposedly have 8times the throughput of DP instructions as the 285gtx anyway, also on a lower process probably meant it had 16x the increase in performance/watt for DP instructions anyway..... that doesn't mean Fermi was more power efficient or a great design compared to the 285gtx though
You can spin anything, you HAVE to spin everything if you work at Nvidia though
