http://www.xbitlabs.com/news/cpu/display/20101017170037_AMD_No_Core_Wars_Incoming.html
Kind of relevant, if there aren't going to be 'core wars' then maybe we will get 6-8 core cpus and then have the frequency ramp up again.
He's referring to efficiency (not clock ramp up).
Although here we are talking about increases using multiple cores. Later CPU designs have been more efficient since the 8086.
For example the 8086 has 4 pipeline stages, IFETCH, IDECODE, IEXECUTE, IADDRESS.
The reason why you have pipelines is so you can have parallelization within the CPU core. So going back to that 4 stage 8086, the first instruction can be executing (IEXECUTE), while the second is being decoded, and the third in the IFETCH buffer. Pipeline approach keeps the separate CPU parts more busy. Same as a group of people working on a car production line.
Now jump ahead to the Core2 - this had at least 20 pipelines. So there was much more parallelization happening inside the CPU. This made is much more efficient then previous designs, and why Core2's were so fast relative to their clock cycles.
You begin to see why it's so complex.
1) First parallelization inside the CPU core with pipelines.
2) Then with MMX there is dedicated SIMD units taking load of the floating point - more parallelization (fine grain)
3) Then you have multi-cores, and the OS scheduling a process to the next redundant CPU core. (coarse grain)
Then there is MIMD over multi cores (fine grain) that it looks like Bulldozer is trying to do.
After all this, what is going to schedule all the above? Firmware on the CPU, OS, the compiler, or responsibility of the developer to write optimised threaded apps?
BTW Sorry to keep posting on about this stuff - My final year degree project was in this area 13 years ago, and it's still close to my heart..