Gashman said:
why does the K8 architecture there look so much more efficient than the other two, i mean look at it, its much less hastle, and accomplishes the same results with a much less complicated design (and please no crap about 'not knowing about processor architecture please' its just a bloody question, nothing more, nothing less)
Efficency isn't measured in putting in less features, but by how well those features can be utilised (one of the problems with netburst and the reason hyperthreading works so well on the P4, lots of the execution units sit idle a lot of the time). The K7/K8 is has more execution units than the P6, incidentally, just has them less spread out into specific functions.
The architecture also looks a bit simpler because the load/store unit (shown in yellow on the Intel articles) is not shown on the diagram for the K8 (it's often omitted for clarity in such diagrams), unfortunately I can't find one that has it in for the K8 (or the K7, which execution wise is the same). (Should have mentioned this in my previous post actually, but I forgot
)
Going back to my first point, the real issue is making sure the execution units are processing code. This is where the front end optimisation comes in. (Scheduling, branch prediction, OOE (out of order execution) units and so on).
If you have a 'narrow' processor (ie one with a few EU's that can only do a couple of instructions at once), Scheduling is fairly easy, you execute instructions in-order and as they come. The problem with this approach is that the only way to make it quicker is to increase the clockspeed.
(credit arstechnica again)
This is the original Pentium architecture, and it's incredibly simple really. One single floating point EU with a single pipeline and two integer EU's each with their own pipelines (that weren't actually identical, strangely. One was much more use with extra hardware than the other)
The Pentium didn't execute instructions out of order (apart from in a few very limited circumstances) it actually had a lot of hardware devoted to making x86 instructions work on the architecture (a workaround that's still done today, only much, much more efficiently)
The problem with increasing clockspeed is that you either have to increase the bus speed, the multipler, or both, it also generally requires more voltage and produces more heat.
This idea has long been discarded (with the P6 core, and most things that followed on from it including the K7 and K8 cores from AMD) in favour of being able to analyse instructions, split them all up and run them in the order that allows for most efficent use of the hardware available, then put the string back together afterwards. This is out of order execution, and was the key to the success of the P6 core and everything that came after it.
The secret to improving the processors these days (as Intel discovered with Netburst) is not going to be getting sillier and sillier clockspeeds, but in getting more instructions processed on each pass around.
This solution isn't without problems of it's own, however. It requires a lot of work in scheduling, prediction, dependancies and so on, and can backfire in a major way if you get this wrong. A large, wide processor will only be as efficent as it's front end, because if those EU's sit idle, then the processor will be slow.
The big difference between the K8 approach, and the Core approach is that AMD have simply gone with existing all purpose units, while Intel have beefed up the all purpose units, and added some more specific units (the Vector ones) in addition to those. In theory this means that the CPU can process more instructions, because the K8 does vector calculations in it's standard units, so can't be using those units for something else at the same time.
The maximum number of operations that can be completed in one clock cycle is determined by the number of EU's available, and the amount of free access per cycle (how many instructions can be passed on, because some share access).
Hope that explains it a bit...
More in depth details for what I'm talking about can be found
here. It focuses on the evolution of the Intel pentium line, but it covers everything, and explains well, and generally AMD have followed, rather than led when it comes to innovation like this.