Execution Units
Pentium M has five dispatch ports located on its Reservation Station, but only two ports are used to dispatch micro-ops to execution units. The other three are used by memory-related units (Load, Store Address and Store Data). Core microarchitecture has also five dispatch ports, however three of them are used to send micro-ops to execution units. This means that CPUs using Core microarchitecture will be able to send three micro-ops to be executed per clock cycle, contrasted to only two on Pentium M.
Core microarchitecture provides one extra FPU and one extra IEU (a.k.a. ALU) compared to Pentium M’s architecture. This means Core microarchitecture can process three integer instructions per clock cycle, contrasted to only two on Pentium M.
But not all math instructions can be executed on all FPUs. As you can see on Figure 2, floating-point multiplication operations can only be executed on the third FPU and floating-point adds can only be executed on the second FPU. FPmov instructions can be executed on the first FPU or on the other two FPUs if there is no other more complex instruction (FPadd or FPmul) ready to be dispatched to them. MMX/SSE instructions are dealt by the FPU.
On Figure 2 you see a preliminary block diagram of Core microarchitecture execution units.
Core Microarchitecture
click to enlarge
Figure 2: Core microarchitecture execution units.
Another big difference between Pentium M and Pentium 4 architectures to Core architecture is that on Core architecture the Load and Store units have their own address generation units embedded. Pentium 4 and Pentium M have a separated address generation unit, and on Pentium 4 the first ALU is used to store data on memory.
Here is a small explanation of each execution unit found on this CPU:
* IEU: Instruction Execution Unit is where regular instructions are executed. Also known as ALU (Arithmetic and Logic Unit). “Regular” instructions are also known as “integer” instructions.
* JEU: Jump Execution Unit processes branches and is also known as Branch Unit.
* FPU: Floating-Point Unit. Is responsible for executing floating-point math operation and also MMX and SSE instructions. In this CPU the FPUs aren’t “complete”, as some instruction types (FPmov, FPadd and FPmul) can only be executed on certain FPUs:
o FPadd: Only this FPU can process floating-point addition instructions, like ADDPS (which, by the way, is a SSE instruction).
o FPmul: Only this FPU can process floating-point multiplication instructions, like MULPS (which, by the way, is a SSE instruction).
o FPmov: Instructions for loading or copying a FPU register, like MOVAPS (which transfers data to a SSE 128-bit XMM register). This kind of instruction can be executed on any FPU, but on the second and on the third FPUs only if FPadd- or FPmul-like instructions aren’t available in the Reservation Station to be dispatched.
* Load: Unit to process instructions that ask a data to be read from the RAM memory.
* Store Data: Unit to process instructions that ask a data to be written at the RAM memory.
Keep in mind that complex instructions may take several clock cycles to be processed. Let’s take an example of port 2, where the FPmul unit is located. While this unit is processing a very complex instruction that takes several clock ticks to be executed, port 2 won’t stall: it will keep sending simple instructions to the IEU while the FPU is busy.