On linux and OSX, the cores are all in use, so up to 4 tasks gets the lowest latencies available. It makes things a lots smoother; for example loading a webpage triggers X parallel connections for images and stuff, and all these small tasks gets the lowest latency available when needed.
It just gives an impression of a faster machine overall /even/ if the cores are not all at 100%. It's the 'little things that add up' effect here.
I don't know anything about windows, but given that it doesn't support >4GB in 32 bits while the PAE CPU extension have existed for years, I'd says that they don't see very concerned about getting their OS running on modern CPUs.
It just gives an impression of a faster machine overall /even/ if the cores are not all at 100%. It's the 'little things that add up' effect here.
I don't know anything about windows, but given that it doesn't support >4GB in 32 bits while the PAE CPU extension have existed for years, I'd says that they don't see very concerned about getting their OS running on modern CPUs.