I'm not an expert in this stuff, but my take is that since CPUs tend to do the same stuff over and over, there are a lot of ways that they can be optimised over time, so that they can do the same thing quicker. There's a lot that can influence it and changes in each generation help different type of workloads to a greater or lesser degree.
There are some instructions and features that together with architectural changes to support them make the CPU much faster than one without them, when the software is written to support it, AVX would be an example. There's also the size and speed of cache and memory access, since the compute part has to wait until it can access or write the data it needs. The same goes for the pipelines that execute instructions because if they're not large or sophisticated enough to process the data, then even if the task in another pipeline is complete, then they all have to wait, so there's less advantage in splitting the work in the first place. Speculative execution is a fairly recent, but controversial feature, the concept is that it completes some work in anticipation that it will be needed, since it would be slower to wait until it is, though keeping as much of the CPU busy as possible is common with other, older features too.
There's a cost to excessive optimisation, because it can make the CPU slower for more general tasks, but if you did decide to focus on only one purpose, then the CPU could be made much more efficient and have a much higher IPC than a general purpose CPU. It can be difficult to predict the best kind of optimisation too, since you don't know exactly what kind of tasks software will demand and what instructions will be repeatedly called for. GPUs are a good example of that, because sometimes their architectural differences and optimisations (or lack of them) can show up quite clearly in benchmarks.