If you're desperate for 8 cores you could buy a Xeon. My Intel machine at work has 6 cores.
The idea is not getting utterly screwed on price.
Intel was making quad cores what, 6 years ago, and still haven't moved up despite the fact they EASILY could do. Again a 6 core Sandy/ivy without IGP would be marginally bigger, if really at all, and idle power, its just two more cores turned off, full use power could potentially be higher, almost certainly if they increased die size and went 8 core(still talking MUCH smaller than AMD chips with a better higher yield process so still much more profitable).
Intel simply haven't... thats it, they CAN, and really easily, really really easily. For everyone who will spend £250 now on a 4 core with HT, many of them would spend £300 on a 6-8 core without igp, others wouldn't.
Its the biggest whole in Intel's line up, most people can't afford and won't spend £450 on a high end hexcore, many people would spend £50 more for the chip they really want though, if they had hexcores in the i5 price range and octo's in the i7 price range both without IGP... we ALL win.
http://www.anandtech.com/show/6201/amd-details-its-3rd-gen-steamroller-architecture
As for AMD architecture/steamroller info.... there you go.
The single biggest weakness, and its humoungous, is that front end decoder basically having a lower throughput than a Phenom hexcore. In single core its not actually that bad... because the decoder is shared, though it means even a single very basic thread, a background OS thread, AV, whatever takes instructions away from the first core, and for power reasons the chip will use one module and one decoder rather than power up 2 or more modules. That means if you run something completely single threaded its likely only one module is fully powered up and any basic OS threads(there are loads) will be on the second core in a module and taking away performance from the other core.
Look at the table for instruction numbers, then take Bulldozer and double each number of instruction decodes at every level(except single core), even in single core while its still 4 instructions per core it means with one module powered up both cores have a 4 instruction decoder, so single high performance thread on one core isn't now effected by everything else your computer does which will be shunted to the second core.
THen read the rest of the article, Intel is not at the peak for Core architecture, but its near its peak, there is only so much to be gained every generation from improving branch predicting and the like(even if you reduce brand mispredicts by 20% every generation, that 20% is of a smaller number of mi****s each generation and has a smaller and smaller effect). AMD is at the very bottom end of its architecture, each 20% reduction in cache mi****s will have a pretty huge effect.
The decode itself is being stated to probably boost single core performance by 20-25%, multithreaded performance will increase even more. Going from a potential 16 instructions a clock decoded, to 32.
The trouble with Bulldozer was, its an 8 core, 8 real cores, and they needed to share resources to fit it in and I said at the time, Bulldozer was about shrinking cores to fit 8 on a die(and its still an efficient way to do so) but further generations on newer processes will get more space and will widen the inside of each core essentially.
Intel have boosted per core performance but have decided as yet they can't afford to fit 8 real cores on a die, AMD decided to go ahead and push 8 cores on die and not focus yet on per die performance.
In two generations AMD will have 8 cores and each core will be very fast, Intel will have 8 cores and each one will be very fast. There is two parts, fitting in 8 cores, increasing per die speed, upgrades A and B, Intel is doing A then B, AMD is doing B then A.
Bulldozer is a very smart architecture, but needs to be "unlocked" with each new process allowing it to release the limits, it can fit in more per core hardware, micro-op instruction queues, better cache, more L1 cache, more decoder resources, more integer resources, etc, etc.
Don't forget that the modules themselves are very efficient and small for a dual core module and those modules are being used effectively in Trinity, and are what Bobcat and Kabini/Jaguar are based on, which are great little chips also improving. The architecture is a good step forward, and it simply made more sense to put 4 slimmer ones on a single die than 2 fat ones for the high end chip for Bulldozer/Piledriver. If AMD had the money Intel had, and was also on 22nm, Bulldozer would have launched on 28-22nm and started off life as Steamroller.