That for me has the problem that it suggests elements on a pipeline can do the same operation on 2 threads simultaneously. (A ticket booth & worker never does more than one vehicle at a time in real life and in metaphor).
For that ticket booth analogy to work you need at least two sets of ticket booths A & B in series and then pose the case that while one car is doing A another car can use B. Thus making sure A and B are always utilised. Though that analogy runs rough shot over the fact that operations are pipelined and executed 'out of order' anyway.
These sorts of imprecise metaphors leave people to overestimate the possible gains of Hyperthreading.
Actually as I started reading I thought, this will be bad, but he got it spot on, yes he didn't mention the ticket booth guys are highly skilled and theres a coin slot either side so two bikes can go through, but the metaphor is fine.
THe problem with HT is, if you're only pushing cars though, you'll never see any improvement whatsoever(not quite true, the ability to hold more than one thread means if one stalls the other can keep going through, and I said stalls, which cars can do
), if you're running applications that are all like motorbikes you can get essentially a 100% speed boost.
The thing with HT is its highly dependant on what you're doing, Core 2 architecture(and, suprisingly little detail on anandtech about high level architecture on Sandybridge review not sure whats changes inside the interger cores) is essentially a 4 issue core. If you're got a thread that can use them all the core can't jam through another thread, simple as that, if an instruction is using 3 of the execution units and the second thread also wants to use 3 execution units, no deal, etc, etc.
However you're also limited to 2 threads, so if you have two threads both only needing one execution unit you're still wasting two others every clock. Sometimes you'll get no benefit, sometimes a large amount, sometimes very little, some programs consistantly get a small or large benefit, and some vary. The one thing HT can't EVER do and will never be able to is use the full width of the core at the same time by two threads. It will never exceed 4 execution units being filled with an instruction.
Where Bulldozer does well is, with only 5% die space, they jam another interger core in each module. Which will never suffer from HT issues, its always available.
There are downsides and upsides though, Bulldozer went with a 2 issue core from a 3 issue core in Phenom 2, so overall one P2 core had 3 execution units available per core, and each Bulldozer core only has 2, and a module with 2 cores can only handle 4 issues over two cores, while a dual core Phenom can do 6.
But thats not all bad news, hugely better/more aggresive prediction means those units are in use more often, better pipeline means they are in use more often, AMD reckons the 3rd issue was rarely ever being used anyway and you can easily with architectural improvements get more single thread performance from a Bulldozer core than a P2 core.
The biggest bonus of AMD's second core in the module strategy is size, nothing more or less. 5% extra die space, for an extra core in each module, is insane. With a p2 to double the core count you'd be increasing die size by a heck of a lot more than that. Keep in mind they are talking about total chip die space and cache/chipset takes up room, its 12% extra transistors or so to turn a single core module into a dual core module, thats still much much better than 100% which is the normal in these situations.
That basically means AMD will be fighting a quad core Sandybridge with a octo core Bulldozer, thats WAY smaller than an octo core Sandybridge would be.
Anyway, because its really a second core, rather than jamming a second thread through the same core, its ALWAYS available and it will give consistant performance on any given thread/instructions being used. It won't work sometimes and not other times. They are giving us a estimate that the extra core adds around 80% performance over a single core, because some of the logic is shared, but not not much.
80% extra performance all the time is better than -10 to +90% performance increase from HT, as its rarely anywhere near either extreme.
HT's very useful, real cores are better, HT on better cores is more useful than more worse cores, think quad core i7 vs hexcore p2.
Theres no real right or wrong approach here.
Intel's current design is a very wide issue core, which lends itself better to HT as many threads can't fill the execution units. Bulldozer has narrowed the execution units which would be awful for HT as the core will rarely not be fully utilised, while the narrowed core also makes it smaller, and means adding the second core adds very little in size.