AMD Says Bulldozer Is 50% Faster than Core i7

CmdrTobs · 3 Feb 2011 at 20:13

wannabedamned said:
Once in a while a certain chip comes out that trumps both brands, Barton 2500m, Opteron 146 onwards, Intel Q6600. ....

2 out of 3 of my last processors. I bagged an XP-M back in the day with a good stepping. They were binned to be stable at lower Vcore any way being a 'mobile version' and that stepping had extra stability. Could clock nearly 1000Mhz, Awesome. I still got it today. I have to give it insane Vcore just to run though now. Guess some serious electron migration has occurred amongst other things.

The cryix's weren't bad if you stuck to business, office tasks. It's that crap floating floating point that gimped it vs pentium on quake and stuff. It was defiantly not for games, especially when software render was still common. IF you want to talk about '****' CPU's remember 'Winchip?' they were turd. That had like ~486 or worse games performance clocked at like 200Mhz. Even ropey round desktop use. Cost peanuts though. Which is an import thing. No 'cheap' chip is really all that bad, they are usually priced smartly. Intel have probably released the 'worse' chips, chips like those 'pre-overclocked' P3's that used that rambus? hahaha. Christ I remember Rambus, **** Rambus. The only people who bought intel then were OEM's and the mentally insane.

Fully star swearwords!

The Halk · 3 Feb 2011 at 20:55

wannabedamned said:
Once in a while a certain chip comes out that trumps both brands, Barton 2500m, Opteron 146 onwards, Intel Q6600.

Yep... the current AMDs seem to be just too mediocre to qualify there. The Sandybridge 2600K is maybe in that league? But probably not.

Perhaps their high end Sandybridges will be...

AMDs Bulldozer as I understand (I may have been taken in by hype) is a bigger step forward than the other cores have been since A64 (which was the Optys) so perhaps it's possible one is coming... I don't expect it though.

drunkenmaster · 3 Feb 2011 at 21:14

Two things, firstly lets just all ignore the trolling as this thread is more about bulldozer than which past chips are great, its obvious trolling and complete rubbish.

Secondly, yes, it still is 8 cores CmdrTobs, and I've ALREADY said well before you that theres an active and purposeful reduction in the amount of FPU per core, why, because within a year there will be, 400+shaders on die, giving more FPU power than dozens of AVX units.

This is a base architecture that will be around for years, HUGE FPU power as a general architectural base, when you're a year away from increasing overall FPU power by a power of magnitude rather than by 2-3x's, is just insane.

This is why these are cores, in the next 3-4 years FPU within the "cpu core" will be an almost dead idea, some fpu power so that for a small amount of data its faster to stay within core, but for larger data its far faster to go out of the module and to a local ridiculously power FPU machine is the way forwards.

The idea that CPU's should be described as how many threads a core can always do is incorrect, and thats not what I meant, the amount of threads a CPU can ALWAYS handle every clock is a great measurement of what a core is though, and thats what we have here. A quad core can do 4 threads every clock with no issue, and can never ALWAYS do moer than 4 threads, a Bulldozer octo core/4module can always push 8 threads through. There are ALWAYS going to be 8 full interger cores to use, there will always be 2 x2 issues cores and each will have its own l1 cache, its own 2 decoders, they just sit next to each other in the same location in the module, its own schedualer, its own 128bit fpu unit, etc, etc.

Take any past 2 core chip and merge them together with the decoder units in the same block, everything side by side, you'd save transistors and it would still have ALL the functionality of dual cores.

Again, Intel will do this, all chips will push cores closer and closer together, the more, really , anything you have you HAVE to reduce communication this is how life works.

With GPU's you had, 1-2-4-8-16 pipelines, then we started getting 16 pipelines with 4 shaders on each pipeline, then we moved to blocks of shaders, then we moved to groups of blocks of shaders, the main reason, how long does it take for someone to run up and down a line of 100 people and tell them something all individually, then how long would it take if one person, told 10 people, and 10 people all told 10 more people?

Interconnect cost, communication, complexity increases dramatically the more cores you have, the more metal layers you need to have the first core contact the 8th core.

At 8 cores we're simply seeing a subdivision to decrease latency and communication time, at 16 cores we'll maybe see 4 groups of 4, at 32 we might see 8 groups of 4, at 64 cores we might see 8 groups of 8, etc, etc, etc.

This is how cores work, and have always worked, in pretty much any type of cpu/gpu, the more major units you have, the more you HAVE TO divide up the core into a more modular design.

I mean its pretty simple, a quad core without HT will be seen as 4 cores in windows, a quad core with HT will be seen as 8 cores, but for 8 threads to occur each core will have two threads per core, on an octo core without HT windows will see 8 cores with one thread available on each core.

IS a 2900xt not really 320 shaders, because they are in groups? Is a GF110 not really 512 shaders, because its in groups of 32? The idea is nonsense, we know what a shader is, and just because GPU's moved forward in natural GPu design in the ONLY WAY IT COULD, we didn't decide a Fermi was really only 16 shaders which can do 32 shader instructions within it, but we still only count it as one, its just ridiculous.

rhiridflaidd · 4 Feb 2011 at 09:49

Reverse multithreading

http://www.theinquirer.net/inquirer/feature/1730197/itll-sandy-bridge-bulldozer-2011

A 50% increase in performance is massive. Most things that I run strains a single core and the other cores are at 60-70%, and in computing it's the lowest common denominator that limits overall speed alsmost every time.

So - is this just a figment of the inquirer's imagination or is this how AMD has achieved a performance leap - because 50% ofer i7 is a big leap, from where AMD is right now.

I know nothing about chip design, but if you imagine each 2 chip 1 FPU unit to be working on a single thread, with the thread being processed by 2 cores and 1 FPU, it sort of makes sense.

Or this might be just wishful thinking.

pswfps · 4 Feb 2011 at 15:13

CmdrTobs said:
Also, I meant to mention the blasé nature of people asking programmers just to code new highly threaded games...... It's SOOOOOO hard.

Why? You have to synchronise with user input you can't have threads racing off as fast as possible like in say video encoding(How does a thread know to play 'explosion.wav' if you have not clicked the mouse yet?)

So the only way to do it is dynamic load balancing and that is a massive ballache if not impossible(TM) imagine. If companies can't be arsed to regularly code simpler largely 1 threaded engines (they just rent the UT3), can you imagine anybody willing to attempt the above.

I think this makes bulldozer more of a assault on the cloud server market and won't be as profound for us. Still lucking forward though.

Multithreading certainly adds extra complexity but it's probably more a case of them using only what they need. In fact, my own experiments with massively threaded apps show that excessive threading can actually slow down app performance, especially when the app uses more threads than the CPU natively supports... and there's still a lot of dual cores out there.

So let's hope that Bulldozer will have substantially better IPC than SandyBridge for all those single/dual threaded apps.

Gashman · 4 Feb 2011 at 15:55

rhiridflaidd said:
http://www.theinquirer.net/inquirer/feature/1730197/itll-sandy-bridge-bulldozer-2011

A 50% increase in performance is massive. Most things that I run strains a single core and the other cores are at 60-70%, and in computing it's the lowest common denominator that limits overall speed alsmost every time.

So - is this just a figment of the inquirer's imagination or is this how AMD has achieved a performance leap - because 50% ofer i7 is a big leap, from where AMD is right now.

I know nothing about chip design, but if you imagine each 2 chip 1 FPU unit to be working on a single thread, with the thread being processed by 2 cores and 1 FPU, it sort of makes sense.

Or this might be just wishful thinking.

it doesn't run a single thread across both interger cores, so in single threaded applications a module will only be using ~50% of its resources executing that thread, the other have would theoretically be idle, though it would really be something special if one could get a single thread running across multiple cores, bulldozer would be insanely fast if that were the case.

chrisarm · 4 Feb 2011 at 16:52

The story states based on the bulldozer cpu. Doesn't this mean that it is not the actual bulldozer chip?

CmdrTobs · 5 Feb 2011 at 00:23

drunkenmaster said:
Two things, firstly lets just all ignore the trolling as this thread is more about bulldozer than which past chips are great, its obvious trolling and complete rubbish.

The T-word!

drunkenmaster said:
The idea that CPU's should be described as how many threads a core can always do is incorrect, and thats not what I meant, the amount of threads a CPU can ALWAYS handle every clock is a great measurement of what a core is though,

Here is where we disagree. My measurement of core in a hardware conversation refers to that basic unit of micro-circuitry. Though I do admit I sometimes for shear laziness/brevity/ or compliance with someone I use 'core' to mean a unit that can continually execute a thread.

drunkenmaster said:
Take any past 2 core chip and merge them together with the decoder units in the same block, everything side by side, you'd save transistors and it would still have ALL the functionality of dual cores.

You had to write that because someone wrote:

Drunken's Strawman said:
Take any future 1 core chip and split them together with the encoder units in a different block. everyting in a row, you'd spend more resistors and it would not have ALL the functionality of a singe core.

I jest, but seriously nobody is disagreeing on the technical details of what bulldozer is here. I know this may be a shock but people MAY understand and just see the vernacular slightly different from you. Windows for example disagrees and calls an execution unit 'CPU'.

I think module is best, for the time being. It nicely informs the consumer as to what they are getting compared to a past norm. AMD themselves sometimes use this.

Disagree? Then we agree to disagree.

Gashman said:
it doesn't run a single thread across both interger cores, so in single threaded applications a module will only be using ~50% of its resources executing that thread, the other have would theoretically be idle, though it would really be something special if one could get a single thread running across multiple cores, bulldozer would be insanely fast if that were the case.

Well it can with Windows 7 or Linux 2.6blahblah (the latest ones) fully use the FPU so you can think of that as 66% usage on a single thread. As for using that other other integer 'core' I don't think there would be much point as integer work is already damn fast. Never(TM) really a bottleneck.

This is one of the arguments to move towards risc and have a non-out of order setup so it's easier to pipeline like crazy like those SPACs chips that can run 100's of threads on one core with no slow down, but a potential speed up for hungrier threads of the type you describe. Like a bulldozer in a bulldozer if that makes? (no it doesn't tobs

)

CmdrTobs · 5 Feb 2011 at 00:43

pswfps said:
Multithreading certainly adds extra complexity but it's probably more a case of them using only what they need. In fact, my own experiments with massively threaded apps show that excessive threading can actually slow down app performance, especially when the app uses more threads than the CPU natively supports... and there's still a lot of dual cores out there.

So let's hope that Bulldozer will have substantially better IPC than SandyBridge for all those single/dual threaded apps.

I hope, but I doubt. I am going to guess it will be par.

If software comes out like that thingy I posted then I could see BD being 50% faster easily than any 4 core I7.

Performance increasingly because of; limits on Ohmic connects and current densities combined with the economics of mores law shoe Horns us down a parallel route whether we like it or not. Available software supporting more threads becomes the big question.

Even when chip makers have good IPC performance they won't release a dual core that will blow away the high end multicore they will try to push. Look at how they killed ocing on the I3, I suspect it would have become the next Athlon XP-M if they did not.

So yeah, I guess BD could be +50% an I7 on Cinebench..... outside of that between intels instruction set favoured benches and general low scaling with threads not 50% at all.

DeeJayCee · 5 Feb 2011 at 11:40

I'd ignore anything from the inquirer, it's a well known trash repository.

I also don't understand the fascination of getting into the architecture too much. Surely it's how the software uses the hardware. This is the point the programmer guy was making about threads being more important than cores.

DeeJayCee · 5 Feb 2011 at 12:21

Back on topic, I think this sums up the OP better than I could:

http://razetheworld.com/hardware/logans-rant-on-the-leaked-amd-bulldozer-performance-specs/

Gashman · 6 Feb 2011 at 01:57

surely discussing architectures is relevant, this is afterall a computer enthusiast forum, Bulldozer is a pretty interesting design no matter which way you look at it, I agree totally with CmdrTobs with his definition of what Bulldozers 'Modules' are. personally think this diagram http://www.qdpma.com/Arch_files/Bulldozer_Core_uArch_0_4.png has some merit, the whole thing labelled as the 'Core' and each part labelled 'Interger Cluster', since I for one think thats how AMD originally intended it to be, though indeed windows will show it as eight physical cores I do believe each module is aimed at competing on a core vs. core basis, not core vs. dual-core. another complete execution unit for only 12% extra die-space than a single core, i.e. meant to compete on the same sort of playing field as Hyper-Threading since its more like the extra die-space taken by HT implimentation than another entire core. at least thats my view on the whole thing, we don't all agree? thats fair enough, the world would be a damned boring place if everyone thought the same about everything!

Gashman · 6 Feb 2011 at 02:23

also back to topical discussion, think about it from AMDs point of view for a minute here, to understand why it is such a brilliant design. each of these so called 'cores' only adds another 12% to the die-space compared to a single traditional core, thats the inspired part of the design!

lets say for the sake of argument a single, traditional core requires 100 square millimetres of die-space, that core can work in a single threaded environment, no more, no less. a Bulldozer module would take up 112 square millimetres, so 12 more than the single core, however you've doubled the amount of threads you can execute, the design becomes even more inspired when you increase the number of modules. another example, a 'traditional' quad-core based on the number above (example of course) requires 400 square millimetres, can do four threads at once right? well for 448 square millimetres you could have yourself four entire Bulldozer modules, capable of executing eight threads, double the number of the quad-core for 12% more die-space. obviously these examples don't take into the account the amount of space required by L3 cache and memory controllers, etc. but I think it emphasises on why its such a spectacular design, and why the 'eight-core' chips are going to be compared to quad-core rivals, I for one believe its the way AMD intended it to be from the start.

so 12% more die-space for 50% more performance (if the number prove to be correct) is a spectacular move in an ever more efficiency orientated world, even more so considering it has improved Turbo than previous Phenom II processors and improved energy saving features, even if it doesn't take the core vs. core performance crown I think it will still be a success for them, since I don't think it was ever aimed at that sort of a comparison..!

Edit: also worth saying that 12% increase in die-space (not including L3, memory controller, etc.) for double the potential threads is gonna be untouchable in heavily multi-threaded applications, because Intel only have Hyper-Threading to compete in that sort of a comparison, and we are all well aware that Hyper-Threading is never, ever a substitute for another physical core so the Bulldozer processors would have a marked advantage in those areas which is brilliant for AMD in an ever growing multi-threaded market.

opethdisciple · 9 Feb 2011 at 00:48

Taken from here: http://en.wikipedia.org/wiki/Bulldozer_(processor)

"Bulldozer is the next-generation micro-architecture and processor design developed from the ground up by AMD. Bulldozer will be the first major redesign of AMD’s processor architecture since 2003, when the firm launched its Athlon 64/ Opteron (K8) processors. Bulldozer will feature two 128-bit FMA-capable FPUs which can be combined into one 256-bit FPU. This design is accompanied with two integer cores each with 4 pipelines (the fetch/decode stage is shared). Bulldozer will also introduce shared L2 cache in the new architecture. AMD calls this design a "Bulldozer module". A 16-core processor design would feature eight of these modules,[6] but the operating system will recognize each module as two physical cores.

The module, described as two cores, can be compared to a single Intel core with HyperThreading. The difference between the two approaches is that Bulldozer provides dedicated schedulers and integer units for each thread, whereas Intel's core has fewer replicated per-thread units, and consequently exhibits more resource contention with HyperThreading enabled."

From that description, I was just thinking, 'oh, so this is AMD catching up with the core i7 achitecture'.

Emlyn_Dewar · 9 Feb 2011 at 09:29

But it is Wikipedia... Brave man who bases all their thinking off of a wiki article.

rypt · 9 Feb 2011 at 09:45

pswfps said:
Multithreading certainly adds extra complexity but it's probably more a case of them using only what they need. In fact, my own experiments with massively threaded apps show that excessive threading can actually slow down app performance, especially when the app uses more threads than the CPU natively supports... and there's still a lot of dual cores out there.

That is a case of poorly developed apps. A GOOD application should be scalable and be able to run on anything from the minimum number of threads it needs up to infinity

Gashman · 9 Feb 2011 at 18:40

Emlyn_Dewar said:
But it is Wikipedia... Brave man who bases all their thinking off of a wiki article.

as far as i know all the wikipedia information is based on either official information or bits and bobs that everyone is assuming based on the architectural diagrams. either way they should be fairly quick processors, apparently designed to run at a frequency of 3.5Ghz (according to one of the wiki-sources), awesome considering they aren't meant to be compared to eight-core processors, more like quad-core with hyper-threading, no more, no less...!

DeeJayCee · 11 Feb 2011 at 13:30

Just came across this on my travels and thought of this thread:

Quote from the AMD Bulldozer blog:

"Performance: We release benchmarks at launch, so don’t expect too much detail there anytime soon. From a performance standpoint, if you compare our 16-core Interlagos to our current 12-core AMD Opteron™ 6100 Series processors (code named “Magny Cours”) we estimate that customers will see up to 50% more performance from 33% more cores. This means we expect the per core performance to go in the right direction — up. That is all I will say until launch."

http://blogs.amd.com/work/2010/08/02/what-is-bulldozer/

Weets · 11 Feb 2011 at 23:02

DeeJayCee said:
"Performance: We release benchmarks at launch, so don’t expect too much detail there anytime soon.

Why the Hell not? Half of me wants to stick my fingers up and buy a Sandy just out of spite.

Sayso · 12 Feb 2011 at 00:01

Weets said:
Why the Hell not? Half of me wants to stick my fingers up and buy a Sandy just out of spite.

because they aren't that impressive? fell free to by SB by the way from what i have read its great