AMD Bulldozer Finally!

Martini1991 · 26 Sep 2011 at 11:17

eddyr said:
So bored on monday morning, have done some back of the napkin calculations. With a module running 2 threads each at 80% throughput of a module running just 1 thread, relative performance should look something like:

At full load:

1 thread , 1 module = 1
4 threads, 1 per module = 4
5 threads, 1 mod 2x0.8 = 4.6
6 threads, 2 mod 2x0.8 = 5.2
7 threads, 3 mod 2x0.8 = 5.8
8 threads, 4 mod 2x0.8 = 6.4

Obviously different work types and loads wills stress the shared resources differently, so perhaps 8 less intesive threads may perform faster per core, but as a general thumb rule.

Thanks for that.
However. AMD's modules work on effiency right?
Ok.

Lets change the terminology a little bit.

Module = Master and Slave. The master = 1, the slave = 0.8

2 Threaded app, does it do Master/Master or Master/Slave.

Since "cores" can't be turned on by themselves, and it requires the module, for power reasons it makes more sense to go Master/Slave.

But that essentially means, a module could have the performance of a Callisto (Phenom II X2) IPC wise.
But as a single thread on the module would run on Master, it'd be lynfield (I'm giving realistic figures, could be better, could be worse)

subbytna · 26 Sep 2011 at 11:19

For a min there I read that as overclocking ability....6.4 nice

eddyr · 26 Sep 2011 at 12:07

Martini1991 said:
Thanks for that.
However. AMD's modules work on effiency right?
Ok.

Lets change the terminology a little bit.

Module = Master and Slave. The master = 1, the slave = 0.8
2 Threaded app, does it do Master/Master or Master/Slave.
Since "cores" can't be turned on by themselves, and it requires the module, for power reasons it makes more sense to go Master/Slave.

But that essentially means, a module could have the performance of a Callisto (Phenom II X2) IPC wise.
But as a single thread on the module would run on Master, it'd be lynfield (I'm giving realistic figures, could be better, could be worse)

Performance wise that is far more appealing, giving a max of 8 threads =7.2

So are both cores (minus shared resources) within a module not identical? (I had assumed they were). As running 1 core at 0.8 throughput of the other is also an power/space efficiancy loss unless the slave core is designed to handle the lower maximum workload that can be sent its way.

I'd have thought that whether it runs a 2 thread app as master/master or master/slave would surely be user configurable by a power/perfomance option or at the least by the OS dependant on workload. There will be circumstances in which the 1 thread per module is going to prefered by some.

i.e 4 threads = 4 as opposed to 4 threads = 3.6 (to use my previous posts relative units)

subbytna said:
For a min there I read that as overclocking ability....6.4 nice

You wish

...although you never know!

Martini1991 · 26 Sep 2011 at 12:24

I don't know if it'd equate to 0.8, I was just giving a rough figure, it could be 0.6 for all I know, I'm just giving an example of whether it'll work as master/master or master/slave.

Regardless of whether or not it works like that in the real world, as I think it is two identical parts, the end performance is still the same.

I was asking, would it do 2 threads on a module first, or a thread from 2 modules? Power efficiency would denote 2 threads on 1 module.

muon · 26 Sep 2011 at 12:26

eddyr said:
So bored on monday morning, have done some back of the napkin calculations. With a module running 2 threads each at 80% throughput of a module running just 1 thread, relative performance should look something like:

At full load:

1 thread , 1 module = 1
4 threads, 1 per module = 4
5 threads, 1 mod 2x0.8 = 4.6
6 threads, 2 mod 2x0.8 = 5.2
7 threads, 3 mod 2x0.8 = 5.8
8 threads, 4 mod 2x0.8 = 6.4

Obviously different work types and loads wills stress the shared resources differently, so perhaps 8 less intesive threads may perform faster per core, but as a general thumb rule.

If its 80% scaling i.e. 1.8x, then its 0.9 per core averaged. 0.8 when you calculated at the margin.

So:

1 thread , 1 module = 1
4 threads, 1 per module = 4
5 threads, 4 x 1 + 1 x 0.8 = 4.8
6 threads, 4 x 1 + 2 x 0.8 = 5.6
7 threads, 4 x 1 + 3 x 0.8 = 6.4
8 threads, 4 x 1 + 4 x 0.8 = 7.2

or

1 thread , 1 module = 1
4 threads, 1 per module = 4
5 threads, 1 mod 2x0.9 = 4.8
6 threads, 2 mod 2x0.9 = 5.6
7 threads, 3 mod 2x0.9 = 5.8
8 threads, 4 mod 2x0.9 = 7.2

edit:

I should have read newer posts, beaten by Martini.

Martini1991 · 26 Sep 2011 at 12:27

Yeh, that's where I got 0.8 from ; 80%.
But I don't know that for definite.

For all we know, it could be

1 thread = 1
2 threads = 1.8
3 threads = 2.8
4 threads = 3.6
5 threads = 4.6
6 threads = 5.4
7 threads = 6.4
8 threads = 7.2

A single BD core could be 1.1 when compared to a Phenom II, but the single thread on the module at 0.8 = BD module at 0.95 of a Phenom II X2.

opethdisciple · 26 Sep 2011 at 13:25

It would be verry disappointing if BD just turns out to be an 8core Phenom 2.

The fact there will be 6 and 4 core versions, means to me there HAS to be an IPC increase otherwise the 4 and 6 core versions will be identical in performance to the current 4 and 6 core Phenom 2's.

Martini1991 · 26 Sep 2011 at 13:29

opethdisciple said:
It would be verry disappointing if BD just turns out to be an 8core Phenom 2.

The fact there will be 6 and 4 core versions, means to me there HAS to be an IPC increase otherwise the 4 and 6 core versions will be identical in performance to the current 4 and 6 core Phenom 2's.

No it doesn't.
Could actually be worse if you read the posts.
A BD module in a single thread could be 10% faster than a Phenom II core, but in a second thread 10% slower. Making a module 0.95 of a Phenom II X2.

It could have course be 20% faster in single thread and the same in two threaded app's as a Phenom II X2.
It could have course be 30% and 1.1 of a Phenom II X2.

eddyr · 26 Sep 2011 at 13:33

Martini1991 said:
Regardless of whether or not it works like that in the real world, as I think it is two identical parts, the end performance is still the same
I was asking, would it do 2 threads on a module first, or a thread from 2 modules? Power efficiency would denote 2 threads on 1 module.

It would allow each core to achieve 100% throughput when the other requires less resources. Both cores being equal does make sense now I think on it, for both simplicity as well as adaptability.

To me the master/master or master/slave optoin should be left to the end user to define how they would like to run. Under different circumstances ther will be benefits to both. It certainly adds another layer of managment.

ghost101 said:
If its 80% scaling i.e. 1.8x, then its 0.9 per core averaged. 0.8 when you calculated at the margin.
So:

I had it fixed in my head as each of a modules cores together ran@ 80% of 1 cores throughput per module. Although I cannot remember specifically why. That's where the notion of 1 thread per module came from as the hit from 2 concurrent threads seemed pretty steep. As a scaling reference it does change things.

Bah, this is making me itchy for more details. I'd told myself I would keep from speculating for that very reason, I've given myself more questions...

pooey · 26 Sep 2011 at 13:40

opethdisciple said:
HAS to be an IPC increase

Not really, if the architecture allows the chips to run consistently faster within the TDP limits then it's still a better chip.

Martini1991 · 26 Sep 2011 at 13:43

Technically the BD "cores" could each be 10% faster than a Phenom II Core clock for clock, but due to the module "bottleneck" when both cores on the module are used, perform the same in two threaded app's.

opethdisciple · 26 Sep 2011 at 14:38

That would be very disappointing if true.

After all the the hype it just turns out to be basically just an 8 core Phenom 2. ....

:confused:

DragonQ · 26 Sep 2011 at 14:51

It will almost certainly clock higher but you're right about the disappointment.

Gashman · 26 Sep 2011 at 15:04

Bulldozer is a streamlined design, that is the whole point of the exercise, their dudes came to the conclusions that a lot of the time the current design has resources sitting around idle and doing nothing, therefore wasting space, power and heat. so in Bulldozer everything is leaner, more compact and theoretically more efficient, each core has less resources than the standard cores they use today but run at a substantially higher clock speed with more aggressive pre-fetching and such, so would expect them to be as fast or faster most of the time, in most situations.

would only expect each 'core' to fall behind in situations that use all the resources of conventional designs, the biggest factor with Bulldozer is by redesigning and streamlining they can fit two integer cores and one shared floating point core in each module, which only takes up about 10% more space than a standard design, but has more resources than a standard design and is much more flexible than the previous generations. each core has fewer arithmetic units but since you essentially get two of these 'cores' on in a similar die space to the previous design with a better, more flexible floating point unit your going to see some benefits, especially when in a multi-threaded environment.

so for 10% more die space (for each core) you get 50% more resources (Phenom II has 3 ALU, 3 AGU, Bulldozer has 2 ALU and 2 AGU in each 'core', so at a module level it has 4 ALU and 4 AGU) at a core level, a much higher clock speed whilst still running within the same sort of thermal envelope as the previous cores, not to mention the cores are using more efficient use of these 50% more resources and wasting less time doing nothing, I think Bulldozer is an awesome piece of micro-architecture design and should totally do what they intend it to do. stop comparing it on a core vs. core level, that has never ever been the intention of the exercise, you compare one Bulldozer module to one Intel core/one Phenom II core or else your forgetting it was designed from the ground up to work in that exact sort of competition/environment. so vs. Phenom II is has more resources, can execute double the amount of threads and has a better instruction set running on a smaller process at significantly higher clock speeds whilst using equal/less power, how can that not be considered a success though I am sure the Intel camp will try think of something? :confused:

Edit: how can it be disappointing if we get eight higher clocking cores on a marginally bigger piece of silicon than current four core Phenom II, double the cores for like 10% more space clocking higher without using more energy...?

Martini1991 · 26 Sep 2011 at 15:22

Stop comparing a BD core to a Phenom II core or an Intel core? A module is still two threads, two Phenom II cores = 2 threads, two SB 2500k cores = 2 threads.

Even that aside, a BD module compared to a 2600k core (Both execute 2 threads)
In single threaded app's, the 2600k is bound to beat it, and in two threaded apps the gap becomes a lot closer, possibly better than the 2600k's 1 core, although the slides show that isn't the case, and that it's slower.

Final8y · 26 Sep 2011 at 15:42

Skidder said:
Welcome to a very small and elite club, my ignore list.

randal24 said:
I'm following Skidder's lead, for the 2nd time in 5 years I'm adding someone to my block list.

EDITED Quotes.

For the sake of not polluting the thread any more, i will not post the long responses i had to the points they both had just made.

All im going to say that they should have just said that i'm on ignore so to not perpetuate the situation which they have just done, as i thought it was over last night.

Anyway congratulations to both of them in joining the bury my head in the sand club.

Final8y · 26 Sep 2011 at 15:48

opethdisciple said:
It would be verry disappointing if BD just turns out to be an 8core Phenom 2.

The fact there will be 6 and 4 core versions, means to me there HAS to be an IPC increase otherwise the 4 and 6 core versions will be identical in performance to the current 4 and 6 core Phenom 2's.

That is a worry that i had thought about myself.

I would like to see a 12 core BD personally.

seandog · 26 Sep 2011 at 15:50

Jeez Final8ty just drop it for the love of jeebus, we get it ur some kind of moral nut and to be blunt morals dont mean squat on 'teh internetz'

Final8y · 26 Sep 2011 at 16:00

seandog said:
Jeez Final8ty just drop it for the love of jeebus, we get it ur some kind of moral nut and to be blunt morals dont mean squat on 'teh internetz'

It don't mean squat anywhere these days, my morals and principles are for my self, people don't have to try to be like me, they can drink and smoke in my place even though i don't drink or smoke, people can do what they want, but just don't ask me to drink or smoke myself and on particular things that are detrimental to others unless its necessary and again I'm not religious as i don't need it or believe in it.

And it was dropped many times already, but someone has to post yet again.

Skidder · 26 Sep 2011 at 16:01

seandog said:
Jeez Final8ty just drop it for the love of jeebus, we get it ur some kind of moral nut and to be blunt morals dont mean squat on 'teh internetz'

there was never any question of morality, just deluded ranting