• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

AMD Bulldozer Finally!

So bored on monday morning, have done some back of the napkin calculations. With a module running 2 threads each at 80% throughput of a module running just 1 thread, relative performance should look something like:

At full load:

1 thread , 1 module = 1
4 threads, 1 per module = 4
5 threads, 1 mod 2x0.8 = 4.6
6 threads, 2 mod 2x0.8 = 5.2
7 threads, 3 mod 2x0.8 = 5.8
8 threads, 4 mod 2x0.8 = 6.4

Obviously different work types and loads wills stress the shared resources differently, so perhaps 8 less intesive threads may perform faster per core, but as a general thumb rule.


Thanks for that.
However. AMD's modules work on effiency right?
Ok.

Lets change the terminology a little bit.

Module = Master and Slave. The master = 1, the slave = 0.8

2 Threaded app, does it do Master/Master or Master/Slave.

Since "cores" can't be turned on by themselves, and it requires the module, for power reasons it makes more sense to go Master/Slave.

But that essentially means, a module could have the performance of a Callisto (Phenom II X2) IPC wise.
But as a single thread on the module would run on Master, it'd be lynfield (I'm giving realistic figures, could be better, could be worse)
 
Thanks for that.
However. AMD's modules work on effiency right?
Ok.

Lets change the terminology a little bit.

Module = Master and Slave. The master = 1, the slave = 0.8
2 Threaded app, does it do Master/Master or Master/Slave.
Since "cores" can't be turned on by themselves, and it requires the module, for power reasons it makes more sense to go Master/Slave.

But that essentially means, a module could have the performance of a Callisto (Phenom II X2) IPC wise.
But as a single thread on the module would run on Master, it'd be lynfield (I'm giving realistic figures, could be better, could be worse)

Performance wise that is far more appealing, giving a max of 8 threads =7.2

So are both cores (minus shared resources) within a module not identical? (I had assumed they were). As running 1 core at 0.8 throughput of the other is also an power/space efficiancy loss unless the slave core is designed to handle the lower maximum workload that can be sent its way.


I'd have thought that whether it runs a 2 thread app as master/master or master/slave would surely be user configurable by a power/perfomance option or at the least by the OS dependant on workload. There will be circumstances in which the 1 thread per module is going to prefered by some.

i.e 4 threads = 4 as opposed to 4 threads = 3.6 (to use my previous posts relative units)


For a min there I read that as overclocking ability....6.4 nice :P

You wish :p

...although you never know!
 
I don't know if it'd equate to 0.8, I was just giving a rough figure, it could be 0.6 for all I know, I'm just giving an example of whether it'll work as master/master or master/slave.

Regardless of whether or not it works like that in the real world, as I think it is two identical parts, the end performance is still the same.

I was asking, would it do 2 threads on a module first, or a thread from 2 modules? Power efficiency would denote 2 threads on 1 module.
 
So bored on monday morning, have done some back of the napkin calculations. With a module running 2 threads each at 80% throughput of a module running just 1 thread, relative performance should look something like:

At full load:

1 thread , 1 module = 1
4 threads, 1 per module = 4
5 threads, 1 mod 2x0.8 = 4.6
6 threads, 2 mod 2x0.8 = 5.2
7 threads, 3 mod 2x0.8 = 5.8
8 threads, 4 mod 2x0.8 = 6.4

Obviously different work types and loads wills stress the shared resources differently, so perhaps 8 less intesive threads may perform faster per core, but as a general thumb rule.

If its 80% scaling i.e. 1.8x, then its 0.9 per core averaged. 0.8 when you calculated at the margin.

So:

1 thread , 1 module = 1
4 threads, 1 per module = 4
5 threads, 4 x 1 + 1 x 0.8 = 4.8
6 threads, 4 x 1 + 2 x 0.8 = 5.6
7 threads, 4 x 1 + 3 x 0.8 = 6.4
8 threads, 4 x 1 + 4 x 0.8 = 7.2

or

1 thread , 1 module = 1
4 threads, 1 per module = 4
5 threads, 1 mod 2x0.9 = 4.8
6 threads, 2 mod 2x0.9 = 5.6
7 threads, 3 mod 2x0.9 = 5.8
8 threads, 4 mod 2x0.9 = 7.2

edit:

I should have read newer posts, beaten by Martini.
 
Last edited:
Yeh, that's where I got 0.8 from ; 80%.
But I don't know that for definite.


For all we know, it could be

1 thread = 1
2 threads = 1.8
3 threads = 2.8
4 threads = 3.6
5 threads = 4.6
6 threads = 5.4
7 threads = 6.4
8 threads = 7.2

A single BD core could be 1.1 when compared to a Phenom II, but the single thread on the module at 0.8 = BD module at 0.95 of a Phenom II X2.
 
Last edited:
It would be verry disappointing if BD just turns out to be an 8core Phenom 2.

The fact there will be 6 and 4 core versions, means to me there HAS to be an IPC increase otherwise the 4 and 6 core versions will be identical in performance to the current 4 and 6 core Phenom 2's.
 
It would be verry disappointing if BD just turns out to be an 8core Phenom 2.

The fact there will be 6 and 4 core versions, means to me there HAS to be an IPC increase otherwise the 4 and 6 core versions will be identical in performance to the current 4 and 6 core Phenom 2's.

No it doesn't.
Could actually be worse if you read the posts.
A BD module in a single thread could be 10% faster than a Phenom II core, but in a second thread 10% slower. Making a module 0.95 of a Phenom II X2.

It could have course be 20% faster in single thread and the same in two threaded app's as a Phenom II X2.
It could have course be 30% and 1.1 of a Phenom II X2.
 
Regardless of whether or not it works like that in the real world, as I think it is two identical parts, the end performance is still the same
I was asking, would it do 2 threads on a module first, or a thread from 2 modules? Power efficiency would denote 2 threads on 1 module.

It would allow each core to achieve 100% throughput when the other requires less resources. Both cores being equal does make sense now I think on it, for both simplicity as well as adaptability.

To me the master/master or master/slave optoin should be left to the end user to define how they would like to run. Under different circumstances ther will be benefits to both. It certainly adds another layer of managment.


If its 80% scaling i.e. 1.8x, then its 0.9 per core averaged. 0.8 when you calculated at the margin.
So:

I had it fixed in my head as each of a modules cores together ran@ 80% of 1 cores throughput per module. Although I cannot remember specifically why. That's where the notion of 1 thread per module came from as the hit from 2 concurrent threads seemed pretty steep. As a scaling reference it does change things.


Bah, this is making me itchy for more details. I'd told myself I would keep from speculating for that very reason, I've given myself more questions...
 
Technically the BD "cores" could each be 10% faster than a Phenom II Core clock for clock, but due to the module "bottleneck" when both cores on the module are used, perform the same in two threaded app's.
 
Bulldozer is a streamlined design, that is the whole point of the exercise, their dudes came to the conclusions that a lot of the time the current design has resources sitting around idle and doing nothing, therefore wasting space, power and heat. so in Bulldozer everything is leaner, more compact and theoretically more efficient, each core has less resources than the standard cores they use today but run at a substantially higher clock speed with more aggressive pre-fetching and such, so would expect them to be as fast or faster most of the time, in most situations.

would only expect each 'core' to fall behind in situations that use all the resources of conventional designs, the biggest factor with Bulldozer is by redesigning and streamlining they can fit two integer cores and one shared floating point core in each module, which only takes up about 10% more space than a standard design, but has more resources than a standard design and is much more flexible than the previous generations. each core has fewer arithmetic units but since you essentially get two of these 'cores' on in a similar die space to the previous design with a better, more flexible floating point unit your going to see some benefits, especially when in a multi-threaded environment.

so for 10% more die space (for each core) you get 50% more resources (Phenom II has 3 ALU, 3 AGU, Bulldozer has 2 ALU and 2 AGU in each 'core', so at a module level it has 4 ALU and 4 AGU) at a core level, a much higher clock speed whilst still running within the same sort of thermal envelope as the previous cores, not to mention the cores are using more efficient use of these 50% more resources and wasting less time doing nothing, I think Bulldozer is an awesome piece of micro-architecture design and should totally do what they intend it to do. stop comparing it on a core vs. core level, that has never ever been the intention of the exercise, you compare one Bulldozer module to one Intel core/one Phenom II core or else your forgetting it was designed from the ground up to work in that exact sort of competition/environment. so vs. Phenom II is has more resources, can execute double the amount of threads and has a better instruction set running on a smaller process at significantly higher clock speeds whilst using equal/less power, how can that not be considered a success though I am sure the Intel camp will try think of something? :confused:

Edit: how can it be disappointing if we get eight higher clocking cores on a marginally bigger piece of silicon than current four core Phenom II, double the cores for like 10% more space clocking higher without using more energy...?
 
Last edited:
Stop comparing a BD core to a Phenom II core or an Intel core? A module is still two threads, two Phenom II cores = 2 threads, two SB 2500k cores = 2 threads.

Even that aside, a BD module compared to a 2600k core (Both execute 2 threads)
In single threaded app's, the 2600k is bound to beat it, and in two threaded apps the gap becomes a lot closer, possibly better than the 2600k's 1 core, although the slides show that isn't the case, and that it's slower.
 
Welcome to a very small and elite club, my ignore list.

I'm following Skidder's lead, for the 2nd time in 5 years I'm adding someone to my block list.

EDITED Quotes.


For the sake of not polluting the thread any more, i will not post the long responses i had to the points they both had just made.

All im going to say that they should have just said that i'm on ignore so to not perpetuate the situation which they have just done, as i thought it was over last night.

Anyway congratulations to both of them in joining the bury my head in the sand club.
 
Last edited:
It would be verry disappointing if BD just turns out to be an 8core Phenom 2.

The fact there will be 6 and 4 core versions, means to me there HAS to be an IPC increase otherwise the 4 and 6 core versions will be identical in performance to the current 4 and 6 core Phenom 2's.


That is a worry that i had thought about myself.

I would like to see a 12 core BD personally.
 
Jeez Final8ty just drop it for the love of jeebus, we get it ur some kind of moral nut and to be blunt morals dont mean squat on 'teh internetz'
 
Jeez Final8ty just drop it for the love of jeebus, we get it ur some kind of moral nut and to be blunt morals dont mean squat on 'teh internetz'

It don't mean squat anywhere these days, my morals and principles are for my self, people don't have to try to be like me, they can drink and smoke in my place even though i don't drink or smoke, people can do what they want, but just don't ask me to drink or smoke myself and on particular things that are detrimental to others unless its necessary and again I'm not religious as i don't need it or believe in it.

And it was dropped many times already, but someone has to post yet again.
 
Last edited:
Back
Top Bottom