• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

AMD Bulldozer Finally!

As I'm not thinking of changing cpu/board for some time this thread has not had my full attention ....

but am always interested in new tech. so :-
to begin with it was a regular call, however after a few days,(never mind the few months that have gone by,) it became clear that all the posts were pure (largely unfounded!!) speculation
and we still do not have a cpu which has been either released or tested.

This does not make interesting reading - 150 pages 4000+ posts about ....nothing (yet).

Only got one ? and that is - Is this the longest 'prequel' (without anything really concrete) thread ever ??
 
Yeah, it's possible that the shared cache for each module wasn't enough to keep both cores fed. Or that the central scheduling needs more optimizing. But no doubt this is why AMD started using the word "modules" rather than "cores". It's not just technically accurate, it's a marketing ploy to get people to think of Bulldozer in comparison to Intel quad-cores rather than some theoretical future octa-core.

EDIT: Also remember every 2 cores share an FPU, that probably gets used a lot when encoding (not that I'm a programmer, I'm only presuming).

More cache, to a certain degree will always be helpful, end of the day if the diesize is too big its not financially viable, without question Sandybridge and Bulldozer, should die size be able to be doubled, would they not have better cores, more cores, and more cache and perform much better.


As for the FPU, Phenom has 2x128bit fpu pipes in it, Bulldozer has 2x128bit fpu's that can run as a single 256bit fpu for one core per clock if it wants. The thing is most fpu instructions are 32bit, less 64bit, and 128/256bit instructions are basically non existant, in compute work more so, even then for instance all gpgpu stuff is only 64bit.

The problem is on a non avx old style fpu pipeline on a 128bit pipe you can only put through a single 32bit, or a single 64bit, or a single 128bit instruction. With the new AVX FPU, you can basically push instructions together so instead of putting 1x32bit instruction through, on a 256bit AVX instruction you can put through 8x32bit, 4x 64bit instructions wrapped in an avx instruction, in one clock, or you can do 2x 128bit avx instructions and each of them can do 4x32bit, or 2x64bit. So basically old FPU, best case is 2x128bit instructions, worst is 2x32bit instructions. The new core, the best case is 1x256 bit 2x128bit or 8x32bit instructions, its 4 times as fast basically in many many situations. Worst case is still 2x32bit instructions for now.

If/when AVX is widely adopted and intergrated into every piece of software, the fpu on die will essentially be way faster than old, but then this is a shift to fusion architecture anyway, this is pushing forward to on die gpu's, which will vastly increase FPU power anyway, at which point you want as much interger and as little FPU as possible.

As for the lack of speed when two instructions are in a module, that hasn't been proven in the slightest bit yet, the whole 80% of a normal module thing is talking about from a design point, nothing more or less. Putting a 256bit fpu and only 1 core together, with no modules would save barely any space, but the space it saves would allow a slightly fatter core to be a bit faster.

Thing is quad core thats 20% faster per core, or twice as many cores that are 20% slower than they could possibly achieve in roughly the same space. 2x80%=160%, or 100%, I know which I want for the same die size and same cost ;)



Anyway, Handbrake from what I've read, the first and faster pass only uses three threads apparently, which is why a 2500k and 2600k are barely 3% apart in pass 1, due to clock speed, and pass 2 is what 30% faster on the 2600k. So if Bulldozer is beating it in pass one, thats actually pretty god damned huge is it not?

This is also why the X6 is rubbish in pass one, where its only using half its cores and they are WAy slower than a 2500k, but its ahead of a 2500k in pass 2 where it will use more than enough threads.
 
as no new info on the release date been leaked yet ?
Phil Hughes @AMDphil on twitter, when asked about BD ship dates said:
'we start shipping at the end of this month. look for availability in Q4.'

https://twitter.com/#!/AMDphil

They(AMD) are to busy trying to overclock Bender to get BD out the door.

080e136945ecad984da24e6779336c46.jpg
 
More cache, to a certain degree will always be helpful, end of the day if the diesize is too big its not financially viable, without question Sandybridge and Bulldozer, should die size be able to be doubled, would they not have better cores, more cores, and more cache and perform much better.


As for the FPU, Phenom has 2x128bit fpu pipes in it, Bulldozer has 2x128bit fpu's that can run as a single 256bit fpu for one core per clock if it wants. The thing is most fpu instructions are 32bit, less 64bit, and 128/256bit instructions are basically non existant, in compute work more so, even then for instance all gpgpu stuff is only 64bit.

The problem is on a non avx old style fpu pipeline on a 128bit pipe you can only put through a single 32bit, or a single 64bit, or a single 128bit instruction. With the new AVX FPU, you can basically push instructions together so instead of putting 1x32bit instruction through, on a 256bit AVX instruction you can put through 8x32bit, 4x 64bit instructions wrapped in an avx instruction, in one clock, or you can do 2x 128bit avx instructions and each of them can do 4x32bit, or 2x64bit. So basically old FPU, best case is 2x128bit instructions, worst is 2x32bit instructions. The new core, the best case is 1x256 bit 2x128bit or 8x32bit instructions, its 4 times as fast basically in many many situations. Worst case is still 2x32bit instructions for now.

If/when AVX is widely adopted and intergrated into every piece of software, the fpu on die will essentially be way faster than old, but then this is a shift to fusion architecture anyway, this is pushing forward to on die gpu's, which will vastly increase FPU power anyway, at which point you want as much interger and as little FPU as possible.

As for the lack of speed when two instructions are in a module, that hasn't been proven in the slightest bit yet, the whole 80% of a normal module thing is talking about from a design point, nothing more or less. Putting a 256bit fpu and only 1 core together, with no modules would save barely any space, but the space it saves would allow a slightly fatter core to be a bit faster.

Thing is quad core thats 20% faster per core, or twice as many cores that are 20% slower than they could possibly achieve in roughly the same space. 2x80%=160%, or 100%, I know which I want for the same die size and same cost ;)



Anyway, Handbrake from what I've read, the first and faster pass only uses three threads apparently, which is why a 2500k and 2600k are barely 3% apart in pass 1, due to clock speed, and pass 2 is what 30% faster on the 2600k. So if Bulldozer is beating it in pass one, thats actually pretty god damned huge is it not?

This is also why the X6 is rubbish in pass one, where its only using half its cores and they are WAy slower than a 2500k, but its ahead of a 2500k in pass 2 where it will use more than enough threads.

interesting post, thankyou. i think BD is going to make quite a splash
 
@drunkenmaster Barely understood anything you said mate, it got way too technical for me! :p But if I got the gist of it, you're saying that Handbrake is capped at 3 threads max? I actually tried to find out on their website, it just said "multithreaded", didn't specify a number. But if that's the case that's certainly a weird benchmark for AMD to choose to showcase an 8-core CPU!

Anyway, if you're right, then the BD chip they chose was probably using 3 modules with 1 core activated each, boosted to their "turbo" speed of 4.x GHz (whatever the exact number was, someone posted it a few pages back). Unless they disabled turbo and manually locked that chip at 3.2. If the latter is the case, then that's very positive news, means that (at encoding at least), BD has a higher IPC than SB. If it ran at "turbo" then it means it's got a lower IPC, but not by much, and since we've already seen the BD chips clock higher than SB it still means it'll have roughly equivalent performance up to 4 threads (and no doubt higher at >4).
 
@drunkenmaster Barely understood anything you said mate, it got way too technical for me! :p But if I got the gist of it, you're saying that Handbrake is capped at 3 threads max? I actually tried to find out on their website, it just said "multithreaded", didn't specify a number. But if that's the case that's certainly a weird benchmark for AMD to choose to showcase an 8-core CPU!

Anyway, if you're right, then the BD chip they chose was probably using 3 modules with 1 core activated each, boosted to their "turbo" speed of 4.x GHz (whatever the exact number was, someone posted it a few pages back). Unless they disabled turbo and manually locked that chip at 3.2. If the latter is the case, then that's very positive news, means that (at encoding at least), BD has a higher IPC than SB. If it ran at "turbo" then it means it's got a lower IPC, but not by much, and since we've already seen the BD chips clock higher than SB it still means it'll have roughly equivalent performance up to 4 threads (and no doubt higher at >4).

He was saying that pass 1 & 3 are 3 threaded and 2 is fully multi-threaded. Hence why Thuban beats a 2500k easily in pass 2, but gets thumped in 1 & 3.

IPC as it pertains to clock speed is totally irrelevant. They're 2 COMPLETELY different architectures. It doesn't matter if Bulldozer takes more or less clock speed to achieve a faster result. The 95W 8120 looks like it's going to be comparable in price to a 2500k. AMD's TDP rating is much more realistic than Intel's, so if an 8120 beats a 2500k (what people are assuming the 2 chips are) in 3 threaded benches, that's absolutely huge. The 2500k would be using 75% of its cores and the 8120 would be using 37.5% of its cores. Instructions per Watt is a much more useful yardstick. It's what's used as the best way of comparison with graphics cards.

Head-room for overclocking looks like it's going to be very high on Zambezi anyway, so it's a total non-issue.
 
He was saying that pass 1 & 3 are 3 threaded and 2 is fully multi-threaded. Hence why Thuban beats a 2500k easily in pass 2, but gets thumped in 1 & 3.
Right I see. So the benchmark results that were posted were the times for a full 3-pass run?

IPC as it pertains to clock speed is totally irrelevant. They're 2 COMPLETELY different architectures.
Well, only insofar as the extensions are concerned (MMX, SSE etc), and Intel and AMD share most of them these days. For tasks that just use the standard x86+64 instruction set IPC is still a useful measure.

Instructions per Watt is a much more useful yardstick. It's what's used as the best way of comparison with graphics cards.
That's very true, though more in the server market than for the likes of us. (Not that we're AMD's core market for these chips)
 
Have to agree with drunkenmaster here. Its AVX performance that I'm really looking forward to! :)

Looks like the boys at XS have some more good news:
A)- Its due for release very soon!
B)- The boys who did the overclocking also got to see benchies and they are so impressed they'll be getting one when their released aswell! Now that says something!

and lastly for all us Crosshair IV users (best bit of news for me!) it looks like it will work on our boards :) - The exact quote was it would work but not optimally (that's all that counts for me as long as it lasts me through till FM2 :) )

awesome news - all we need now is official benches AND a release date!
 
I assume FM2 is not going to be pin-compatible with AM3+, correct? So if you bought an AM3+ Bulldozer you couldn't later buy an FM2 board and use the same CPU on it?
 
Have to agree with drunkenmaster here. Its AVX performance that I'm really looking forward to! :)

Looks like the boys at XS have some more good news:
A)- Its due for release very soon!
B)- The boys who did the overclocking also got to see benchies and they are so impressed they'll be getting one when their released aswell! Now that says something!

and lastly for all us Crosshair IV users (best bit of news for me!) it looks like it will work on our boards :) - The exact quote was it would work but not optimally (that's all that counts for me as long as it lasts me through till FM2 :) )

awesome news - all we need now is official benches AND a release date!

Well if Movieman is buying then it cant be all that bad and he normally only buys Blue..
 
Right I see. So the benchmark results that were posted were the times for a full 3-pass run?


Well, only insofar as the extensions are concerned (MMX, SSE etc), and Intel and AMD share most of them these days. For tasks that just use the standard x86+64 instruction set IPC is still a useful measure.


That's very true, though more in the server market than for the likes of us. (Not that we're AMD's core market for these chips)

I was talking about physical architecture, not instruction sets. Bulldozer is arguably the biggest change or departure in x86 architecture that there's ever been.
 
and lastly for all us Crosshair IV users (best bit of news for me!) it looks like it will work on our boards :) - The exact quote was it would work but not optimally (that's all that counts for me as long as it lasts me through till FM2 :) )
i've said that all along.. it'll work in a am3 board but if the person wants the full performance/features from BD they'll need a am3+ board.
 
Back
Top Bottom