Soldato
- Joined
- 1 Mar 2010
- Posts
- 6,316
Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.
but we seen that FM2 has been delayed to q3-q4 2012. maybe even q1-q2 2013 if delayed more
Yeah, it's possible that the shared cache for each module wasn't enough to keep both cores fed. Or that the central scheduling needs more optimizing. But no doubt this is why AMD started using the word "modules" rather than "cores". It's not just technically accurate, it's a marketing ploy to get people to think of Bulldozer in comparison to Intel quad-cores rather than some theoretical future octa-core.
EDIT: Also remember every 2 cores share an FPU, that probably gets used a lot when encoding (not that I'm a programmer, I'm only presuming).
Phil Hughes @AMDphil on twitter, when asked about BD ship dates said:as no new info on the release date been leaked yet ?
Phil Hughes @AMDphil on twitter, when asked about BD ship dates said:
'we start shipping at the end of this month. look for availability in Q4.'
https://twitter.com/#!/AMDphil
They(AMD) are to busy trying to overclock Bender to get BD out the door.
![]()
More cache, to a certain degree will always be helpful, end of the day if the diesize is too big its not financially viable, without question Sandybridge and Bulldozer, should die size be able to be doubled, would they not have better cores, more cores, and more cache and perform much better.
As for the FPU, Phenom has 2x128bit fpu pipes in it, Bulldozer has 2x128bit fpu's that can run as a single 256bit fpu for one core per clock if it wants. The thing is most fpu instructions are 32bit, less 64bit, and 128/256bit instructions are basically non existant, in compute work more so, even then for instance all gpgpu stuff is only 64bit.
The problem is on a non avx old style fpu pipeline on a 128bit pipe you can only put through a single 32bit, or a single 64bit, or a single 128bit instruction. With the new AVX FPU, you can basically push instructions together so instead of putting 1x32bit instruction through, on a 256bit AVX instruction you can put through 8x32bit, 4x 64bit instructions wrapped in an avx instruction, in one clock, or you can do 2x 128bit avx instructions and each of them can do 4x32bit, or 2x64bit. So basically old FPU, best case is 2x128bit instructions, worst is 2x32bit instructions. The new core, the best case is 1x256 bit 2x128bit or 8x32bit instructions, its 4 times as fast basically in many many situations. Worst case is still 2x32bit instructions for now.
If/when AVX is widely adopted and intergrated into every piece of software, the fpu on die will essentially be way faster than old, but then this is a shift to fusion architecture anyway, this is pushing forward to on die gpu's, which will vastly increase FPU power anyway, at which point you want as much interger and as little FPU as possible.
As for the lack of speed when two instructions are in a module, that hasn't been proven in the slightest bit yet, the whole 80% of a normal module thing is talking about from a design point, nothing more or less. Putting a 256bit fpu and only 1 core together, with no modules would save barely any space, but the space it saves would allow a slightly fatter core to be a bit faster.
Thing is quad core thats 20% faster per core, or twice as many cores that are 20% slower than they could possibly achieve in roughly the same space. 2x80%=160%, or 100%, I know which I want for the same die size and same cost
Anyway, Handbrake from what I've read, the first and faster pass only uses three threads apparently, which is why a 2500k and 2600k are barely 3% apart in pass 1, due to clock speed, and pass 2 is what 30% faster on the 2600k. So if Bulldozer is beating it in pass one, thats actually pretty god damned huge is it not?
This is also why the X6 is rubbish in pass one, where its only using half its cores and they are WAy slower than a 2500k, but its ahead of a 2500k in pass 2 where it will use more than enough threads.
interesting post, thankyou. i think BD is going to make quite a splash
@drunkenmaster Barely understood anything you said mate, it got way too technical for me!But if I got the gist of it, you're saying that Handbrake is capped at 3 threads max? I actually tried to find out on their website, it just said "multithreaded", didn't specify a number. But if that's the case that's certainly a weird benchmark for AMD to choose to showcase an 8-core CPU!
Anyway, if you're right, then the BD chip they chose was probably using 3 modules with 1 core activated each, boosted to their "turbo" speed of 4.x GHz (whatever the exact number was, someone posted it a few pages back). Unless they disabled turbo and manually locked that chip at 3.2. If the latter is the case, then that's very positive news, means that (at encoding at least), BD has a higher IPC than SB. If it ran at "turbo" then it means it's got a lower IPC, but not by much, and since we've already seen the BD chips clock higher than SB it still means it'll have roughly equivalent performance up to 4 threads (and no doubt higher at >4).
Right I see. So the benchmark results that were posted were the times for a full 3-pass run?He was saying that pass 1 & 3 are 3 threaded and 2 is fully multi-threaded. Hence why Thuban beats a 2500k easily in pass 2, but gets thumped in 1 & 3.
Well, only insofar as the extensions are concerned (MMX, SSE etc), and Intel and AMD share most of them these days. For tasks that just use the standard x86+64 instruction set IPC is still a useful measure.IPC as it pertains to clock speed is totally irrelevant. They're 2 COMPLETELY different architectures.
That's very true, though more in the server market than for the likes of us. (Not that we're AMD's core market for these chips)Instructions per Watt is a much more useful yardstick. It's what's used as the best way of comparison with graphics cards.
Have to agree with drunkenmaster here. Its AVX performance that I'm really looking forward to!
Looks like the boys at XS have some more good news:
A)- Its due for release very soon!
B)- The boys who did the overclocking also got to see benchies and they are so impressed they'll be getting one when their released aswell! Now that says something!
and lastly for all us Crosshair IV users (best bit of news for me!) it looks like it will work on our boards- The exact quote was it would work but not optimally (that's all that counts for me as long as it lasts me through till FM2
)
awesome news - all we need now is official benches AND a release date!
Well if Movieman is buying then it cant be all that bad and he normally only buys Blue..
Right I see. So the benchmark results that were posted were the times for a full 3-pass run?
Well, only insofar as the extensions are concerned (MMX, SSE etc), and Intel and AMD share most of them these days. For tasks that just use the standard x86+64 instruction set IPC is still a useful measure.
That's very true, though more in the server market than for the likes of us. (Not that we're AMD's core market for these chips)
Well if Movieman is buying then it cant be all that bad and he normally only buys Blue..
i've said that all along.. it'll work in a am3 board but if the person wants the full performance/features from BD they'll need a am3+ board.and lastly for all us Crosshair IV users (best bit of news for me!) it looks like it will work on our boards- The exact quote was it would work but not optimally (that's all that counts for me as long as it lasts me through till FM2
)