AMD Bulldozer Finally!

DragonQ · 9 Aug 2011 at 19:06

drunkenmaster said:
Thats not how it works though, theres a reason hyperthreading works very well in many situations, because lots of single threaded applications won't use a 4 issue wide core very well and as pointed out, if one benchmark is just completely bandwidth limited, then you wouldn't actually be seeing "8 core" performance, but heavily limited 8 core performance, single core/thread performance is rarely heavily bandwidth limited these days. You can't just divide that performance by 8 and assume single threaded performance and divide the 2600k by 4, and decide it will be close to twice as fast.

Is this a joke? HyperThreading only adds 30-35% performance at the best of times (x264, 7-Zip, etc.). AMD stated long ago that two cores in a Bulldozer module should operate at 90% of the speed that two completely separate cores would. Also, 7-Zip scales very well with extra real cores, so you can do simple maths to estimate single-threaded performance.

So, once again, if those results were true, Zambezi would be abysmal at single-threaded applications (worse than Phenom II in fact). They are fake.

drunkenmaster said:
More important is, what single threaded applications NEED that much performance, Superpi, fine, what will 99% of people use on a daily basis, windows, games, encoding, decoding, hdd playback, internet, streaming video's, most of these things are easily multithreaded or simply use so little performance you can do it fine on a tablet.

From a personal point of view, I do calculations for my research that cannot be made multi-threaded (well portions of it can but the overhead makes it slower than running it in a single thread). Single-threaded performance is still important, even if we are steadily moving towards a multi-threaded future.

Martini1991 · 9 Aug 2011 at 19:19

Hyperthreading's effectiveness decreases with each generation as the cores are made more efficient.
More cores aren't the way forward at the moment, higher IPC is, until Software catches up, by the time it does, we'll have 8/12/16/24 fast cores.
Which is why the Thubans tank so hard against SB.

DragonQ · 9 Aug 2011 at 22:47

I still don't understand why HT was removed for the Core & Core 2 generations though. Seems like an obvious technology to keep around (until a complete architecture redesign comes along that eliminates its usefulness, like Bulldozer). Maybe it was to do with die size, excess heat or whatever.

drunkenmaster · 10 Aug 2011 at 00:03

DragonQ said:
Is this a joke? HyperThreading only adds 30-35% performance at the best of times (x264, 7-Zip, etc.). AMD stated long ago that two cores in a Bulldozer module should operate at 90% of the speed that two completely separate cores would. Also, 7-Zip scales very well with extra real cores, so you can do simple maths to estimate single-threaded performance.

So, once again, if those results were true, Zambezi would be abysmal at single-threaded applications (worse than Phenom II in fact). They are fake.

From a personal point of view, I do calculations for my research that cannot be made multi-threaded (well portions of it can but the overhead makes it slower than running it in a single thread). Single-threaded performance is still important, even if we are steadily moving towards a multi-threaded future.

Again you choose to ignore something, if that benchmark is heavily bandwidth limited, which could easily be the case as Sandybridge has significant advantages in cpu performance over the 990x, it has 2 extra cores, but in stuff like 7-zip I'd expect Sandy to be much closer, the 990x also has triple channel memory. You can't randomly ignore the possibility the Bulldozer is heavily bandwidth limited in that test, and not cpu limited.

You seem to also be missing the pretty obvious, look at the scores, 2500k which doesn't have HT, 14k, 2600k, with HT, 20k, thats almost 40% faster, with a 3% clock speed bump, so HT is adding 37% performance here alone.

A triple channel older architecture quad still beats it, and a hex core with tripple channel is 50% faster than that. Both the older quad and hex have HT, and they clearly need HT to get anywhere near their top whack performance. But it suggests to me that they are definately bandwidth limited(the 2600k and bulldozer would likely be aswell).

There will be people who need single threaded performance, but inability to forfill your needs, one person, while making a chip thats better for MOST people, is their goal, who designs an entire architecture for one small segment of users, few people, or if the segment are happy to pay insane amounts per chip(Itaniums).

If 85% of software now is either no where near low end cpu limits, or heavily multithreaded, then really what would be the point of great single threaded performance? Give up the 85% of the market, for the 15%, that would be shooting yourself in the foot.

Martini1991 said:
Hyperthreading's effectiveness decreases with each generation as the cores are made more efficient.
More cores aren't the way forward at the moment, higher IPC is, until Software catches up, by the time it does, we'll have 8/12/16/24 fast cores.
Which is why the Thubans tank so hard against SB.

Meh, not really, the efficiency of each generation really doesn't get around the basic fact that Core architecture is a very wide issue core, its applications ability to use 4 instructions per clock that is limited.

Thats why Superpi gives insane performance differences between AMD/Intel, but essentially no other software, even single threaded software, shows as big a difference. Superpi is for all intents and purposes very basic, and perfect for a 4 issue core, most software is simply too complex to code so perfectly. Its also bad code, other programs can do the same calculations now in a fraction of the time by being multithreaded.

Do we really want to use older, slower software just to highlight how useful single thread performance is, or use multithreaded software that is way way faster.

DragonQ said:
I still don't understand why HT was removed for the Core & Core 2 generations though. Seems like an obvious technology to keep around (until a complete architecture redesign comes along that eliminates its usefulness, like Bulldozer). Maybe it was to do with die size, excess heat or whatever.

It is in there, its on every single chip(just about) its just disabled arbitrarily to introduce various price points for people to buy at. Want virtualisation, pay more, want HT, pay more, etc, etc. Intel's pretty much always been that way.

Intel's core architecture is VERY good, I really can't see them dropping down from a 4 issue core for a LONG time so HT will likely work very well for them for a very long time.

DragonQ · 10 Aug 2011 at 10:18

drunkenmaster said:
Again you choose to ignore something, if that benchmark is heavily bandwidth limited, which could easily be the case as Sandybridge has significant advantages in cpu performance over the 990x, it has 2 extra cores, but in stuff like 7-zip I'd expect Sandy to be much closer, the 990x also has triple channel memory. You can't randomly ignore the possibility the Bulldozer is heavily bandwidth limited in that test, and not cpu limited.

OK that could be a factor in the 7-Zip benchmark but the i7-975 XE works out as ~7% faster clock-for-clock than the i7-2600K in that benchmark result. Not a massive amount considering there's 33% more RAM bandwidth. Also, this increase could be due to triple channel RAM but it could also be because they're using different RAM speeds or timings - we have no idea what these are! Regardless, even if Bulldozer is bandwidth limited in a test like this, that's still bad - especially considering it was AMD who introduced on-die RAM controllers in the first place!

drunkenmaster said:
You seem to also be missing the pretty obvious, look at the scores, 2500k which doesn't have HT, 14k, 2600k, with HT, 20k, thats almost 40% faster, with a 3% clock speed bump, so HT is adding 37% performance here alone.

Your maths here is wrong, you need to account for clock speed differences first. The 2500K is running at 3.4 GHz (due to turbo boost), the 2600K is running at 3.5 GHz. Adjusting the latter to 3.4 GHz gives it a score of 19635 (again, this assumes performance increases linearly with clock speed but this is generally a good estimate). 19635 is 33% faster than the 2500K's score of 14759. So, like I said, 30-35%. This also ignores the 2600K's larger L3 cache but I have no idea if that makes a difference in this sort of test.

drunkenmaster said:
There will be people who need single threaded performance, but inability to forfill your needs, one person, while making a chip thats better for MOST people, is their goal, who designs an entire architecture for one small segment of users, few people, or if the segment are happy to pay insane amounts per chip(Itaniums).

If 85% of software now is either no where near low end cpu limits, or heavily multithreaded, then really what would be the point of great single threaded performance? Give up the 85% of the market, for the 15%, that would be shooting yourself in the foot.

This is true, multi-threaded computing is the future we are moving towards (as I said before). However, according to these results Zambezi isn't even that good at multi-threaded stuff, let alone single-threaded! Depending on what the turbo core is on that FX chip, it could be slower than Sandy Bridge clock-for-clock even with 8 real cores vs 4 with HT! Giving up 100% of the market seems pretty stupid to me.

drunkenmaster said:
It is in there, its on every single chip(just about) its just disabled arbitrarily to introduce various price points for people to buy at. Want virtualisation, pay more, want HT, pay more, etc, etc. Intel's pretty much always been that way.

Wrong. No Core or Core 2 CPUs had HyperThreading. It was re-introduced with the Core i7 Nehalems.

CAT-THE-FIFTH · 10 Aug 2011 at 12:27

DragonQ said:
This is true, multi-threaded computing is the future we are moving towards (as I said before). However, according to these results Zambezi isn't even that good at multi-threaded stuff, let alone single-threaded! Depending on what the turbo core is on that FX chip, it could be slower than Sandy Bridge clock-for-clock even with 8 real cores vs 4 with HT! Giving up 100% of the market seems pretty stupid to me.

How can you believe the results from a chap who has been banned from many forums and has made up results?? He even admitted he made up a whole lot of Bulldozer results before.

On top of this you are wrong about Bulldozer multi-threaded performance. In HandBrake, a Phenom II X6 has very similar performance to a socket 1366 Core i7 at the same clockspeeds. A Core i7 2600K is barely 10% faster at the same clockspeeds. Even if 8 Bulldozer cores had the same IPC as a Phenom II cores it would obliterate a Core i7 2600K.

So according to your logic AMD will release something slower or a similar speed to its existing processors?? WTF?? Even the original Phenom had much faster cores than the Athlon 64 and the Phenom II had faster cores than the Phenom. In both cases there was a decent IPC increase too.

drunkenmaster · 10 Aug 2011 at 12:28

For benchmarking the given speed is usually the given speed, IE< disable Turbo so you can see the difference, which also means the bulldozer is operating significantly slower than it would with Turbo. You can't guess what clocks they are really running at, either its the stated speed, or you don't know, theres no middle "i'll assign anything I want here" number.

The i7 is faster but based on older chips, it has more bandwidth and is slower core for core. You literally can't know what the ratio is here, core for core Sandy is 30% faster but is 40% slower because of bandwidth limits, or its 10% faster and limited 20% by bandwidth, who knows.

The numbers from separate generations and different bandwidth clearly show "something" going on. A quad core Sandy is faster than a quad core anything else, often by a large degree, yet here its slower, how much slower is unknown, the limit is almost without question bandwidth.

Why is it okay for the 2600k £200 chip to be bandwidth limited but its bad for a Bulldozer £200 chip to be bandwidth limited? Answer, it isn't, some things will be bandwidth limited, some cpu, some thread, some cache, thats life. Its all about cost, making the fastest chip on earth that requires a £1400 mobo and costs £4000 per chip, would bankrupt them within a year.

Again you seem to be saying the 8 core chip isn't doing multithreading well because it can't beat a 2600k significantly and thats only 4 core. The cores are twice as wide.

heres the argument, they are BOTH 16 issue cores, and will cost a similar amount, neither should be much faster than the other.

Okay, so you said, but you worry about single threaded performance, which is a valid concern, though for 90% of the market, its unimportant. Ok, but then you're back to saying, but it has twice the cores, it should be much faster.......... it shouldn't.

Again you've got two circa £200 chips, if AMD beat the other one, in a bunch of area's, but not in single threaded stuff, I can't see the issue.

As for HT, I read the post as, not understanding why the whole range doesn't have HT top to bottom, my bad.

CAT-THE-FIFTH · 10 Aug 2011 at 12:33

I think some people maybe smoking something if they think that a 4 module Bulldozer will be barely faster than a Phenom II X6!! :rolleyes:

Mollari · 10 Aug 2011 at 12:35

CAT-THE-FIFTH said:
I think some people maybe smoking something if they think that a 4 module Bulldozer will be barely faster than a Phenom II X6!!

Lol so true well I guess when we see first actual REAL benchmarks it may be shocking for some

CAT-THE-FIFTH · 10 Aug 2011 at 12:37

Mollari said:
Lol so true well I guess when we see first actual REAL benchmarks it may be shocking for some

Agreed! People tend to forget that the Phenom II X6 is not that slow in multi-threaded applications.

Martini1991 · 10 Aug 2011 at 12:43

CAT-THE-FIFTH said:
I think some people maybe smoking something if they think that a 4 module Bulldozer will be barely faster than a Phenom II X6!!

Who the hell thinks that?

Gashman · 10 Aug 2011 at 13:01

CAT-THE-FIFTH said:
Agreed! People tend to forget that the Phenom II X6 is not that slow in multi-threaded applications.

find it hard to believe so many people consider Phenom II terrible period, there genuinely not that bad. want to experiment with one but me mate wouldn't let me use his, since I hear they are slowed down by their CPU-NB and L3 cache to an extent, whatever happened to AMD architecture being very latency sensitive, like with K8 where sometimes faster memory didn't equal better performance but rather memory with tighter timings, does that exist with Phenom or not? :confused:

CAT-THE-FIFTH · 10 Aug 2011 at 16:46

Gashman said:
find it hard to believe so many people consider Phenom II terrible period, there genuinely not that bad. want to experiment with one but me mate wouldn't let me use his, since I hear they are slowed down by their CPU-NB and L3 cache to an extent, whatever happened to AMD architecture being very latency sensitive, like with K8 where sometimes faster memory didn't equal better performance but rather memory with tighter timings, does that exist with Phenom or not?

The main weakness of the Phenom II is its performance in very lightly threaded applications.

However,in multi-threaded applications its is not too bad. I have been a long time Intel user myself due to my preference for SFF PCs but the performance of these so called "garbage" CPUs(not my words) is not too bad.

I like how people say how brilliant the Core i3 2100 is without ACTUALLY having one. It is a great CPU but it is NOT the second coming as it has weaknesses.

Here is a list I compiled on another forum to test HandBrake performance with various HD trailers. It is not complete and there is a typo or two but it gives you a rough indication of performance in a common application.

sarge78 · 10 Aug 2011 at 17:02

I've just been playing deadspace on a Llano A8 @ 1920*1200 + high settings.

(A fairly solid 30fps too, awesome!)

Adding another 400sp with the newer CPU architecture would be compelling.

bastic · 10 Aug 2011 at 17:19

sarge78 said:
I've just been playing deadspace on a Llano A8 @ 1920*1200 + high settings. (A fairly solid 30fps too, awesome!)

Adding another 400sp with the newer CPU architecture would be compelling.

without any extra GPU ??

CAT-THE-FIFTH · 10 Aug 2011 at 17:36

bastic said:
without any extra GPU ??

Dead Space 2 runs at 24 FPS at 1920X1080 on an HD4670 on very high quality settings:

http://gamegpu.ru/action-/-fps-/-tps/dead-space-2-test-gpu.html

It would surprise me if the first Dead Space game runs at around 30FPS with slightly lower settings.

koooowweeee · 10 Aug 2011 at 18:13

wow this is a long 30 days lol

if there any proper release date yet?

sarge78 · 10 Aug 2011 at 19:19

bastic said:
without any extra GPU ??

Yep, no extra GPU! Most 2-3 year old games should run fine at high settings. (thanks consoles!

)

CAT-THE-FIFTH said:
Dead Space 2 runs at 24 FPS at 1920X1080 on an HD4670 on very high quality settings.

It certainly feels similar to my dads E8400 + 4670 HTPC.

The only problem is this crappy bundled heatsink + fan. After 30 mins it started overheating and crashed, there's now artifacting on the desktop

Hopefully nothings broken once it cools down

Bulldozer will probably need significant cooling on a hot day if your using the integrated GPU

Mollari · 10 Aug 2011 at 19:32

koooowweeee said:
wow this is a long 30 days lol

if there any proper release date yet?

Several things point at 19th september.

I hope its this year.

mmj_uk · 10 Aug 2011 at 21:20

Trunks9486 said:
No idea if its true or not but seems to be genuine.

http://hardforum.com/showpost.php?p=1037482638&postcount=88

Sad if true.

Just been reading all of his posts some pretty juicy stuff, it's not looking good for AMD if he's telling the truth though. I was looking forward to Bulldozer but the way he's talking it'll not be anything revolutionary because all of the talented guys left.

Sure they were. Ever hear of K5? And Athlon was kind of crummy (though good enough to keep us in the game). After clawhammer/sledgehammer, they've done almost nothing other than slapping down more cores. The integrated memory controller and point-to-point bus, the x86-64 instruction set, etc. was all done before 2002.

Nonsense. You act like I'm not still plugged into what's going on there. That I don't go drinking with some of the current employees. That I don't know the people in charge VERY well having had years of experience working with them before (and while) they were in charge. And 2002, 2003, 2004, 2005, 2006 is a lot of years of screwups while I was still there. 2007, 2008, 2009 AMD continued on the path from the previous 5 years - they did not tape out anything interesting, and continued making spins on designs from early 2002.

I can't speak as to the ATI division, but, yes, the processor division is doomed. You don't have to believe me. Look at their track record for the last 8-10 years, and keep in mind that any new microarchitecture that is sold takes 2-3 years of design time. Then look at their 10K's, their desperation moves (selling the fab, Arab investors, etc.), the well-publicized defections (Fred Weber, etc.) and put it all together. Look at the benchmarks over time. Look at their stock over time. If you choose to write off my statements, there are plenty of objective facts out there for you.