AMD Multicore CPUs - whats the crack?

FatRakoon · 19 May 2013 at 10:28

I like the Workers idea. Seems to help me understand things a bit better in the way they may work.

I do wonder why AMD seems to have made REAL 6 core CPUs but then cut such corners with their 8 cores? Maybe they were putting too much faith into future coders?

Zuban · 19 May 2013 at 10:41

The idea stems from heterogenerous computing where the gpu would perform these kind of floating point functions, unfortunately we don't live in this world yet, we are really only at the start of it, how relevant bulldozer and piledriver are to this depends on how fast software changes. this architecture in trinity is more relevant here if you look at the performance of some opencl enabled applications vs much more expensive cpu's, it suddenly makes sense.

Tuvoc · 19 May 2013 at 11:10

FatRakoon said:
I do wonder why AMD seems to have made REAL 6 core CPUs but then cut such corners with their 8 cores? Maybe they were putting too much faith into future coders?

Yes, I remember when AMD were taking the high ground, when they had true quad cores and were criticising the Intel Core Quad architecture (QX6700, Q6600), claiming that there were not true quad cores, but just two dual core CPUs stuck together on one die. How times have changed.

I have two Phenom II X6 CPUs (1090T, 1100T) and I bought them both AFTER Bulldozer had been released. I looked at the Bulldozer architecture and performance, and said "no thanks AMD". I recently tried Piledriver, and my conclusions are the same. 6 true cores is better than the 4 modules that they call 8 cores. I've clocked my X6s up to 3.7Ghz on stock volts, and they are superb. All I'm missing is some of the new instruction sets.

drunkenmaster · 19 May 2013 at 11:42

There is no times changing, modules aren't two dies stuck together, and sticking two dies together these days is significantly less "brute force" method than previously both AMD and Intel do it, 16 core AMD server chips are 2x 8 core dies. When Intel did it years ago, they did it without a dedicated interconnect, nor bus, nor real speed, the communication and latency hit from one core talking to the other or a thread moving across it was HUGE because it really was just two dies stuck together with a basic connection, a HT connection compared to the first way of sticking dies together, was ridiculously better, and what Intel/AMD do now to connect dies makes the old school method look very antiquated. They are also full 8 cores no matter what people say, the shared scheduler is nothing more than a limitation of die size on 32nm. There are sacrifices to be made, Intel have also taken up the module design with Atom in their most recent chip design, and WILL do so in the future of desktop.

Its very simple, take a gpu as an example, you start off with 2 pipelines(equate them to shaders though they aren't really), then 4, 8, 16, 32, 64, 128, 256(well 240), 512, etc, etc, etc.

Now equate that to any work place. When you have 4 workers, one boss and no inbetween is fine, communication is quick and simple, 4 people can all fit in the boss's office to hear instructions then they can get on with their work. When you reach 512 shaders or workers, imagine them all being instructed individually by one boss... its simple inefficient, by the time you've told person 512 what he should be doing, the people 1 through 500 are all finished and waiting for something else to do.

SO you have managers(clusters) and you subdivide the work, and at various stages you keep subdividing work to keep communication balanced between efficient and not too many subdivisions.

With AMD we had clusters, then the clusters themselves got split into two separate halfs because there were too many clusters, then we got geometry engines doubled, and doubled again for the next generation.

Modules is simply that, its utterly inefficient to contact 8 different cores at the kinds of power, bandwidth and transistor count that entails, at some stage you have to say, this data path will be doubled, but have 2 cores at either end, because transistors aren't just used for the data, its like a motor way, to make a road you need a pavement, power lines go along them, banks get built to absorb noise, lighting network is laid down, phone cables, emergency stuff, hard shoulder. That is all the same amount of work if there are 4 lanes, or 1 lane, its more efficient to have less fatter communication lines in a cpu, than many smaller ones.

Modules WILL have for Intel, and AMD, and in the future it will happen more. Suddenly a module will be 2 cores, but there will be a cluster, with 4 modules in, and 4 clusters, thats how we'll get to more cores, and beyond that, we'll get 2 cores, in 4 modules, in 4 clusters, which are in two compute units. This is how chips have and always will work.

This is what cores are for one thing, cpu's started off with one integer pipe, then you have 3 integer pipes, then you had two cores for 2 sets of integer pipes rather than 6 int pipes in one core. Its all the same principle, and always has been.

With Bulldozer people ignored stuff said LONG before it was released, firstly Intel dominates the market and controls the most used compilers and is a distinct disadvantage for any new chip from AMD for a significant portion of time, likewise the first chip in any new architecture is usually pretty terrible. Memron/Yonah were pretty terrible compared to the first desktop Core architecture, which are pretty crap compared to now. Bulldozer HAS improved over time to better optimisation of software, OS and individual pieces.

The most crucial thing that people are ignoring, is simply that Bulldozer was designed to be an HSA compliant APU of the future, it integrates MANY idea's like reducing FPU power because ARM, AMD and Intel are all moving towards gpu offloading of FPU calculations, which already happens, is happening more and is gaining the software stack to push to optimise in the industry.

I said at the time, before and after release, a year before release. You can't plough what is likely in excess of a couple of billion into a design for an architecture and optimise it for the software available on the day of release, you optimise it for the software coming in the years afterwards.

Look up the HSA foundation, the sheer and absolute industry support, ARM is well on board, AMD is onboard, the first HSA chips are being launched this year, HSA might be a very big reason AMD got both the PS4 and Xbox(supposedly) win, which is already being predicted to basically increase AMD's quarterly revenue by around 25%, which is huge for one project and one win.

Bulldozer was not in any way a chip designed for 2010, at all, in any way, nor the software available in 2010, anyone with half an ounce of sense can see that spending a couple billion of cash you can barely afford on software of 2010 when the entire industry is moving towards accelerating as much as possible, reducing power and interchangable IP.

joeyjojo · 19 May 2013 at 12:16

Tuvoc · 19 May 2013 at 13:18

drunkenmaster said:
I said at the time, before and after release, a year before release. You can't plough what is likely in excess of a couple of billion into a design for an architecture and optimise it for the software available on the day of release, you optimise it for the software coming in the years afterwards.

Look up the HSA foundation, the sheer and absolute industry support, ARM is well on board, AMD is onboard, the first HSA chips are being launched this year, HSA might be a very big reason AMD got both the PS4 and Xbox(supposedly) win, which is already being predicted to basically increase AMD's quarterly revenue by around 25%, which is huge for one project and one win.

Bulldozer was not in any way a chip designed for 2010, at all, in any way, nor the software available in 2010, anyone with half an ounce of sense can see that spending a couple billion of cash you can barely afford on software of 2010 when the entire industry is moving towards accelerating as much as possible, reducing power and interchangable IP.

All well and good, so in a few years time we might see the benefit of this architecture. We still aren't seeing it in 2013 and that is 3 years on from 2010... So in the meantime you're mostly better off with Intel, or in my case the older AMD X6 design.

FatRakoon · 25 May 2013 at 01:46

Incredible.

Im back on the AMD for a bit. This time, I am toying with the 8120.

I grabbed a whole bunch of AVI files, and converted them with the daughters I7 ( Socket 1156 860 @ 2.93 ) to get a benchmark and it gave me 9minutes and 45 seconds with using all 8 cores... Ok 4 plus HT then... You kmow what I mean.

So, on the AMD, I have just gone into the BIOS and I have set the CPU cores to "PAIR UP". and I set ConvertXtoDVD up to be the same as it is on her PC and I see its only showing 4 cores.

So, this is clearly what pairing has done and it is I think pretty much an AMD version of HT surely?

Anyway, all fun, its just finished the convertion and its done it in 07:58 so a smidge quicker on the AMD. I will have a plook with those CPU settings and then hopefully tomorrow I will have a go with my own I7 and 8350 and see what they produce.

Only thing is, that I noticed that while encoding, the AMD CPU usage was hovering at 53% to 61% so why wasteing all that CPU power?

I also remembered that my daughters PC is folding too! - that will affect the results surely?

humbug · 25 May 2013 at 01:53

CPU cores "PAIR UP" ?

That is interesting, I would be interested to see what that does to single and multithreaded performance.

FatRakoon · 25 May 2013 at 02:09

I think so yes? - of course I am probably wrong, but...

There is a CPU CORE CONTROL option and you can have MANUAL, AUTO or DIABLE

If you set it to manual, then you get the 8 cores but they kind of pair up so you only see 4 of them.

I have just gone back into the BIOS and set it to give me the 8 again and this time that convertion on 8 did it in just 6:18 so a huge difference again.

However... And this is killing me, but when I had it at 4, the system booted up and accessed the disk instantly and it was snappy as hell... Now on 8 its slow again? - cannot figure that one out?

humbug · 25 May 2013 at 02:18

FatRakoon said:
I think so yes? - of course I am probably wrong, but...

There is a CPU CORE CONTROL option and you can have MANUAL, AUTO or DIABLE

If you set it to manual, then you get the 8 cores but they kind of pair up so you only see 4 of them.

I have just gone back into the BIOS and set it to give me the 8 again and this time that convertion on 8 did it in just 6:18 so a huge difference again.

However... And this is killing me, but when I had it at 4, the system booted up and accessed the disk instantly and it was snappy as hell... Now on 8 its slow again? - cannot figure that one out?

Do you have 8 and then 4 cores shown in Task Manager?

Just a wild guess... I wonder if what its doing is in 4 core mode its locking the modules into one big core giving you 4 big cores, and splitting into 2 for 8 smaller ones in 8 core mode..

Be that as it may, it may give better performance where the software only uses 4 or less cores, and in 8 core mode give better performance where the software uses 8 cores.

You know, I could spend a couple of days playing with that

Edit- 'I think' that is what its supposed to do while in operation, its supposed to decide, or the software is supposed to decide what's best and then set its self to that.

FatRakoon · 25 May 2013 at 02:57

Wow... I think Im in love with this thing now!

yes, it was showing 8 then 4 then I put it down to 2 for a laugh.

Its now up to 8 and set to MANUAL rather than AUTO.

in earlier trials when I tried to swap my I7 for the 8350 I found that while I was not expecting the AMD to beat the Intel, I simply wanted to swap, but this is more than fast enough for my needs.

I have just run 4 copies of ConvertXtoDVD and after setting the affinity to 2 cores each, I was able to run them all just fine and not have any issues like I did before, so the BIOS is setup great at the moment.

Some more piddling and then Im going to play with the 8350.

humbug · 25 May 2013 at 03:09

Dang it! now I want one to play with....

humbug · 25 May 2013 at 03:43

I have just realised something "Core control" is for locking and unlocking cores, in other words you can turn cores on and off.

I know with AMD its been around since the Phenom days, what's happening here is you can't actually turn off individual cores in a module, you can only turn off a whole module (2 Cores in that module)
So when you turn it to manual it locks the cores in the module together so it can turn the whole thing off, by leaving it set to manual, I wonder if its doing something to the CPU that's actually forcing it to work in the way its supposed to, rather than waiting for instructions from the software that its never going to get, its now set like that permanently. IE: its version of HT

It would be an interesting thing to test for, single threaded games like WoW and PS2, Arma II vs multithreaded games like BF3 and Crysis 3 in manual mode vs auto mode.

Same with software, like Handbreak in 8 thread mode vs iTunes..... again in manual mode vs auto mode.

FatRakoon · 25 May 2013 at 04:03

Get one.

PC Parts are getting so cheap these days.

Heck, I remember a while back, my favourite Mobo is the DS3 right. But I find out over the past few years that there is a few types.

I have for the Socket 775 is the 965P DS3. I also have a 935 and a 945 DS3, but at the time, I had 2 935 boards, and in my 965P, I have a Q9550 Intel.

Anyway, I wanted to try out an AMD since my last AMD at the time was the Opteron Socket 939 whatever and I offered out a Q6600 and the 935-DS3 for an AMD Mobo & CPU purely so I can say that I have a 9550 CPU in a DS3 Mobo in both Intel and AMD types.

Amazingly the 970 AM3 Mobo is another DS3.

Anyway, I keep buying this junk purely because I love messing about with PCs, and they are so cheap these days, its not THAT expensive to have a handful of spares handy.

Im just sad to have half a dozen half decent ones and another dozen semi decent, AND to have had my house eextended just to hold my PCs.

FatRakoon · 25 May 2013 at 04:09

Ah, missed your post.

Yes, I will compare some more... Im getting knackered now however... Im going to go to beddy-byes Im afraid.

I think I need to get both AMDs together and play with them to see the BIOSes and perhaps do a bit of clocking to bring them together and do direct head to head tests?

Phixsator · 25 May 2013 at 16:58

I don't know if I am just being stupid here, feel free to yell

, but ive tried setting my Core Control on my Asus Sabertooth FX990 r2.0 to manual while leaving 2-3,4-5, 6-7 active(I can only disable whole modules not cores individually) and I as of yet see no improvement in performance in singlethreaded apps and windows is still reporting 8 threads as I suspected it would. Am I missing something?

adolf hamster · 25 May 2013 at 19:11

humbug said:
It would be an interesting thing to test for, single threaded games like WoW and PS2, Arma II vs multithreaded games like BF3 and Crysis 3 in manual mode vs auto mode.

someone please do this, it would be interesting if you could then go for having the 4 cores but slightly more powerful for gaming, then switch back to all 8 when doing multi thread stuff.

*restarts computer to have a gander at this, I shall report my opinion in skyrim as played yesterday with as played in about 10 minutes time...

humbug · 25 May 2013 at 19:16

adolf hamster said:
someone please do this, it would be interesting if you could then go for having the 4 cores but slightly more powerful for gaming, then switch back to all 8 when doing multi thread stuff.

*restarts computer to have a gander at this, I shall report my opinion in skyrim as played yesterday with as played in about 10 minutes time...

Play the same part of the game the same way for about 2 minutes each and record your FPS with FRAPS http://www.fraps.com/download.php

The opening sequence (in horse cart) will do the trick as its consistent, don't look around, just let it run and stop FRAPS with the Hot-Key when you get to the wall, before you get out.

adolf hamster · 25 May 2013 at 19:27

humbug said:
Play the same part of the game the same way for about 2 minutes each and record your FPS with FRAPS http://www.fraps.com/download.php

The opening sequence (in horse cart) will do the trick as its consistent, don't look around, just let it run and stop FRAPS with the Hot-Key when you get to the wall, before you get out.

got fraps, but that particular section is a solid 60 [vsync]. I was planning on only doing impressions of it because in places it will drop below 60.

i'll see once I can figure out how it's done, I think I went with the wrong setting because I'm still seeing all 8 cores. once I've got it i'll disable vsync and see what I get.

graphics maxed or lowest? iirc isn't lowest better for seeing a cpu bottleneck?

adolf hamster · 25 May 2013 at 20:04

bugger

seems I can't find this setting for pair up, all i'm getting is the usual 'turn off modules' option. even flashed the bios [I was running a hideously out of date version] but no change [although ironically the bios screen looks and performs better]

edit: it's the board, the crosshair 5 does it, i'm on a sabertooth and realising one of the reasons the ch5 is more expensive.....

on a brighter note, found this:
http://www.xtremesystems.org/forums...ew-(4)-!exclusive!-Excuse-for-1-Threaded-Perf

seems that it is indeed better if you aren't going to be using all 8 cores.