• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

AMD Multicore CPUs - whats the crack?

Soldato
Joined
18 Oct 2002
Posts
10,475
Location
Behind you... Naked!
Can anyone actually explain to me what the crack is with AMD CPUs?

In particular, the 6 and 8 cores.

I mean, Im getting people telling me that they are not true cores, but similar to intels hyperthreading.

I have 2 hex cores and 2 octo core AMDs and this is possibly down to the motherboards, but in the 1090T, the 8120 and the 8350, I have in the BIOS, a setting that will let me assign a core to each CPU - WTFIT?

Anyway, I just thought Id ask.
 
AMD's piledriver cpu's are nothing like hyperthreading. Hyperthreading is software based that adds two/four virtual cores to the two/four phyiscal cores Windows can see and it allows the OS to assign more resources to an application or problem. It's like a division of labour fpr your cpu, while the phyical core is crunching away at one problem, theres a whole bunch of space in not being used for solve additional taks and thats were hyperthreading comes in. In reality hyperthreading gives you 30% more performance if the aoftware is coded to take advantage of the feature.

AMD's piledriver cpus cores are all hardware and true processing cores but are designed for for highly threaded software. You could almost argue that its more like a gpu then a cpu since its great for runing in parallel and not so hot when it comes to calculating in serial.
 
AMD's piledriver cpus cores are all hardware and true processing cores but are designed for for highly threaded software. You could almost argue that its more like a gpu then a cpu since its great for runing in parallel and not so hot when it comes to calculating in serial.
Not exactly true. Granted they are not virtual cores like HT and are hardware based, but they are still not exactly true cores either.

The Phenom II X6 were the ones that got 6 "real" cores.
 
Last edited:
Think of it this way, an octo core AMD processor has 8 physical cores. The 8 cores are "coupled" so that they work together meaning that effictively you have 8 cores working on 4 problems at once (2 cores per 1 problem)
 
Think of it this way, an octo core AMD processor has 8 physical cores. The 8 cores are "coupled" so that they work together meaning that effictively you have 8 cores working on 4 problems at once (2 cores per 1 problem)

Makes it sound like they've combined into one super core, if 1 module worked on 1 thread, it'd have incredible performance.

There's 8 cores, it can execute 8 software threads.
2 Cores per module, when the second core is used, the scaling isn't as good as a conventional true core, down to the shared resources.
Which is why programs do/should go core 1,3,5,7,2,4,6,8.
 
Last edited:
There are 2 cores per module but they share resources.
Most notable of these shared resources are the FPU units.

With only I FPU per module the number crunching performance is seriously constrained.
 
*edit - Dammit PCZ, right as I was typing :P*

Doesn't seem to have been mentioned yet - the confusion about AMD being like hyperthreading probably stems from a shared floating point unit between every 2 integer units. If your program is all integers, then it's a full 8 cores, but if you have more than 4 threads running floating point ops, you get bottlenecks.

I believe AMD used to refer to it as strong and weak cores; the strong one would be in possession of the FPU for that operation, the weak one would have to wait, or ideally execute a non floating point operation. That's the coupling Ollie1132 suggests above, and why as Martini1991 says, software runs better when it loads just one core of each module, rather than both cores of a single module :)

2nd slide has details: http://www.anandtech.com/show/6201/amd-details-its-3rd-gen-steamroller-architecture
 
They are supposed run one 256Bit thread to one module + one core, or two 128Bit threads to one module + 2 cores in high threaded tasks. The problem is often software and Windows its self does not understand that.

One Module shares one pool of L2 cache but are other wise 8 core 8 thread.
 
The Phenom II X6 were the ones that got 6 "real" cores.


Cool...

I did notice that when I upped my AMDx4 @ 3.2 to the AMDx6 @ 2.6 that it was able to handle more, but no faster. Some benchmarks put them pretty much together for the most part, so more cores but less speed.

When I then got a 3.2 Ghz HEX core, I found it to be a huge jump. I did also go from 4 to 8GB mind you

I think I will have another play comparing the 6 vs the 8 in terms of multiple stuff.


Blah Blah plus a link


I am reading the link and I came across this ... I found it pretty giggle worthy

"And then will come the ultimate AMD CPU, called Undertaker, and bury the company once and for all."

I certainly feel that there is justification in that.

----------

Ok guys, well cheers for this. Very much appreciated.

( Idiot mode is still on - I am typing this in notepad and looking for the POST REPLY button ???)
 
Makes it sound like they've combined into one super core, if 1 module worked on 1 thread, it'd have incredible performance.

There's 8 cores, it can execute 8 software threads.
2 Cores per module, when the second core is used, the scaling isn't as good as a conventional true core, down to the shared resources.
Which is why programs do/should go core 1,3,5,7,2,4,6,8.

Aha true I should have spent more time thinking how to word what I meant xD
 
Put in its simplest terms both 3770K & 8350 are 4 core/8 thread chips but the resources are structured differently.

If 100% is the maximum theoretical performance of the chip then the resources are (roughly) arranged as follows:

3700K:
20% - 5% - 20% - 5% - 20% - 5% - 20% - 5%

FX3850:
12.5% - 12.5% - 12.5% - 12.5% - 12.5% - 12.5% - 12.5% - 12.5%

Overall performance of 3770K is similar to 8350 when both are utilised 100% (encoding etc) so the percentages do roughly compare to the performance you will see (GPU bottlenecks aside), the trouble AMD FX has is too much software is coded for 4 or fewer threads where with Intel 80% of the total CPU performance is utilised, whereas on a 8350 only 50% is being utilised and the rest of the resources lay unused.

When 8 threaded software is more widely available AMD FX will probably perform more consistently on par with Intel by then newer and better processors will be available.
 
Last edited:
You can't really do 8 lots of 12.5% as the 1st core of the module will have higher performance until the second core of the module is used.
But apart from that, it's pretty spot on.
I'd be inclined to say more like 13%/12% per module :p
 
I have done a bit of comparisons with my Daughter I7 ( 860 @ 2.93 ) and the AMDx8 and hers is still infinitely more responsive and faster when running multiple tasks.

Again, with Both setup with a 60GB SSD as C: and a 1TB as D: ( Thats how all my LAN PCs are setup ) and with them both running GSKILL 2x4GB RipJaws and Windows 7Pro ( Ok, naughtily NOT yet activated on the AMD ) I used ConvertX to DVD on 4 AVI Files from my GhostHunting collection.

I set them both up to use 2 cores per convertion. 1-2 3-4 5-6 and 7-8
The AMD was utterly dead to respond to anything while it was converting. I tried to double click on My computer but it failed to open it up for easily 10 to 15 minutes.

The intel never slowed down a single bit.

I then burned the content on my phone onto a CD and then I played a bit of Quake 4... When I finished playing, the Intel had finished the burn, and all 4 convertions of the videos and yet the AMD was not even 20% though on any of the video convertions and it was still not responding although the M<y Computer window had finally opened.

Ouch.

I am going to experiment with the BIOS CPU settings later to see how they help ... or not )
 
Well, I did some comparisons of my 3 setups that are clock for clock the closest.

The AMD 1090T is 3.2Ghz and is a 6 core.
The 8120 is an 8 core but at 3.1
My daughters I7 is an 860 and thats 2.8 ( or 2.93 - it seems to be random )

But these are indeed the closest together and so I toyed about a little.

I do also have a 3.2Ghz Quad core AMD too, however, I am in a wheelchair and so 3 is bad enough to be sodding about with right now.

Anyway, I have found that if I force the BIOS on the 8 core to be 1-2 3-4 5-6 and 7-8 rather than AUTO, then it does handle basic multitaking a lot better. When using the setup as simply a straigh PC, its actually fairly nippy, however, unless I am sadly mistaken, the hexcore not only feels a hell of a lot quicker, but it handles multiple tasks a damned sight better too. Sure the hex is 3.2 over the octo being 3.1 but that should make absolutely no significant difference really.

I used Convert X to DVD v4 to convert a number of AVI files from my camera. The film in question, was some ghost hunting stuff, and I simply copied the same file over and over to each PC and the Hex core was perfectly able to handle 5 convertions, and let me play Dawn of War Soulstorm without much issues, ( I forced them to one core each ) but the 8 core started to fall over after 3 and by the time SoulStorm came up, It had been a good 10 minutes of me waiting.

The intel didnt seem to glitch at all, and I dont think that it was ANY different in loading up soulstorm to as if I didnt do any convertions at all.

However... Getting them to give all the cortes to a single task, the 8 core AMD really shone against the 6 core. Its still lost out to the intel however, but it did kill the AMDx6

Im annoyed with myself now, because I should have run more tests but the wife is moaning like hell at me, since my accident, I now have a bed an commode in the living room, plus 3 base units, and my laptops etc... So fair is fair, but over the next couple of weeks, I will be doing a whole series of tests properly one a one-by-one basis and that should be a laugh.

I have found some interesting things with AMDs in the last few days and learned a heck of a lot about them too I feel.

Thanks guys.



When convertin only
 
I tried an 8350, but I prefer my 1100T. Depends on your usage model, but the 6 "true" cores works better for me. I'm hoping that this machine will last several more years, the X6 is the best chip I've owned and great value for money. OK the Intel 6 cores are faster, but also more expensive.
 
Have you looked at wikipedia? https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)

Bulldozer ... features two 128-bit FMA-capable FPUs which can be combined into one 256-bit FPU. This design is accompanied by two integer clusters, each with 4 pipelines ... AMD's marketing service calls this design a "Module". A 16-threads processor design would feature eight of these "modules", but the operating system will recognize each "module" as two logical cores.

The "module", described as two logical cores, can be contrasted with a single Intel core with HyperThreading. The only difference between the two approaches is that Bulldozer provides dedicated schedulers and integer units for each thread, whereas in Intel's core all threads must compete for available execution resources.

It's a really tricky architecture to optimise for though which is why performance was initially, and is still, a bit lacking. Certainly the GNU compilers do a bad job of it in my experience.
 
Have you looked at wikipedia? https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)



It's a really tricky architecture to optimise for though which is why performance was initially, and is still, a bit lacking. Certainly the GNU compilers do a bad job of it in my experience.


They are supposed run one 256Bit thread to one module + one core, or two 128Bit threads to one module + 2 cores in high threaded tasks. The problem is often software and Windows its self does not understand that.

One Module shares one pool of L2 cache but are other wise 8 core 8 thread.

:D
 
Yours was definitely the best answer humbug (best of a bad lot mind :p)


Then i will try again.

Lets just take two modules, think of it as a core i3, but with a difference.

A Core i3 HT has 2 Integer compute units (cores) between 2 modules with 2 threads for every module.

A (quad core Piledriver) has 4 Integer compute units (cores) between 2 modules and a choice between one big thread or two normal ones for each module.

If we use a factory analogy and imagine that the Integer compute unit is the worker, and the threads are the conveyor belts.

On the Core i3 HT one worker is fed by one conveyor belt in a single threaded operation, in multi threaded operations both workers are fed with 4 conveyor belts.

On the FX-4350 two workers are fed with one large conveyor belt, while in multi threaded operations four workers are fed by four normal conveyor belts.

The idea with the Core i3 is it has 4 pseudo cores in multi threaded tasks, it is fed 4 sets of data streams (conveyor belts) and it works.

The idea with the FX-4350 is where there is only one data stream it combines the two threads (conveyor belts) into one big one, so the two workers (cores) work together on a faster / wider conveyor belt.
which 9 times out of 10 does not work (yet?) what happens is the software does not know to, or how to combine two conveyor belts and so the two workers get stuck with the one smaller belt.

Where there are multiple data streams they split up and each take their own conveyor belt.
 
Last edited:
Then i will try again.

Lets just take two modules, think of it as a core i3, but with a difference.

A Core i3 HT has 2 Integer compute units (cores) between 2 modules with 2 threads for every module.

A (quad core Piledriver) has 4 Integer compute units (cores) between 2 modules and a choice between one big thread or two normal ones for each module.

If we use a factory analogy and imagine that the Integer compute unit is the worker, and the threads are the conveyor belts.

On the Core i3 HT one worker is fed by one conveyor belt in a single threaded operation, in multi threaded operations both workers are fed with 4 conveyor belts.

On the FX-4350 two workers are fed with one large conveyor belt, while in multi threaded operations four workers are fed by four normal conveyor belts.

The idea with the Core i3 is it has 4 pseudo cores in multi threaded tasks, it is fed 4 sets of data streams (conveyor belts) and it works.

The idea with the FX-4350 is where there is only one data stream it combines the two threads (conveyor belts) into one big one, so the two workers (cores) work together on a faster / wider conveyor belt.
which 9 times out of 10 does not work (yet?) what happens is the software does not know to, or how to combine two conveyor belts and so the two workers get stuck with the one smaller belt.

Where there are multiple data streams they split up and each take their own conveyor belt.

interesting spin but it isn't thread combining at all which is what you make it sound like, its certain instructions that mean a bulldozer/piledriver core will combine its 128bit fpu with the other core in the same modules's 128fpu into a 256 bit fpu, this isn't a choice or thread combining just resource pooling, and at the expense of the other core and is a limitation built into the hardware. So in your analogy, the workers are still fed by two separate conveyor belts, its just for a particular job that comes in sometimes one worker needs to borrow the others tools to do it, and while this go's on the other worker cant continue.

when those same instructions are issued to an intel core, like a current i3, it doesn't need to borrow resources from the other core, as each core has all the resources required, infact for those same particular instructions the i3 has more capability than the 2 module/4 core amd.

your basically talking about how they handle 256 bit avx instructions and getting it confused with some sort of thread/core combining.
 
Last edited:
Back
Top Bottom