• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

Intel Fires Shots At AMD For False Marketing Of Boost Clocks

Soldato
Joined
22 Dec 2008
Posts
10,370
Location
England
Or even better tell me how you calculate cpu instructions per cycle? Let me help to save you the keystrokes, you haven't, I spec servers, comms rooms networks etc etc and I also haven't.

Instructions per cycle on x64 is still four, no?
Last time I ran the numbers was for sandy bridge, limited to six ports, of which two load 8 bytes and one stores. I think newer ISAs are 16byte wide, slightly more ports, but still limited to retiring four per clock.

There's complications to calculating throughput sure, but it's inaccurate to claim it can't be, or isn't, done.
 
Man of Honour
Joined
30 Oct 2003
Posts
13,229
Location
Essex
Instructions per cycle on x64 is still four, no?
Last time I ran the numbers was for sandy bridge, limited to six ports, of which two load 8 bytes and one stores. I think newer ISAs are 16byte wide, slightly more ports, but still limited to retiring four per clock.

There's complications to calculating throughput sure, but it's inaccurate to claim it can't be, or isn't, done.

Helps if you quote the whole post so you don't quote me out of context...

So you agree then that relative performance is king? Tell me a single time when you have personally calculated theoretical peak performance in a real world situation? Or even better tell me how you calculate cpu instructions per cycle? Let me help to save you the keystrokes, you haven't, I spec servers, comms rooms networks etc etc and I also haven't.

I never claimed it couldn't be or isnt done. The bit you missed is where I asked for an example where he personally and specifically had calculated theoretical throughput figure which formed the basis of that argument and was of course ignored. In the consumer space, even in the most deep dive reviews you will not find people calculating this as for the market they are designed for it's fairly pointless and nobody cares.

I stand by that point, if consumers cared about some theoretical throughput figures you would see them scattered about in reviews for every chip but you dont and the reason for that is simple, relative performance is king and all that matters, which was exactly my point earlier.
 
Associate
Joined
24 Feb 2010
Posts
213
I have been messing around with different bios and settings for my MEG ACE X570 and discovered with cool'n'quiet enabled my 3700X at stock now hits 4.5GHz on one core during single core Cinebench R20 with all other cores hitting around 4.40-4.48

With cool'n'quiet disabled max boost was 4.375GHz

I also observed the cores used during the benchmark seemed to favour the higher clocking cores whereas with C'n'Q disabled it would spread load over all cores.
 
Permabanned
Joined
15 Oct 2011
Posts
6,311
Location
Nottingham Carlton
TLTR
AMD lies its all fine
Intel lies outrage
NV lies outrage

Amd calls intel and nv on something its fine. They get called its discusting..

Ahh the hardcore AMD BIAS
 
Man of Honour
Joined
30 Oct 2003
Posts
13,229
Location
Essex
TLTR
AMD lies its all fine
Intel lies outrage
NV lies outrage

Amd calls intel and nv on something its fine. They get called its discusting..

Ahh the hardcore AMD BIAS

Apart from nobody is saying that are they? If you read back you will note that people agree that they are all fantasists and misrepresent their products be that Intel and their power usage/tdp figures, AMD/Intel and boost frequencies etc etc. Nobody is sitting there and throwing their toys out of their pram over any of these things so why now? Also is it really a lie? Most would agree that they meet what is put on the boxes even if it is a little sneaky. Again it comes down to relative performance and value for money, you are either getting good performance at a price you are happy with or you aren't.
 
Caporegime
Joined
17 Mar 2012
Posts
47,382
Location
ARC-L1, Stanton System
Lets take a random Intel CPU.

9900K:
8 Integer Units
L1 8X 32 KBytes
L1 Instructions 8X 32 KBytes
L2 8X 256 KBytes
L3 12 MBytes

Now lets look at the FX 8350:
8 Integer Units
L1 8X 16 KBytes
L1 Instructions 4X 64 KBytes
L2 4X 2048 KBytes
L3 8 MBytes

Those of you who say the FX 8350 is a 4 core explain your reasoning.

Edit: Intel 4 core:

7700K
4 Integer Units
L1 8X 32 KBytes
L1 Instructions 4X 32 KBytes
L2 4X 256 KBytes
L3 8 MBytes
 
Soldato
Joined
17 Aug 2009
Posts
10,714
As far as I've seen the cpus can hit the maximum boost speed however it is similar to achieving memory speed in that other hardware is important and it isn't necessarily price related.

So if you don't have a motherboard which plays nice then you will never see the top boost. But that's two things in play and the motherboard manufacturers aren't making boost promises.
 
Permabanned
Joined
2 Sep 2017
Posts
10,490
Now lets look at the FX 8350:
8 Integer Units
L1 8X 16 KBytes
L1 Instructions 4X 64 KBytes
L2 4X 2048 KBytes
L3 8 MBytes

I understood that the 8-thread FX has 4 double-width integer units. And it's extremely difficult for the Windows scheduler to make them work normally.
 
Soldato
Joined
28 May 2007
Posts
18,200
that was expected.
the problem is that if AMD gets away with that Intel will be advertising its next gen CPUs the same way...
days of overclocking will come to an end and this forum might as well be shut down... :p

You kind of have that backwards. Intel move to "upto X speed" two generations ago...
 
Caporegime
Joined
17 Mar 2012
Posts
47,382
Location
ARC-L1, Stanton System
I understood that the 8-thread FX has 4 double-width integer units. And it's extremely difficult for the Windows scheduler to make them work normally.

2 of the Integer Units share an L2 Cache.

The idea was they are configurable, in one mode its 8X 128Bit wide threads, one thread per core, its how they would operate in MT tasks like Cinebench, for low threaded workloads the two Integer Units could combine to form a 256Bit wide thread for better ST performance.

That was the design and theory but yes the Windows Scheduler just treated it as a 4 core with Hyperthreading.

It is however by definition and design an 8 core CPU.
 
Last edited:
Caporegime
Joined
17 Mar 2012
Posts
47,382
Location
ARC-L1, Stanton System
2 of the Integer Units share an L2 Cache.

The idea was they are configurable, in one mode its 8X 128Bit wide threads, one thread per core, its how they would operate in MT tasks like Cinebench, for low threaded workloads the two Integer Units could combine to form a 256Bit wide thread for better ST performance.

That was the design and theory but yes the Windows Scheduler just treated it as a 4 core with Hyperthreading.

It is however by definition and design an 8 core CPU.

I should probably expand on this, In MT mode its 8/8, in ST its 4/4.

So:
MT 8X 128Bit Integer
ST 4X 256Bit Integer

So where normally an 8 core would be MT 8X 12.5% and ST 1X 12.5% the FX 8350 would be 8X 12.5% or 1X 25% because its combing two threads and Integer Units as a monolith for twice the performance, a very clever design, this was however never what actually happened, the Windows Scheduler never combined the two threads so in ST mode it ran the two Integer Units and a monolith but with only one 128Bit wide thread bottlenecking the crap out of the cores at the front end.

Windows saw the CPU with 4 monolithic cores and 8X 128Bit threads, that's it.

When you know that you understand why AMD never accepted Bulldozer was a bad CPU.
 
Last edited:
Caporegime
Joined
17 Mar 2012
Posts
47,382
Location
ARC-L1, Stanton System
Oh.... and i have more to say about the Windows Scheduler, now that i have got started on it!!!!! :D

Ryzen, particularity Ryzen 3000.

In lightly threaded Workloads (like low threaded games) what's supposed to happen is everything is meant to happen within one CCX, nothing is ever supposed to jump between CCX's let alone chiplets, this to keep latency low, as low as Coffelake's Ring Bus, low Latency Inter core communication = high performance in games.

The Windows Scheduler just doesn't do that, it likes to switch a singular workload between cores, you can even see this when your watching MSI OSD CPU threads, you can see the load on the CPU's moving around all over the place, this is fine on Coffeelake with its Ring Bus everything is always extremely close and tight, the down side it this only works with upto 10 cores, beyond that the traces are too long and complex which is why Skylake-X has a high latency "Mesh" architecture.

The problem for Ryzen 3000 with the Windows Scheduler is the same workload is moving around between CCX's and die clusters adding unnecessary latency, this isn't quite so apparent because the IPC on Ryzen 3000 is about 10% higher than Coffeelake, so the end result is less of an impact by comparison.

Sometimes AMD can put in hacks to force low threaded games to stay within one CCX, when that is the case you can see the true performance of Ryzen 3000.

Exhibit A: CS:GO really easy to hack as this game only uses 2 maybe 3 threads at most.

The 3900X with PBO is probably running at about 4.5Ghz here, vs a 5Ghz 9900K.

The 9900K is about 10% higher clocked and yet the 3900X is about 4% faster in CS:GO.

If the Windows Scheduler wasn't so crap more games would look something like this on Ryzen 3000

7QUFEgR.png
 
Permabanned
Joined
2 Sep 2017
Posts
10,490
2 of the Integer Units share an L2 Cache.

The idea was they are configurable, in one mode its 8X 128Bit wide threads, one thread per core, its how they would operate in MT tasks like Cinebench, for low threaded workloads the two Integer Units could combine to form a 256Bit wide thread for better ST performance.

That was the design and theory but yes the Windows Scheduler just treated it as a 4 core with Hyperthreading.

It is however by definition and design an 8 core CPU.

I should probably expand on this, In MT mode its 8/8, in ST its 4/4.

So:
MT 8X 128Bit Integer
ST 4X 256Bit Integer

So where normally an 8 core would be MT 8X 12.5% and ST 1X 12.5% the FX 8350 would be 8X 12.5% or 1X 25% because its combing two threads and Integer Units as a monolith for twice the performance, a very clever design, this was however never what actually happened, the Windows Scheduler never combined the two threads so in ST mode it ran the two Integer Units and a monolith but with only one 128Bit wide thread bottlenecking the crap out of the cores at the front end.

Windows saw the CPU with 4 monolithic cores and 8X 128Bit threads, that's it.

When you know that you understand why AMD never accepted Bulldozer was a bad CPU.

Oh.... and i have more to say about the Windows Scheduler, now that i have got started on it!!!!! :D

Ryzen, particularity Ryzen 3000.

In lightly threaded Workloads (like low threaded games) what's supposed to happen is everything is meant to happen within one CCX, nothing is ever supposed to jump between CCX's let alone chiplets, this to keep latency low, as low as Coffelake's Ring Bus, low Latency Inter core communication = high performance in games.

The Windows Scheduler just doesn't do that, it likes to switch a singular workload between cores, you can even see this when your watching MSI OSD CPU threads, you can see the load on the CPU's moving around all over the place, this is fine on Coffeelake with its Ring Bus everything is always extremely close and tight, the down side it this only works with upto 10 cores, beyond that the traces are too long and complex which is why Skylake-X has a high latency "Mesh" architecture.

The problem for Ryzen 3000 with the Windows Scheduler is the same workload is moving around between CCX's and die clusters adding unnecessary latency, this isn't quite so apparent because the IPC on Ryzen 3000 is about 10% higher than Coffeelake, so the end result is less of an impact by comparison.

Sometimes AMD can put in hacks to force low threaded games to stay within one CCX, when that is the case you can see the true performance of Ryzen 3000.

Exhibit A: CS:GO really easy to hack as this game only uses 2 maybe 3 threads at most.

The 3900X with PBO is probably running at about 4.5Ghz here, vs a 5Ghz 9900K.

The 9900K is about 10% higher clocked and yet the 3900X is about 4% faster in CS:GO.

If the Windows Scheduler wasn't so crap more games would look something like this on Ryzen 3000

7QUFEgR.png

Why didn't they force a driver or something to "hack" the Windows Scheduler to properly work with the CPUs?
 
Caporegime
Joined
17 Mar 2012
Posts
47,382
Location
ARC-L1, Stanton System
Why didn't they force a driver or something to "hack" the Windows Scheduler to properly work with the CPUs?

They do, Chipset Drivers... these days are more a kin to GPU drivers with GPU CPU 'optimisations' in them.

That gets more complex to do the more CPU threads the game uses, if its more than 4/8 its going to move to the neighbouring CCX anyway, other than 'very' lightly threaded games, such as CS:GO.
What they can do is stop the workloads moving outside the chiplets.
 
Back
Top Bottom