AMD Zen 2 (Ryzen 3000) - * NO COMPETITOR HINTING *

Murphy · 29 Apr 2019 at 16:50

humbug said:
Here we go again with you trying to twist something in to something that it isn't because it doesn't agree with you, how is it that you have so much time and energy to straw-man your way through this thread constantly, don't you have anything better to do or is this what you do?

His information came from an article written by Dr Fog https://www.researchgate.net/profile/Agner_Fog a well respected technology analyst. he's quoting it in the video with the data right in front of him.

If you don't like whats in his article go and disagree with him, stop with your constant arguing in this thread.

I'm not trying to twist anything because it doesn't agree with me, I'm stating that what you said doesn't match what AMD themselves have said, it's not me who's twisting or disagreeing with you it's AMD.

I mean seriously this is what's been taken directly from AMD.

AMD Ryzen CPU Optimization

AMD Ryzen processors Master Overclocking User Guide

AMD Financial Analyst Day 2017 - Datacenter

AMD EPYC

AMD EPYC 7000-series Product Brief, June 2017

AMD EPYC Performance Brief, June 2017

AMD EPYC Solution Brief, June 2017

AMD x86 Memory Encryption Technologies, David Kaplan, Security Architect, LSS 2016 August 25, 2016

Manuals

Open-Source Register Reference For AMD Family 17h Processors

And in those documents they state the following...

Cache

L0 µOP cache:

2,048 µOPs, 8-way set associative

32-sets, 8-µOP line size

Parity protected

And...

L1D Cache:

32 KiB 8-way set associative

64-sets, 64 B line size

Write-back policy

4-5 cycles latency for Int

7-8 cycles latency for FP

SEC-DED ECC

Click to expand...

My emphasis to highlight the relevant architecture you mentioned...

You're literally trying to argue with the people who designed the blooming thing.

And pray-tell what strawman is it that you believe i've constructed? Did you not say "Zen: 2048 uOps, Zen: 10 Integer Execution units, Zen: 4 FP units, Zen: Float latency 3 Cycles"? Did i not say Zen does not have those specific things in the numbers you claimed? Tell me how you think I've misrepresented what you've said.

Also what i do with my time has nothing to do with you, trying to infer some sort of moralistic fallacy serves no purpose and just undermines what you're saying.

Oh and FYI I'm not arguing, i asked you if you was sure the information you posted was correct and suggested it may not be, you then took umbrage when someone attempted to correct your mistake, it's you who is arguing not me, I'm simply trying to refute misinformation.

Oh and one last thing, it would help if you linked to the actual paper that information is taken from and not just the person research gate profile, otherwise how does anyone know what one of his 46 papers the information came from.

humbug · 29 Apr 2019 at 16:56

There is a difference to what a CPU core is capable of executing at one end and what its actually executing at the other, its called a bottleneck, AMD's data sheets are what the CPU is actually executing, they have to be, its no good saying 4 cycles of latency when what actually comes out at the other end is 7 or 8.
if you actually watch the thing this is explained.

The whole point of what i posted is the analysis of what the CPU is actually capable of, with that you have to look deeper at what the CPU is actually doing, that's what Dr Fog did, its what he does.

PS: FP is in cache, they are both correct, they agree with eachother. Floating Point.

Murphy · 29 Apr 2019 at 17:15

humbug said:
There is a difference to what a CPU core is capable of executing at one end and what its actually executing at the other, its called a bottleneck, AMD's data sheets are what the CPU is actually executing, they have to be, its no good saying 4 cycles of latency when what actually comes out at the other end is 7 or 8.
if you actually watch the thing this is explained.

The whole point of what i posted is the analysis of what the CPU is actually capable of, with that you have to look deeper at what the CPU is actually doing, that's what Dr Fog did, its what he does.

PS: FP is in cache, they are both correct, they agree with eachother. Floating Point.

Now I've had a chance to read some of the paper Dr Fog (why do i keep reading that as frog

) published he actually says in the opening introduction, and i quote.

I have no way of knowing with certainty whether it is in accordance with the actual physical structure of the microprocessors. The main purpose of providing this information is to enable programmers and compiler makers to optimize their code.

He's basically making educated guesses based on how his code runs, while that maybe useful when a vendor doesn't publish any information on how things work it's highly misleading to claim that's how it actually designed or how it's actually executing things.

While i can certainly agree what you posted is what Zen is capable of when running ASM it's leaving out all the other types of languages and code, it's like trying to make an educated guess on why a cars fast by only listening to its engine, it misses the bigger picture.

PS: The FP is not in the cache, the cache is memory, the FP are dedicated arithmetic logic units, they're two very different things.

PPS: What Dr Frog's paper shows is basically speculation and that's great when the information from vendors is lacking but to put it forwards as factual would be incorrect and when, like AMD have, the vendor makes the information publicly available that would take precedence.

Potatowithearsontheside · 29 Apr 2019 at 17:17

Asking for a 4.5GHz base clock is a bit much; the 9900K has a base clock of 3.6GHz.
Whilst modern CPUs don't ever really run at their base clock, it's a bit daft to expect that you'll get something so highly clocked as a base.
XFR2 does muddy the waters somewhat, but the number on the tin won't be 4.5GHz base for sure.

~>Dg<~ · 29 Apr 2019 at 17:22

i mean with full speed on. 2700x does 4.35 so its not that much extra is it ? so to expect 4.5 stock no i think that is more than expected from this revision of cpus.

BongoHunter · 29 Apr 2019 at 17:25

Potatowithearsontheside said:
Asking for a 4.5GHz base clock is a bit much; the 9900K has a base clock of 3.6GHz.
Whilst modern CPUs don't ever really run at their base clock, it's a bit daft to expect that you'll get something so highly clocked as a base.
XFR2 does muddy the waters somewhat, but the number on the tin won't be 4.5GHz base for sure.

Agreed.

I can't see stock base being 4.5 - what's the point, it's just wasted energy. As long as it boosts to 4.5 or more under load for at least 2 cores on the 8 core model that will do

humbug · 29 Apr 2019 at 17:37

Murphy said:
Now I've had a chance to read some of the paper Dr Fog (why do i keep reading that as frog ) published he actually says in the opening introduction, and i quote.

He's basically making educated guesses based on how his code runs, while that maybe useful when a vendor doesn't publish any information on how things work it's highly misleading to claim that's how it actually designed or how it's actually executing things.

While i can certainly agree what you posted is what Zen is capable of when running ASM it's leaving out all the other types of languages and code, it's like trying to make an educated guess on why a cars fast by only listening to its engine, it misses the bigger picture.

PS: The FP is not in the cache, the cache is memory, the FP are dedicated arithmetic logic units, they're two very different things.

PPS: What Dr Frog's paper shows is basically speculation and that's great when the information from vendors is lacking but to put it forwards as factual would be incorrect and when, like AMD have, the vendor makes the information publicly available that would take precedence.

I don't see that line in it, can you screenshot it and put it in here? within its own paragraph.

humbug · 29 Apr 2019 at 17:39

Potatowithearsontheside said:
Asking for a 4.5GHz base clock is a bit much; the 9900K has a base clock of 3.6GHz.
Whilst modern CPUs don't ever really run at their base clock, it's a bit daft to expect that you'll get something so highly clocked as a base.
XFR2 does muddy the waters somewhat, but the number on the tin won't be 4.5GHz base for sure.

Its not base clock its boost, all core boost, the 9900K does that at 4.7Ghz and don't forget the Ryzen 3000 test sample Lisa had was 1 or 2% faster than the 9900K in Cinebench.

So if its not boosting to at least 4.5Ghz the IPC is up a lot.

Murphy · 29 Apr 2019 at 18:20

humbug said:
I don't see that line in it, can you screenshot it and put it in here? within its own paragraph.

Do i have to, that's such a PITA, if you're not blocked from downloading things you could download the PDF from Dr Frog's blog, it's 3. The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers section and the sentence i quoted is at the bottom of page 6.

1.1 About this manual
This is the third in a series of five manuals:
1. Optimizing software in C++: An optimization guide for Windows, Linux and Mac platforms.
2. Optimizing subroutines in assembly language: An optimization guide for x86 platforms.
3. The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers.
4. Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs.
5. Calling conventions for different C++ compilers and operating systems.

The latest versions of these manuals are always available from www.agner.org/optimize. Copyright conditions are listed on page 238 below.

The present manual describes the details of the microarchitectures of x86 microprocessors from Intel and AMD. The Itanium processor is not covered. The purpose of this manual is to enable assembly programmers and compiler makers to optimize software for a specific microprocessor. The main focus is on details that are relevant to calculations of how much time a piece of code takes to execute, such as the latencies of different execution units and the throughputs of various parts of the pipelines. Branch prediction algorithms are also covered in detail.

This manual will also be interesting to students of microarchitecture. But it must be noted that the technical descriptions are mostly based on my own research, which is limited to what is measurable. The descriptions of the "mechanics" of the pipelines are therefore limited to what can be measured by counting clock cycles or micro-operations (µops) and what can be deduced from these measurements. Mechanistic explanations in this manual should be regarded as a model which is useful for predicting microprocessor behavior. I have no way of knowing with certainty whether it is in accordance with the actual physical structure of the microprocessors. The main purpose of providing this information is to enable programmers and compiler makers to optimize their code.

On the other hand, my method of deducing information from measurements rather than relying on information published by microprocessor vendors provides a lot of new information that cannot be found anywhere else. Technical details published by microprocessor vendors is often superficial, incomplete, selective and sometimes misleading. My findings are sometimes in disagreement with data published by microprocessor vendors.
Reasons for this discrepancy might be that such data are theoretical while my data are obtained experimentally under a particular set of testing conditions. I do not claim that all information in this manual is exact. Some timings etc. can be difficult or impossible to measure exactly, and I do not have access to the inside information on technical implementations that microprocessor vendors base their technical manuals on.
The tests are done mostly in 32-bit and 64-bit protected mode. Most timing results are independent of the processor mode. Important differences are noted where appropriate. Far jumps, far calls and interrupts have mostly been tested in 16-bit mode for older processors.

Call gates etc. have not been tested. The detailed timing results are listed in manual 4: "Instruction tables".
Most of the information in this manual is based on my own research. Many people have sent me useful information and corrections, which I am very thankful for. I keep updating the manual whenever I have new important information. This manual is therefore more detailed, comprehensive and exact than other sources of information; and it contains many details not found anywhere else. This manual is not for beginners. It is assumed that the reader has a good understanding of assembly programming and microprocessor architecture. If not, then please read some books on the subject and get some programming experience before you begin doing complicated optimizations. See the literature list in manual 2: "Optimizing subroutines in assembly language" or follow the links from www.agner.org/optimize.

The reader may skip chapters describing old microprocessor designs unless you are using these processors in embedded systems or you are interested in historical developments in microarchitecture. Please don't send your programming questions to me, I am not gonna do your homework for you! There are various discussion forums on the Internet where you can get answers to your programming questions if you cannot find the answers in the relevant books and manuals.

humbug · 29 Apr 2019 at 18:27

Murphy said:
Do i have to, that's such a PITA, if you're not blocked from downloading things you could download the PDF from Dr Frog's blog, it's 3. The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers section and the sentence i quoted is at the bottom of page 6.

1.1 About this manual
This is the third in a series of five manuals:
1. Optimizing software in C++: An optimization guide for Windows, Linux and Mac platforms.
2. Optimizing subroutines in assembly language: An optimization guide for x86 platforms.
3. The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers.
4. Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs.
5. Calling conventions for different C++ compilers and operating systems.

The latest versions of these manuals are always available from www.agner.org/optimize. Copyright conditions are listed on page 238 below.

The present manual describes the details of the microarchitectures of x86 microprocessors from Intel and AMD. The Itanium processor is not covered. The purpose of this manual is to enable assembly programmers and compiler makers to optimize software for a specific microprocessor. The main focus is on details that are relevant to calculations of how much time a piece of code takes to execute, such as the latencies of different execution units and the throughputs of various parts of the pipelines. Branch prediction algorithms are also covered in detail.

This manual will also be interesting to students of microarchitecture. But it must be noted that the technical descriptions are mostly based on my own research, which is limited to what is measurable. The descriptions of the "mechanics" of the pipelines are therefore limited to what can be measured by counting clock cycles or micro-operations (µops) and what can be deduced from these measurements. Mechanistic explanations in this manual should be regarded as a model which is useful for predicting microprocessor behavior. I have no way of knowing with certainty whether it is in accordance with the actual physical structure of the microprocessors. The main purpose of providing this information is to enable programmers and compiler makers to optimize their code.

On the other hand, my method of deducing information from measurements rather than relying on information published by microprocessor vendors provides a lot of new information that cannot be found anywhere else. Technical details published by microprocessor vendors is often superficial, incomplete, selective and sometimes misleading. My findings are sometimes in disagreement with data published by microprocessor vendors.
Reasons for this discrepancy might be that such data are theoretical while my data are obtained experimentally under a particular set of testing conditions. I do not claim that all information in this manual is exact. Some timings etc. can be difficult or impossible to measure exactly, and I do not have access to the inside information on technical implementations that microprocessor vendors base their technical manuals on.
The tests are done mostly in 32-bit and 64-bit protected mode. Most timing results are independent of the processor mode. Important differences are noted where appropriate. Far jumps, far calls and interrupts have mostly been tested in 16-bit mode for older processors.

Call gates etc. have not been tested. The detailed timing results are listed in manual 4: "Instruction tables".
Most of the information in this manual is based on my own research. Many people have sent me useful information and corrections, which I am very thankful for. I keep updating the manual whenever I have new important information. This manual is therefore more detailed, comprehensive and exact than other sources of information; and it contains many details not found anywhere else. This manual is not for beginners. It is assumed that the reader has a good understanding of assembly programming and microprocessor architecture. If not, then please read some books on the subject and get some programming experience before you begin doing complicated optimizations. See the literature list in manual 2: "Optimizing subroutines in assembly language" or follow the links from www.agner.org/optimize.

The reader may skip chapters describing old microprocessor designs unless you are using these processors in embedded systems or you are interested in historical developments in microarchitecture. Please don't send your programming questions to me, I am not gonna do your homework for you! There are various discussion forums on the Internet where you can get answers to your programming questions if you cannot find the answers in the relevant books and manuals.

He also said.

The descriptions of the "mechanics" of the pipelines are therefore limited to what can be measured by counting clock cycles or micro-operations (µops) and what can be deduced from these measurements.

On the other hand, my method of deducing information from measurements rather than relying on information published by microprocessor vendors provides a lot of new information that cannot be found anywhere else. Technical details published by microprocessor vendors is often superficial, incomplete, selective and sometimes misleading. My findings are sometimes in disagreement with data published by microprocessor vendors.

As i said the whole point of analysis to to get to what the data sheets don't tell you.

~>Dg<~ · 29 Apr 2019 at 18:34

humbug said:
Its not base clock its boost, all core boost, the 9900K does that at 4.7Ghz and don't forget the Ryzen 3000 test sample Lisa had was 1 or 2% faster than the 9900K in Cinebench.

So if its not boosting to at least 4.5Ghz the IPC is up a lot.

cinebench means squat. 8700k is about 15 -20 percent quicker in games. they have used cinebench again which shows a even smaller lead than demonstrated here. this is why until actual benchmarks air its all just hyperbole.

humbug · 29 Apr 2019 at 18:41

~>Dg<~ said:
cinebench means squat. 8700k is about 15 -20 percent quicker in games. they have used cinebench again which shows a even smaller lead than demonstrated here. this is why until actual benchmarks air its all just hyperbole.

Of course it is, its clocked higher and yes Intel sometimes benefits from games being optimized for Intel and not AMD, that however is fast becoming a thing of the past now that Ryzen is here and outselling Intel.

Cinebench is and always has been a good yard stick for IPC, a lot of games are faster on Intel than Zen+ only by its measure of clock speed, just like Cinebench.

It was good for Intel to show IPC but not now that it doesn't look so good for them anymore?

Murphy · 29 Apr 2019 at 18:46

humbug said:
He also said.

As i said the whole point of analysis to to get to what the data sheets don't tell you.

And if we read the analysis we discover that the guy in that video got it wrong, from the section on what he tested on Ryzen's Integer execution pipes...

The integer unit has four ALUs so that it can execute four integer instructions per clock cycle. Simple integer instructions can be handled by any of these four ALUs, while some of the more costly operations such as multiplication and division can only be handled by one of the ALUs.

^^That's^^ why i rarely trust what people say in YouTube videos.

And yes the section you quoted is true "Technical details published by microprocessor vendors is often superficial, incomplete, selective and sometimes misleading" and in the case of Intel or past AMD CPUs that may well be true (Intel designs don't interest me much) but that's not the case with Zen, as shown by the links to the firsthand information i provided on the previous page.

I would also say that IMO the reason his findings are sometimes in disagreement with data published by microprocessor vendors is because he's running very specific code in one particular programing language and as i said before he's basically making educated guesses that in some cases may not be showing him the full picture from an architectural viewpoint, like i said before he's doing the equivalent of trying to make an educated guess on why a cars fast by only listening to its engine.

humbug · 29 Apr 2019 at 18:50

Murphy said:
And if we read the analysis we discover that the guy in that video got it wrong, from the section on what he tested on Ryzen's Integer execution pipes...

^^That's^^ why i rarely trust what people say in YouTube videos.

And yes the section you quoted is true "Technical details published by microprocessor vendors is often superficial, incomplete, selective and sometimes misleading" and in the case of Intel or past AMD CPUs that may well be true (Intel designs don't interest me much) but that's not the case with Zen, as shown by the links to the firsthand information i provided on the previous page.

I would also say that IMO the reason his findings are sometimes in disagreement with data published by microprocessor vendors is because he's running very specific code in one particular programing language and as i said before he's basically making educated guesses that in some cases may not be showing him the full picture from an architectural viewpoint, like i said before he's doing the equivalent of trying to make an educated guess on why a cars fast by only listening to its engine.

I would also say that IMO the reason his findings are sometimes in disagreement with data published by microprocessor vendors is because he's running very specific code in one particular programing language and as i said before he's basically making educated guesses that in some cases may not be showing him the full picture from an architectural viewpoint, like i said before he's doing the equivalent of trying to make an educated guess on why a cars fast by only listening to its engine.

You can be in disagreement with him, pick out what you think are flaws, none of that interests me, this his job, he is the proverbial expert, i trust his judgment for more than i do some random forum poster.

With that i'm done with this, rage against it all you like you're not going to convince me you're more right than he is.

i'm going for some dinner.

Undesirable · 29 Apr 2019 at 18:54

So are we getting 5 GHz XFR on a couple of cores?

Kelt · 29 Apr 2019 at 18:56

Undesirable said:
So are we getting 5 GHz XFR?

I'm hoping for 6Ghz.

Along with Bob.

Potatowithearsontheside · 29 Apr 2019 at 19:39

Bob Hope?

4K8KW10 · 29 Apr 2019 at 19:51

Undesirable said:
So are we getting 5 GHz XFR on a couple of cores?

No.

You don't need 5 GHz.

Undesirable · 29 Apr 2019 at 19:55

4K8KW10 said:
No.

You don't need 5 GHz.

Yes, I need them to pummel Intel into the ground where they belong.

4K8KW10 · 29 Apr 2019 at 19:57

Undesirable said:
Yes, I need them to pummel Intel into the ground where they belong.

There is no need to push the physical limitations. I hope you know that IPC is more important than frequency. A 2.2 GHz Athlon 64 was faster than a 3.2 GHz Pentium 4!

Competitor rules

AMD Zen 2 (Ryzen 3000) - *** NO COMPETITOR HINTING ***

AMD Zen 2 (Ryzen 3000) - * NO COMPETITOR HINTING *