Ryzen and Gaming results.

h4rm0ny · 13 Mar 2017 at 14:28

DragonQ said:
That's pretty interesting. So latency between cores on different CCXs is 3.5x as high as between cores on the same CCX. Some rough numbers from their graphs:

Intel SMT latency (same physical core, different logical core): 15 ns
AMD SMT latency (same physical core, different logical core): 25 ns

Intel core latency (different physical core): 80 ns
AMD core latency (different physical core, same CCX): 45 ns
AMD core latency (different physical core, different CCX): 145 ns

It also shows that Windows does understand AMD's SMT implementation, at least in terms of the core layout. Patching Windows to understand the CCX implementation would very likely garner the biggest improvement.

I believe cache is local to its CCX. So if you switch something over to a different CCX, it's starting with a blank cache.

DragonQ · 13 Mar 2017 at 14:49

h4rm0ny said:
I believe cache is local to its CCX. So if you switch something over to a different CCX, it's starting with a blank cache.

Yes, the L3 cache is shared between all cores on a CCX but not between CCXs.

h4rm0ny · 13 Mar 2017 at 15:54

DragonQ said:
Yes, the L3 cache is shared between all cores on a CCX but not between CCXs.

Which is what I was getting at. Windows does not yet, so far as I know, have an understanding that it needs to not only balance things between real cores rather than just logical cores (which it can do), but to make certain that threads are moved between real cores within their own CCX rather than between CCXs. Because the cost of moving a thread between two real cores might be unexpectedly high because in one case (same CCX) the shared cache is still available and in the other (different CCX) it is not.

ubersonic · 13 Mar 2017 at 16:00

DragonQ said:
Yes, the L3 cache is shared between all cores on a CCX but not between CCXs.

It's somewhat similar to having two 4c8t CPUs as opposed to one 8c16t CPU then?

Curlyriff · 13 Mar 2017 at 16:11

ubersonic said:
It's somewhat similar to having two 4c8t CPUs as opposed to one 8c16t CPU then?

Yep that is the case and why the scheduling seems so off. It has the Infinite Fabric which then links them. It is scale-able up to 512GBps so over time we should see this improve. What it is then meant to do is be able to also link things like the APU system up and the GPU so they all work off this one fabric and thus should allow all to share the data much better once it all works as intended.

Of course at this time it appears to be the limiting factor but that could well change as time goes on.

DragonQ · 13 Mar 2017 at 16:14

h4rm0ny said:
Which is what I was getting at. Windows does not yet, so far as I know, have an understanding that it needs to not only balance things between real cores rather than just logical cores (which it can do), but to make certain that threads are moved between real cores within their own CCX rather than between CCXs. Because the cost of moving a thread between two real cores might be unexpectedly high because in one case (same CCX) the shared cache is still available and in the other (different CCX) it is not.

Correct.

ubersonic said:
It's somewhat similar to having two 4c8t CPUs as opposed to one 8c16t CPU then?

Not really because it's a single die and a single package but yes, the two CCXs are connected via their Infinity Fabric. AMD haven't released exact performance numbers so we can't really compare to what you'd see in a dual Xeon configuration using QPI. Since it's on the same die, I imagine it's quicker, but I have no evidence of this.

Curlyriff · 13 Mar 2017 at 16:28

DragonQ said:
Correct.

Not really because it's a single die and a single package but yes, the two CCXs are connected via their Infinity Fabric. AMD haven't released exact performance numbers so we can't really compare to what you'd see in a dual Xeon configuration using QPI. Since it's on the same die, I imagine it's quicker, but I have no evidence of this.

There is some figures floating about that someone showed in the other thread where it suggested the ms timings all detailed out.

spoffle · 13 Mar 2017 at 16:30

cosmogenesis said:
if you have the money for 8 core CPUs then I reckon your gaming at 2 or 4K which makes Ryzen good value for money going forward in gaming. The current crop of issues will be ironed out and as consoles going forward will feature Zen like CPUs it good for AMD.

1920x1080 nor 2560x1440 is 2K.

DragonQ · 13 Mar 2017 at 16:42

Curlyriff said:
There is some figures floating about that someone showed in the other thread where it suggested the ms timings all detailed out.

That was me but I don't have numbers for dual-CPU QPI.

spoffle · 13 Mar 2017 at 16:44

ubersonic said:
The Tech Talk streams he does (sadly they are ending soon) with Jerry "Barnacules" Berg are awesome, tech news/perspectives with humour, probably the most comical moment was when they were discussing something about Microsoft (where Jerry worked as a senior software developer for 15 years) and they suddenly realised that Jerry was actually the lead developer on the automation software that later laid off him and his department XD

Why's he stopping tech talk?

haszek · 13 Mar 2017 at 17:39

spoffle said:
Why's he stopping tech talk?

Coz he's got no time to produce it anymore. He might be a guest in Jerry's 'Talk', as Jerry admitted he is interested to create his own audition and produce it.

AllBodies · 13 Mar 2017 at 19:57

MDPlatts said:
I would expect CCX to CCX transfers to be one of the big areas of effort for improvement coming to Zen+/Zen++

I assume so too. And very high MHz RAM support, since RAM speed itself increases CCX transfer rate with the current design.

DragonQ · 13 Mar 2017 at 20:09

AllBodies said:
I assume so too. And very high MHz RAM support, since RAM speed itself increases CCX transfer rate with the current design.

Yep, fixed 2:1 ratio IIRC.

MDPlatts · 13 Mar 2017 at 22:12

ubersonic said:
It's somewhat similar to having two 4c8t CPUs as opposed to one 8c16t CPU then?

Yes - but that is no different to say a dual cpu xeon server where the o/s has to be aware of the NUMA nature of the server so similar work is placed on one CPU/socket (CCX) and other different work (or the same type of work if there is enough to go around) is sent to the other.

Once Naples comes out then the same will be true there too - but with upto 16 CCX's (2 x 8/socket) to spread the work around.

It will also get to know the difference between CCX transfers within a socket and CCX transfers to the other socket.

The Naples previews which came out last week (http://techreport.com/news/31549/amd-naples-platform-prepares-to-take-zen-into-the-datacenter) were phenomenal in their performance vs Intels top Xeon 2P chips - 2x faster like-for-like or 2.5 unleashed (until the skylake based ones come out in 3-4 months which will be a little closer). Sure the Intel 8P ones could go higher and I'm sure AMD are working on chipsets/chips to get them into the 4P/8P market once they control the 2P space.

One the MS scheduler gets able to work this out (like the linux one does) so it treats it as two quad-core + ht cpu's - with it also knowing the difference between logical/physical then the work rate will go up quite a bit.

Panos · 13 Mar 2017 at 22:31

h4rm0ny said:
I believe cache is local to its CCX. So if you switch something over to a different CCX, it's starting with a blank cache.

Yes. And there is where the latency exists.

Also AMD can greatly improve the performance if it puts either a tiny L4 between the 2 CCXs or even an eDRAM like Intel did with the B-E chips.
Hell, even a single stack of HBM1 would make a humongous difference.

Which makes me wonder, if that chip AMD designed and showed last week, with the 4 HBM2 stacks on top of the cores, is for any future APU and I am intrigued about the performance of it.

SiDeards73 · 14 Mar 2017 at 06:43

Anyone with RyZen and a 1070 able to do a 1440p DX11 and dx12 benchmark run on The Division for me please? Set it to medium or high, not ultra. So i can compare my 4770k against it, I'm not happy with my current performance in it as i seem to now get a lot of FPS dips, not sure if it's Nvidia drivers or the game but something is not right.

If someone can benchmark it i can compare against my 1070 and 4770k to see if it's going to be an FPS upgrade or atleast match it, i imagine it will be 10x smoother on RyZen though.

RavenXXX2 · 14 Mar 2017 at 12:52

https://www.youtube.com/watch?v=5LSYqaNwfL4&t=0s

I fear for the 5 range of ryzen.

Curlyriff · 14 Mar 2017 at 12:55

RavenXXX2 said:
https://www.youtube.com/watch?v=5LSYqaNwfL4&t=0s

I fear for the 5 range of ryzen.

That video does nothing to suggest why the R5 would be any different to the R7 and why it is an issue regardless. So far most of what you have linked/typed seems completely illogical to me in honesty.

Loque · 15 Mar 2017 at 02:14

If anyone has Gears of War 4 and a GTX 1080, could you post a screenshot of your benchmark results? My results don't seem to look right to me...

humbug · 15 Mar 2017 at 02:35

Curlyriff said:
That video does nothing to suggest why the R5 would be any different to the R7 and why it is an issue regardless. So far most of what you have linked/typed seems completely illogical to me in honesty.

Ignore him, he has been trying very hard to put a downer on Ryzen for a long time.

Competitor rules

Ryzen and Gaming results.

Man of Honour