• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

*** AMD "Zen" thread (inc AM4/APU discussion) ***

All this is pseudo logic.
The gaming performance is lower than it perhaps should be, the IMC latency is a little higher than Intel and memory speed performance scaling is higher than Intel, put all that together it must be the reason, right? As people like PcPer concluded it must be because the CCX interconnect latency is causing a slowdown in IPC tasks heavily dependent on gaming.

That ^^^ is a theory.

The 2400G has no CCX interconnect latency because it has no CCX interconnect and yet its no better than the one with the CCX interconnect and its inherent latency.

The 2400G is the experiment to test the aforementioned theory, there is no difference in gaming performance between the two, the experiment proves the theory wrong.

There could be any number of reasons for the perceived lower gaming performance, i say perceived because for a CPU like the 8700K clocked 25% higher than the 1600 is 9 in 10 times its significantly less than that faster in games, same goes for Ryzen's performance scaling with memory speed, its better than Intel maybe because it just is that way for other reasons, the 2400G does not have CCX's and yet it scales in the same way. explain that.

Wut?? I am talking about memory controller latency not inter-CCX latency. We are talking about two different things.The reduction in L3 cache and one CCX has not really affected gaming performance,but its not fixed it in a number of games either. You have entirely forgotten about the Phenom II X6 having the same issue - it responded well to CPU-NB tweaks and lower latency RAM. You should know that.

kuSeTfA.png

That is with 3200MHZ DDR4 - the memory read/write results are HIGHER for AMD than Intel but latency is significantly higher.

Ryzen works better with higher speed and lower latency RAM - now maybe to look at how game engines work,especially older engines. If its not due to CCX communication then it is something else.

The Core i5 8400 runs at similar clockspeed,yet the IPC difference between Haswell and Skylake can't explain why a Core i5 8400 still ends up faster than a Ryzen 5 at similar clockspeeds in games:

http://www.eurogamer.net/articles/d...-coffee-lake-s-core-i5-8400-i5-8600k-review_1
https://www.techspot.com/review/1514-core-i5-8400-vs-overclocked-ryzen-5-1600/

Some of those differences are up 30% between the Core i5 8400 and Ryzen 5 1600/1600X,and the Ryzen 5 has double the threads of the Core i5,and those engines which show differences don't scale well beyond 4 cores. Its not an IPC or clockspeed issue,as look at non-gaming results. If that is the case the Core i5 8400 should win nothing more than 5% in stuff which is lightly threaded.

It shouldn't even be close in engines which thread well with half the threads. It doesn't win over a Ryzen 5 1600 in a rendering task!

This is not a clockspeed or IPC issue. It could be an optimisation issue,but the Core i5 8400 shouldn't perform as well as it should relative to Ryzen IMHO,but it does,even in cases where it should not.

If you even analysed websites which show gaming and non-gaming performance,its obvious that gaming performance relative to Intel trails non-gaming performance. So it can't be all optimisation issues,since it should be the same for non-gaming scenarios too,and not all sites use the latest software either!

Even AMD has said that Ryzen was their worst case scenario and this is their first DDR4 controller. BR DDR4 controller was apparently licensed from another company.

Anyway we can agree to disagree and leave it at that!
 
Last edited:
@Martini1991 Its very simple, multi-threaded scaling is not 100%, i score 160 single threaded, i have 12 threads, i score 1325 with all 12, that's very roughly 70% scaling, if i can make changes to the architecture to bring that scaling up to 80% then i have gained 10% MT IPC, my MT score will go up from 1325 to 1460 and yet my per core performance remains the same.

Ah, I see where you're coming from.

But your logic's a bit whack.
If you ran the benchmark without SMT, your scaling would be right up.
It's wrong to include SMT with your physical cores and make an overall scaling claim.
 
Wut?? I am talking about memory controller latency not inter-CCX latency. We are talking about two different things - you are getting confused.The reduction in L3 cache and one CCX has not really affected gaming performance,but its not fixed it in a number of games either. You have entirely forgotten about the Phenom II X6 having the same issue - it responded well to CPU-NB tweaks and lower latency RAM. You should know that.

kuSeTfA.png

Ryzen works better with higher speed and lower latency RAM - now maybe to look at how game engines work,especially older engines.

The Core i5 8400 runs at similar clockspeed,yet the IPC difference between Haswell and Skylake can't explain why a Core i5 8400 still ends up faster than a Ryzen 5 at similar clockspeeds:

http://www.eurogamer.net/articles/d...-coffee-lake-s-core-i5-8400-i5-8600k-review_1
https://www.techspot.com/review/1514-core-i5-8400-vs-overclocked-ryzen-5-1600/

Some of those differences are up 30% between the Core i5 8400 and Ryzen 5 1600/1600X,and the Ryzen 5 has double the threads of the Core i5,and those engines which show differences don't scale well beyond 4 cores. Its not an IPC or clockspeed issue. If that is the case the Core i5 8400 should win nothing more than 5% in stuff which is lightly threaded.

It shouldn't even be close in engines which thread well with half the threads. It doesn't win over a Ryzen 5 1600 in a rendering task!

This is not a clockspeed or IPC issue. It could be an optimisation issue,but the Core i5 8400 shouldn't perform as well as it should relative to Ryzen IMHO,but it does,even in cases where it should not.

If you even analysed websites which show gaming and non-gaming performance,its obvious that gaming performance relative to Intel trails non-gaming performance. So it can't be all optimisation issues,since it should be the same for non-gaming scenarios too,and not all sites use the latest software either!

Even AMD has said that Ryzen was their worst case scenario and this is their first DDR4 controller. BR DDR4 controller was apparently licensed from another company.


Take one of those results, Tomb Raider, performance was initially much worse, then Square Enix made some tweaks to it and released a patch that improved performance greatly. square that with IMC latency.

You're still using 'connect the dots logic' absolutes. you look at the other half of those games and the performance is pretty much on par, is the IMC latency now being choosy?

Or is it simply that most current games were made for Intel CPU's? why could that not be a reason?
 
Has this been posted before in the thread:

https://www.anandtech.com/show/1243...ew-with-dr-gary-patton-cto-of-globalfoundries

Its an interview of the CTO of GF. It also talks about their roadmaps and what their new nodes bring to the table.

Q5: On the Samsung deal: when it came time for GlobalFoundries to say its custom 14nm process did not work, and GF would have to license somebody else, was that just a factor of resources being split?

GP: This was technically before my time so I don't know all what went into the decision, but I think it was a smart move. They were trying to bring up this factory (Fab 8) and develop a new technology at the same time and that is very tough. But by having the technology that had been debugged largely, to bring it up the fab, was brilliant. It allowed us to get to work straight away on 14nm. We have of course done extensions on that technology, boosted the performance, and we have a number of customers that want to keep getting improvements in the technology without having to wait for a jump to the next node. Customers want performance kickers as well as some density improvements like in 12LP, which offers about a 10% performance improvement and a 15% circuit density improvement.

Q6: So was 12LP originally called 14+?

GP: No, and we never really had a 14+ per se. We have had what we call BKM, which relates to ‘performance bump’ improvements. I think some companies would call it a plus, you would call it a plus. But 12LP is a completely separate thing, so we had customers who were pushing us for some density improvement and so we did some optimization of the middle line and back-end - it's not a pure optical shrink. We wanted to do it with minimum disruption to the design IP that had already been developed, for time to market.

Q7: So it's not a pure optical shrink, so there is a partial optical shrink?

GP: Yes, the middle line and back-end (BEOL) is where we did our tuning.

Q8: So is 12LP in high volume manufacturing now?

GP: It is ramping with Q1 production. We will be ramping through the year.

Q9: As far as what we've been able to determine, the difference between 14LPP and 12LP is just a difference in going from 9T to 7.5T design, with that tuning. Are there other changes?

GP: Along with the track changes, we also changed the middle and back-end-of-line ground rules. One thing that I think we're noting is that the industry focuses a lot on pitches, but that's like having half the story. So much of it is also all the little subtle secondary ground rules.

For example, we have a shrink on our 7nm from 14nm that is 0.37x scale. So it's more than 50% scale at a logic library level. When we first started, we were more like 0.50x or 0.55x and then there was a lot of work with our partners on all the secondary ground rules. How you route the wiring played such a huge role.

At least one of our competitors spends a lot of time talking about the pitches and the things like that, but at the end of the day, a lot of it is how you develop all the rules for the back end and how you develop the routing. We went from 0.50x to 0.37x without changing the pitches. It was all optimization of these secondary ground rules.

For something like 12LP, we try to make it as seamless as possible for our customers. They've invested all this money in the platform, but with the minimal investment, they want to be able to get improvements and enhancements because it is taking longer to get from node to node.

We will intentionally skip 10nm. I call 10nm more of a ‘half-node’ you know, and there are some people who are very focused on the Christmas season. Therefore they have to get something out and it may not have a great deal of improvement for you.

Q10: The mobile industry likes to do that, with the smaller chips. What is your opinion on these half-nodes?

GP: Yeah, and you saw 20nm ended up being a very weak node. It was the end of the road on planar transistors and as a result, you were basically fighting electrostatics. It didn't have a lot of performance gain, and it didn't have a lot of density. The same thing has happened with 10nm - I mean if you look at the scaling and the performance, it is a pretty weak node. We want to focus on nodes that will give a very strong value proposition, so we are focusing very hard on 7nm, and making sure for customers jumping from 14nm to 7nm that we are giving them a really significant improvement. So we scaled it. I say simple logic is scaled 0.37x, because we know with these advanced nodes you are adding a lot of complexity.

That is the problem with 10nm, as you've added a lot of masks, you've got some scaling, but you've added a load of mass so when you combine it, the cost improvements are not that great. We're getting a full node, or greater, of cost improvement if you jump from 14nm to 7nm. You are getting well over a full node on the cost scale, and we've got a very substantial performance improvement with the technology.

Q14: Will GlobalFoundries 7nm have different processes focused on high performance, on mobile, and on low power?

GP: We use different libraries [in the same process]. Libraries are optimized differently, so we have one that is a 2 fin library, so it is ultra-dense, and another that is a 4 fin library for maximum performance. The other thing we did different to some of the other players is that we built a lot of flexibility into our back end, so where there is high performance computing, there is a place where you're going to want to run wider lines and they can do that; we have a place where you can put larger vias, and they can do that. So where IR drop is an issue on a critical path, they can do that with the way we designed our back end line.

Q15: With the first generation of 7nm, do you expect to be high volume production by the end of the year?

GP: By the end of the year or most likely in early 2019, with a couple of key partners. Our ASIC customers, of which there are quite a few, are also lead users of our 7nm process.

Q16: So with the first generation of 7nm, with quad patterning involved, whenever we talk to people involved in semiconductor design, if you mention quad patterning they tend to give an awkward look. As you have seen the semiconductor industry grow, are you fully confident in the quad patterning capabilities?

GP: From where we are applying it, that hasn't been a big challenge. It does make process longer, which has a knock-on effect when climbing the yield curve, but it has not been one of the major issues.

Q17: Does the first generation of 7LP target higher frequency clocks than 14LPP?

GP: Definitely. It is a big performance boost - we quoted around 40%. I don't know how that exactly will translate into frequency, but I would guess that it should be able to get up in the 5GHz range, I would expect.

Q18: So you would do a custom version of 7LP for IBM, who is currently running 5.2GHz on its 10 core chips - could you also perhaps translate that 40%?

GP: I'm not a system guy so I wouldn't want to commit IBM to it! But certainly, we are very focused on delivering to IBM the performance they need for the next generation, for power and integration into systems. Some of us have worked with them for many years, so we have a good understanding of what they need.

So now we know the difference between the GF low power and performance versions of their current nodes - so LP must use 2 fins and LPP must use 4 fins.
 
Has this been posted before in the thread:

https://www.anandtech.com/show/1243...ew-with-dr-gary-patton-cto-of-globalfoundries

Its an interview of the CTO of GF. It also talks about their roadmaps and what their new nodes bring to the table.

So now we know the difference between the GF low power and performance versions of their current nodes - so LP must use 2 fins and LPP must use 4 fins.

14nm LPP was/is 3Ghz, + 40% = 4.2Ghz, Of course the quoted speed it optimal, Ryzen on that 3Ghz node can and does run at anything upto and including 4.2Ghz
 
14nm LPP was/is 3Ghz, + 40% = 4.2Ghz, Of course the quoted speed it optimal, Ryzen on that 3Ghz node can and does run at anything upto and including 4.2Ghz
Yup, but what speed node is 12nm LP... if it is a 3.3Ghz node (10% improvement) then add 40% to that and it is a 4.6 Ghz optimal process.
 
The 2400G has no CCX interconect, its gaming performance is exactly the same as the 1500X, inter CCX latency being the fault of gaming performance is a whole lot of Male Bovine Manure.

I think you're right, but (I think) there's two reasons:
  1. The CCX impact was overblown, especially if you're running faster RAM and hence higher fabric clocks.
  2. The CCX impact will be lessened by the kernel/software understanding the architecture, i.e. a core on one CCX should try not to touch the memory/cache pool allocated on another CCX. Windows is pretty good at this these days - just look at Threadripper performance in NUMA mode.
 
That's a 9% bump in clock speed. if there is 5% IPC gain on top of that its not bad, not bad at all :)

2-1080.830233650.jpg


3.9 - 4.4 for the 2800X ?
 
Leaked Zen+ and Zen scores compared to measure IPC(taken off Reddit):

https://imgur.com/a/Do2dd
Only the last two graphs have value and, as Gavin said, the scaling means you have to be careful at interpreting them. The supposed IPC difference is between -1 and +9%, however most are within 2-3% which I would consider within the margin of error for tests like these, particularly when the source is questionable.

If new cpus can hit 4.4ghz consistently then I'll consider it a positive.
At the moment they mostly max out at 3.9
Agreed, a minor IPC bump coupled with a ~10% clock boost is reasonable. For comparison purposes, Skylake to Kaby Lake was only a ~5% clock bump with no IPC change.
 
If new cpus can hit 4.4ghz consistently then I'll consider it a positive.
At the moment they mostly max out at 3.9

Yup, that's basically the important part. The higher "reasonable speed" for the process could have that in range once pushed. If there's tangible IPC improvements at stock then it literally boils down to "is what you're running, ragingly single core?" If not, there's quite a bit of performance gain on top.
-1 to +9% IPC depending on what you're doing... as we move into more things being multicore, that single thread speed along with the precision 2 smoothing might keep it fast enough all round to keep it even or faster than Intel in 9/10 cases (it's a silly logical extension but... it kinda works as well. The parts with -1% ipc being the Intel wins.) with 2 extra cores, 4 extra threads and a similar price (mebbe a bit lower).

I'm genuinely not being too blinkered about it, there's going to be a lot of the gaming side Intel still does better on, but the lead is definitely reduced considerably.
 
Back
Top Bottom