Intel to launch Ryzen killer - the Core i9-10990XE (rumour)

humbug · 20 Jan 2020 at 11:29

Sargatanas2511 said:
Oh I agree, I was thinking more a separate added on EDRAM die of some significant size for caching benefits. I don't see it happening either but it's all I can come up with for a way for them to give a bit of a performance boost to their new chips.

It would improve the IPC in some tasks, where it could cache more branch prediction data, however an EDRAM die wouldn't help with X86 execution, it defeats the objective of an L3 Cache, the bandwidth isn't any where near high enough and with it being external to the core die the latency is too high.

AMD use a huge L3 to get around the latency of having external Memory Controllers, Games move a lot of data around between the L3 and Ram, having an external IMC increases the latency of that which in turn reduces IPC, if your clocking 4 tasks per cycle having a 20ns delay moving that to the IMC bottlenecks the tasks you have executed, using a large L3 to queue those tasks helps stop the core from stalling out, or idling.

An external EDRAM just creates a bottleneck between the core and it and even without that it would only help with large exaction transfers, if you're executing 3 tasks per cycle compared to 4 your core is still slower, AMD's Zen 2 core is faster than Intel's Coffeelake core. Much faster.

Sargatanas2511 · 20 Jan 2020 at 11:42

humbug said:
It would improve the IPC in some tasks, where it could cache more branch prediction data, however an EDRAM die wouldn't help with X86 execution, it defeats the objective of an L3 Cache, the bandwidth isn't any where near high enough and with it being external to the core die the latency is too high.

AMD use a huge L3 to get around the latency of having external Memory Controllers, Games move a lot of data around between the L3 and Ram, having an external IMC increases the latency of that which in turn reduces IPC, if your clocking 4 tasks per cycle having a 20ns delay moving that to the IMC bottlenecks the tasks you have executed, using a large L3 to queue those tasks helps stop the core from stalling out, or idling.

An external EDRAM just creates a bottleneck between the core and it and even without that it would only help with large exaction transfers, if you're executing 3 tasks per cycle compared to 4 your core is still slower, AMD's Zen 2 core is faster than Intel's Coffeelake core. Much faster.

I understand, I am not passing it off as an answer to Zen 2, I was more musing about the possibility of it giving them something that could mean people have a reason to buy it over Zen2/3. Even if they can sell it as helping minimum frame rates in games or something along those lines.

The L2/L3 layout in Zen is way beyond what Intel can do right now, especially on a monolithic die and it's only going to get worse. I do wonder what Intel are working on in secret, surely it has to be a gamechanger.

Steampunk · 20 Jan 2020 at 11:56

Sargatanas2511 said:
The L2/L3 layout in Zen is way beyond what Intel can do right now, especially on a monolithic die and it's only going to get worse. I do wonder what Intel are working on in secret, surely it has to be a gamechanger.

How can they make it work if they can't manufacture it on 10nm/7nm ? Process node improvements have been integral to Intel's tick/tock philosophy. Everyone started having problems when moving to 10nm/7nm, but where AMD innovated a way past that with designs that were capable of being manufactured (and had TSMC to make it work), Intel's progress got badly stalled out. It could take Intel years to recover from these missteps, at a time when they thought it didn't matter and they could just keep selling incremental improvements to a ten year old architecture.

Intel has not just been caught out because they've been unable to find a way forwards (which they didn't think would impact them much), but because at the same time AMD has found a way through the same problems. You can't count Intel out, they have a massive pile of money, but they seem to be quite risk averse, and happy to milk the same thing for years. AMD doesn't have that luxury and have made a big jump ahead that will not be easy for Intel to catch up on.

Sargatanas2511 · 20 Jan 2020 at 12:02

Steampunk said:
How can they make it work if they can't manufacture it on 10nm/7nm ? Process node improvements have been integral to Intel's tick/tock philosophy. Everyone started having problems when moving to 10nm/7nm, but where AMD innovated a way past that with designs that were capable or being manufactured (and had TSMC to make it work), Intel's progress got badly stalled out. It could take Intel years to recover from these missteps, at a time when they thought it didn't matter and they could just keep selling incremental improvements to a ten year old architecture.

Intel has not just been caught out because they've been unable to find a way forwards (which they didn't think would impact them much), but because at the same time AMD has found a way through the same problems. You can't count Intel out, they have a massive pile of money, but they seem to be quite risk averse, and happy to milk the same thing for years. AMD doesn't have that luxury and have made a big jump ahead that will not be easy for Intel to catch up on.

you're right, I was thinking more of moving to chiplet designs to try and get the most out of 10nm or something like that. They must be spending their R&D money somewhere just now and there's no way in hell the constant refreshes is in their long term strategy.

They have to be building a future strategy now or they are in huge trouble. I don't know what they are planning or how they are planning to do it but they must be doing something big.

LePhuronn · 20 Jan 2020 at 12:23

Sargatanas2511 said:
They must be spending their R&D money somewhere just now and there's no way in hell the constant refreshes is in their long term strategy.

We've already seen glimpses of where Intel is going. Look at Foveros and Lakefield. They're already looking at 3D stacking and a chiplet approach, although not in the same way as AMD.

Intel always had plans to go in this direction, but they were only going to release them when they needed too and had 10nm working to support it. The trouble now is AMD have forced Intel into pushing this stuff out when they're not actually able to do so.

Sargatanas2511 · 20 Jan 2020 at 12:25

LePhuronn said:
We've already seen glimpses of where Intel is going. Look at Foveros and Lakefield. They're already looking at 3D stacking and a chiplet approach, although not in the same way as AMD.

Intel always had plans to go in this direction, but they were only going to release them when they needed too and had 10nm working to support it. The trouble now is AMD have forced Intel into pushing this stuff out when they're not actually able to do so.

I know about the 3D stacking but hadn't really seen anything about their on chiplet based tech. I am just thinking out loud rather than actually expecting or pretending I know what they will do anyway.

Thanks for the input guys, gives me something to look into

.

LePhuronn · 20 Jan 2020 at 12:34

Sargatanas2511 said:
I know about the 3D stacking but hadn't really seen anything about their on chiplet based tech. I am just thinking out loud rather than actually expecting or pretending I know what they will do anyway.

Thanks for the input guys, gives me something to look into .

From what I gather, Intel are doing this hybrid thing with Lakefield where they have a 10nm dual core which is pretty damn quick (clocks are reasonable but IPC is a huge jump) and that acts as the primary processor. Then any workloads which require more cores get offloaded onto a comparatively big-ass 14nm secondary CPU. The idea being single-thread operations run on the IPC monster, multi-threaded operations run on the core monster. Throw in 3D stacking to get some common cache and latencies down. Take the best yields for laptop parts where power efficiency is king, and theoretically use the leaky, inefficient cruft for desktop.

Sargatanas2511 · 20 Jan 2020 at 12:41

LePhuronn said:
From what I gather, Intel are doing this hybrid thing with Lakefield where they have a 10nm dual core which is pretty damn quick (clocks are reasonable but IPC is a huge jump) and that acts as the primary processor. Then any workloads which require more cores get offloaded onto a comparatively big-ass 14nm secondary CPU. The idea being single-thread operations run on the IPC monster, multi-threaded operations run on the core monster. Throw in 3D stacking to get some common cache and latencies down. Take the best yields for laptop parts where power efficiency is king, and theoretically use the leaky, inefficient cruft for desktop.

That's a very complicated way to get around it, it's almost following the Big.LITTLE setup that ARM uses for its phones with additional 3D stacking. It makes sense for current use but things are getting more multithreaded every year, so I wonder how it would fare in 3-4 years time doing something like that.

alec · 20 Jan 2020 at 12:50

LePhuronn said:
The idea being single-thread operations run on the IPC monster, multi-threaded operations run on the core monster

Good luck teaching (Windows) scheduler what tasks should go where.
I think even Big.LITTLE setup just runs all big or all little cores at one time, to take that decision away from kernel.

The idea of dedicated fast cores has been implemented. AMD and Intel are marking their best clocking cores already. (And after couple years maybe Windows can figure it out without 1usmus helping them). It gives low-threaded tasks a boost when they run on selected cores
So we get best of both worlds. All cores are same architecture and capability, so that there is no added complexity or wasted silicon for multithread workloads. And in low load cores can boost higher to mimic your high IPC monsters.

LePhuronn · 20 Jan 2020 at 12:56

alec said:
Good luck teaching (Windows) scheduler what tasks should go where.

I'm sure Microsoft would be very receptive to Intel's requests to get the scheduler updated

humbug · 20 Jan 2020 at 13:06

alec said:
Good luck teaching (Windows) scheduler what tasks should go where.
I think even Big.LITTLE setup just runs all big or all little cores at one time, to take that decision away from kernel.

The idea of dedicated fast cores has been implemented. AMD and Intel are marking their best clocking cores already. (And after couple years maybe Windows can figure it out without 1usmus helping them). It gives low-threaded tasks a boost when they run on selected cores
So we get best of both worlds. All cores are same architecture and capability, so that there is no added complexity or wasted silicon for multithread workloads. And in low load cores can boost higher to mimic your high IPC monsters.

AMD already tried something similar, Bulldozer. 8 half IPC Integer units that could be switched to 4 independent full IPC Integer units, low IPC Multi Threading and / or High IPC single threading.

Guess what, Windows looked at it and just treated it as an 8 thread CPU if it was working with 1 thread or 8, so you never actually got the "High IPC Big Core"

Sargatanas2511 · 20 Jan 2020 at 13:56

humbug said:
AMD already tried something similar, Bulldozer. 8 half IPC Integer units that could be switched to 4 independent full IPC Integer units, low IPC Multi Threading and / or High IPC single threading.

Guess what, Windows looked at it and just treated it as an 8 thread CPU if it was working with 1 thread or 8, so you never actually got the "High IPC Big Core"

I think it's doable but whether MS want to put the time into getting it sorted out is another matter. A quick dual core will be irrelevant in a few years time anyway in my opinion so would need to be upped to a quad pretty quick.

Harlequin · 20 Jan 2020 at 16:22

Being that Zen is a Jim Keller design - he was hired by AMD in 2012 on a 3 year contract and left in 2015 - shortly after AMD formally announced Zen as the new product. The launch was December 2016.

Using those time frames - Keller was hired by Intel in 2018 - if he leaves next year, it`ll be late 2021 or 2022 before they are `back in the game`.

LePhuronn · 20 Jan 2020 at 16:46

Harlequin said:
it`ll be late 2021 or 2022 before they are `back in the game`.

Assuming Intel have a working process node to support the new designs. It's all well and good having a great new arch, but if it's hamstrung by 14nm+++++++++++++++++++++++++ then it's not worth much. Unless the design is node-agnostic and can be throw at whatever's viable at the time, which has been muttered by a couple of YouTubers before.

Steampunk · 20 Jan 2020 at 17:40

LePhuronn said:
Assuming Intel have a working process node to support the new designs. It's all well and good having a great new arch, but if it's hamstrung by 14nm+++++++++++++++++++++++++ then it's not worth much. Unless the design is node-agnostic and can be throw at whatever's viable at the time, which has been muttered by a couple of YouTubers before.

Is there such a thing as "node agnostic" in the world of high performance CPUs with billions of transistors? I'm sure you can design stuff like that, but not if you don't want it to be massively slower and less powerful than your competitors who have matched an optimum design for a particular process node.

I don't see Intel coming up with anything competitive until they can get to 10nm/7nm/5nm. At the same time AMD is advancing and are rumoured to have made yet more IPC gains with Ryzen 4.

LePhuronn · 21 Jan 2020 at 09:40

Steampunk said:
Is there such a thing as "node agnostic" in the world of high performance CPUs with billions of transistors? I'm sure you can design stuff like that, but not if you don't want it to be massively slower and less powerful than your competitors who have matched an optimum design for a particular process node.

Polaris 20 worked rather well copy-pasted to 12nm from its original 16nm (14nm?) design, Vega 20 worked perfectly fine moving from 14nm to 7nm, so I guess it's entirely possible you can create something which can be built on multiple nodes (although those 2 examples are more by fluke then by design

)

But I agree, in the real world I don't think you can realistically be node agnostic. You could probably take something designed for a big'un and shrink to a small'un, but performance and efficiency would more than likely tank if you designed something for a small'un but had to scale it back up to a big'un.

Oh wait, that's Ice Lake!

Competitor rules

Intel to launch Ryzen killer - the Core i9-10990XE (rumour)

More options

humbug

humbug

Sargatanas2511

Sargatanas2511

Steampunk

Steampunk

Sargatanas2511

Sargatanas2511

LePhuronn

LePhuronn

Sargatanas2511

Sargatanas2511

LePhuronn

LePhuronn

Sargatanas2511

Sargatanas2511

alec

alec

LePhuronn

LePhuronn

humbug

humbug

Sargatanas2511

Sargatanas2511

Harlequin

Harlequin

LePhuronn

LePhuronn

Steampunk

Steampunk

LePhuronn

LePhuronn