• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

Just 19-20 years ago, extra L3 cache or any L3 cache used to help in almost all workloads

Associate
Joined
28 Jun 2022
Posts
371
Location
United States

I remember in the Fall 2003 reading about Intel's Pentium 4 Socket 478 Extreme Edition and it had L3 cache of 2MB for like $1000 and I was intrigued but did not buy it.

The regulars Pentium 4 Northwoods had no L3 cache.

Now we have the Ryzen 7800X3D vs 7700X and 5800X vs 5800X3D. Difference is there is 96MB L3 cache vs only 32MB.

Today yet almost all workloads except gaming do better with the lower cache amount.

Yet back then, almost every workload including games did better with Pentium 4 Northwood Extreme edition that had 2MB L3 cache as opposed to the no L3 cache Pentium 4 Northwoods.

Why is that. Is it all about lower clocks on extra cache Ryzen counterparts where as Pentium 4 EE 2MB L3 cache had equal or higher clocks.

Or has more changed in 20 years where extra L3 cache even at same clock speeds can be a disadvantage on some CPUs today as opposed to the early 2000s?
 
Last edited:
it’s all about correctly sizing the cache for the workload.


SO basically is the extra cache kind of a gimmick in that it onky benefits games where as 32MB L3 is optimized proper cache size for whole set of workloads?

Was basically the extra 64MB stacked on top kind of a test phase and not optimized except for gaming workloads. But if AMD tried to design CPU with extra cache and optimize it it would be better in all workloads.

I mean did Intel back then with the Pentium 4 Extreme Edition optimize L3 cache to benefit all workloads and it was nit just a gimmick add on for only specific workload types?
 
SO basically is the extra cache kind of a gimmick in that it onky benefits games where as 32MB L3 is optimized proper cache size for whole set of workloads?

Was basically the extra 64MB stacked on top kind of a test phase and not optimized except for gaming workloads. But if AMD tried to design CPU with extra cache and optimize it it would be better in all workloads.

I mean did Intel back then with the Pentium 4 Extreme Edition optimize L3 cache to benefit all workloads and it was nit just a gimmick add on for only specific workload types?

No, I’ve long felt expanded cache is a way to gain performance especially in the right environment.

You’re example isn’t a good one. The K8 family of chips made short work of the Pentium 4 family with or without the same level cache. The regular P4 simply didn’t have enough cache for it’s architecture, so increasing the amount of cache helped a lot. It also come with some big disadvantages though…

Ryzen 7000 parts are pretty well sized for the majority of tasks, but just adding more cache to any architecture isn’t going to make everything more better all the time.
 
Last edited:
Cache is used to store data that the prefetch thinks the CPU cores will need before they need it. Pulling data from any cache L1/L2/L3 is much faster than getting it from RAM when its needed. In workloads that have tightly packed data, the prefetch will get most of the required data before its needed while the cores do work in parallel. If the cache is too small, some data will need to wait to be pulled in and cores sit idle, if it’s too big, it is not helpful and just add latency. Workloads that use a lot of small random IO do not benefit much from cache as the prefetch will not be able to guess what is needed. For this type of work, low latency RAM is more beneficial. An AMD CCX die is more than 50% L3 cache, they could easily double the core count by removing the L3 but overall, the CPU would be a lot slower for most things.
 
A lot of processing churns through data in a nice linear fashion. Here, the prefetchers fetch data before its required so massive cache doesn't help.
Games on the other hand are constantly accessing data all over the place, so keeping more of it in cache makes a difference.
 
A lot of processing churns through data in a nice linear fashion. Here, the prefetchers fetch data before its required so massive cache doesn't help.
Games on the other hand are constantly accessing data all over the place, so keeping more of it in cache makes a difference.

Yep it's the random nature of it - games inherently are random and it helps to try to predict and preload data if you can and store it in somewhere that's low latency amd easy to access when the cpu needs it.
 
I don’t think games use that much random IO, they use some, but game engines are very organized batch possessors. Everything is in large tightly packed and aligned buffers. They also use a lot of SIMD/AVX which must be tightly packed and aligned to work. Feeding these buffers to the CPU is very cache friendly as it easy to tell what is needed next. Random IO is NOT cache friendly no matter how much L3 you have.
 
I don’t think games use that much random IO, they use some, but game engines are very organized batch possessors. Everything is in large tightly packed and aligned buffers. They also use a lot of SIMD/AVX which must be tightly packed and aligned to work. Feeding these buffers to the CPU is very cache friendly as it easy to tell what is needed next. Random IO is NOT cache friendly no matter how much L3 you have.

Does more L3 cache often make random IO worse even if clock speed is the same?
 
Now we have the Ryzen 7800X3D vs 7700X and 5800X vs 5800X3D. Difference is there is 96MB L3 cache vs only 32MB.

Today yet almost all workloads except gaming do better with the lower cache amount.

Nothing to do with having a Lower amount of cache at all - in none cache sensitive situations the 7700X and 5800X have a clock speed advantage over the 3D parts.

It may "only" be a few hundred Mhz, but in the case of properly threaded apps, thats a few hundred Mhz * 8 threads for example.


(Whereas the P4 Extreme Edition you reference was the same clock speed as the normal cache version, so wasn't disadvantaged in none-cache sensitive apps)
 
Last edited:
Nothing to do with having a Lower amount of cache at all - in none cache sensitive situations the 7700X and 5800X have a clock speed advantage over the 3D parts.

It may "only" be a few hundred Mhz, but in the case of properly threaded apps, thats a few hundred Mhz * 8 threads for example.

I’d think the non v-cache chips can also hit the L3 cache in slightly less clocks.

Op just needs a wafer scale CPU with a massive pool of L1 cache.
 
Why is that. Is it all about lower clocks on extra cache Ryzen counterparts where as Pentium 4 EE 2MB L3 cache had equal or higher clocks.

This is the reason. The 8 core 7800x3d is clocked lower than the corresponding 8 core 7700x and the loss of performance in most professional tasks is in direct proportion to the lower clock speeds.

Like Cyber-Mav says above , the x3d chips have to run slower with lower Vcore because of the difficulties with cooling the cpu with the 3d stacked cache sitting on top of it. The Ryzen X3D chips have the caches stacked on top of the CCD and to dissipate the heat generated by the cpu it has to mostly pass through the stacked ram to be dissipated by the cooling solution on top of the IHS.
 
If you look closer its not the extra cache that causes a performance hit outside of gaming. Its the drawbacks that come with that cache such as lower clock speed and thermal headroom that cause the performance hit not extra cache capacity.

Oh ok I see. So basically at same clock speeds, extra cache should make no difference or actually help in all other workloads?

Its just that thermal headroom is much weaker so clock speeds are more volatile and can go lower and even much lower causing performance loss?
 
Oh ok I see. So basically at same clock speeds, extra cache should make no difference or actually help in all other workloads?

Its just that thermal headroom is much weaker so clock speeds are more volatile and can go lower and even much lower causing performance loss?

Depends if the cache is required. Talking in very general terms, data flow will show peaks and troughs.
 
Last edited:
anyone done benchmarks of a 7700x and 7800x3d both locked to same clocks to isolate performance gain of just the cache?

I haven’t. You’d have to lock them at the same clockspeed/power level and measure the differences in l3 hit rates. Thats probably well beyond the average reviewer and I’m not sure the data would be that relevant.
 
that data would prove if more l3 cache hinders performance as the op has stated. my suspicions are that the cpu with more cache would out perform the one with less cache in everything tested, clock for clock at same power targets.
 
Back
Top Bottom