My experience of compute workloads is limited, so just supposition on my part. In my experience in games, especially Unreal which fundamentally is a disaster in terms of cache efficiency, and multi-threading....we're usually battling quirks in the engine all the time, certainly getting nowhere close to optimal hardware performance. You can get your data all nicely lined up and chew through it cache-efficient fashion, but then in order to realise the results in the UWorld you have to go back to single-threaded, cache-disaster zone.From my own experience (may not be as extensive as yours, feel free to correct me!) I've found the larger L3 cache's have more impact on your first use case than your second. that doesn't mean the second isn't always the quickest, but with more scope and range for branch prediction it softens the blow for cache misses?
I've used Intel PCM on Xeon processors to figure out where our bottlenecks are and just noted that we get more of an uplift with larger L3 cache for more algorithmic/logical threads than not.. The data processing threads tend to fit nicely in L2's range anyway (once optimised), but larger or more varied datasets do see an increase. We do encoding of large data streams at times, those algorithms are too heavy to 'fit' fully in L2 and our hit rates decrease so the L3 size often does help then.
As a side note, when we have not seen any performance difference between hardware generations with more L3/better features, it's normally been data synchronisation issues between threads (so stalling the CPU constantly) or worse, heavy context switching on data processing, that gets very low hit ratios on all the cache.
There's a lot of work been done in UE5 to be fair to address these issues, but it's still pretty painful to work with.