CPUs with many cores have been a thing for a long time, but games haven't really caught up - until now. Some time ago I noticed that Mount & Blade 2: Bannerlord uses a lot of CPU. And I mean a lot; it will fill my 5950X up with 32 threads, peaking at 90% total CPU use. It's easy to see how it could use that much, with 1000 combatants at once, all with AI and physics, and the game looks nice too. So, with a properly multi-threaded game available, I thought I would do a bit of benchmarking to see how it scales.
The game has a handy built-in benchmark that can record the FPS of every frame and save it to a file, allowing me to produce some confusing, vomit-coloured charts. The benchmark records "CPU FPS", which presumbaly is the FPS you would get if the GPU was infinitely fast. This is the number that we want here. It records "GPU FPS" too, which I ignored. I ran the benchmark at 4k, "Very High" settings. Full settings and system specs at the end.
I used the start /affinity command to limit the game to specific cores. As far as I can tell, the game actually spawns enough worker threads to fit the number of logical cores, not affected by CPU affinity, so running the game like this will be slightly slower than a real low-core-count CPU due to thread overhead. I can't easily tell how much the overhead is, but it's probably not enough to change the results.
The hardest part of this was trying to visualize the results in a way that makes the difference clear. I ended up with multiple charts, starting with frame times for every frame:
It's pretty obvious that 4 threads isn't enough, with much of the plot well off the top of the chart, which ends at 20ms (50fps). 8 threads is playable, staying below 16.7ms (60fps) most of the time. There continue to be improvements all the way up to 32 threads, especially in the consistency of frame times. That's easier to see when the chart is zoomed in to show only 1s of frames:
Now you can see that it takes 24 threads to get something that looks like a straight line.
Plotting every frame makes it hard to see the overall difference because there are so many points, so I also plotted FPS with a moving average window of 30 frames:
Now you can see that the scaling from 4 to 8 threads is really good, and there's still a fairly large jump from there to 12 threads. Returns quickly diminish above that, but performance is still increasing up to 32 threads, and it looks like it would keep scaling even further. It's really impressive; we need more devs that can write game engines like this.
And just for completeness, I also did some benchmarks with the CPU affinity set to alternating cores, to simulate a CPU without hyperthreading:
Physical cores perform better, but SMT does help quite a bit.
Conclusion: Moar cores is moar betterer.
Setup:
Game version 1.0.3.9860. Settings, as recorded by the benchmark itself:
Not mentioned in that list is that the FPS limit was set to the maximum 360 (not sure if that has any effect on the benchmark), and the resolution was 3840x2160, 165Hz.
System specs:
The game has a handy built-in benchmark that can record the FPS of every frame and save it to a file, allowing me to produce some confusing, vomit-coloured charts. The benchmark records "CPU FPS", which presumbaly is the FPS you would get if the GPU was infinitely fast. This is the number that we want here. It records "GPU FPS" too, which I ignored. I ran the benchmark at 4k, "Very High" settings. Full settings and system specs at the end.
I used the start /affinity command to limit the game to specific cores. As far as I can tell, the game actually spawns enough worker threads to fit the number of logical cores, not affected by CPU affinity, so running the game like this will be slightly slower than a real low-core-count CPU due to thread overhead. I can't easily tell how much the overhead is, but it's probably not enough to change the results.
The hardest part of this was trying to visualize the results in a way that makes the difference clear. I ended up with multiple charts, starting with frame times for every frame:
It's pretty obvious that 4 threads isn't enough, with much of the plot well off the top of the chart, which ends at 20ms (50fps). 8 threads is playable, staying below 16.7ms (60fps) most of the time. There continue to be improvements all the way up to 32 threads, especially in the consistency of frame times. That's easier to see when the chart is zoomed in to show only 1s of frames:
Now you can see that it takes 24 threads to get something that looks like a straight line.
Plotting every frame makes it hard to see the overall difference because there are so many points, so I also plotted FPS with a moving average window of 30 frames:
Now you can see that the scaling from 4 to 8 threads is really good, and there's still a fairly large jump from there to 12 threads. Returns quickly diminish above that, but performance is still increasing up to 32 threads, and it looks like it would keep scaling even further. It's really impressive; we need more devs that can write game engines like this.
And just for completeness, I also did some benchmarks with the CPU affinity set to alternating cores, to simulate a CPU without hyperthreading:
Physical cores perform better, but SMT does help quite a bit.
Conclusion: Moar cores is moar betterer.
Setup:
Game version 1.0.3.9860. Settings, as recorded by the benchmark itself:
Battle Size, 360 vs 640
Antialiasing Technique, Very High
Character Detail, Very High
Decal Quality, Very High
Enable Cloth Simulation, Very High
Enable Flora Sway, Very High
Environment Detail, Very High
Foliage Quality, Very High
Lighting Quality, Very High
Number of Ragdolls, Very High
Occlusion Technique, Very High
Particle Detail, Very High
Particle Quality, Very High
Postfx Bloom, Very High
Postfx Chromatic Aberration, Low
Postfx Dof, Very High
Postfx Grain, Very High
Postfx Hexagon Vignette, Custom
Postfx Lens Flares, Low
Postfx Motion Blur, Very High
Postfx SSR, Very High
Postfx SSSSS, Very High
Postfx Streaks, Low
Postfx Sun shafts, Very High
Postfx Vignette, Very High
Shader Quality, Very High
Shadow map Filtering, Very High
Shadow map Resolution, Very High
Shadow map Type, Very High
Terrain Quality, Very High
Tesselation, Very High
Texture Budget, Very High
Texture Filtering, Very High
Texture Quality, Very High
Water Quality, Very High
Antialiasing Technique, Very High
Character Detail, Very High
Decal Quality, Very High
Enable Cloth Simulation, Very High
Enable Flora Sway, Very High
Environment Detail, Very High
Foliage Quality, Very High
Lighting Quality, Very High
Number of Ragdolls, Very High
Occlusion Technique, Very High
Particle Detail, Very High
Particle Quality, Very High
Postfx Bloom, Very High
Postfx Chromatic Aberration, Low
Postfx Dof, Very High
Postfx Grain, Very High
Postfx Hexagon Vignette, Custom
Postfx Lens Flares, Low
Postfx Motion Blur, Very High
Postfx SSR, Very High
Postfx SSSSS, Very High
Postfx Streaks, Low
Postfx Sun shafts, Very High
Postfx Vignette, Very High
Shader Quality, Very High
Shadow map Filtering, Very High
Shadow map Resolution, Very High
Shadow map Type, Very High
Terrain Quality, Very High
Tesselation, Very High
Texture Budget, Very High
Texture Filtering, Very High
Texture Quality, Very High
Water Quality, Very High
System specs:
CPU: AMD Ryzen 5950X. Undervolted with core optimizer.
RAM: 64GB DDR4 3600
GPU: Sapphire 7900XTX Nitro+ (stock)
OS: Windows 10 LTSC
RAM: 64GB DDR4 3600
GPU: Sapphire 7900XTX Nitro+ (stock)
OS: Windows 10 LTSC