Caporegime
So now the dust has settled, review sites have had a chance to take in Mantle and TheTechReport tested the XFX 290X against a reference 780Ti and used a couple of processors from a 4770K to a A10-7850K. It was only when the 7850K was paired with the 290X, did it outshine the 780Ti and shows that Mantle is good for relieving the CPU bottleneck. It seems even with the reported 8% performance gains from using Mantle, it still wasn't enough to beat the 780Ti in their findings on a 4770K CPU.
http://techreport.com/review/25995/first-look-amd-mantle-cpu-performance-in-battlefield-4/2
http://techreport.com/review/25995/first-look-amd-mantle-cpu-performance-in-battlefield-4/2
Battlefield 4
We tested Mantle performance versus Direct3D in Windows 8.1 using a couple of different processors, a Kaveri-based AMD A10-7850K APU and an Intel Haswell-based Core i7-4770K. The idea was to test in a CPU-constrained performance scenario using two processors with different levels of performance. In fact, I had hoped to show a lower level of CPU performance by including another AMD APU, a 65W Richland-based A10-6700. However, its performance turned out to be almost identical to that of the 95W Kaveri 7850K, so I held it out of our final results in order to keep things simple.
The main video card we used was a Radeon R9 290X card from XFX. This 290X comes with a custom cooler and sustains its peak Turbo clock almost constantly, even under the heaviest of loads. It essentially eliminates the clock speed and thermal variance issues we've seen with stock-cooled 290X cards. (I'll be writing more about this card soon.) To ensure the GPU wasn't the performance constraint, we tested BF4 at 1920x1080 on the "high" image quality presets, which is fairly easy work for a video cards of this power. We also tested at these same settings using a GeForce GTX 780 Ti, in order to see how Nvidia's Direct3D driver fares compared to AMD's D3D and Mantle implementations.
The known issue with occasional stuttering rears its head in one plot, for the 4770K with Mantle. You can't see the full size of the frame time spike on the plot, but it's 295 milliseconds—nearly a third of a second. We didn't see this sort of hiccup all that often, but it did happen during some test runs, including the one we plotted for the 4770K.
AMD has made some big claims for performance improvements from Direct3D to Mantle, and the numbers from the A10-7850K appear to back them up. The leap from an average of 69 FPS to 110 FPS is considerable by any standard, particularly for an API change that apparently produces the same visuals. Even better, our latency-focused metric, the 99th percentile frame time, tends to agree that Mantle is substantially faster than D3D in this case. Mantle also outperforms Direct3D in combination with the Core i7-4770K, but the differences aren't quite as dramatic.
One thing we didn't expect to see was Nvidia's Direct3D driver performing so much better than AMD's. We don't often test different GPU brands in CPU-constrained scenarios, but perhaps we should. Looks like Nvidia has done quite a bit of work polishing its D3D driver for low CPU overhead.
Of course, Nvidia has known for months, like the rest of us, that a Mantle-enabled version of BF4 was on the way. You can imagine that this game became a pretty important target of optimization for them during that span. Looks like their work has paid off handsomely. Heck, on the 4770K, the GTX 780 Ti with D3D outperforms the R9 290X with Mantle. (For what it's worth, although frame times are very low generally for the 4770K/780 Ti setup, the BF4 data says it's still mainly CPU-limited.)
The "time spent beyond X" graphs are our indicator of "badness," of how long frame production times exceed several key thresholds. Those intermittent stuttering episodes with the early Mantle driver show up in the beyond-50-ms results for the A10-7850K, even though we didn't see a hiccup of this size in every run. Since we're showing the median result from three runs, the spike we plotted for the 4770K doesn't show up at all here. (There were no such spikes in the other two test sessions.)
The big takeaway here comes from the "time spent beyond 16.7 ms" plot. You need to produce a frame every 16.7 milliseconds to achieve a smooth 60-FPS rate of animation. Mantle moves the A10-7850K much, much closer to that goal, even with that one big latency spike in the picture. If AMD can eliminate those hiccups, then slower CPUs like the 7850K should be capable of delivering a much smoother gaming experience than they can with Direct3D.
Conclusions
These are still early days for Mantle, but we can already see its ability to reduce CPU overhead rather dramatically compared to Direct3D. That's exactly the sort of innovation folks have wanted to see in PC gaming, and AMD and DICE are already delivering. One would hope this demonstration of a more modern approach to graphics programming would spur others (ahem, Redmond) to innovate in a way that can benefit the entire PC ecosystem.
There's lots of work yet to be done on Mantle. AMD needs to refine its drivers, add some key features, and improve performance scaling for its older GCN-based graphics chips. Meanwhile, in order for Mantle to really gain traction, EA and DICE will have to follow through on their promise to bring the Mantle rendering path to a host of other games based on the Frostbite 2 engine.
Based on these first results, the big beneficiaries of Mantle's proliferation will probably be folks who, for one reason or another, have a PC that isn't built to perform especially well in many of today's games. PCs with slower processors stand to gain the most.
That said, there are already some well-worn paths to very good gaming experiences on the PC today. The Haswell-based Core i7-4770K is faster than the A10-7850K regardless of the graphics API. Switching from AMD's Direct3D driver to Nvidia's will get you more than halfway to Mantle's performance on an A10-7850K, too. AMD would do well to work on improving its Direct3D drivers and CPUs, as well as pursuing Mantle development—but I'm sure they already know that. I'm happy to see AMD pushing innovation in graphics APIs at the same time.
We'll surely test Mantle's performance on a broader range of CPUs as it matures. I'm curious to play around with different core counts and to see whether low-power chips like Kabini can provide good gaming experiences with Mantle. Our next task, though, will be to see what performance benefits Mantle can deliver in GPU-limited scenarios. Stay tuned for that.