While Nvidia has led GPU performance for some time—bar AMD's impressive turn with the release of the 290X back in 2013—in recent months it's suffered a few setbacks when it comes to DirectX 12 and performance under Stardock's Ashes of the Singularity. The problem for Nvidia has been asynchronous shaders, or rather, the lack of them in its hardware. AMD took a gamble early on when designing its GCN range of GPUs (the 7000-series and up) with hardware-based asynchronous shaders. These allow its GPUs to take the multithreaded workloads of DX12 and execute them in parallel and asynchronously, greatly improving performance over serial processing.
Pascal still doesn't have hardware-based asynchronous shaders. In DX12 games like Ashes of the Singularity that take advantage of them, Nvidia doesn't enjoy the same kind of performance boost as AMD. In early tests it even dropped in performance, although recent driver updates have seen Nvidia cards at least achieve parity between DX11 and DX12.
Enlarge
Instead of asynchronous shaders, Pascal uses a technique called pre-emption. Effectively, this enables the GPU to prioritise one set of more complex tasks over another (for example, preferencing compute tasks like physics over graphics). The trouble is, longrunning compute jobs can end up monopolising the GPU. This was a particular issue for Maxwell, where the GPU could only pre-empt tasks at the end of each command. That means extra time spent waiting for the command to end increasing latency.
Pascal implements pixel level pre-emption, allowing the GPU to pause smaller tasks at any point in order to save the status of them to memory while bigger tasks complete. It's an interesting solution, but it still doesn't replace the performance of hardware-based asynchronous shaders. Fortunately for Nvidia, even with the increasing number of DX12 games being released, few of them take full advantage of asynchronous shaders. Fewer still have shown any real improvement in performance over DX11.
That will change over time (spoiler: it does a little here too), but there's more work required on the developer side to support the low-level hardware features of DX12. Right now, most simply aren't bothering. That's not to mention that despite its lack of async, Nvidia has one very big advantage over the competition: clock speed.