What I don't understand is WHY are Nvidia not taking Async fully on board and utilising it properly? Makes no sense to me, surely it's a win win?? It can't be a technical hurdle too high given their resources?!
Because it just isn;t that important, however much AMD try to market it as such.
Async can allow better utilization of GPU that is not being utilized properly. It is a solution to a problem that nvidia GPUs just don't have to such a high degree, there are other bottle necks. And even if there was a huge utilization problem with nvidia GPUs there are more solutions than simply sync compute, for example, removing the bottlenecks that are preventing full utilization of the GPU (command processors, geometry throughput, tessellation,scheduling).
As a user you only care about performance, there are may different ways to gain performance. Async is one possible way if the GPU has compute utilization issues. It isn't a feature like tessellation or fragment shaders etc. that have to be done in order to correctly render the scene at playable frame rates in hardware.
As simplistic view with completely made up number to try and illustrate why AMD and nvidia view async differently:
- AMD GPUs might get 70% of their theoretical performance. Using Async carefully they can get another 15-20% boost under some scenarios if the developers can implement a lot of async shaders. Enabling this kind of performance bump requires a significant cost in transistors, transistors that could be used for other things that make games faster (tessellation, ROPs, TMUs, compression, cache).
- Conversely, Nvidia's Maxwell might work at 90% utilization. Adding Async to the same level as AMD does might add 5-7% performance, but again has a transistor cost. A simplified multi-engine system with software scheduler but hardware based dynamic load-balancing and very fine gained preemption might gain you 3-4% utilization for much less transistors, this is what we get with Pascal.
- The end result being that Pascal might get 90-95% utilization, again older AMD hardware at 85-90% with AMD only getting that boost if developer put in the effort (but the effort is much less for AMD GPUs than Nvidia).
- These numbers are made up but look at the theoretical performance of the Fiji FuryX compared to 980Ti. The Fury should destroy 980ti, it doesn't, because AMD have serious utilization issues and their real-world performance does not stack up to their theoretical performance. Maxwell with far less compute capability through fewer shaders is faster in a vast majority of games. Nvidia spent their transistor budget making the GPU reach closer to its theoretical limit and not get bottle necked by geometry, ROPs or the command processor. AMD developed a GPU with massive compute resources that it can't properly use and are bottle necked in other parts of the GPU.
If you look at the leaks of the Polaris GPU they are are actually moving in direction of Maxwell. Polaris 10 looks to have much less compute units/shaders. They are spending the transistor budget on trying to utilize the shaders better and not get bottle necked elsewhere. If you look at Nvidia's GPU releases the number of Compute units has grown that fast, their design efforts have been in keep the compute units well utilized. Async may be far less useful on Polaris than Hawaii and Fiji....
In the future Async compute will liekly be more important as GPU add more and more compute units it gets harder and harder to keep them all fed and balanced. But that is probably at least 2 generation away where it become critical.