I get that. You submit a request, get a callback. All that you said is correct and so is the link. From a client point of view it's all about asking for something and moving on to do other things while that something gets done. You can do it from a single thread like you said, or from multiple threads (if you are already multi-threaded).
But I'm not talking about the client side of things.
I'm referring to the driver+hardware on the other side that actually runs asynchronously.
AMD can dispatch work from driver to hardware in parallel, whereas NVidia cannot: it can only interrupt one thing for another very fast (in Pascal).
So that where you get the steady 5-10% from in favour of AMD.
It's two legitimate ways to implement an async service. Both are valid. One just generally runs faster than the other in most cases (not all).
" whereas NVidia cannot: it can only interrupt one thing for another very fast (in Pascal)." This is just plain wrong, on Pascal (and Maxwell) compute and graphic work is being executed in parallel, and it is trivial to see that, e.g. look at Futuremarks GPUs analysis:
http://www.futuremark.com/pressreleases/a-closer-look-at-asynchronous-compute-in-3dmark-time-spy
Above is a corresponding trace from an NVIDIA GTX 1080. As can be seen the general structures resemble those which are found on AMD Radeon Fury, albeit with extra queues that do not originate from the engine and which contain only synchronization items. From this image we can see that the GTX 1080 has an additional compute queue which accepts packets in parallel with the 3D queue.
Timespy doesn't enable asynchronous multi-engine support for Maxwell but there are other benchmarks out there where you can achieve similar multi-engine DX 12 support. The problem with Maxwell is the gains are smaller and the architecture is much more sensitive to the developer correctly balancing loads.
Last edited: