Asynchronous Compute
Modern gaming workloads are increasingly complex, with multiple independent, or “asynchronous,”
workloads that ultimately work together to contribute to the final rendered image. Some examples of
asynchronous compute workloads include:
GPU-based physics and audio processing
Postprocessing of rendered frames
Asynchronous timewarp, a technique used in VR to regenerate a final frame based on head
position just before display scanout, interrupting the rendering of the next frame to do so
These asynchronous workloads create two new scenarios for the GPU architecture to consider.
The first scenario involves overlapping workloads. Certain types of workloads do not fill the GPU
completely by themselves. In these cases there is a performance opportunity to run two workloads at
the same time, sharing the GPU and running more efficiently—for example a PhysX workload running
concurrently with graphics rendering.
For overlapping workloads, Pascal introduces support for “dynamic load balancing.” In Maxwell
generation GPUs, overlapping workloads were implemented with static partitioning of the GPU into a
subset that runs graphics, and a subset that runs compute. This is efficient provided that the balance of
work between the two loads roughly matches the partitioning ratio. However, if the compute workload
takes longer than the graphics workload, and both need to complete before new work can be done, and
the portion of the GPU configured to run graphics will go idle. This can cause reduced performance that
may exceed any performance benefit that would have been provided from running the workloads
GeForce GTX 1080 Whitepaper GeForce GTX 1080 GPU Architecture In-Depth
| 15
overlapped. Hardware dynamic load balancing addresses this issue by allowing either workload to fill the
rest of the machine if idle resources are available.
Figure 10: Pascal's Dynamic Load Balancing Reduces GPU Idle Time When Graphics Work Finishes Early,
Allowing the GPU to Quickly Switch to Compute
Time critical workloads are the second important asynchronous compute scenario. For example, an
asynchronous timewarp operation must complete before scanout starts or a frame will be dropped. In
this scenario, the GPU needs to support very fast and low latency preemption to move the less critical
workload off of the GPU so that the more critical workload can run as soon as possible.
As a single rendering command from a game engine can potentially contain hundreds of draw calls, with
each draw call containing hundreds of triangles, and each triangle containing hundreds of pixels that
have to be shaded and rendered. A traditional GPU implementation that implements preemption at a
high level in the graphics pipeline would have to complete all of this work before switching tasks,
resulting in a potentially very long delay.
To address this issue, Pascal is the first GPU architecture to implement Pixel Level Preemption. The
graphics units of Pascal have been enhanced to keep track of their intermediate progress on rendering
work, so that when preemption is requested, they can stop where they are, save off context information
about where to start up again later, and preempt quickly. The illustration below shows a preemption
request being executed.