Big assumptions being made here, how does the load balancing work?
Does the card resognise that a game which has gone over, for the sake of argument 80 fps, gets cutoff from over using the shaders. Therefore freeing up the remaining shaders for something like tessellation.
Roff i don't understand how nVidia would be immune to stalling, any job that is depended on tessellation would have to wait for the processing to be done before being executed, whether on nVidia or ATi, so stalling is not unique to ATi.
It's a little simpler than that actually:
Each frame has a specific amount of calculation work to do (mostly add-multiply computations of some kind), before it is sent to the backend to be rendered. The "load balancing" algorithm will make an assessment of the amount of work required by each component (shaders, tessellation etc etc), and assign effort based on this assesment (i.e. dedicate certain clocks on certain shaders to one component or the other).
Of course, the load balancing algorithm will never give a perfect estimate of the amout of work required by one component or the other, but it can be iteratively improved. That is, if the algorithm under-estimates the amount of work required by the shaders in several consecutive frames (i.e. the shaders are the last to finish their workload) it will adjust the balance at the next frame. Anyway, the point is that this wil happen on a frame-by-frame basis, so the actual framerate the game is running at has no bearing. The goal of the load balancing is to finish each frame as quickly as possible.
...As for the question of which approach to tessellation (ATIs or nvidia's) is best: It's swings and roundabouts really. In situations where you have very little tessellation to do, it makes sense to utilise an external tessellator (like ATI do), as this will always finish before the rest of the workload of the frame, and won't take away from GPU performance at all. If you have a LOT of tessellation, then it's more efficient to balance it over the whole shader region (like nvidia do). In this case, if you have an external tessilator, the rest of the GPU will be just waiting for the tessellation unit to finish.
Anyway, that's what the nvidia graphs above are showing. Of course they have chosen scenarios that show a massive improvement over the 5870, but really all they are expressing is the different approach that to tessilation that has been taken by the two companies.
To summarise:
Very little tessellation: ATIs approach is more effective (no loss of GPU power).
Heavy tessellation: nvidia's approach is more effective (can utilise almost the entire GPU for tessilation)
Where the threshold between the two will occur, and how much tessilation will be used in real-world games, is anyone's guess at this point.