GDC: Async Compute What Nvidia says

CAT-THE-FIFTH · 27 Mar 2016 at 00:10

More details here:

http://translate.google.com/transla...14558/gdc-async-compute-qu-en-dit-nvidia.html

We have of course used the GDC to question Nvidia in order to learn more about what its capable GPU in terms of management of multi engine DirectX 12 ... without real success?

At the heart of DirectX 12, this feature allows to decompose rendering several lines of commands, which can be of copy, graphics or compute, and manage synchronization between the files. What to allow developers to take control over the order in which tasks are executed or to directly drive the multi-GPU. This decomposition allows in some cases to take advantage of the GPU ability to handle multiple tasks in parallel to boost performance.

This is what AMD calls Async Compute, although the term may not be correct. Indeed, the asynchronous execution of a task does not imply that it be treated concomitantly another, yet it is this last point that is crucial and allows a performance gain. AMD GPU advantage of multiple orders of processors capable of powering the GPU computing units from several different files. Treatment simultaneous tasks that maximizes the use of all GPU resources processing units, memory bandwidth etc.

Nvidia side is more complicated. If the GeForce are able to support copy files in parallel compute and graphics files, process the last two concomitantly seems problematic. Theoretically Maxwell 2 GPUs (GTX 900) have a command processor can handle 32 lines of which may be type graphics. Yet this support is still not functional in practice, as shown by example in the performance of GeForce Ashes of the Singularity.

Why ? So far we were able to get real answer to Nvidia. So of course we wanted to take advantage of the GDC to try to learn more and have questioned Nvidia at a meeting organized with Rev Lebaredian, Senior Director GameWorks. Unfortunately for us, this engineer who is part of the technical support group for video game developers was very well prepared for these issues that affect the multi engine support specificities. His answers were initially verbatim those of the brief official statement Nvidia communicated to the technical press in recent months. Namely "GeForce Maxwell can support running concurrently at the SM (groups of processing units)", "it is not yet active in the pilot," "Ashes of the Singularity is one set (not too important) among others. "

An unusual wooden language that shows, if it were still needed, that this issue bothers Nvidia. So we changed the approach to the impasse we have approached the subject from a different angle: is the Async Compute is important (for Maxwell GPU)? What Lebaredian Rev relax and open the way for a much more interesting discussion. Two arguments are then developed by Nvidia.

First, if Async Compute is a way to increase performance, what matters in the end it is the overall performance. If GeForce GPUs are the most efficient basis than the Radeon GPU, the use of multi engine in an attempt to boost their performance is not a top priority.

On the other hand, if the rate of use of the various blocks of the GeForce GPU is relatively high at the base, the potential gain from Async Compute is less important. Nvidia says here that overall there are far fewer holes (bubbles in language GPU) at the activity of units of its GPU than its competitor. But the purpose of concurrent execution is to exploit synergies in the treatment of different tasks to fill these "holes".

Behind these arguments lie Nvidia actually one of the good planning of a GPU architecture. Integrate into chips one or more advanced control processors at a cost, a cost that can eg be exploited differently to provide more computing units and boost performance directly in up games.

When developing a GPU architecture, much of the work is to provide a profile of tasks that will be supported when the new chips will be marketed. The balance of the architecture between its different types of units, among the computational power and memory bandwidth between the triangles rate and pixel throughput, etc., is a crucial point that requires good visibility, a lot of pragmatism and a strategic vision. It is clear that Nvidia is rather pulls well at this level for several generations of GPUs.

To illustrate this, let's do a few comparisons between GM200 and Fiji on the basis of results obtained in Ashes of the Singularity not Async Compute. The comparison is rough and approximate (the exploited GM200 is from the GTX 980 Ti, which operates in a slightly castrated view) but still interesting:

GM200 (GTX 980 Ti): 6.0 fps / Gtransistors, 7.8 fps / TFLOPS, 142.1 fps / TB / s
Fiji (R9 Fury X): 5.6 fps / Gtransistors, 5.8 fps / TFLOPS, 97.9 fps / TB / s
We could do the same with many games and the result would be similar or even greater (AOTS is particularly effective on Radeon): the GM200 better utilize resources at its disposal than Fiji. It is an architecture of choice, which does not directly involve it is better than another. Increase the yield of some units may cost more than the increase in their number in a greater measure. The work of architects is to find the right balance at this level.

Obviously, AMD has instead relied on gross flows of its GPU, which usually implies a lower yield and optimization of opportunity at this one. Add to this that the organization of the Async Compute in AOTS seems more efficient use of memory bandwidth surplus and you will easily understand that there is less to gain from the side of NVIDIA. Especially as the synchronization commands related to Compute Async have a cost that will be masked by a significant gain.

If our own thinking leads to rather agree with Nvidia these arguments, there is another important point for the players and that's probably what makes the number one GPU addresses the topic lip: Async Compute provides free gain for Radeon users. While this possibility was provided for in the AMD GPU for more than 4 years, they have not been able to get commercial profit, they have not been sold more expensive for the cause. This changes somewhat with the latest range of AMD that focuses strongly on this point, but in terms of perception, players like to get a free such little boost, even if only a handful of games. Conversely, the overall higher performance GPU Nvidia may have an immediate benefit in up games, and could be included directly in the price of GeForce. And from the perspective of a company whose purpose is not to post losses, it is clear that an approach makes more sense than another.

Still we are in 2016 and that the operation of the Async Compute should gradually spread, particularly thanks to the similarity between the architecture of the GPU consoles and that of the Radeon. Nvidia can not totally ignore the possibility that could reduce or eliminate the lead in terms of performance. Without going into detail, Rev Lebaredian thus wished to reiterate that there were indeed opportunities in the drivers' level to implement which they can enjoy in some cases a performance gain with the Async Compute . Opportunities that Nvidia constantly revalues, not without forgetting that its future GPU could change that at this level.

CAT-THE-FIFTH · 27 Mar 2016 at 11:38

So nobody going to comment on this?

It looks like are kind of saying they are not really looking at really trying to get async working on Maxwell.

Jono8 · 27 Mar 2016 at 11:47

I wish i had the patience to read that but the translation is awful

I think i will just wait until the new cards are out and until there are a decent amount of DX12 games released before worrying about a sync compute.

tommybhoy · 27 Mar 2016 at 12:04

CAT-THE-FIFTH said:
It looks like are kind of saying they are not really looking at really trying to get async working on Maxwell.

Didn't take reading a spread to work that out.

Mauller · 27 Mar 2016 at 12:12

CAT-THE-FIFTH said:
So nobody going to comment on this?

It looks like are kind of saying they are not really looking at really trying to get async working on Maxwell.

That was kind of obvious months ago, If the maxwell architecture were able to support Async in a similar manner to GCN, then it would have been done already.

As far as it stands, they have been taking to long so it points closer to them being unable to do it. Their engineers are smart people, if it were capable then it would be supported already in a driver update.

Rroff · 27 Mar 2016 at 12:21

Mauller said:
That was kind of obvious months ago, If the maxwell architecture were able to support Async in a similar manner to GCN, then it would have been done already.

As far as it stands, they have been taking to long so it points closer to them being unable to do it. Their engineers are smart people, if it were capable then it would be supported already in a driver update.

I kind of disagree on the bolded bit to a degree - AMD have a history of jumping onto new technologies and paradigm prematurely (I don't really want to knock them for it but often they end up leapfrogged by nVidia when it comes to the punch) and then by the time they are actually broadly useful nVidia have their implementation up and running and AMD have moved onto other things leaving their implementation not moved on much from when they first got it up and running initially ahead of everyone else.

Mauller · 27 Mar 2016 at 12:38

Rroff said:
I kind of disagree on the bolded bit to a degree - AMD have a history of jumping onto new technologies and paradigm prematurely (I don't really want to knock them for it but often they end up leapfrogged by nVidia when it comes to the punch) and then by the time they are actually broadly useful nVidia have their implementation up and running and AMD have moved onto other things leaving their implementation not moved on much from when they first got it up and running initially ahead of everyone else.

I mean from the perspective that Nvidia keep saying it is coming then keep pushing the date back. So if the hardware were capable then it would be out sooner and they should be more honest about it.

They will more than likely put the bullet in when Pascal comes out.

bru · 27 Mar 2016 at 14:40

Maybe NVidia cannot get Async to work, maybe they can, who knows. but if the release day driver, that will inevitably show up on the 31st, along with the couple of hotfixes that will follow very shortly after, doesn't have the magic Async fix that NVidia have mentioned then, in my opinion it points to them not being able to do it.

h4rm0ny · 27 Mar 2016 at 16:32

The way I read Nvidia's response is that they're saying Async Compute is useful for AMD because they have a lot of misfires in their rendering process (bubbles in their process where things are under-utilized) and Aysnchronous Compute is beneficial for AMD because it's a way of filling in those bubbles with useful work; but that it's not a big deal for Nvidia because their process is already firing on all cylinders and they don't have a lot of bubbles to fill in.

Now I don't know how true or not that is, and to use a certain phrase: "well he would say that, wouldn't he". But that seems to be how I read their response on this subject. I guess time will tell.

ICDP · 27 Mar 2016 at 21:00

h4rm0ny said:
The way I read Nvidia's response is that they're saying Async Compute is useful for AMD because they have a lot of misfires in their rendering process (bubbles in their process where things are under-utilized) and Aysnchronous Compute is beneficial for AMD because it's a way of filling in those bubbles with useful work; but that it's not a big deal for Nvidia because their process is already firing on all cylinders and they don't have a lot of bubbles to fill in.

Now I don't know how true or not that is, and to use a certain phrase: "well he would say that, wouldn't he". But that seems to be how I read their response on this subject. I guess time will tell.

To me that sounds like marketing speak for AMD can do Async compute but we can't.

DrBombcrater · 27 Mar 2016 at 22:03

h4rm0ny said:
Now I don't know how true or not that is, and to use a certain phrase: "well he would say that, wouldn't he". But that seems to be how I read their response on this subject. I guess time will tell.

That's how I read their response, too. And it's a valid one - Async Compute is no benefit if there's little in way of idle resources in the GPU. But the unspoken reality is that NV likes fat margins, so in any given market segment NVidia GPUs tend to be smaller and simpler than the competing AMD one. Thus when AMD can get their efficiency up by using AC NVidia gets left behind.

Only at the very high-end - 980TI vs Fury/X - does NV have a design of comparable raw power to AMD, so the 980TI generally performs well enough to say in the game even when AMD gets the benefit of AC.

LambChop · 27 Mar 2016 at 22:44

I unfortunately read all that, I took from it that Nvidia doesnt give two hoots, and rightly so, why should they need to.

Disco_P · 27 Mar 2016 at 23:40

I assumed it was a wccftech from the quality.

nashathedog · 28 Mar 2016 at 01:55

LambChop said:
I unfortunately read all that, I took from it that Nvidia doesnt give two hoots, and rightly so, why should they need to.

I assumed it was a wccftech from the quality.

One reliable source following another

CAT-THE-FIFTH · 28 Mar 2016 at 10:47

It is from Hardware.fr which is one of the best tech sites in Europe and it is a Google translation from French. If someone wants to do a better manual translation be my guest.

Vish Petrol · 31 Mar 2016 at 03:15

What better way to get consumers to jump from Maxwell to Pascal?

Panos · 31 Mar 2016 at 05:13

Vish Petrol said:
What better way to get consumers to jump from Maxwell to Pascal?

Usual Nvidia tactics tbh. And also, I couldn't hold my breath waiting for the first gen of Pascal supporting Async.
And another reason Nvidia is forcing people all the time to upgrade, is their abysmal drivers and performance on old hardware atm.

Just two and half years later, the R9 290 (not the X) at STOCK speeds, beats the crap out of the 780Ti, the TB and the 980 both in DX11 and DX12... (without async)

While the 380X, is just couple of FPS slower than the 970.

Explains a lot.

David Bisset · 31 Mar 2016 at 08:16

Amazing, that comparison

it's as bad as the nano 'performance per inch'

Gregster · 31 Mar 2016 at 09:17

Panos said:
Usual Nvidia tactics tbh. And also, I couldn't hold my breath waiting for the first gen of Pascal supporting Async.
And another reason Nvidia is forcing people all the time to upgrade, is their abysmal drivers and performance on old hardware atm.

Just two and half years later, the R9 290 (not the X) at STOCK speeds, beats the crap out of the 780Ti, the TB and the 980 both in DX11 and DX12... (without async)

While the 380X, is just couple of FPS slower than the 970.

Explains a lot.

Thanks for making me chuckle.

humbug · 31 Mar 2016 at 13:55

h4rm0ny said:
The way I read Nvidia's response is that they're saying Async Compute is useful for AMD because they have a lot of misfires in their rendering process (bubbles in their process where things are under-utilized) and Aysnchronous Compute is beneficial for AMD because it's a way of filling in those bubbles with useful work; but that it's not a big deal for Nvidia because their process is already firing on all cylinders and they don't have a lot of bubbles to fill in.

Now I don't know how true or not that is, and to use a certain phrase: "well he would say that, wouldn't he". But that seems to be how I read their response on this subject. I guess time will tell.

This is exactly what they are saying, "we don't need it because we are better than the competition"

I have a feeling Nvidia are struggling with this, and Pascal maybe no better. They are really going out of their way to avoid this subject.

Competitor rules

GDC: Async Compute What Nvidia says

Mobster