AMD’s DirectX 12 Advantage Explained – GCN Architecture More Friendly To Parallelism Than Maxwell

FredFlint · 1 Sep 2015 at 15:13

Rroff said:
I think his point is - if nVidia can render the same features at the same fidelity without performance penalty in serial then forcing async compute which doesn't run as well apparently on nVidia hardware is needlessly handicapping it.

That's why they added extra code to disabled it for Nvidia. Time will tell if they can maintain performance without it.

smilertoo · 1 Sep 2015 at 15:20

I would be considering AMD if theyd bothered to support HDMI2.

Harlequin · 1 Sep 2015 at 15:26

smilertoo said:
I would be considering AMD if theyd bothered to support HDMI2.

Rroff · 1 Sep 2015 at 15:27

^^ Don't think he is trolling - he had a couple of threads/posts on HDMI2 before it came to light 3xx/Fury didn't have it.

FredFlint said:
That's why they added extra code to disabled it for Nvidia. Time will tell if they can maintain performance without it.

On 28nm I doubt AMD have been able to use enough space to do GCN 1.2+ justice in workloads that heavily favour parallel compute, on 16nm v Maxwell another matter but doubt nVidia will drop the ball in that regard with Pascal.

andybird123 · 1 Sep 2015 at 15:31

Overclocked, the 980ti still pulls ahead anyway

Calin Banc · 1 Sep 2015 at 15:41

bru said:
Does this benchmark look any different on NVidia or AMD hardware, does it run exactly the same?

If the answer is yes then 'is' there a point to be made about needless amounts of Async compute being used? Isn't this just the same as the tessellation of Geralt's hair in the Witcher 3, use more for no benefit just to hurt the opposition?

Actually there are low amounts of AS in this game, other devs are using the technique to far grater extents and seeing around 30% gains (on consoles). This is a more efficient way of distributing the data to be computed by the shaders and it's not a physix/tessellation move. It was disabled on nvidia because the hardware at this point is unable to process efficiently fast context switching. It is however able to do it. 20% of the engine is using compute (nitrous engine) and the next one will use around 50%, according to the dev. that posted on some forum. Although it can all be done 100% in compute, most likely today's methodology won't change that much for games in general.

Also not sure if it was posted here, but here it goes:

Talking about 12_1 in nvidia maxwell v.2
https://www.reddit.com/r/pcmasterra...aming_nvidia_gpus_do_not_support_dx12/cum6vy6

Raster Ordered Views and Conservative Raster. Thankfully, the techniques that these enable (like global illumination) can already be done in other ways at high framerates (see: DiRT Showdown).

and dx12 in general
https://www.reddit.com/r/pcmasterra...aming_nvidia_gpus_do_not_support_dx12/cum3xow

I think gamers are learning an important lesson: there's no such thing as "full support" for DX12 on the market today.

There have been many attempts to distract people from this truth through campaigns that deliberately conflate feature levels, individual untiered features and the definition of "support." This has been confusing, and caused so much unnecessary heartache and rumor-mongering.

Here is the unvarnished truth: Every graphics architecture has unique features, and no one architecture has them all. Some of those unique features are more powerful than others.

Yes, we're extremely pleased that people are finally beginning to see the game of chess we've been playing with the interrelationship of GCN, Mantle, DX12, Vulkan and LiquidVR.

FredFlint · 1 Sep 2015 at 15:46

Rroff said:
^^ Don't think he is trolling - he had a couple of threads/posts on HDMI2 before it came to light 3xx/Fury didn't have it.

On 28nm I doubt AMD have been able to use enough space to do GCN 1.2+ justice in workloads that heavily favour parallel compute, on 16nm v Maxwell another matter but doubt nVidia will drop the ball in that regard with Pascal.

The vast majority of gamers don't have the latest GPU's. When the first DX 12 games hit most will be using older hardware.

Harlequin · 1 Sep 2015 at 15:48

FredFlint said:
The vast majority of gamers don't have the latest GPU's. When the first DX 12 games hit most will be using older hardware.

that's the crux of it - HD 7730 onwards are all DX12 (they have GCN 1.1) so will actually be `ok` with Async.

whereas IMO Fermi and Kepler wont get a DX12 driver - the feature level they support is right at the bottom of the pile - such as the FX 5950 supporting DX9....

bru · 1 Sep 2015 at 16:40

Calin Banc said:
This is a more efficient way of distributing the data to be computed by the shaders(Only on AMD hardware) and it's not a physix/tessellation move. It was disabled on nvidia because the hardware at this point is unable to process efficiently fast context switching. It is however able to do it. 20% of the engine is using compute (nitrous engine) and the next one will use around 50%, according to the dev. that posted on some forum. Although it can all be done 100% in compute, most likely today's methodology won't change that much for games in general.

I can only assume that this game would run absolutely terribly on GCN hardware without Async Compute, as shown by the DirectX 11 results. I'm not entirely convinced about why it is so necessary, see as not using it on NVidia hardware is just as fast.
Of course we will not know how these thing will pan out until we have a much bigger sample size of games/benchmarks.

Calin Banc · 1 Sep 2015 at 17:16

bru said:
I can only assume that this game would run absolutely terribly on GCN hardware without Async Compute, as shown by the DirectX 11 results. I'm not entirely convinced about why it is so necessary, see as not using it on NVidia hardware is just as fast.
Of course we will not know how these thing will pan out until we have a much bigger sample size of games/benchmarks.

Because under DX11 you couldn't tap those resources properly, so part of that silicon was wasted doing nothing or inefficient tasks.
Nvidia is doing better in dx11 because it looks like their architecture is more serial than AMD's and with good drivers it performs well. AMD is parallel and it works better in a different environment. DX12 is all about parallel workloads/multitasking. Nvidia went for "now", AMD went for "now and tomorrow as well".

At least that may be the result of what we're seeing these days.

AthlonXP1800 · 1 Sep 2015 at 17:24

It amazed me how childish AMD really is with their lies and still continued carried on this kind of dirty marketing tactics.

Last time AMD said full DirectX 12 support mattered but now said it do not mattered.

AMD bragged about asynchronous shading for months claimed Nvidia Maxwell did not supported async compute, they cant do async compute.

Maxwell 2 can do async compute, somebody proved Geforce GTX 980 Ti can do async compute!

https://www.reddit.com/r/nvidia/comments/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-9#post-1869058

CAT-THE-FIFTH · 1 Sep 2015 at 17:31

AthlonXP1800 said:
It amazed me how childish AMD really is with their lies and still continued carried on this kind of dirty marketing tactics.

Last time AMD said full DirectX 12 support mattered but now said it do not mattered.

AMD bragged about asynchronous shading for months claimed Nvidia Maxwell did not supported async compute, they cant do async compute.

Maxwell 2 can do async compute, somebody proved Geforce GTX 980 Ti can do async compute!

https://www.reddit.com/r/nvidia/comments/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-9#post-1869058

You probably need to look more closely at that graph again and understand what is happening before you start pointing fingers. There is plenty of discussion about it and its been posted before.

Dygaza · 1 Sep 2015 at 18:02

That graph in those pics were 390x, not fury X. Author said so today (ofc I should have noticed it, as it was my fury x to start with). Not that it makes any real difference, as all gcn seem to be performing same way. Though there seems to be more variance in graphics+compute with Fury and 390x compared to 7970. Dunno why.

Graphics is basicly cards theoritical Pixel fillrate with 90%+ efficiency. Compute is using low amount of gpu usage aswell, so quess biggest difference between different GCN models are clockspeeds. (Difference in compute performance that is)

bru · 1 Sep 2015 at 18:03

So by looking at the graphs and reading the article, it would seem that NVidia does up to 32 lists much quicker, then slows down when more command lists are added, to a similar speed that AMD does 128 lists. As the someone on the beyond 3d thread suggests, it would be interesting to see what happens when even more lists are added, does AMD step up in latency like NVidia does but in 128 list blocks.

Dygaza · 1 Sep 2015 at 18:11

bru said:
So by looking at the graphs and reading the article, it would seem that NVidia does up to 32 lists much quicker, then slows down when more command lists are added, to a similar speed that AMD does 128 lists. As the someone on the beyond 3d thread suggests, it would be interesting to see what happens when even more lists are added, does AMD step up in latency like NVidia does but in 128 list blocks.

Yes, and if that benchmark also uses ACE units on AMD hardware, we should see major difference between GCN 1.0 vs 1.1/1.2. (2 ACE vs 8 ACE).

Rroff · 1 Sep 2015 at 19:13

CAT-THE-FIFTH said:
You probably need to look more closely at that graph again and understand what is happening before you start pointing fingers. There is plenty of discussion about it and its been posted before.

Assuming those are latencies then that is a very interesting picture as nVidia does very well at small loads and would assumably be an advantage at over 128 and beliw 256 as well.

Mtom · 1 Sep 2015 at 19:23

Just to point out the maker of the graphs added that this is not the Fury X but the 390X data,

It seems to me from the graph that NV chip is divided to 4 segments..if only one of them works it is fast two it is slower, and so on.
The AMD card divides the workload to all the chip everytime.

Lokken86 · 1 Sep 2015 at 19:35

FredFlint said:
Async shaders are not adding extra data like tessellation does, its just about running multiple shader programs in parallel.

Subsequently allowing higher detail.....

Howling · 1 Sep 2015 at 19:43

Lokken86 said:
Subsequently allowing higher detail.....

Correct. Think of it as HyperThreading - it's a tool to get more out of the hardware. I don't see how anyone can view it as a negative thing, except the hardcore greens because nVidia doesn't currently do this well. I've bolded a word there because I'm sure they're going to improve on it, whether they get better than AMD - who knows.

LtMatt · 1 Sep 2015 at 20:36

A little evening reading perhaps?

Asynchronous Shaders Whitepaper

Hardware, software and API support are all now available to deliver on the promise of asynchronous
computing for GPUs. The GCN architecture is perfectly suited to asynchronous computing, having
been designed from the beginning with this operating model in mind. This will allow developers to
unlock the full performance potential of today’s PC GPUs, enabling higher frame rates and better
image quality.

Source
http://amd-dev.wpengine.netdna-cdn....10/Asynchronous-Shaders-White-Paper-FINAL.pdf