• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

AMD’s DirectX 12 Advantage Explained – GCN Architecture More Friendly To Parallelism Than Maxwell

What are you basing these assumptions on? As I have seen nothing as of yet to indicate this, maybe in time when we get more benchmarks and games.

AMD have been banging on about GCN for an absolute age. There's got to be a reason they stuck with it, given it needs a big die. Nvidia learned their kitchen sink lesson and now make semi skimmed GPU cores first.

I'm not making assumptions either. DX12 is here and it's very real. And if the AMD cards run it better then they will be bought by the bucket load, hate AMD or not.

I don't think people realise how good Fury X is. It's very easy to look at an old benchmark and say "Wow that's crap Nvidia are better" but once the software hits that the card was actually designed for?

It's a bit of a master piece really.
 
AMD have been banging on about GCN for an absolute age. There's got to be a reason they stuck with it, given it needs a big die. Nvidia learned their kitchen sink lesson and now make semi skimmed GPU cores first.

I'm not making assumptions either. DX12 is here and it's very real. And if the AMD cards run it better then they will be bought by the bucket load, hate AMD or not.

I don't think people realise how good Fury X is. It's very easy to look at an old benchmark and say "Wow that's crap Nvidia are better" but once the software hits that the card was actually designed for?

It's a bit of a master piece really.

DX12 is here you are correct, but we can not see much as of yet. In a way I hope AMD have got the upper hand as it will bring them right back in to play.

I am just cautious to proclaim who is going to win on the DX12 front as Nvidia are not daft as much as some like to portray them. You are correct in that he Fury range are very nice cards indeed. I guess time will tell who can do what and who is right/wrong
 
I'm not making assumptions either. DX12 is here and it's very real. And if the AMD cards run it better then they will be bought by the bucket load, hate AMD or not.

If they run it better, and if they are available to buy, they might be bought by the bucket load. We can only hope so.

I don't think people realise how good Fury X is. It's very easy to look at an old benchmark and say "Wow that's crap Nvidia are better" but once the software hits that the card was actually designed for?

It's a bit of a master piece really.

Or you could look at it like this.

It is very easy to look at a new benchmark and say "Wow that's great AMD are better"

Fury is very good in the right circumstances, we will have to wait and see what the future holds for it and DirectX12
 
Where are you guys going with this ? I understand what you hope happens in that AMD cards suddenly become quicker, but even if they are (not in an AMD sponsored game) this wont be the first time AMD have had slightly quicker cards and they never sold anywhere near Nvidia. Where is the new huge sales going to come from ?
 
Where are you guys going with this ? I understand what you hope happens in that AMD cards suddenly become quicker, but even if they are (not in an AMD sponsored game) this wont be the first time AMD have had slightly quicker cards and they never sold anywhere near Nvidia. Where is the new huge sales going to come from ?

They won't suddenly become quicker. They may (note I said MAY, not will) be much better at DX12 than Nvidia. This means that the Fury range could be the go to cards for DX12.

If they are quicker in DX12 and it's not some odd driver bug for Nvidia then they will be the cards to buy.

Look, man. You know how it works. If the Fury X, for example, turns out to be 15% quicker than a Titan X in DX12 games then people will ditch their cards and switch in a heart beat.

Most of the reason people buy this ultra end stuff is to show off with, how can you show off with something that is slower?

Time will tell. Hopefully the first DX12 benchmarks will roll out soon.
 
When the Fury X was first released and I looked at the benching and FPS numbers I was thinking that AMD had screwed up something on the GPU but now it looks more like a software issue.

It's kind of all over the place, but makes perfect sense at 4k.

This may also explain why AMD are not rushing to get them made and out there.

I dunno, it could just be because they are waiting on DX12.. I doubt it like, but you never know !
 
I have been wondering if the followers of team green would have said

"Oh it's just the first DX12 benchmark and it's not even out of Beta, so we wont count that one" if the result of said benchmark had been in favour of the Nvidia cards.

No, I didn't think so either. It would have been more like

** There is a swear filter for a reason, do not attempt to bypass it **

Come on admit it, you know I am right. :p

It is going to be interesting to see what twists and turns the stuff coming from Nvidia followers will take, if (and I mean IF) Fury cards start to build up a good number of DX12 benchmark wins in the near future.

Watch this space, as I am betting that some of the facts above will come true.
;)
 
Last edited:
Now the bad thing, The furyx will never be that good at 1080 , 1200 , 1440 res on any game currently out ( based on dx11 )

Thats what makes me not want to buy it.

At 4k obviously its as good as the opposition but i dont want a 4k monitor yet and i dont use dsr.

Ashes of the singularity wont make me upgrade regardless when its released and regardless how good it is, whether it be pascal or greenland or boffin :D
 
think I know what is happening.

Ashes of the Singularity makes use of Asynchronous Shading. Now we know that AMD have been big on advertising this feature. It is a feature which is used in quite a few Playstation 4 titles. It allows the Developer to make efficient use of the compute resources available. GCN achieves this by making use of 8 Asynchronous Compute Engines (ACE for short) found in GCN 1.1 290 series cards as well as all GCN 1.2 cards. Each ACE is capable of queuing up to 8 tasks. This means that a total of 64 tasks may be queued on GCN hardware which features 8 ACEs.

nVIDIA can also do Asynchronous Shading through its HyperQ feature. The amount of available information, on the nVIDIA side regarding this feature, is minimal. What we do know is that nVIDIA mentioned that Maxwell 2 is capable of queuing 32 Compute or 1 Graphics and 31 Compute for Asynchronous Shading. nVIDIA has been

Anandtech made a BIG mistake in their article on this topic which seems to have become the defacto standard article for this topic. Their information has been copied all over the web. This information is erroneous. Anandtech claimed that GCN 1.1 (290 series) and GCN 1.2 were Capable of 1 Graphics and 8 Compute queues per cycle. This is in fact false. The truth is that GCN 1.1 (290 series) and GCN 1.2 are capable of 1 Graphics and 64 Compute queues per cycle.
Anandtech also had barely no information on Maxwell's capabilities. Ryan Smith, the Graphics author over at Anandtech, assumed that Maxwell's queues were its dedicated compute units. Therefore Anandtech published that Maxwell 2 had a total of 32 Compute Units. This information is false.
The truth is that Maxwell 2 has only a single Asynchronous Compute Engine tied to 32 Compute Queues (or 1 Graphics and 31 Compute queues).
I figured this out when I began to read up on Kepler/Maxwell/2 CUDA documentation and I found what I was looking for. Basically Maxwell 2 makes use of a single ACE-like unit. nVIDIA name this unit the Grid Management Unit.


How it works?

The CPUs various Cores send Parallel streams to the Stream Queue Management. The Stream Queue Management sends streams to the Grid Management Unit (Parallel to Serial thus far). The Grid Management unit can then create multiple hardware work queues (1 Graphics and 31 Compute or 32 Compute) which are then sent in a Serial fashion to the Work Distributor (one after the other or in Serial based on priority) . The Work Distributor, in a Parallel fashion, assigns the work loads to the various SMMs. The SMMs then assigns the work to a specific array of CUDA cores. nVIDIA call this entire process "HyperQ".

Here's the documentation: (minimum of 5 posts before I can post the URL)

GCN 1.1 (290 series)/GCN 1.2, on the other hand, works in a very different manner. The CPUs various Cores send Parallel streams to the Asynchronous Compute Engines various Queues (up to 64). The Asynchronous Compute Engines prioritizes the work and then sends it off, directly, to specific Compute Units based on availability. That's it.

Maxwell 2 HyperQ is thus potentially bottlenecked at the Grid Management and then Work Distributor segments of its pipeline. This is because these stages of the Pipeline are "in order". In other words HyperQ contains only a single pipeline (Serial not Parallel).

AMDs Asynchronous Compute Engine implementation is different. It contains 8 Parallel Pipelines working independently from one another. This is why AMDs implementation can be described as being "out of order".

A few obvious facts come to light. AMDs implementation incurs less latency as well as having the ability of making more efficient use of the available Compute resources.

This explains why Maxwell 2 (GTX 980 Ti) performs so poorly under Ashes of the Singularity under DirectX 12 and when compared to even a lowly R9 290x. Asynchronous Shading kills its performance compared to GCN 1.1 (290 series)/GCN 1.2. The latter's performance is barely impacted.
GCN 1.1 (290 series)/GCN 1.2 are clearly being limited elsewhere, and I believe it is due to their Peak Rasterization Rate or Gtris/s. Many objects and units permeate the screen under Ashes of the Singularity. Each one is made up of Triangles (Polygons). Since both the Fury-X and the 290x/390x have the same amount of hardware rasterization units, I believe that this is the culprit. Some people have attribute this to the amount of ROps (64) that both Fury-X and 290/390x share. I thought the same at first but then I remembered about the Color Compression found in the Fury/Fury-X cards. The Fury/X make use of Color Compression algorithms which have shown to alleviate the Pixel Fill Rate issues which were found in the 290/390x cards. Therefore I do not believe that ROps (Render Back Ends) are the issue. Rater the Triangle Setup Engine (Raster/Hierarchical Z) are the likely culprits.


I've been away from this stuff for a few years so I'm quite rusty but Direct X 12 is getting me interested once again

taken from hexus forum
 
This explains why Maxwell 2 (GTX 980 Ti) performs so poorly under Ashes of the Singularity under DirectX 12 and when compared to even a lowly R9 290x. Asynchronous Shading kills its performance compared to GCN 1.1 (290 series)/GCN 1.2. The latter's performance is barely impacted.

Ohh so similar how too much tessellation kills AMD cards performance right? Asynchronous shading kills maxwell (2) performance? Well aint that a b***?

Lets hope dx12 games don't make use of too much asynchronous shading then :p
 
So the game is swamping Maxwell with parallel calls when a separate code path could serialise them to improve performance. So the dev is intentionally crippling performance on one vendors hardware?

Nice to see AMD sinking to Nvidias level then. So what do we think is going to happen with the number of sponsored titles on each side... This is going to get messy...

I say that, but then the two are still roughly equal even in a game that appears to deliberately favour one vendor (and has anyone done OC'd runs yet?)
 
Last edited:
What that Hexus post above states is quite true for how Hyper-Q operates on Kepler, but I've not been able to find any information that shows how queue and displatch management operates on Maxwell. If anyone does have a link please share it as I would be very interested in having a read.
 
So the game is swamping Maxwell with parallel calls when a separate code path could serialise them to improve performance. So the dev is intentionally crippling performance on one vendors hardware?

Nice to see AMD sinking to Nvidias level then. So what do we think is going to happen with the number of sponsored titles on each side... This is going to get messy...

Parallel workload is the main thing dx12 could be faster, because the gpu could be utilized much better, it don't have to wait for serialized calls.
Why would they dumb it down to dx11 levels just to make maxwell look good? This is a dx12 benchmark as far as i know.
 
So the game is swamping Maxwell with parallel calls when a separate code path could serialise them to improve performance. So the dev is intentionally crippling performance on one vendors hardware?

Nice to see AMD sinking to Nvidias level then. So what do we think is going to happen with the number of sponsored titles on each side... This is going to get messy...

I say that, but then the two are still roughly equal even in a game that appears to deliberately favour one vendor (and has anyone done OC'd runs yet?)

It does put some further fuel under the theory that the engine has been designed specifically to target strengths and weaknesses in the two different architectures.
 
So the game is swamping Maxwell with parallel calls when a separate code path could serialise them to improve performance. So the dev is intentionally crippling performance on one vendors hardware?

Nice to see AMD sinking to Nvidias level then. So what do we think is going to happen with the number of sponsored titles on each side... This is going to get messy...

I say that, but then the two are still roughly equal even in a game that appears to deliberately favour one vendor (and has anyone done OC'd runs yet?)

Firstly it isn't AMD's game but don't let that stop you, it's a DX12 feature, that Nvidia is supposed to support, not ADDITIONAL code that gets added and paid for and is unoptimised for one.

But lets really break down what you're saying. If you have enough information you want processed that you can fill up 4 i7 cores, would it be faster if you serialise the code to run all on one processor? Because that is what you're saying. If you serialise it you would be forcing way too much data into a limited pipeline.

We're talking about getting effective and efficient usage of a large number of shaders vs inefficiently having thousands of shaders but getting low utilisation. You can run superpi in a single thread on a single core very slowly or do the same calculation with multiple threads on multiple cores taking a small fraction of the time.
 
I can see tears ahead, as certain games work considerably better on one vendors graphics card compared to the other and vice versa as developers code for specific hardware. It will be like going back to early days pre Direct x.
 
Back
Top Bottom