AMD Polaris architecture – GCN 4.0

Seanspeed · 23 May 2016 at 12:17

ICDP said:
DX11 will become less and less relevant over the coming years as DX12 and Vulkan become the norm.

Sure. But as you say - it's going to take some years for that to happen.

For now, in 2016 and 2017, most games will still be using DX11. By the time DX12 starts becoming 'the norm', we'll be talking about Volta, not Pascal.

MyBrainz · 23 May 2016 at 12:18

Jesus Christ can u guys go make yourselves a bloody 'Async' thread or something, i come here to read about Polaris, not who has Async and who doesnt.

Its getting really tedious trying to read this thread lately.. go make your own Async debate thread FFS.

Ferrari100 · 23 May 2016 at 12:18

D.P. said:
Because Pascal does have proper async co lyre, it just isn't the same implementation that AMD has and there is absolutely no requirements to. That makes everything else you say completely irrelevant.

Hi, can you please show me absolute evidence of this?
You must have it as you seem very sure. Having said that, however it is implemented (and we will find out when people test Pascal) there is clearly a massive DX12 win for AMD and the future looks very promising for AMD and their Polaris architecture.
Thanks.

Now, are you looking forward to Polaris launch?

Ferrari100 · 23 May 2016 at 12:23

MyBrains said:
Jesus Christ can u guys go make yourselves a bloody 'Async' thread or something, i come here to read about Polaris, not who has Async and who doesnt.

Its getting really tedious trying to read this thread lately.. go make your own Async debate thread FFS.

You are correct there should be a specific thread. However Polaris does stand to benefit from it so there is a link between the two.

humbug · 23 May 2016 at 12:37

There are decisions that you as a GPU designer need to make about what sort of GPU you make, you have a set R&D budget and limited die space.

AMD's approach / priorities are somewhat different to Nvidia.
AMD's agenda, in general always seems to be looking ahead or even trying to drive innovation and technology forward with their designs.
Nvidia work with whats current.
I'm not saying one approach is better than the other tho i will say Nvidia is more pragmatic and effective in their approach.
with that said with GCN 1.1 and 1.2 AMD are looking beyond DX11 more than they are at it, in their opinion DX11 is dead in the water.

The fact is quit simple, you can have over 4000 shaders on your GPU but if you can only feed instructions fast enough to keep 60% active Asynchronously then you only have 2500 shaders and a lot of waste.
This is the problem with the Fury-X, even to some extent the 390X.

So they built and architecture that in their view is the future to try and push the future, even spending huge R&D on an API to accompany said architecture. as they saw it was the only way to push through that perceived boundary.

In my view AMD were taken back by Nvidia and their ability to squeeze so much more out of DX11, they didn't think it was possible, Nvidia proved it was.

So that 2800 shader 980TI is using 2800 shaders Asynchronously at any given time, AMD can't get anywhere near that, the difference in call efficiency between the two is about 70% to Nvidia.

Where in DX12 AMD's architecture is utilised in the way intended they are getting higher utilisation of their GPU's resulting in less idle shaders and higher performance, even for the 390X.

For Nvidia DX12 doesn't make a lot of difference at this point because their current GPU's are not so massively more powerful than the way their GPU/Drivers handle DX11 can use.

So who is right and who is wrong?

AMD have looked at what Nvidia are doing with DX11 and know they need to do it too, i have no doubt they will with Polaris, they have even said they will.

Nvidia also know that eventually they too will run out of steam even with all their DX11 tweaks, and they will need to follow AMD's lead.

For all the toeing and throwing what we have here are two very capable GPU designers with different skill sets learning from eachother.

They are both right.

humbug said:
As i understand it, and may be wrong.

Nvidia already use Preemption in Maxwell, Preemption and Scheduling are the same thing, instruction sets are organised in a way that avoids idle threading by pretiming call instructions to interleave.

AMD are adding Instruction scheduling to Polaris, ASynchronous shaders will be carried forward from CGN 1.1 and 1.2.

Preemption or Scheduling is used to reduce draw call overheads in DX11 and can be used in DX12.
AMD did not use Scheduling as it can introduce a latency caused by holding intrusions in a stack, which is necessary to organise said instructions.

In AMD's GPU's the instructions flow directly without stacking, the downside of that is a reduced flow of calls compared with Scheduling so it is less efficient, the up side is improved latency.

Cryengine actually has the same scheduling system in its extraction layer, this is why its able to run DX11 so beautifully balanced across upto 16 threads, which is why AMD's 8 core FX CPU's so unusually compared with almost all other games run's Crysis 3 ever so slightly better than a 3770K.
Nothing to do with AMD's partnership with Crytek in making the engine for Crysis 3

Anyway...

AMD's solution was to have multiple schedulers in the hardware, 8 in GCN 1.1 and 1.2, those ACE units.

The problem is its a bit like 8 core CPU's in GPU form and DX11 cannot handle this.
Mantle was the first API to be capable of running multiple shader threads.
DX12 has the same architecture, but of course its not at all borrowed from Mantle, its just a coincidence

Nvidia have proven how useful Schduling is in DX11 and AMD will now do the same for DX11 while for DX12 the ASynchronous hardware remains the same.

Pascal is the same as Maxwell, the difference is the ASynchronous capabilities derived from Scheduling have been switched on for Pascal

AlamoX posted the video that explains this nicely.

D.P. · 23 May 2016 at 12:45

humbug said:
There are decisions that you as a GPU designer need to make about what sort of GPU you make, you have a set R&D budget and limited die space.

AMD's approach / priorities are somewhat different to Nvidia.
AMD's agenda, in general always seems to be looking ahead or even trying to drive innovation and technology forward with their designs.
Nvidia work with whats current.
I'm not saying one approach is better than the other tho i will say Nvidia is more pragmatic and effective in their approach.
with that said with GCN 1.1 and 1.2 AMD are looking beyond DX11 more than they are at it, in their opinion DX11 is dead in the water.

The fact is quit simple, you can have over 4000 shaders on your GPU but if you can only feed instructions fast enough to keep 60% active Asynchronously then you only have 2500 shaders and a lot of waste.
This is the problem with the Fury-X, even to some extent the 390X.

So they built and architecture that in their view is the future to try and push the future, even spending huge R&D on an API to accompany said architecture. as they saw it was the only way to push through that perceived boundary.

In my view AMD were taken back by Nvidia and their ability to squeeze so much more out of DX11, they didn't think it was possible, Nvidia proved it was.

So that 2800 shader 980TI is using 2800 shaders Asynchronously at any given time, AMD can't get anywhere near that, the difference in call efficiency between the two is about 70% to Nvidia.

Where in DX12 AMD's architecture is utilised in the way intended they are getting higher utilisation of their GPU's resulting in less idle shaders and higher performance, even for the 390X.

For Nvidia DX12 doesn't make a lot of difference at this point because their current GPU's are not so massively more powerful than the way their GPU/Drivers handle DX11 can use.

So who is right and who is wrong?

AMD have looked at what Nvidia are doing with DX11 and know they need to do it too, i have no doubt they will with Polaris, they have even said they will.

Nvidia also know that eventually they too will run out of steam even with all their DX11 tweaks, and they will need to follow AMD's lead.

For all the toeing and throwing what we have here are two very capable GPU designers with different skill sets learning from eachother.

They are both right.

Good post

D.P. · 23 May 2016 at 12:48

http://www.tomshardware.co.uk/nvidia-geforce-gtx-1080-pascal,review-33557-3.html

Simple

Ferrari100 · 23 May 2016 at 12:50

D.P. said:
http://www.tomshardware.co.uk/nvidia-geforce-gtx-1080-pascal,review-33557-3.html

Simple

Pick out the exact part of that article that PROVES it exists.

What is so difficult in what I am asking?

Ferrari100 · 23 May 2016 at 12:54

I am really interested to see what AMD's primitive discard accelerator brings to the table. From what I have read it should enable the ability to improve performance on already released,as well as future, titles that incorporate heavy doses of tesselation due to culling what is not required before processing.
It would be really nice to see AMDs DX11 get a nice performance boost on games currently available as well as those in the making.

bru · 23 May 2016 at 13:12

Ferrari100 said:
Hi, can you please show me absolute evidence of this?

NVidia GP 104 die shot.

There you go, plain as day, it is the little green bit right next to the other little green bit, beside the not quite so green bit.

So unless you can prove that isn't the Async part can we please just give it a rest.

Ferrari100 · 23 May 2016 at 13:19

bru said:
NVidia GP 104 die shot.

There you go, plain as day, it is the little green bit right next to the other little green bit, beside the not quite so green bit.

So unless you can prove that isn't the Async part can we please just give it a rest.

...and that is exactly where we should all stand. Until absolute proof arrives we should all give it a rest. It would be nice is Nvidia could categorically state what is going on in their hardware but for some reason they do not want to.
Others here having a decent discussion claim that Nvidia ASSYNC is not likely to bring much performance gain anyway. So, why do Nviudia supporters need to discuss it or troll other threads anyway? It's pointless
For AMD however it brings massive gains and we should be allowed to discuss that without being trolled or accused of being affiliates/shills.

Having people enter this thread and trying to derail it is simply trolling and must stop.
For example,
D.P accused us all of being AMD affiliates for recognising the advantage AMD Async brings to Polaris.

GFX-Kid · 23 May 2016 at 13:20

bru said:
NVidia GP 104 die shot.

There you go, plain as day, it is the little green bit right next to the other little green bit, beside the not quite so green bit.

So unless you can prove that isn't the Async part can we please just give it a rest.

Lol!

Mei · 23 May 2016 at 13:21

i thought i was pretty nerdy but i guess im still not nerdy enough

GFX-Kid · 23 May 2016 at 13:25

Mei said:
i thought i was pretty nerdy but i guess im still not nerdy enough

Haha. Me too

D.P. operates on another level to us. I see him as the green version of DM

humbug · 23 May 2016 at 13:28

I would be interested to see Maxwell and Pascal die shots side by side.

Ferrari100 · 23 May 2016 at 13:33

So, according to this image there are seven architectural improvements coming to Polaris. As already stated I fancy that the primitive discard accelerator will be a big deal due to the culling of unnecessary tesselation. Anybody have any idea or input into how any of these improvements will benefit the architecture?
http://cdn.videocardz.com/1/2016/01/AMD-Polaris-Architecture-7.jpg

[B][COLOR="Orange"]** No Hotlinking **[/COLOR][/B]

h4rm0ny · 23 May 2016 at 13:40

Ferrari100 said:
D.P accused us all of being AMD affiliates for recognising the advantage AMD Async brings to Polaris.

Is that the Royal Us?

drunkenmaster · 23 May 2016 at 13:41

Ferrari100 said:
So, according to this image there are seven architectural improvements coming to Polaris. As already stated I fancy that the primitive discard accelerator will be a big deal due to the culling of unnecessary tesselation. Anybody have any idea or input into how any of these improvements will benefit the architecture?

There is absolutely no way to know. You can improve a memory controller by improving effective bandwidth utilisation, or improving efficiency in terms of power usage via transistor design or spacing. You can improve efficiency via performance, a ROP that is 50% more effective but takes up only 20% more space, in which case maybe you use the 20% more space and gain 50% performance or you reduce the number of rops by 33% but get the same performance leading to more space for shaders/other parts.

There are many ways to improve any part of a gpu core and many design choices resulting from that improvement, knowing that they've improved several parts gives almost zero indication how much they've improved and what the end result to the user is.

I mean some are more predictable, memory controller will usually focus around power and effective usage of the bandwidth(given bandwidth specs are theoretical maximums and rarely achieved in real running), cache is usually about reducing latency via improved cache or more cache which leads to better flow of data, display engine obviously includes hdmi 2.0b/dp1.3 and being ready for dp1.4, but this could still also refer to achieving this with a more efficient design that uses less transistors, less power, etc.

GFX-Kid · 23 May 2016 at 13:41

Yes, that shot gives me hope. Lots of new things in the Polaris architecture

Let's hope it translates into good performance at a non-insulting price point

humbug · 23 May 2016 at 13:42

Its difficult to quantify but it will be 'significant'

I'm looking forward to Seeing Polaris, if the 2048 shader mobile parts @1400Mhz are = to a 390/X them what is thought to be full fat 2560 shader parts @perhaps 1600Mhz then we should be looking at <Fury-X performance only without the bottlenecks, IE like <Fur-X in DX12 but also in DX11. and without the Tessellation problems.