AMD Polaris architecture – GCN 4.0

Tute · 20 May 2016 at 16:24

beany_bot said:
If they manage that then I'll be buying 2 of em. It's a big ask though.

I'm actually worried that's the best case scenario.

I feel both Polaris 10 and 11 will come in around Fury performance at best. I'd like to see a clear improvement on the 390X at least, Fury X or better.

Ferrari100 · 20 May 2016 at 16:25

beany_bot said:
If they manage that then I'll be buying 2 of em. It's a big ask though.

If you mean regarding Linux there has been massive headway gained lately.

I think AMD are ploughing a lot of cash into Linux drivers, they must be to be getting such good results and feedback.

http://www.pcworld.com/article/3058...asons-to-love-this-popular-linux-desktop.html

why?

My theory is that the new Nintendo NX will be based on Linux and Vulkan, but that is just a feeling.

Ferrari100 · 20 May 2016 at 16:27

Tute said:
I'm actually worried that's the best case scenario.

I feel both Polaris 10 and 11 will come in around Fury performance at best. I'd like to see a clear improvement on the 390x at least, Fury X or better.

While I am sure AMD respect your opinion. The majority of consumers would snap up an equivalent 390x/Fury at great price/performance in a heartbeat.

Tute · 20 May 2016 at 16:30

Yeah I just sold a Nano so I'm hoping for a little jump at least.

CAT-THE-FIFTH · 20 May 2016 at 16:35

Tute said:
Yeah I just sold a Nano so I'm hoping for a little jump at least.

On the flipside even if it is Fury Nano performance in a slightly longer card,it will be cheaper and you get double the RAM!!

OnionSpark · 20 May 2016 at 16:45

Ferrari100 said:
Well, everything he says is true.

When you have the developers of the AMD-sponsored Hitman game claim that async compute is "super-hard" to tune, why do we need a Nvidia-blaming tinfoil-hat supposition instead of believing what a professional from Team Red is telling us? It's difficult to get async compute working well in games, so why should we be surprised if it is sometimes absent?

(I'm also not sure why you both think async compute is DX12's "biggest benefit" - the DX12 boosts we've seen on AMD's side haven't come from async compute, but from reducing the overhead from the software side, as AMD's drivers are well-known to be a lot more CPU-dependent. DX12 is great for reducing AMD's problems in that area, but it's not because of async compute (at least, for the most part). Giving almost every system a ~10% boost through Multi-Adapter making use of everyone's IGP seems like logical candidate for 'Biggest DX12 Benefit' for gamers with decent CPUs, and DX12's lower software overhead would seem to be the most beneficial for those with poorer CPUs and AMD GPUs.)

AlamoX · 20 May 2016 at 16:50

OnionSpark said:
When you have the developers of the AMD-sponsored Hitman game claim that async compute is "super-hard" to tune, why do we need a Nvidia-blaming tinfoil-hat supposition instead of believing what a professional from Team Red is telling us? It's difficult to get async compute working well in games, so why should we be surprised if it is sometimes absent?

(I'm also not sure why you both think async compute is DX12's "biggest benefit" - the DX12 boosts we've seen on AMD's side haven't come from async compute, but from reducing the overhead from the software side, as AMD's drivers are well-known to be a lot more CPU-dependent. DX12 is great for reducing AMD's problems in that area, but it's not because of async compute (at least, for the most part). Giving almost every system a ~10% boost through Multi-Adapter making use of everyone's IGP seems like logical candidate for 'Biggest DX12 Benefit' for gamers with decent CPUs, and DX12's lower software overhead would seem to be the most beneficial for those with poorer CPUs and AMD GPUs.)

it's not hard it's new, Devs always say the same thing whenever there is something new they need to learn, API or whatever, Async compute is used heavily on the PS4/Xbone, and Devs are getting the hang of it, and it's going to be the new normal soon enough.

OnionSpark · 20 May 2016 at 16:54

Ferrari100 said:
If you mean regarding Linux there has been massive headway gained lately.

I think AMD are ploughing a lot of cash into Linux drivers, they must be to be getting such good results and feedback.

http://www.pcworld.com/article/3058...asons-to-love-this-popular-linux-desktop.html

Errr... not to rain on your parade, but that link doesn't show any performance boosts on AMD's side. You might have added the wrong URL? That one didn't have any benchmarks at all, so no "good results" (as you claimed). In fact, the "feedback" was:

AMD graphics card users may want to stick with Ubuntu 14.04 until AMDGPU has matured. That is, if you’re using the card for gaming or other demanding chores.

Yeah... don't update to 16.04, because AMD isn't ready yet. :eek:

Unless you mean that AMD are very close to big Linux performance boosts..? Yes, we all know that. We've all known that for nine years.

It's always a few months away, isn't it.

flopper · 20 May 2016 at 16:59

6 days to go for Polaris launch

Ayahuasca · 20 May 2016 at 17:01

Rroff · 20 May 2016 at 17:04

Ayahuasca said:

Looks like an nVidia shareholder after seeing the 1080 launch price.

OnionSpark · 20 May 2016 at 17:04

AlamoX said:
it's not hard

Oh, I'm sorry, I didn't include a link/quote - your mistake there is my fault, I'm very sorry about that.

Jonas Meyer, Lead Render Programmer at IO Interactive, at this year's GDC:

Async Compute... was also “super hard” to tune; according to IO Interactive, too much Async work can even make it a penalty, and then there’s also the fact that PC has lots of different configurations that need tuning.

(Paraphrase from here, as it was part of a presentation.)

I'm sorry, I'm quite new here and don't know you! If you have more experience than Mr Meyer, would you mind letting me know what it is, so I can gauge your argument that he is wrong in the correct context?

Seanspeed · 20 May 2016 at 17:12

AlamoX said:
Async compute is used heavily on the PS4/Xbone

By 1st party studios, sure. Developers who dont have to worry about their application working on anything except the *exact* piece of hardware they're targeting.

eddyr · 20 May 2016 at 17:14

OnionSpark said:
Oh, I'm sorry, I didn't include a link/quote - your mistake there is my fault, I'm very sorry about that.

Jonas Meyer, Lead Render Programmer at IO Interactive, at this year's GDC:

(Paraphrase from here, as it was part of a presentation.)

I'm sorry, I'm quite new here and don't know you! If you have more experience than Mr Meyer, would you mind letting me know what it is, so I can gauge your argument that he is wrong in the correct context?

Someones spitting some attitude.

Its a pretty vague statement in the article, Is it the "calculation of light tiles" with Async compute or async compute generally; which of those specifically is referring to a) being hard to tune or b.) leading to performance penalties; are any scenarios either particularity affected and therefore best avoided or otherwise exempt in which case it would be trivial to implement? There is no usefully presented information in the paragraph to be able to make any assessment.

OnionSpark · 20 May 2016 at 17:40

eddyr said:
Its a pretty vague statement in the article, Is it the "calculation of light tiles" with Async compute or async compute generally; which of those specifically is referring to a) being hard to tune or b.) leading to performance penalties; are any scenarios either particularity affected and therefore best avoided or otherwise exempt in which case it would be trivial to implement? There is no usefully presented information in the paragraph to be able to make any assessment.

I'm sorry, I'm not too good with non-native speakers, but if I'm understanding your English correctly, you seem to think that it's easy to implement, but only hard to tune if you want to actually achieve a performance boost?

Yes, I would agree with that. To put it another way, it's the power of the Cell!

drunkenmaster · 20 May 2016 at 18:11

OnionSpark said:
I'm sorry, I'm not too good with non-native speakers, but if I'm understanding your English correctly, you seem to think that it's easy to implement, but only hard to tune if you want to actually achieve a performance boost?

Yes, I would agree with that. To put it another way, it's the power of the Cell!

That isn't what I really felt he was getting at, just that async like every other new paradigm in coding is 'hard' till you get good at it, then it's easy as anything else. Developers have dramatically less experience coding in DX12 because they hadn't done it before, it's as simple as that. You improve tools, you gain experience, the next time they use async they have all the experience of trying to balance performance from the previous game. Maybe AMD/Nvidia or MS introduce some new tools into DX12 or their developer programs that give a better indication of how much and where to use async.

In general you find pushback in most new coding situations as the majority of people are comfortable with what they already know and don't love whatever is new, eventually they usually become comfortable with the new way and often wouldn't go back to the old way. But there is always that time when people find it harder to do something new than redo something they've done before and that shouldn't come as a surprise to anyone.

D.P. · 20 May 2016 at 18:12

Even AMD's posterchild engine to showcase Async in synthetically high loads shows how complex tuning Async sharers are even for the different revision of the same architecture. Aysnc sharers that performance well on GCN 3 (FuryX) can absolutely kill performance on GCN. Then there are also complex interactions with the rest of the code base, e.g. the addition of a new lighting model and the associated fragment and compute sharers will totally change the CU utilization and affect how the async shaders are best coded.

humbug · 20 May 2016 at 18:50

D.P. said:
Even AMD's posterchild engine to showcase Async in synthetically high loads shows how complex tuning Async sharers are even for the different revision of the same architecture. Aysnc sharers that performance well on GCN 3 (FuryX) can absolutely kill performance on GCN. Then there are also complex interactions with the rest of the code base, e.g. the addition of a new lighting model and the associated fragment and compute sharers will totally change the CU utilization and affect how the async shaders are best coded.

GCN 1.0 (7970) doesn't have any ACE units, no A-Sync.

GCN 1.1 and 1.2 do.

D.P. · 20 May 2016 at 18:58

GCN 1.1 is what I meant. (Hawaii)

LtMatt · 20 May 2016 at 19:05

humbug said:
GCN 1.0 (7970) doesn't have any ACE units, no A-Sync.

GCN 1.1 and 1.2 do.

GCN 1.0 has 2 ACE's.

source
http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading