DirectX 12 Requires Different Optimization on Nvidia and AMD Cards, Lots of Details Shared

D.P. · 17 Mar 2016 at 14:14

LambChop said:
I also thought about the fact you can mix and match vendors in dx12 (if programmed for?). So you could always have an AMD and Nvidia card, SLI/XF for general games, disable one card or the other for your red and green games. Again, thats something that I believe will have to be programmed for by devs so it could be another hassle, but its an option.

It just makes things way more complex for developers. E.g., you have GCN 1.1. and 1.2cards that have good ASYNC compute for small task sizes due to better fencing and barrier implementation, but Maxwell does better for larger compute chunks with lower overhead. GCN 1.1 has very poor tessellation, GCN 1.2 moderate tessellation and Maxwell very good tessellation. If you develop a game with different async compute needs and different tessellation needs you now have some complex pick and mix scenario depending on what combination of hardware exists. And combining the different rendering capabilities into a single coherent frame is no simple tasks either.

I think the most likely scenarios are :
A) A developer doesn't care much at all about architectural optimization and if they do support mixed hardware then performance will be all over the place. The developer wont care, but the option will be there and some people might be lucky with good performance.
B). Games exploit the intel/amd IGPU to do some compute tasks, or something like shadow computation. OS everyone with iGPU gets a small but pleasant game
C) As B but using 2nd GPU forma different vendor.

AWPC · 17 Mar 2016 at 22:59

Remedy on Quantum Break DX12:
http://wccftech.com/remedy-dx12-matching-dx11-gpu-performance-trivial-architectures-driver/

turbot1984 · 18 Mar 2016 at 12:42

When is Mantle coming out?

AlamoX · 18 Mar 2016 at 13:01

turbot1984 said:
When is Mantle coming out?

it's called vulkan 1.0

LambChop · 22 Mar 2016 at 13:37

turbot1984 said:
When is Mantle coming out?

Fortunately, that died a long time ago.

Rroff · 22 Mar 2016 at 13:42

AWPC said:
Remedy on Quantum Break DX12:
http://wccftech.com/remedy-dx12-matching-dx11-gpu-performance-trivial-architectures-driver/

Echos kind of what I think - low level access is all well and good but the API design approach to that needs some serious thinking out hence my opinion overall DX12 is a huge mis-step. Ideally it needs a hybrid model where you can prototype at a more abstract level and then go back and delve in deeper once you hit milestones.

I might be a bit wrong about Vulkan - while certain aspects are a little more intuitive than the overly complex implementation in DX12 largely it isn't much less demanding in memory management implementation, etc.

D.P. · 22 Mar 2016 at 13:59

Rroff said:
Echos kind of what I think - low level access is all well and good but the API design approach to that needs some serious thinking out hence my opinion overall DX12 is a huge mis-step. Ideally it needs a hybrid model where you can prototype at a more abstract level and then go back and delve in deeper once you hit milestones.

I might be a bit wrong about Vulkan - while certain aspects are a little more intuitive than the overly complex implementation in DX12 largely it isn't much less demanding in memory management implementation, etc.

I think most developers will end up making a middleware wrapper to make DX12 work as a higher-level API, even higher than DX11, they tended to do that with earlier version anyway. Software has always been progressively abstracting away the hardware and layering higher and higher levels to expedite development and reduce bugs.

Instead of nvidia/AMD writing a DX11 driver you will now get the game developer writing a DX12 driver. We all know how terribly most developers mange to code under DX10/11 following a documented high-level API, so lord know what kind of state some games will come out as. But then we have already seen a taster in gears of War with massive performance disparities between different architectures due to code immaturity.

D.P. · 22 Mar 2016 at 14:15

This slide form Remedy is very telling:

GPU perf: Do things right, match DX11

Not trivial on all architectures

Messing up GPU mem mgmt can be costly

CPU perf: Easy to outperform DX11

But are you really API overhead bound?

Instancing, LODding, good culling: You’re not swamping the driver with draws.

So if the developer does everything right they match DX11 performance on the GPU, great. And as I have repeatedly pointed out before, the fact that DX and OGl are heavily command limited simply means that developers created a smart work around like instancing to reduce call overhead. So with DX12 developers can be lazy and do less command batching, if they put in the effort of making their own DX driver layer, and the end result if everything foes OK is similar DX11 GPU performance but improved CPU. Only th best developers will be able to exploit the higher API call limits and overcome any weaknesses of instancing and LOD techniques etc.

humbug · 22 Mar 2016 at 16:33

The thing with using workarounds to get over the limitations of DX11 your inevitably going to reduce the quality of the work.

I have never understood people who seemingly defend DX11 with that sort of reasoning.
If your going to spend your life working LOD's and merging assets so to pull less draw calls that is time spent reducing the quality of your work and not developing the game.

Most serious designers who work to the best of their ability are later frustrated by very tight Draw Call limitations, their creation may only be pulling 200K calls per second (which is nothing) but then the coders turn round and say "thats way too many, i only have 2m Calls to work with and your asset is tipping what i have left way over the edge, reduce it by 150K calls"

So the designer needs to go back to undo literally everything and reduce his asset down to a low quality static object...

Cite:

So wouldn't it be nice if the coder had 15m calls to work with instead of 2m... when that happens there is far less need to "optimise"
Its nothing to do with being lazy.

nashathedog · 22 Mar 2016 at 20:13

melmac said:
back on thread, I wonder what people are expecting from Dx12 and Vulkan? I think people are expecting massive frame rate increases from reading some of the forums.

That's not going to be the case. Dx12 and Vulkan's main advantage will be freeing up the CPU. It will allow for more diversity. For example, if you have 20 boxes on screen, instead of each box been exactly the same, Dx12 will allow you to have 20 different boxes. Ignore the numbers, because I am don't know the limitations. But's that's basically what a low level api will bring.

AC Unity is an example of a game that would benefit from it, It will allow for more variety in crowds of people, more complex AI too.

maonayze · 22 Mar 2016 at 21:11

nashathedog said:
There's nothing good about that, PC games should run on all PC's that meet the basic requirements regardless of what brand card you have.

+1

Developers should be looking to get the best out of both manufacturers cards regardless of who is sponsoring it and without crippling the performance of competitors cards. I really do not know why this has been allowed to happen (If indeed that is what is happening) as I think it is criminal to do this and behavior like that will lead to PC gaming doing a nose dive or someone being nailed to the wall for anti-competitive behaviour.

Lambchop if the bigger % of Market share was on the side of AMD then you wouldn't be saying what you said.

If this Exclusivity happens between brands of PC GFX cards then we may as well give up now! I hate exclusivity on the consoles and it would be worse if it were to happen in the PC world.

United we stand, Divided we fall

:mad:

D.P. · 22 Mar 2016 at 21:37

humbug said:
The thing with using workarounds to get over the limitations of DX11 your inevitably going to reduce the quality of the work.

it not inevitable. If there is no visible difference then there is no issue. If an object is not visible then culling it away at the CPU level so it doesn't require an API call is not impacting the final image in the slightest, nor is reject geometry when it reaches sub-pixel level.

I have never understood people who seemingly defend DX11 with that sort of reasoning.

I also wouldn't understand someone defending DX11 in that way, thankfully I have never seen anyone do that, got any links?

If your going to spend your life working LOD's and merging assets so to pull less draw calls that is time spent reducing the quality of your work and not developing the game.

Again, it doesn't necessarily reduce the quality of the work. LOD is just as critical under DX12 as DX11, otherwise there will be huge step backwards in graphics quality.

Most serious designers who work to the best of their ability are later frustrated by very tight Draw Call limitations, their creation may only be pulling 200K calls per second (which is nothing) but then the coders turn round and say "thats way too many, i only have 2m Calls to work with and your asset is tipping what i have left way over the edge, reduce it by 150K calls"

A good designer and a good developer can work together to get a good compromise and reduce draw calls to manage levels. Again, this will be no different under DX12 - designers will still have to cut draw calls down otherwise graphics quality will greatly deteriorate. What we want out of DX12 is improved graphics, so that means culling and LOD techniques will be just as important but will allow further scene complexity.Otherwise you end up with the same or worse IQ and developers that can lazily let designers have over the top model complexity.

So the designer needs to go back to undo literally everything and reduce his asset down to a low quality static object...

Or what happens in reality the designer uses techniques to cull objects that aren't visible and have dynamic LOD algorithms so the quality of the asset is proportional to the viewing distance.

Cite:

So wouldn't it be nice if the coder had 15m calls to work with instead of 2m... when that happens there is far less need to "optimise"
Its nothing to do with being lazy.

Sure, but it would also be nice to have 10x the fragment shader power, 10x the tessellation performance, and 10x the geometry throughput.

I think you are completely missing the point here. Remedy developers have stated that most games aren't actually API and/or CPU limited if properly designed, so reducing APi overhead doesn't gain you much performance under most situations.

Having a multi-threaded command processor was a very important addition, increasing draw call limits will be beneficial and drive graphics forwards, but this fundamentally doesn't change game engine design or GPU bottlenecks. With DX12 there is a big penalty for gaining a larger draw call limiit.

GoogalyMoogaly · 22 Mar 2016 at 21:42

humbug said:
The thing with using workarounds to get over the limitations of DX11 your inevitably going to reduce the quality of the work.

I have never understood people who seemingly defend DX11 with that sort of reasoning.
If your going to spend your life working LOD's and merging assets so to pull less draw calls that is time spent reducing the quality of your work and not developing the game.

Most serious designers who work to the best of their ability are later frustrated by very tight Draw Call limitations, their creation may only be pulling 200K calls per second (which is nothing) but then the coders turn round and say "thats way too many, i only have 2m Calls to work with and your asset is tipping what i have left way over the edge, reduce it by 150K calls"

So the designer needs to go back to undo literally everything and reduce his asset down to a low quality static object...

Cite:

So wouldn't it be nice if the coder had 15m calls to work with instead of 2m... when that happens there is far less need to "optimise"
Its nothing to do with being lazy.

I don't want DX12/Vulkan to be an excuse for devs to stop optimising games! It'd be nice if the limitations were less restrictive allowing for more eye-candy as an option. But it'd also be nice to optimise things as much as possible so that things are runnable on lower spec machines than some games need these days.

maonayze said:
+1

Developers should be looking to get the best out of both manufacturers cards regardless of who is sponsoring it and without crippling the performance of competitors cards. I really do not know why this has been allowed to happen (If indeed that is what is happening) as I think it is criminal to do this and behavior like that will lead to PC gaming doing a nose dive or someone being nailed to the wall for anti-competitive behaviour.

Lambchop if the bigger % of Market share was on the side of AMD then you wouldn't be saying what you said.

If this Exclusivity happens between brands of PC GFX cards then we may as well give up now! I hate exclusivity on the consoles and it would be worse if it were to happen in the PC world.

United we stand, Divided we fall

I'm not so worried about exclusives between consoles as I own neither, but it is annoying when games don't get released on the PC (like the UFC games). Having to then worry about if it's an AMD game or an Nvidia one would be more annoying. Currently I'm running card from both camps so it's not an issue, but I can't guarantee that'll be the case going forward. Also, I'm sure I'm in the minority by owning both.

However with lower level APIs it seems only natural that this is the way it'll go at each side will have strengths and weaknesses. Do we really want AMD and Nvidia to have to release cards that are similar just so they work better with low level APIs? I'd prefer both sides keep innovating, but this is the issue we'll face.

Mauller · 22 Mar 2016 at 23:45

D.P. said:
So if the developer does everything right they match DX11 performance on the GPU, great. And as I have repeatedly pointed out before, the fact that DX and OGl are heavily command limited simply means that developers created a smart work around like instancing to reduce call overhead. So with DX12 developers can be lazy and do less command batching, if they put in the effort of making their own DX driver layer, and the end result if everything foes OK is similar DX11 GPU performance but improved CPU. Only th best developers will be able to exploit the higher API call limits and overcome any weaknesses of instancing and LOD techniques etc.

Instancing heavily limits the amount of freedom the developers have with their game. Yes you can fill a field with grass but you are limited with what you can do with it. You can fill a city with people but, oh hey, i see the same people five times over.

Freeing up draw call limitations allows more freedom of expression and allows new game types to appear. Not being heavily limited will allow more dynamic and interactive games with many unique objects on screen.

The main reason we see so many linear shooters, Devoid open world games and devoid city spaces is due to the draw call imitations. And as with Assassins creed, when they do fill a city with many people, you just see the same people multiple times over in a scene with little variation.

Batching up commands is just a workaround for limitations with the API, if you want a more varied scene you can't batch up too many calls. And many current DX12 games simply don't have a great deal of variety in them still since they are being limited by needing a DX11 fallback. So in current games the only times we will see the biggest performance improvements are in Draw call limited scenarios.

humbug · 23 Mar 2016 at 00:16

D.P. said:
it not inevitable. If there is no visible difference then there is no issue. If an object is not visible then culling it away at the CPU level so it doesn't require an API call is not impacting the final image in the slightest, nor is reject geometry when it reaches sub-pixel level.

I also wouldn't understand someone defending DX11 in that way, thankfully I have never seen anyone do that, got any links?

Again, it doesn't necessarily reduce the quality of the work. LOD is just as critical under DX12 as DX11, otherwise there will be huge step backwards in graphics quality.

A good designer and a good developer can work together to get a good compromise and reduce draw calls to manage levels. Again, this will be no different under DX12 - designers will still have to cut draw calls down otherwise graphics quality will greatly deteriorate. What we want out of DX12 is improved graphics, so that means culling and LOD techniques will be just as important but will allow further scene complexity.Otherwise you end up with the same or worse IQ and developers that can lazily let designers have over the top model complexity.

Or what happens in reality the designer uses techniques to cull objects that aren't visible and have dynamic LOD algorithms so the quality of the asset is proportional to the viewing distance.

Sure, but it would also be nice to have 10x the fragment shader power, 10x the tessellation performance, and 10x the geometry throughput.

I think you are completely missing the point here. Remedy developers have stated that most games aren't actually API and/or CPU limited if properly designed, so reducing APi overhead doesn't gain you much performance under most situations.

Having a multi-threaded command processor was a very important addition, increasing draw call limits will be beneficial and drive graphics forwards, but this fundamentally doesn't change game engine design or GPU bottlenecks. With DX12 there is a big penalty for gaining a larger draw call limiit.

Can you link me to the full article of what Remedy developers said? I have a feeling you're misunderstanding what Remedy developers are saying.

I very much doubt what they are saying is "reduction in the API overhead has no benefit" i think you are quoting very much out of context.

I assume they work with consoles and if so they will know the benefits of reducing the API overheads, they would not get anything like the performance out of that low end hardware if they did not have reduced overhead API's.
The fact is its a huge benefit to them.

On LOD's, it has nothing to do with sub pixel geometry.
The whole point of LOD's is to reduce polygons by replacing high polygon geometry with low polygon geometry.
In other words at a set distance the object is reduced in quality, everyone can see LOD in action, its that thing where and object appears blocky and rubbish at 100m and gets progressively better in quality the closer you get to it.
Its not something developers want to do as it takes extra time to do it. and as anyone can see it reduces the quality of the object.

A good designer and a good developer can work together to get a good compromise and reduce draw calls to manage levels. Again, this will be no different under DX12 - designers will still have to cut draw calls down otherwise graphics quality will greatly deteriorate

They do this as par for course.
I'm really struggling to understand what you are trying to say with all of this and why, and why to back up whatever argument it is you are trying to make by using a few lines quoted (i do not doubt out of context) from a developer.

Do you agree that more is better than less or not?
My argument is very very simple, if you have more available to you then you have more to work with and you can do more.
I can plant more grass more densely packed with greater verity and run out of draw calls in DX11 long long long before i run out of tessellation throughput.

This is about as much as i can pack in.... This is how something like this should look but it cannot be done on a larger scale with serious compromises in quality, then it no longer looks like that..... it....

....looks like this.
A pretty bleak and dull sceptical.

If you don't agree with that i would really like to see a good reason for it it, right now it looks like you're saying a whole lot of nothing....

D.P. · 23 Mar 2016 at 15:08

humbug said:
Can you link me to the full article of what Remedy developers said? I have a feeling you're misunderstanding what Remedy developers are saying.

http://wili.cc/research/northlight_dx12/GDC16_Timonen_Northlight_DX12.pptx

But are you really API overhead bound?
Instancing, LODding, good culling: You’re not swamping the driver with draws.

If think you are the one that is understanding, the developers are quite clear.
In their experience they are not bound by the API overhead in DX11.

I very much doubt what they are saying is "reduction in the API overhead has no benefit" i think you are quoting very much out of context.

They didn't but no one else in this thread has said that either. Why have you put that in quotation marks, who are you quoting?

I assume they work with consoles and if so they will know the benefits of reducing the API overheads, they would not get anything like the performance out of that low end hardware if they did not have reduced overhead API's.

Not necessarily true at all. The consoles are largely limited by geometry and fragment shaders.

The fact is its a huge benefit to them.

Where does it say they, can I have a link?

On LOD's, it has nothing to do with sub pixel geometry.

No one said it did, you are the only person that linked those concepts.

The whole point of LOD's is to reduce polygons by replacing high polygon geometry with low polygon geometry.

yep, because GPU's don't have infinite geometry processing capabilities. The amount of draw calls required is independent to the number of vertices.

In other words at a set distance the object is reduced in quality, everyone can see LOD in action, its that thing where and object appears blocky and rubbish at 100m and gets progressively better in quality the closer you get to it.

I the dynamic LOD is particularly visible then it is badly implemented. DX12 doesn't change a thing here.

Its not something developers want to do as it takes extra time to do it. and as anyone can see it reduces the quality of the object.

Culling and dynamic LOD are some of the mainstays of games programming. If they didn't want to do it then they shouldn't be in that business. And no, it doesn't have to necessarily reduce quality. Culling an object that isn't visible cuts draw calls and has absolutely no impact on the rendered i age because guess what, it wasn't visible to begin with.

They do this as par for course.
I'm really struggling to understand what you are trying to say with all of this and why, and why to back up whatever argument it is you are trying to make by using a few lines quoted (i do not doubt out of context) from a developer.

I'm really lost at what the heck you are going on about, you seem to be adding quotes to text that no one in this thread has said?

Do you agree that more is better than less or not?
My argument is very very simple, if you have more available to you then you have more to work with and you can do more.

yes, and I said exactly that, is English not your native language or something? I replied with the word "sure", that in English is a positive affirmation, like saying "yes" or "I agree".

I can plant more grass more densely packed with greater verity and run out of draw calls in DX11 long long long before i run out of tessellation throughput.

You can also greatly increase the density of grass without increasing draw call count in the slightest.
The greater variety is the first thing that has actually made any sense from you, and that is exactly where DX12 will bring big benefits. The issue is that getting those benefits comes with a lot of costs that developers aren't interested. Developers don;t want to write an entire DX11 driver stack

Mauller · 23 Mar 2016 at 19:24

D.P. said:
The issue is that getting those benefits comes with a lot of costs that developers aren't interested. Developers don;t want to write an entire DX11 driver stack

The majority of Developers who work with consoles already do this. They have to write their own GPU management code and will have some experience doing it. And they are not writing an entire driver stack, just GPU state managment and render pipeline code.

The DX12 API to hardware ISA call translation is still performed by the IHV within the DX12 driver etc.

humbug · 23 Mar 2016 at 23:41

D.P. said:
http://wili.cc/research/northlight_dx12/GDC16_Timonen_Northlight_DX12.pptx

If think you are the one that is understanding, the developers are quite clear.
In their experience they are not bound by the API overhead in DX11.

They didn't but no one else in this thread has said that either. Why have you put that in quotation marks, who are you quoting?

Not necessarily true at all. The consoles are largely limited by geometry and fragment shaders.

Where does it say they, can I have a link?

No one said it did, you are the only person that linked those concepts.

yep, because GPU's don't have infinite geometry processing capabilities. The amount of draw calls required is independent to the number of vertices.

I the dynamic LOD is particularly visible then it is badly implemented. DX12 doesn't change a thing here.

Culling and dynamic LOD are some of the mainstays of games programming. If they didn't want to do it then they shouldn't be in that business. And no, it doesn't have to necessarily reduce quality. Culling an object that isn't visible cuts draw calls and has absolutely no impact on the rendered i age because guess what, it wasn't visible to begin with.

I'm really lost at what the heck you are going on about, you seem to be adding quotes to text that no one in this thread has said?

yes, and I said exactly that, is English not your native language or something? I replied with the word "sure", that in English is a positive affirmation, like saying "yes" or "I agree".

You can also greatly increase the density of grass without increasing draw call count in the slightest.
The greater variety is the first thing that has actually made any sense from you, and that is exactly where DX12 will bring big benefits. The issue is that getting those benefits comes with a lot of costs that developers aren't interested. Developers don;t want to write an entire DX11 driver stack

Got their power point, thanks. a waste of time me asking for context, there isn't any in it, its literally just a few lines of bullet points with no accompanying context.
It looks like a GDC speech slide collection, my guess is the context was in the actual speech, without that we can never know what they meant by:

CPU perf: Easy to outperform DX11

But are you really API overhead bound?

Instancing, LODding, good culling: You’re not swamping the driver with draws.

To me those two lines represent separate parts of a speech, none of it is entirely clear, the first point is a question, i presume the answer to it came with the accompanied speech, the second part which i assume your referring to with:

So if the developer does everything right they match DX11 performance on the GPU, great. And as I have repeatedly pointed out before, the fact that DX and OGl are heavily command limited simply means that developers created a smart work around like instancing to reduce call overhead. So with DX12 developers can be lazy and do less command batching, if they put in the effort of making their own DX driver layer, and the end result if everything foes OK is similar DX11 GPU performance but improved CPU. Only th best developers will be able to exploit the higher API call limits and overcome any weaknesses of instancing and LOD techniques etc.

You would have to make some assumptions about those bullet points in order to use them to support your arguments.
They are not assumptions that i recognise.
I do recognise that DX12 overcomes some extremely limiting factors of DX11, i also recognise that workarounds also overcome some of those same limitations, but not to anything like the extent DX12 flatly does, whats more those DX11 workarounds are too often at the cost work hours that would be better spend developing the actual game and its overall quality.
I even recognise that DX12 has its own set of issues, nothing is ever perfect but one thing is absolute, DX12 unlocks far more of any hardware's potential than DX11 can with any current level of modification.

I will leave it at that, no need to repeat what I have already said earlier.

Calin Banc · 24 Mar 2016 at 13:19

A nice talk from Dan Baker, if anyone is interested.

Competitor rules

DirectX 12 Requires Different Optimization on Nvidia and AMD Cards, Lots of Details Shared

Mobster