Interesting post on driver development Dx12/Vulkan/Mantle

Cambofrog · 8 Mar 2015 at 15:48

Hi guys, I found this post on the gamedev.net forums which I found very interesting.
The post is from a thread discussing the virtues (or otherwise) of DX12, Vulkan and Mantle.

This particular post gives some insight into the development of DX/OpenGL drivers in the past and the difficulties encountered, and what changes may be possible with DX12 and Vulkan.

I can't post a link to the original thread because it contains a few swearies, so a censored copy/paste below:

Many years ago, I briefly worked at NVIDIA on the DirectX driver team (internship). This is Vista era, when a lot of people were busy with the DX10 transition, the hardware transition, and the OS/driver model transition. My job was to get games that were broken on Vista, dismantle them from the driver level, and figure out why they were broken. While I am not at all an expert on driver matters (and actually sucked at my job, to be honest), I did learn a lot about what games look like from the perspective of a driver and kernel.

The first lesson is: Nearly every game ships broken. We're talking major AAA titles from vendors who are everyday names in the industry. In some cases, we're talking about blatant violations of API rules - one D3D9 game never even called BeginFrame/EndFrame. Some are mistakes or oversights - one shipped bad shaders that heavily impacted performance on NV drivers. These things were day to day occurrences that went into a bug tracker. Then somebody would go in, find out what the game screwed up, and patch the driver to deal with it. There are lots of optional patches already in the driver that are simply toggled on or off as per-game settings, and then hacks that are more specific to games - up to and including total replacement of the shipping shaders with custom versions by the driver team. Ever wondered why nearly every major game release is accompanied by a matching driver release from AMD and/or NVIDIA? There you go.

The second lesson: The driver is gigantic. Think 1-2 million lines of code dealing with the hardware abstraction layers, plus another million per API supported. The backing function for Clear in D3D 9 was close to a thousand lines of just logic dealing with how exactly to respond to the command. It'd then call out to the correct function to actually modify the buffer in question. The level of complexity internally is enormous and winding, and even inside the driver code it can be tricky to work out how exactly you get to the fast-path behaviors. Additionally the APIs don't do a great job of matching the hardware, which means that even in the best cases the driver is covering up for a LOT of things you don't know about. There are many, many shadow operations and shadow copies of things down there.

The third lesson: It's unthreadable. The IHVs sat down starting from maybe circa 2005, and built tons of multithreading into the driver internally. They had some of the world's best kernel/driver engineers in the world to do it, and literally thousands of full blown real world test cases. They squeezed that system dry, and within the existing drivers and APIs it is impossible to get more than trivial gains out of any application side multithreading. If Futuremark can only get 5% in a trivial test case, the rest of us have no chance.

The fourth lesson: Multi GPU (SLI/CrossfireX) is ******* complicated. You cannot begin to conceive of the number of failure cases that are involved until you see them in person. I suspect that more than half of the total software effort within the IHVs is dedicated strictly to making multi-GPU setups work with existing games. (And I don't even know what the hardware side looks like.) If you've ever tried to independently build an app that uses multi GPU - especially if, god help you, you tried to do it in OpenGL - you may have discovered this insane rabbit hole. There is ONE fast path, and it's the narrowest path of all. Take lessons 2 and 3, and magnify them enormously.

Deep breath.

Ultimately, the new APIs are designed to cure all four of these problems.
* Why are games broken? Because the APIs are complex, and validation varies from decent (D3D 11) to poor (D3D 9) to catastrophic (OpenGL). There are lots of ways to hit slow paths without knowing anything has gone awry, and often the driver writers already know what mistakes you're going to make and are dynamically patching in workarounds for the common cases.
* Maintaining the drivers with the current wide surface area is tricky. Although AMD and NV have the resources to do it, the smaller IHVs (Intel, PowerVR, Qualcomm, etc) simply cannot keep up with the necessary investment. More importantly, explaining to devs the correct way to write their render pipelines has become borderline impossible. There's too many failure cases. it's been understood for quite a few years now that you cannot max out the performance of any given GPU without having someone from NVIDIA or AMD physically grab your game source code, load it on a dev driver, and do a hands-on analysis. These are the vanishingly few people who have actually seen the source to a game, the driver it's running on, and the Windows kernel it's running on, and the full specs for the hardware. Nobody else has that kind of access or engineering ability.
* Threading is just a catastrophe and is being rethought from the ground up. This requires a lot of the abstractions to be stripped away or retooled, because the old ones required too much driver intervention to be properly threadable in the first place.
* Multi-GPU is becoming explicit. For the last ten years, it has been AMD and NV's goal to make multi-GPU setups completely transparent to everybody, and it's become clear that for some subset of developers, this is just making our jobs harder. The driver has to apply imperfect heuristics to guess what the game is doing, and the game in turn has to do peculiar things in order to trigger the right heuristics. Again, for the big games somebody sits down and matches the two manually.

Part of the goal is simply to stop hiding what's actually going on in the software from game programmers. Debugging drivers has never been possible for us, which meant a lot of poking and prodding and experimenting to figure out exactly what it is that is making the render pipeline of a game slow. The IHVs certainly weren't willing to disclose these things publicly either, as they were considered critical to competitive advantage. (Sure they are guys. Sure they are.) So the game is guessing what the driver is doing, the driver is guessing what the game is doing, and the whole mess could be avoided if the drivers just wouldn't work so hard trying to protect us.

So why didn't we do this years ago? Well, there are a lot of politics involved (cough Longs Peak) and some hardware aspects but ultimately what it comes down to is the new models are hard to code for. Microsoft and ARB never wanted to subject us to manually compiling shaders against the correct render states, setting the whole thing invariant, configuring heaps and tables, etc. Segfaulting a GPU isn't a fun experience. You can't trap that in a (user space) debugger. So ... the subtext that a lot of people aren't calling out explicitly is that this round of new APIs has been done in cooperation with the big engines. The Mantle spec is effectively written by Johan Andersson at DICE, and the Khronos Vulkan spec basically pulls Aras P at Unity, Niklas S at Epic, and a couple guys at Valve into the fold.

Three out of those four just made their engines public and free with minimal backend financial obligation.

Now there's nothing wrong with any of that, obviously, and I don't think it's even the big motivating raison d'etre of the new APIs. But there's a very real message that if these APIs are too challenging to work with directly, well the guys who designed the API also happen to run very full featured engines requiring no financial commitments. So that's served to considerably smooth the politics involved in rolling these difficult to work with APIs out to the market.

The last piece to the puzzle is that we ran out of new user-facing hardware features many years ago. Ignoring raw speed, what exactly is the user-visible or dev-visible difference between a GTX 480 and a GTX 980? A few limitations have been lifted (notably in compute) but essentially they're the same thing. MS, for all practical purposes, concluded that DX was a mature, stable technology that required only minor work and mostly disbanded the teams involved. Many of the revisions to GL have been little more than API repairs. (A GTX 480 runs full featured OpenGL 4.5, by the way.) So the reason we're seeing new APIs at all stems fundamentally from Andersson hassling the IHVs until AMD woke up, smelled competitive advantage, and started paying attention. That essentially took a three year lag time from when we got hardware to the point that compute could be directly integrated into the core of a render pipeline, which is considered normal today but was bluntly revolutionary at production scale in 2012. It's a lot of small things adding up to a sea change, with key people pushing on the right people for the right things.

Phew. I'm no longer sure what the point of that rant was, but hopefully it's somehow productive that I wrote it. Ultimately the new APIs are the right step, and they're retroactively useful to old hardware which is great. They will be harder to code. How much harder? Well, that remains to be seen. Personally, my take is that MS and ARB always had the wrong idea. Their idea was to produce a nice, pretty looking front end and deal with all the awful stuff quietly in the background. Yeah it's easy to code against, but it was always a bitch and a half to debug or tune. Nobody ever took that side of the equation into account. What has finally been made clear is that it's okay to have difficult to code APIs, if the end result just works. And that's been my experience so far in retooling: it's a pain in the ass, requires widespread revisions to engine code, forces you to revisit a lot of assumptions, and generally requires a lot of infrastructure before anything works. But once it's up and running, there's no surprises. It works smoothly, you're always on the fast path, anything that IS slow is in your OWN code which can be analyzed by common tools. It's worth it.

If anybody wants to see the original post in context, then search on gamedev.net for a thread titled "What are your opinions on DX12/Vulkan/Mantle?".
Post #6

andybird123 · 8 Mar 2015 at 17:09

Interesting read. Thanks.

humbug · 8 Mar 2015 at 17:14

Good read, thanks

Cambofrog · 8 Mar 2015 at 18:09

A snippet from another post in the same gamedev thread:

Apparently the mantle spec documents will be made public very soon, which will serve as a draft/preview of the Vulkan docs that will come later.

The poster supplies no links or source, but

if true.

Rroff · 8 Mar 2015 at 18:17

Disagree a bit on the threading side - there is a lot you simply just can't thread no ifs, buts or clever programming (and new APIs, etc. will have limited impact on that) - regardless of what some people might imply but there are other areas where clever programming can see significant gains from threading. You also need to look beyond raw performance - threading might only gain 5% in some areas but at the same time allow for much smoother rendering.

Gregster · 8 Mar 2015 at 18:20

I see this thread this morning but looked heavy reading, so skipped it. I am glad I came back, as a lot of information there I was completely unaware of. A very good and informative post. Well done Cambofrog

Cambofrog · 8 Mar 2015 at 18:40

Gregster said:
I see this thread this morning but looked heavy reading, so skipped it. I am glad I came back, as a lot of information there I was completely unaware of. A very good and informative post. Well done Cambofrog

Thanks Gregster, but all credit to the OP on gamedev of course. I was worried that the wall of text could be a bit off-putting, but worth the effort I think.

humbug · 8 Mar 2015 at 18:53

Rroff said:
Disagree a bit on the threading side - there is a lot you simply just can't thread no ifs, buts or clever programming (and new APIs, etc. will have limited impact on that) - regardless of what some people might imply but there are other areas where clever programming can see significant gains from threading. You also need to look beyond raw performance - threading might only gain 5% in some areas but at the same time allow for much smoother rendering.

Agree and Disagree, threading isn't so much the problem as you can allocate tasks to different threads.
The problem is it does require some programming and some can't do it well if at all, others are just lazy.

You only need look in the CPU room with contributors there trying to make a point about this i3 is faster when compared to other CPU's on this and that newly released game.... and so on.
Take a few minutes to look at whats actually going on and you soon realise the engine is only loading up one or two threads.

Its utter incompetence, laziness or both.

So unless the Game / Engine Developer puts the work in its basically junk.

Vulkan / Mantle / DX12 with good threading at the API level removes the reliance of the Developer having to get it right.

andybird123 · 8 Mar 2015 at 19:13

humbug said:
Vulkan / Mantle / DX12 with good threading at the API level removes the reliance of the Developer having to get it right.

I very much doubt that. It will give the developers the tools to multi thread, but it won't magically multi thread all by itself.

Mauller · 8 Mar 2015 at 19:30

humbug said:
Agree and Disagree, threading isn't so much the problem as you can allocate tasks to different threads.
The problem is it does require some programming and some can't do it well if at all, others are just lazy.

Vulkan / Mantle / DX12 with good threading at the API level removes the reliance of the Developer having to get it right.

Threading the graphics stack, does not give the performance boost some people think, with dx11 etc, since everything is still limited with going through the black box, on one thread.

The "low abstraction" apis don't magically make things threaded, the drivers are just dumb abstraction layers that convert api calls, to code that the hardware understands and vice versa. The threading side comes from the fact that the api calls are not limited to a single thread since there is no "black box" controlling everything.

Now the engine does all that work and can throw batches at the api using any thread it likes, or any number of threads with some synchronisation between them, in the engine.

Cambofrog · 8 Mar 2015 at 19:32

From the gamedev post I quoted, he seems to imply there will be more opportunities for multi-threading within the new API/drivers and for devs using them.

Multi-threading takes more discipline/ synchronisation of course but that isn't specific to game development.

I should imagine the new APIs will present opportunities for optimisation for API writers and game devs, in addition to multi-threading.

KillBoY_UK · 8 Mar 2015 at 19:37

that was a interesting read thanks for posting

humbug · 8 Mar 2015 at 19:40

andybird123 said:
I very much doubt that. It will give the developers the tools to multi thread, but it won't magically multi thread all by itself.

Granted its not going to do it all by its self but removing layer upon layer upon... of abstraction will reduce the skill level necessity and time.

Another thing, all these new API's have perfect timing, a lot of very capable and powerful Game Development tools are becoming accessible to the masses.

No longer the expensive secrete source in the hands of a few of the establishment.

Now anyone with some understanding of it can give this Game Development thing a go.
Pretty soon we are going to see somewhat of a mini bang of have a go Indy Developers, some of them might be quite good and with modern sleek API's being more forgiving of mistakes and lack of knowledge they would all run pretty good too.

Mauller said:
Threading the graphics stack, does not give the performance boost some people think, with dx11 etc, since everything is still limited with going through the black box, on one thread.

The "low abstraction" apis don't magically make things threaded, the drivers are just dumb abstraction layers that convert api calls, to code that the hardware understands and vice versa. The threading side comes from the fact that the api calls are not limited to a single thread since there is no "black box" controlling everything.

Now the engine does all that work and can throw batches at the api using any thread it likes, or any number of threads with some synchronisation between them, in the engine.

With a bit of luck those engines wont be too long coming.

nails666 · 8 Mar 2015 at 20:01

Thanks for that, as a software engineer (not games though) I can certainly believe and understand where some of those problems would come from

Mauller · 8 Mar 2015 at 20:39

humbug said:
With a bit of luck those engines wont be too long coming.

Yeah, The Oxide demo of "Ashes of the singularity" is already a brilliant demonstration. The Source 2 demo of a vulkan version of Dota2 on an intel IGP was also very good.

Not sure if the Valve VR, Portal Vr demo, was running on a Vulkan version of Source 2, but it did look fantastic and smooth.

Pottsey · 9 Mar 2015 at 20:37

“Although AMD and NV have the resources to do it, the smaller IHVs (Intel, PowerVR, Qualcomm, etc) simply cannot keep up with the necessary investment.”
What! Don’t Intel and Qualcomm dwarf AMD and NV and PowerVR have a very large driver team as well and just as much engineering resources as AMD or NV for GPU’s. You can hardly call them small companies lacking in resources. PowerVR even managed to get out there own low level API and get tons of game support for it.

Mauller · 9 Mar 2015 at 20:53

Still a good thing that they decided to use this model on the pc now, It was needed for years. Many a pc port could have worked far better (GTA4) and many a pc game (Crysis) could have worked far better with API's like these.

Although Glide was completely proprietary and implemented directly into the hardware, (The API was literally hard coded into the graphics card) It was so smooth and the performance was fantastic, good old voodoo cards and playing games in Glide mode. Just a shame that 3DFX shot themselves in the foot with the integrated method, made it hard for them to improve hardware without breaking older games, They really needed to use the Driver abstraction method as with Mantle, Vulkan and DX12.

Some videos still lurk on youtube of people playing games in glide, it is like night and day when comparing to other graphics api modes.

andybird123 · 9 Mar 2015 at 21:39

Erm, glide went open source and ended up working on nvidia and Amd cards, so i dont see how it was hard coded in to the card itself

Mauller · 9 Mar 2015 at 21:50

andybird123 said:
Erm, glide went open source and ended up working on nvidia and Amd cards, so i dont see how it was hard coded in to the card itself

They used a "code wrapper" to emulate the glide calls into directx. Lookup nGlide. But the API itself was implemented directly into hardware for all of 3dFX's cards.

andybird123 · 9 Mar 2015 at 22:55

I think you're just getting confused between the old style of fixed function cards and newer fully programmable gpgpu's. The api was written to work with 3dfx gpu's using the fixed functions that were available on those cards.