It is not entirely to do with poor code, it has a large amount to do with driver implementation as the developer has no idea what is going on in the black box, also that the AMD black box and the nvidia black box act differently which is to do with differences in abstraction implementation, due to "trade secrets," nvidia and AMD will never reveal to developers what is happening in the black box the things that would help them understand what the box is doing. The DirectX api has no spec for the box, only specs for what the hardware should be capable of and what calls the driver should accept and return to an application. Hence the majority of the mess we have as there is no grand and unified box which acts in the same way for all developers.
Yes some code might be completely broken on one manufacturers driver, but it may work flawlessly on another's due to the black box liking it. Hence the need for a complete rework of the shader code, then having it added into the driver.
The difference with the new api, is that the validation is all done by the dev and with validation points added to the driver abstraction, the dev can tell what is happening and change their code accordingly, which cannot be done with dx11 and bellow, due to the black box, with dx12 and vulkan, the developer has direct feedback with what is happening and can make changes accordingly.
It cant fix bad coding, but they now have the feedback needed to see what is going wrong.
Also with current sli and xfire, it is impossible for a dev to implement it correctly without working with AMD or nvidia, it is not part of DX spec and the driver and application need to do certain things for it to work right, this involves a lot of work between driver team and developer.
With low abstraction api's, the render has to directly do all the work splitting the scene up with whatever technique they choose, which will require far less ****ing about to put it as simply as possible, just to get a system like afr working, but it will still require work from the developers end, the driver is just dumb.
The best technique for multi-nonlike setups, would be "frame sub-division" where geometry and lighting etc. Is separated into individual jobs and sent to different places, like how the hydra system worked. It means that things can be better divided based on complexity, rather than just a chunk on screen being rendered by a different card, since even a small chunk could suddenly have something in it that is too complex for the hardware, causing performance to tank overall.
But like with everything, it requires work and stuff like this is very new.