Results look promising for the future of gaming, starswarm is not a benchmark but shows how these api's can run with very high numbers of individual draw calls and is close to what a game could be, there is a lot of ai running in the demo, far more than people give it credit all while every ship, gun and bullet is being individually modelled in real-time.
Many draw calls is not inefficient, its required if you want to draw many unique objects, some may say its inefficient, but that is only because they are stuck in the mindset of older, managed directx, where draw call overhead is a problem.
If anything, it is a better representation of the cpu reductions that will be seen in games than a benchmark, as a benchmark is scripted and is heavily weighted to show gpu performance, although it will still be nice to have one for a raw gpu performance comparison.
We already see the great cpu benefits in multiplayer on bf4 etc, especially in map areas where draw call issues crop up.
On the performance of the 980, its simply the case that it was still being heavily bottlenecked.
One thing about gpu load monitors, they are showing the load on the driver more than actual gpu load. Its a case of the directx pipeline becoming fully loaded when the gpu load shows 98-100% etc, while the card itself could have many shaders going unutilised.
This is where cpu bottlenecking occurs, essentially at the quality setting in star swarm, the 980 was still very cpu bound by directx 11.
Another reason for the large gap, is that the Nvidia cards have a tendency to boost to over 200mhz more than their stated boost, while the unknown 290x was most likely only boosting to 1050mhz, if it boosted.
Essentially with AMD's poor driver performance, they are also trying to show the disparity between giving the developer the tools to perform the optimisations themselves against needing to optimise in drivers on a situation by situation basis, which only increases driver size and complexity an also as shown with nvidia, required a lot of work to get the performance they did, but even then the 980 was still very bottlenecked.
Other reasons for the dramatic increase in performance in directx 12 on the 980 are the same as with mantle, having a monolithic pipeline and being able to parrarelise more work in the pipeline, so the card is doing more per cycle, with the pipeline being structured and predictable to how the developer wants to utilise it, all while being better able to utilise all of the cards resources, its on the mantle developer slides from a year or so ago.
Just throwing out my ideas on the subject and hello to all since this is my first post from just being a lurker. If anyone spoke to me on the BF 4 forums a while ago, I'm the same Mauller, so hello again.