• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

Oculus Employees: "Preemption for context switches is best on AMD, Nvidia possibly catastrophic"

I think it'll stay a gimmick until they add more than just head tracking.

what do you need? smell, pain sensors? :D

There are too many different players in all of this to be excited at the moment. Too much segmentation. market needs to settle down, all the players need to kill each other off and one major player stays or two. and games can be created not for 6000 different VR headsets but for just one or two ;) Then it will become more serious to consider
 
Then you've misunderstood the graphs... What they show is that NVIDIA has lower latency dealing with smaller sets and only reach the same high level as AMD when dealing with the maximum size AMD can deal with... If you increased the set size again AMD's latency would double where as Nvidias latency would make another small step up

What it shows is that if you optimise your code for AMD's set size then nvidia is equal on latency, but if you optimise for nvidia then AMD will be behind

Nope!
 
what do you need? smell, pain sensors? :D

There are too many different players in all of this to be excited at the moment. Too much segmentation. market needs to settle down, all the players need to kill each other off and one major player stays or two. and games can be created not for 6000 different VR headsets but for just one or two ;) Then it will become more serious to consider

According to the latest Quantum Physics theories you don't smell anything, you hear it.:D
 
From them graphs I know what I think looks best, a continue smooth constant line or a line that starts low and gradually gets worse?

That's all am going to say on this matter, because for me the real taking comes when games start dropping..

Don't forget those Graphs are from a 7970 and a GTX 980TI.

At first the 980TI beats the 7970 but once a lot of threading is involved the performance of the 7970 surpasses the 980TI.

What it also shows that there is a constant latency of about 20ms on the Nvidia GPU in task switching, where as the 7970 is completely parallel.

This is about the 5'th time this has been explained and yet it still keeps cropping up.....

This latency on Nvidia is not good for performance in high threaded tasks but its also not good for VR as that latency results in a delay in Graphics rendering which can give you motion sickness. which is "Catastrophic"
 
Last edited:
It may have peed itself but you won't know until you look.:D

If you do then it means it's twin has pooped itself.:D

I think I need a drink after thinking about this.:eek::cool::)

maybe you needed a drink before you thought of this :D

quantum physics is fun :) love it as a light read :D
 
Don't forget those Graphs are from a 7970 and a GTX 980TI.

At first the 980TI beats the 7970 but once a lot of threading is involved the performance of the 7970 surpasses the 980TI.

What it also shows that there is a constant latency of about 20ms on the Nvidia GPU in task switching, where as the 7970 is completely parallel.

This is about the 5'th time this has been explained and yet it still keeps cropping up.....

This latency on Nvidia is not good for performance in high threaded tasks but its also not good for VR as that latency results in a delay in Graphics rendering which can give you motion sickness. which is "Catastrophic"

Yeah
And even still that is 980ti @31 simultaneous command lists vs FuryX @128

some guy on Beyond3d's forums made a small DX12 benchmark. He wrote some simple code to fill up the graphics and compute queues to judge if GPU architecture could execute them asynchronously.

He generates 128 command queues and 128 command lists to send to the cards, and then executes 1-128 simultaneous command queues sequentially. If running increasing amounts of command queues causes a linear increase in time, this indicates the card doesn't process multiple queues simultaneously (doesn't support Async Shaders).

He then released an updated version with 2 command queues and 128 command lists, many users submitted their results.

On the Maxwell architecture, up to 31 simultaneous command lists (the limit of Maxwell in graphics/compute workload) run at nearly the exact same speed - indicating Async Shader capability. Every 32 lists added would cause increasing render times, indicating the scheduler was being overloaded.
On the GCN architecture, 128 simultaneous command lists ran roughly the same, with very minor increased speeds past 64 command lists (GCN's limit) - indicating Async Shader capability. This shows the strength of AMD's ACE architecture and their scheduler.

Interestingly enough, the GTX 960 ended up having higher compute capability in this homebrew benchmark than both the R9 390x and the Fury X - but only when it was under 31 simultaneous command lists. The 980 TI had double the compute performance of either, yet only below 31 command lists. It performed roughly equal to the Fury X at up to 128 command lists.
 
Dozens? Going to be hundreds!
No seriously, where are those games?

EDIT: From reddit.


The problem is: are those going to be DX11 games with a few DX12 features,

That seems likely especially games like Ark that are getting DX12 patches.

What exactly is enough to constitute it being called a DX12 patch?

I imagine the Dev's of games like that will being doing just enough so that they can call it a DX12 patch and use it as a promotional tool. But it'll probably make very little difference at the end of the day.
 
Yeah
And even still that is 980ti @31 simultaneous command lists vs FuryX @128

some guy on Beyond3d's forums made a small DX12 benchmark. He wrote some simple code to fill up the graphics and compute queues to judge if GPU architecture could execute them asynchronously.

He generates 128 command queues and 128 command lists to send to the cards, and then executes 1-128 simultaneous command queues sequentially. If running increasing amounts of command queues causes a linear increase in time, this indicates the card doesn't process multiple queues simultaneously (doesn't support Async Shaders).

He then released an updated version with 2 command queues and 128 command lists, many users submitted their results.

On the Maxwell architecture, up to 31 simultaneous command lists (the limit of Maxwell in graphics/compute workload) run at nearly the exact same speed - indicating Async Shader capability. Every 32 lists added would cause increasing render times, indicating the scheduler was being overloaded.
On the GCN architecture, 128 simultaneous command lists ran roughly the same, with very minor increased speeds past 64 command lists (GCN's limit) - indicating Async Shader capability. This shows the strength of AMD's ACE architecture and their scheduler.

Interestingly enough, the GTX 960 ended up having higher compute capability in this homebrew benchmark than both the R9 390x and the Fury X - but only when it was under 31 simultaneous command lists. The 980 TI had double the compute performance of either, yet only below 31 command lists. It performed roughly equal to the Fury X at up to 128 command lists.

Maxwell has 32 Command lists totalling 32 Commands in parallel.

GCN 1.0 has 64 Command lists parallel across 2 ACE Units totalling 128 Commands in parallel.

GCN 1.1 and 1.2 has 64 Command lists parallel across 8 ACE Units totalling 512 Commands in parallel.

Thats the difference :)

If really pushed a 7970 in render / compute it is at least as fast a GTX 980TI where the latter will make up for its latency through brute force.

If you push a 290/390/X / Fury/X even harder, up to 4 times harder it will stand its ground while all the rest are ground to a halt.

If any DX12 game should ever use 300 or more command lines a 290 will put a GTX 980TI to shame.
 
Last edited:
It will be interesting to see nvidia owners unhappy since nvidia kinda lied about async shaders, and their software based solution will be horrible and much slower than AMD hardware solutions, especially in latency.


You see what I did there? Next time, when you are trying to be a crystal ball, please use facts and not speculations

He does it all the time - there was a chap called seronx on forums who did the same but on the AMD side. But LOL at him thinking David Kanter jumps to conclusions.

People really need to look at the sideline work he did on real world tech. If people thought Anand did good technical write-ups on Anandtech, David Kanter is another level....

Look at his LinkedIn account:

https://www.linkedin.com/in/kanterd

He is held in high regard in the industry,so its interesting he passed on the comments made.

OTH,I don't really care about VR too.
 
Last edited:
what do you need? smell, pain sensors? :D

There are too many different players in all of this to be excited at the moment. Too much segmentation. market needs to settle down, all the players need to kill each other off and one major player stays or two. and games can be created not for 6000 different VR headsets but for just one or two ;) Then it will become more serious to consider

This.
 
Maxwell has 32 Command lists totalling 32 Commands in parallel.

GCN 1.0 has 64 Command lists parallel across 2 ACE Units totalling 128 Commands in parallel.

GCN 1.1 and 1.2 has 64 Command lists parallel across 8 ACE Units totalling 512 Commands in parallel.

Thats the difference :)

If really pushed a 7970 in render / compute it is at least as fast a GTX 980TI where the latter will make up for its latency through brute force.

If you push a 290/390/X / Fury/X even harder, up to 4 times harder it will stand its ground while all the rest are ground to a halt.

If any DX12 game should ever use 300 or more command lines a 290 will put a GTX 980TI to shame.

Well Thief was the first mantle game title used Async compute to unleashed full power of 290X with 352 commands (44 CU x 8 ACE) but after Nvidia DirectX 11 wonder driver, GTX 780 Ti outperformed 290X easily. A 980 Ti is over twice performance put 290X to shame in Thief's mantle async compute.
 
Well Thief was the first mantle game title used Async compute to unleashed full power of 290X with 352 commands (44 CU x 8 ACE) but after Nvidia DirectX 11 wonder driver, GTX 780 Ti outperformed 290X easily. A 980 Ti is over twice performance put 290X to shame in Thief's mantle async compute.

Yeah,considering that Toms Hardware used Mantle and a Core i7 4770 was MASSIVELY outperformed massively by an FX8350 in that review with an R9 290X....that makes lots of sense LOL.

It was so buggy in Thief,being the second Mantle title,half the sites just resorted to using the DX11 in the end.

Who,knows NV might release a new wonder driver soon,too.

Out of interest are you mates with Rollo??:D
 
Last edited:
I don't think so

Game devs did the absolute minimum with Mantle and all they were concerned about was making the maximum profit.

If game devs are allowed to get away with it they will do the same with DX12, welcome to the world of broken games.

THe minimum they could do with Mantle was not use it at all... game devs did use it thus they didn't do the absolute minimum thus you're talking rubbish... again.

Adding support to multiple engines isn't the minimum possible, learning how to port a game isn't the minimum possible, learning to rewrite the engine to take control of memory management and changing how you do things isn't the minimum possible.

As Johan said they were required to rewrite how the game dealt with memory to better use Mantle/any low level API, they didn't but in learning and playing with Mantle they learned what they'd need to do with the NEXT game for ANY low level API. This effort then will mean the next game is designed with this fundamentally in mind. BF4 didn't launch with Mantle support, it was added in during the Mantle BETA so they could learn what needed to happen for their next games to come to PC completely designed with a low level API in mind.

Certain people never quite cottoned on to this, it was in beta, the games weren't perfect under Mantle. The idea was for devs to spend a few months added Mantle and working out where their engines and games need to be improved for in 1-2 years time when their games will come out with the default rendering path being a low level API. So in Johan's case, he added Mantle support to the engine in general, got BF4 working with it and spent months working out what changes he'd need for the next games and where to tweak the engine to maximise performance and usage for the future.
 
Going a bit off topic sometimes - why does it always turn into an nVIdia vs AMD thing :).
To be honest I think it's low on the list of nVidia's prorities. When VR really worth having, they'll be on top of it.
As someone else said, too many different players at the moment, too many headsets.
I wouldn't mind trying the technology but have no desire to spend money on it yet. If I wanted to try it and spend money on it and AMD cards were simply much better then I'd be tempted to pick up an AMD card just for the one task of VR.
 
it is not important to nvidia buyers, since nvidia trained them well to upgrade every year ;) nvidia has broken async shaders in maxwell, and VR hardware is kinda weak? No problem, you just need to upgrade next year to next gen cards which will have everything fixed for you :D Yeah, consumer friendly.

But yeah, regarding DX12 adoption rate, it does make sense what you are saying. all the previous DX versions were exclusive to new cards, thus longer adoption rate.

Haha LOL this kinda sounds like Apple...look at our new feature....crowd runs to buy new iphone......android users are: wft we have this for ages. :)
 
Back
Top Bottom