Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.
Crytek on Async Compute: Interesting, But It Adds Non-Trivial Complexity; DX12 Support To Improve With Time
http://wccftech.com/crytek-async-compute-interesting-adds-nontrivial-complexity-dx12-support-improve-time/
Fix your new engine, it's a mess!
Originally Oxide said the Nvidia drivers were advertising that Async was supported but when the game code executed the async codepath, there were severe performance problems.
Rather dubious on Nvidia's part to fake Async support in this way since drivers can be made to show support for any feature even if it's not actually supported at all. Programs like GPU-Z, Hardwareinfo, etc would show that card as supporting a feature by interrogating the driver only.
We will find out soon enough if the 1080 really does support it or it's the same again.
Problem was if they just indiscriminately loaded up the pipeline - using an approach that works fine for GCN on Maxwell can easily cause stalls within the GPU or even take out Windows itself.
It's not that they have not developed a path. It's due to Nvidia requesting it be switched off. There is nothing stopping Nvidia asking it to be turned on. Nvidia originally requested to the developer Oxide that they not use Asynchronous compute at all. This request was not adhered to.
If that was really the case then there was no need to disable it altogether for Nvidia. Why not reduce the load instead since the Maxwell cards are supposedly able to do async similar to the older GCN cards like Tahiti which have only a few ACE units.
I think they lost a couple of their better engine devs to id.
Problem was if they just indiscriminately loaded up the pipeline - using an approach that works fine for GCN on Maxwell can easily cause stalls within the GPU or even take out Windows itself.
No, maxwell parts need a different implantation of async for best performance, pascal will need a different Implementation again.
Async compute is not switched on in AotS for any nvidia card at this time, that is straight from the developers.
Just watched the PCper podcast from last week and they came up with an interesting way of putting the whole ASync compute thing together.
Basically just like AMD's GCN shader and NVidia Cuda core, arrive at the same end result, they get there in different ways and it is the same with ASync Compute. Async compute is a concept and both companies have decided to go about it in completely different ways, NVidia's method is a rather brute force way of doing things whereas AMD's approach has more finesse. The whole thing is not helped at all because the developer has to code their games to use ASync and it is not a one method suits all situation.
Their conclusion on the issue was does it matter, it is the end performance that matters.
Dissclaimer: I know this opinion will not be popular with some people, but it is just that another opinion on the subject.
Just watched the PCper podcast from last week and they came up with an interesting way of putting the whole ASync compute thing together.
Basically just like AMD's GCN shader and NVidia Cuda core, arrive at the same end result, they get there in different ways and it is the same with ASync Compute. Async compute is a concept and both companies have decided to go about it in completely different ways, NVidia's method is a rather brute force way of doing things whereas AMD's approach has more finesse. The whole thing is not helped at all because the developer has to code their games to use ASync and it is not a one method suits all situation.
Their conclusion on the issue was does it matter, it is the end performance that matters.
Dissclaimer: I know this opinion will not be popular with some people, but it is just that another opinion on the subject.
It is an accurate description. And as I said before, async compute is a solution to a problem that Nvidia GPUs just don't have to the same extent as later GCN architectures. Nvidia's cards perform close to their theoretical optimimum and have a high utilization of the compute shaders. AMD GPUs have far more theoretical compute resources that they struggle to fully utilize, async compute helps AMD much more than Nvidia simply because Nvida GPUs were design differently to maximize real world utilization without some of the front-end bottlenecks that are limiting Hawaii and Fiji architectures. Just look at the Fiji theoretical FP32 compute performance compared to Maxwell And then at real game peformance. That is why AMD are heavily marketing async compute.
If you look at the Dx12 they only describe mixed multi-engine pipeline, nothing to do with async compute. There are many ways to achieve the DX12 requirements, even the old Fermi architecture meets the specifications. Later GCN architectures definitely have a much more advanced approach that is easier for developers, but then the cards stand to gain much more so the costs in transistor budget was probably worth it. AMD went along the lines of brute force with huge compute resources that are hard to fully exploit. As GCN developed they became. Ore and more bottlenecked and AMD sore value in making deviated Async scheduling engines in hardware to try to maximize utilization. nvidia spent mrie of the transistor budget on removing bottlenecks on command processing, geometry throughput etc. People laughed when they saw the Fiji compute shader count compared to the 980ti and we had all these wild claims that the FuryX would be 1.5X faster. The reality was very different.
Pascals's pixel level and instruction level preemption combined with the dynamic load balancing should greatly improve multi-engine support and reduce the complexity for developers. Is it the same approach as AMD? No. Does it need to be? No, does any of it matter? No. The only thing that matters to the consumer is the ultimate performance, if Pascal is faster than Polaris/Volta using advanced preemption instead of ACE units then nvidia has the superior solution. If smaller Polaris kicks Gp104 in the nads then AMD have a vastly superior solution. Performance is the only metric that matters, not how the transisor budget was used to achieve that performance. This isn't something like fragment shaders or Tessellation that is required in hardware to achieve correct rendering, it is merely a way to increase GPU utilization and efficiency. If the GPU doesn't have utilization problems then it's fairly irrelevant.
AMD approach to Dx12 multi-engines is liekly a very good approach in future bpUs when we are hitting 6-10,000 shaders
Also, technically they should not be calling it Async as that is false marketing unless they prove compute tasks can run in Parralel, not just sequentially or pseudo parralel. There is software available that is able to establish this out I look forward to seeing the results when the 1000 series arrives
You're seriously overestimating how much people care about the tiny minority of DX12 titles utilizing async shaders to a strong degree man.One thing you say I know is true.
Whatever brand has the best performance has the better solution.
So far Nvidia has a poor solution and instead of saying it's coming, it's coming, they have to prove it now to avoid would could potentially be millions in lost sales
I understand that - none of it covers what I'm trying to convey in terms of some simple way of looking at the equivalency of AMD and nVidia's approaches without getting bogged down into the technical debate.
It isn't about stuff running in parallel - someone correct me if I'm wrong but I believe the difference is this - say you have 5 queues A, B, C, D and E where E depends on the results of A and C:
GCN:
|--A--||---E---|
|------B-----|
|--C--|
|-----D-----|
Maxwell (2nd gen):
|--A--|-------|---E---|
|------B-----|
|--C--|
|-----D-----|
(Where the length of each is the time taken)
But due to the architecture differences that is much less of an issue for nVidia than if the same thing happened on an AMD GPU - especially with Pascal where the queue B that takes a long time could be run concurrent with E later on and take a lot less time which AMD can't do i.e. would look something like:
|--A--||---E---|
-------|----B---|
|--C--|
-------|---D---|
(This is grossly over simplifying but I think it illustrates the effective differences)
EDIT: I think with Pascal it could also look like combinations of:
|--A--||--E--|
|--B--||--B--|
|--C--|
|--D--||-D-|
as well depending on the workload.
Though according to the oxide guy: "He also states that while Maxwell 2 (GTX 900 family) is capable of parallel execution, “The hardware doesn’t profit from it much though, since it has only little ‘gaps’ in the shader utilization either way. So in the end, it’s still just sequential execution for most workload"