Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.
http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1720[Mahigan:]
Yes... Async helps them achieve what is in this slide...
Latency becomes hidden by overlapping executions of Wavefronts. That's why GCN retains the same degree of latency as you throw more and more Kernels at it. GCN is far more parallel than competing architectures. I wouldn't say it is faster, it's just able to take on far more computational workloads (Threads) at any given time.
If you throw too much work at Maxwell/2, it begins to bottleneck. We see this result with the staircase effect, on nVIDIAs architecture, in Beyond3Ds graphs. So while Maxwell2 can compute a Kernel containing 32 threads in 25ms, GCN can compute a Kernel containing 64 threads (twice the commands) in 38-50ms. The problem is that if you throw a Kernel, containing 32 threads, at GCN, it will take the same 38-50ms. This is the result Beyond3D is getting and concluding (Jawed for example) that Maxwell 2 is so superior at compute.
If you add Async to the mix, You have that same 64 thread Kernel taking 38-50ms as well as a parallel Graphic task. So if we do the math, Maxwell 2 would take 50ms to handle a Kernel with 64 threads plus the 8-12ms it takes to handle the Graphics task.
I think that Beyond3D are CUDA programmers, if true, you can't fault them for not knowing.
At the end of this, Beyond3D will likely conclude that Oxide did something wrong when, in fact, they did something wrong in their tests.
So that's the 2nd test on a 980Ti,
1st:
Compute: 5.67ms
Graphics: 16.77ms
Graphics + Compute: 21.15ms
Graphics + Compute (Single Commandlist): 20.70ms
And for 512th:
Compute: 76.11ms
Graphics: 16.77ms
Graphics + Compute: 97.38ms
Graphics + Compute (Single Commandlist): 2294.69ms
---------------------------------------
In both the 1st to 512th, Async Mode adds up the time. Single Commandlist mode went nuts.
Serial:
A (Compute) + B (Graphics) = A + B
Async:
A + B = A OR B
Right? Or is that not how we are meant to interpret the data of this test?
980 Ti
Compute only:
1. 6.79ms
Graphics only: 16.21ms
Graphics + compute:
1. 20.22ms
Graphics, compute single commandlist:
1. 20.04ms
Your result is identical to others. Running Graphics + Compute results in an additive output, close to the sum of compute + graphics.
Also your single commandlist results (forced), result in ever rising timings as we've seen with the others, up to 281st with a time of 2117.00ms!
Is this what Oxide is talking about? When they try to force direct async mode it would mess up.
First post here, I was curious on this matter so I ran both on my spare and main card.
Just sharing results if it might prove useful.
AsyncCompute
written by MDolenc
7950 Catalyst 15.8b
Graphics only: 57.37ms (29.24G pixels/s)
Graphics + compute: 238.70ms (7.03G pixels/s)
Graphics, compute single commandlist: 295.77ms (5.67G pixels/s)
980 Forceware 355.82
Graphics only: 23.23ms (72.21G pixels/s)
Graphics + compute: 103.58ms (16.20G pixels/s)
Graphics, compute single commandlist: 2433.35ms (0.69G pixels/s)
https://forum.beyond3d.com/posts/1869587/Yeah, except it's not functional at all:
There's an almost constant step-up between the blue and the red lines, and that step is almost always equal to the constant value of the green line. This means the GPU is doing rendering + context switching + compute task.
There's no Async Compute happening on the hardware level at all.
#322
ToTTenTranz, 26 minutes ago
I'm fully under the impression Nvidia is supporting async as far as enabling games to make a call for async at which point the driver just serialises the process and context switching is causing horrible performance issues when context switching overloads the hardware.
AKA: Emulating it.I didn't say they were, I said support it as far as enabling games to make calls, not that it's doing them in hardware. I'm saying, it's advertising it can but reordering commands at the driver level to take async calls from the game and turn them into serialised calls for the hardware.
It's effectively pretending to support async fully, while not actually supporting it at the hardware level.
AKA: Emulating it.
Simulating is the word I think, within a computer-science context.
Any hardware function can be emulated in software, it will be slow compared to dedicated hardware but it can be done. Also multi-core CPU's are emulated in VM's.I don't think you can emulate ASync, either the hardware is parallel or its serial, you can't emulate multi-core CPU's on a single core CPU.
What you can do is use software to organise task queuing to reduce latency, tho i'm pretty sure Nvidia are already doing that.