• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

AMD’s DirectX 12 Advantage Explained – GCN Architecture More Friendly To Parallelism Than Maxwell

[Mahigan:]
Yes... Async helps them achieve what is in this slide...
700


Latency becomes hidden by overlapping executions of Wavefronts. That's why GCN retains the same degree of latency as you throw more and more Kernels at it. GCN is far more parallel than competing architectures. I wouldn't say it is faster, it's just able to take on far more computational workloads (Threads) at any given time.

If you throw too much work at Maxwell/2, it begins to bottleneck. We see this result with the staircase effect, on nVIDIAs architecture, in Beyond3Ds graphs. So while Maxwell2 can compute a Kernel containing 32 threads in 25ms, GCN can compute a Kernel containing 64 threads (twice the commands) in 38-50ms. The problem is that if you throw a Kernel, containing 32 threads, at GCN, it will take the same 38-50ms. This is the result Beyond3D is getting and concluding (Jawed for example) that Maxwell 2 is so superior at compute.

If you add Async to the mix, You have that same 64 thread Kernel taking 38-50ms as well as a parallel Graphic task. So if we do the math, Maxwell 2 would take 50ms to handle a Kernel with 64 threads plus the 8-12ms it takes to handle the Graphics task.

I think that Beyond3D are CUDA programmers, if true, you can't fault them for not knowing.

At the end of this, Beyond3D will likely conclude that Oxide did something wrong when, in fact, they did something wrong in their tests.
http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1720


Some interesting results.

So that's the 2nd test on a 980Ti,

1st:
Compute: 5.67ms
Graphics: 16.77ms

Graphics + Compute: 21.15ms
Graphics + Compute (Single Commandlist): 20.70ms

And for 512th:
Compute: 76.11ms
Graphics: 16.77ms
Graphics + Compute: 97.38ms
Graphics + Compute (Single Commandlist): 2294.69ms

---------------------------------------

In both the 1st to 512th, Async Mode adds up the time. Single Commandlist mode went nuts.

Serial:
A (Compute) + B (Graphics) = A + B

Async:
A + B = A OR B

Right? Or is that not how we are meant to interpret the data of this test?

980 Ti
Compute only:
1. 6.79ms

Graphics only: 16.21ms

Graphics + compute:
1. 20.22ms

Graphics, compute single commandlist:
1. 20.04ms

Your result is identical to others. Running Graphics + Compute results in an additive output, close to the sum of compute + graphics.

Also your single commandlist results (forced), result in ever rising timings as we've seen with the others, up to 281st with a time of 2117.00ms!

Is this what Oxide is talking about? When they try to force direct async mode it would mess up.



First post here, I was curious on this matter so I ran both on my spare and main card.

Just sharing results if it might prove useful.

AsyncCompute
written by MDolenc

7950 Catalyst 15.8b

Graphics only: 57.37ms (29.24G pixels/s)
Graphics + compute: 238.70ms (7.03G pixels/s)
Graphics, compute single commandlist: 295.77ms (5.67G pixels/s)

980 Forceware 355.82

Graphics only: 23.23ms (72.21G pixels/s)
Graphics + compute: 103.58ms (16.20G pixels/s)
Graphics, compute single commandlist: 2433.35ms (0.69G pixels/s)


And the last example the 980 is clearly beating the 7950 as its a much more powerful card until you get to the last test Parallel Asynchronous Compute.
https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-15
 
Last edited:
I'm fully under the impression Nvidia is supporting async as far as enabling games to make a call for async at which point the driver just serialises the process and context switching is causing horrible performance issues when context switching overloads the hardware.
 
Yeah, except it's not functional at all:

3NrhGRo.png

There's an almost constant step-up between the blue and the red lines, and that step is almost always equal to the constant value of the green line. This means the GPU is doing rendering + context switching + compute task.
There's no Async Compute happening on the hardware level at all.

#322
ToTTenTranz, 26 minutes ago
https://forum.beyond3d.com/posts/1869587/
 
I'm fully under the impression Nvidia is supporting async as far as enabling games to make a call for async at which point the driver just serialises the process and context switching is causing horrible performance issues when context switching overloads the hardware.

I think there is a bit of a contradiction in that statement.

Nvidia can't be running ASync if the operations are serialised, the serial switching between tasks is as a result of a lack of parallel operations

ASync is parallel, without it its running tasks in Serial which is exactly what Maxwell is doing.
 
I didn't say they were, I said support it as far as enabling games to make calls, not that it's doing them in hardware. I'm saying, it's advertising it can but reordering commands at the driver level to take async calls from the game and turn them into serialised calls for the hardware.

It's effectively pretending to support async fully, while not actually supporting it at the hardware level.
 
I didn't say they were, I said support it as far as enabling games to make calls, not that it's doing them in hardware. I'm saying, it's advertising it can but reordering commands at the driver level to take async calls from the game and turn them into serialised calls for the hardware.

It's effectively pretending to support async fully, while not actually supporting it at the hardware level.
AKA: Emulating it.
 
AKA: Emulating it.

I wondered if I should use the term but, ultimately emulation is usually doing it but in software much more slowly. This is more a case of not supporting it at all but telling the outside world you do support it.

AMD and Nvidia afaik support quite a few things only in software but a lot can be done cheaply and either couldn't have hardware made for it in the current gen or was simply deemed not required. There are other things that take a huge performance hit running in hardware and are nearly useless done that way but at least still possible.

Nvidia's async seems more a case of faking it than emulating it.
 
I don't think you can emulate ASync, either the hardware is parallel or its serial, you can't emulate multi-core CPU's on a single core CPU.

What you can do is use software to organise task queuing to reduce latency, tho i'm pretty sure Nvidia are already doing that.
 
Last edited:
I don't think you can emulate ASync, either the hardware is parallel or its serial, you can't emulate multi-core CPU's on a single core CPU.

What you can do is use software to organise task queuing to reduce latency, tho i'm pretty sure Nvidia are already doing that.
Any hardware function can be emulated in software, it will be slow compared to dedicated hardware but it can be done. Also multi-core CPU's are emulated in VM's.
 
nVidia's stinger device used to have quite impressive software performance for missing hardware features but I can't see even running a virtual async compute system emulated via serial compute being able to provide useful performance.

On a semi related note they used to have a real time 1:1 performance software emulation of the kepler core for testing that required a server array the size of a shipping container heh.
 
Back
Top Bottom