AMD VEGA confirmed for 2017 H1

TheRealDeal · 20 Apr 2017 at 17:44

Gregster said:
Does Vega do Async? lol

For AMD's sake it better

Gregster · 20 Apr 2017 at 17:45

LoadsaMoney said:
Does it matter?

Not really lol but I am sure it will do it better than Polaris or NVidia.

Seanspeed · 20 Apr 2017 at 18:12

D.P. said:
Stopped reading there because again you are wrong.
Nvidia added hardware support for pixel level preemption and dynamic load balancing for starters:
https://www.bit-tech.net/hardware/graphics/2016/06/15/evga-geforce-gtx-1080-ftw-review/6

You stopped reading there? Jesus man, why does anybody waste their time talking to you about anything if you're not going to read a comment to get the full context?

Preemption's biggest usefulness is for timewarp in VR as it even states there in the article you linked. This is not the same thing as utilizing async compute shading, which is generally what people are referring to with this. Though it does have usefulness for certain resolution compositing techniques(which is still different from using async compute shaders).

And 'for starters'? Nah man, that's ALL there is to it. You named everything, so dont go acting like this was just the beginning of your argument. It is the beginning, middle and end of any 'hardware support' argument for this.

At the end of the day, AMD is the only one with *proper, full scale* support for this. Their compute engine system is literally built for this kind of thing, no overheard required, and as you even admit, is a more preferable solution for it. There are numerous comments from developers who back this up.

TheRealDeal · 20 Apr 2017 at 18:18

We will know when Nvidia can do it properly as they will shout it from the hills as if it's the next coming of Christ and so they should as it's a good feature. They will want everyone to know so it's not used against there cards as a selling point for AMD. This has not really happened with Pascal.

humbug · 20 Apr 2017 at 18:40

Toss3 said:
Would be great if you actually read the post before coming with incorrect claims, because that is not at all what it says:

"On Maxwell what would happen is Task A is assigned to 8 SMs such that execution time is 1.25ms and the FFU does not stall the SMs at all. Simple, right? However we now have 20% of our SMs going unused.
So we assign task B to those 2 SMs which will complete it in 1.5ms, in parallel with Task A's execution on the other 8 SMs.

Here is the problem; when Task A completes Task B will still have 0.25ms to go, and on Maxwell there's no way of reassigning those 8 SMs before Task B completes. Partitioning of resources is static(unchanging) and happens at the drawback boundary, controlled by the driver.

So if driver estimates the execution times of Tasks A and B incorrectly, the partitioning of execution units between them will lead to idle time as outlined above.

Pascal solves this problem with 'dynamic load balancing' ; the 8 SMs assigned to A can be reassigned to other tasks while Task B is still running; thus saturating the SMs and improving utilization.

For some reason many people have decided that Pascal uses preemption instead of async compute.

This makes no sense at all. Preemption is the act of telling a unit to halt execution of its running task. Preemption latency measures the time between the halt command being issued and the unit being ready for another assignment."

You couldn't be more wrong here.

I did read it, its always got SM's in reserve to load priority queues simultaneously, its still pre-empting and prioritising command queues, its not "Preemtion in the traditional sense" but Preemtion no less, what he is not telling you is that there is still a latency and a CPU overhead as the software needs to organise calls before executing them, the CPU runs those calculations. and its still very limited as its still calling draws on one command thread, this is why nVidia can't deal with Multi-threaded draw calls like AMD can, thats another thing he fails to mention.

The whole thing is irrelevant anyway, its a semantics argument... Pascal still can't deal with high core count CPU's like Broadwell-E and Ryzen, its bottlenecked by how well a 4 core performs, that's the limit.

Not with AMD, they have 4x as much room to grow.

If you own a Broadwell-E or Ryzen CPU you're better off with a Vega GPU, you'll get more out of it.

nashathedog · 20 Apr 2017 at 18:42

TheRealDeal said:
We will know when Nvidia can do it properly as they will shout it from the hills as if it's the next coming of Christ and so they should as it's a good feature. They will want everyone to know so it's not used against there cards as a selling point for AMD. This has not really happened with Pascal.

They'll also make sure to push for it's inclusion in any game development they're involved with as it can then be used to highlight why all the Pascal owners now need to move to Volta

EDIT: As you sort of mentioned at the end.

TheRealDeal · 20 Apr 2017 at 18:43

nashathedog said:
They'll also make sure to push for it's inclusion in any game development they're involved with as it can then be used to highlight why all the Pascal owners now need to move to Volta

Toss3 · 20 Apr 2017 at 18:44

Seanspeed said:
You stopped reading there? Jesus man, why does anybody waste their time talking to you about anything if you're not going to read a comment to get the full context?

Preemption's biggest usefulness is for timewarp in VR as it even states there in the article you linked. This is not the same thing as utilizing async compute shading, which is generally what people are referring to with this.

And 'for starters'? Nah man, that's ALL there is to it. You named everything, so dont go acting like this was just the beginning of your argument. It is the beginning, middle and end of any 'hardware support' argument for this.

At the end of the day, AMD is the only one with *proper, full scale* support for this. Their compute engine system is literally built for this kind of thing, no overheard required, and as you even admit, is a more preferable solution for it. There are numerous comments from developers who back this up.

Async shaders = marketing term created by AMD, so of course AMD are the only ones with "proper, full scale" support. Why don't you spend 10 minutes reading the pascal whitepaper instead of posting things you read online that just aren't true?

You are absolutely right that preemption is mostly useful for timewarp though.

From the whitepaper:

Two scenarios:

"These asynchronous workloads create two new scenarios for the GPU architecture to consider.

The first scenario involves overlapping workloads. Certain types of workloads do not fill the GPU completely by themselves. In these cases there is a performance opportunity to run two workloads at the same time, sharing the GPU and running more efficiently—for example a PhysX workload running concurrently with graphics rendering.

For overlapping workloads, Pascal introduces support for “dynamic load balancing.” In Maxwell generation GPUs, overlapping workloads were implemented with static partitioning of the GPU into a subset that runs graphics, and a subset that runs compute. This is efficient provided that the balance of work between the two loads roughly matches the partitioning ratio. However, if the compute workload takes longer than the graphics workload, and both need to complete before new work can be done, and the portion of the GPU configured to run graphics will go idle. This can cause reduced performance that may exceed any performance benefit that would have been provided from running the workloads overlapped. Hardware dynamic load balancing addresses this issue by allowing either workload to fill the rest of the machine if idle resources are available."

Second scenario:

"Time critical workloads are the second important asynchronous compute scenario. For example, an asynchronous timewarp operation must complete before scanout starts or a frame will be dropped. In this scenario, the GPU needs to support very fast and low latency preemption to move the less critical workload off of the GPU so that the more critical workload can run as soon as possible.

As a single rendering command from a game engine can potentially contain hundreds of draw calls, with each draw call containing hundreds of triangles, and each triangle containing hundreds of pixels that have to be shaded and rendered. A traditional GPU implementation that implements preemption at a high level in the graphics pipeline would have to complete all of this work before switching tasks, resulting in a potentially very long delay.

To address this issue, Pascal is the first GPU architecture to implement Pixel Level Preemption. The graphics units of Pascal have been enhanced to keep track of their intermediate progress on rendering work, so that when preemption is requested, they can stop where they are, save off context information about where to start up again later, and preempt quickly. The illustration below shows a preemption request being executed."

LoadsaMoney · 20 Apr 2017 at 18:51

nashathedog said:
They'll also make sure to push for it's inclusion in any game development they're involved with as it can then be used to highlight why all the Pascal owners now need to move to Volta

YAY! AMD might be getting async used after all people

Unless they block it when their cards are detected

'If AMD card detected - aysnc = **** the hell no!'

Seanspeed · 20 Apr 2017 at 19:04

Toss3 said:
Async shaders = marketing term created by AMD, so of course AMD are the only ones with "proper, full scale" support. Why don't you spend 10 minutes reading the pascal whitepaper instead of posting things you read online that just aren't true?

You are absolutely right that preemption is mostly useful for timewarp though.

From the whitepaper:

Two scenarios:

"These asynchronous workloads create two new scenarios for the GPU architecture to consider.

The first scenario involves overlapping workloads. Certain types of workloads do not fill the GPU completely by themselves. In these cases there is a performance opportunity to run two workloads at the same time, sharing the GPU and running more efficiently—for example a PhysX workload running concurrently with graphics rendering.

For overlapping workloads, Pascal introduces support for “dynamic load balancing.” In Maxwell generation GPUs, overlapping workloads were implemented with static partitioning of the GPU into a subset that runs graphics, and a subset that runs compute. This is efficient provided that the balance of work between the two loads roughly matches the partitioning ratio. However, if the compute workload takes longer than the graphics workload, and both need to complete before new work can be done, and the portion of the GPU configured to run graphics will go idle. This can cause reduced performance that may exceed any performance benefit that would have been provided from running the workloads overlapped. Hardware dynamic load balancing addresses this issue by allowing either workload to fill the rest of the machine if idle resources are available."

Second scenario:

"Time critical workloads are the second important asynchronous compute scenario. For example, an asynchronous timewarp operation must complete before scanout starts or a frame will be dropped. In this scenario, the GPU needs to support very fast and low latency preemption to move the less critical workload off of the GPU so that the more critical workload can run as soon as possible.

As a single rendering command from a game engine can potentially contain hundreds of draw calls, with each draw call containing hundreds of triangles, and each triangle containing hundreds of pixels that have to be shaded and rendered. A traditional GPU implementation that implements preemption at a high level in the graphics pipeline would have to complete all of this work before switching tasks, resulting in a potentially very long delay.

To address this issue, Pascal is the first GPU architecture to implement Pixel Level Preemption. The graphics units of Pascal have been enhanced to keep track of their intermediate progress on rendering work, so that when preemption is requested, they can stop where they are, save off context information about where to start up again later, and preempt quickly. The illustration below shows a preemption request being executed."

Well no, 'async shaders' is absolutely not a marketing term. The compute engine handles the shading 'computing' where applicable. This is literally one of its biggest tasks in terms of modern graphics rendering.

And doing it 'asynchronously' means being able to do that while simultaneously doing other graphics tasks on other parts of the GPU(whereas normally, the command queue has everything lined up sequentially). Unlike Nvidia's solution, where this can only be applied when there's 'idle resources available', this setup can be run at anytime, given the programming is done to facilitate it. It's a fairly significant difference. And depending on the application of course, can ultimately provide much better efficiency/performance.

humbug · 20 Apr 2017 at 19:04

nashathedog said:
They'll also make sure to push for it's inclusion in any game development they're involved with as it can then be used to highlight why all the Pascal owners now need to move to Volta

EDIT: As you sort of mentioned at the end.

Maybe.

Right now Broadwell-E and Ryzen owners would get more out of their CPU bound performance with Vega, or CF 580's

Thats assuming Vega is at least as fast as a 1080, no guarantees of that.

LoadsaMoney · 20 Apr 2017 at 19:07

Raja Koduri‏ @GFXChipTweeter 34m34 minutes ago
It always warms my heart when new content brings GPU down to its knee! It means we have content ready for Vega to shine

akarypid · 20 Apr 2017 at 19:10

I'm going to miss this thread...

Toss3 · 20 Apr 2017 at 19:13

Seanspeed said:
Well no, 'async shaders' is absolutely not a marketing term. The compute engine handles the shading 'computing' where applicable. This is literally one of its biggest tasks in terms of modern graphics rendering.

And doing it 'asynchronously' means being able to do that while simultaneously doing other graphics tasks on other parts of the GPU(whereas normally, the command queue has everything lined up sequentially). Unlike Nvidia's solution, where this can only be applied when there's 'idle resources available', this setup can be run at anytime, given the programming is done to facilitate it. It's a fairly significant difference. And depending on the application of course, can ultimately provide much better efficiency/performance.

No use in discussing this further as you seem to have no understanding of how this stuff works nor any interest in actually learning something. I mean how the hell are you supposed to be running something on resources that are already busy doing something else? Defies all logic. AMD needs 'idle' resources to do async compute just like nvidia. Async shaders = AMD's implementation and nothing to do with Async compute.

Here's yet another thread discussing and explaining everything for those that just refuse to grasp the concept: https://www.reddit.com/r/nvidia/comments/4mn0e3/can_someone_help_me_understand_the_difference/

Seanspeed · 20 Apr 2017 at 19:13

I feel like Raj should probably stick with engineering and stop putting himself as the PR face of the company.

He regularly says some nonsense or overhypes products. This comment makes no sense - it's good for Vega that new content is bringing it to its knees? Naw, that doesn't make Vega look good. It makes it sound like Vega is struggling with it.

Besides, anybody can write a program that brings a GPU down to its knees. That's not an inherently good thing. It all depends on the actual level of visuals/experience achieved.

Seanspeed · 20 Apr 2017 at 19:16

Toss3 said:
No use in discussing this further as you seem to have no understanding of how this stuff works nor any interest in actually learning something. I mean how the hell are you supposed to be running something on resources that are already busy doing something else? Defies all logic. AMD needs 'idle' resources to do async compute just like nvidia. Async shaders = AMD's implementation and nothing to do with Async compute.

The point is that you dont have to rely on a single command queue to get everything done. All GPU's have 'idle resources' in this scenario because only one bit of the GPU can be doing anything at one time(in terms of executed processes). Asynchronous compute changes this by allowing more parts of the GPU to be doing things at the same time instead of just waiting for their turn.

jjgreenwood · 20 Apr 2017 at 19:20

just checking in as I do periodically, do we know anything about this yet or still just arguing about nothing again?

Seanspeed · 20 Apr 2017 at 19:42

jjgreenwood said:
just checking in as I do periodically, do we know anything about this yet or still just arguing about nothing again?

This is all highly relevant to Vega because.....well, see........

<runs away>

Toss3 · 20 Apr 2017 at 19:45

Seanspeed said:
The point is that you dont have to rely on a single command queue to get everything done. All GPU's have 'idle resources' in this scenario because only one bit of the GPU can be doing anything at one time(in terms of executed processes). Asynchronous compute changes this by allowing more parts of the GPU to be doing things at the same time instead of just waiting for their turn.

Exactly and nvidia and amd have taken different apporaches, yet both support running graphics and compute tasks in parallell. AMD supports fast context switching which allows them to run tasks on the same CU's, concurrently, reducing idle time, while nvidia's hardware don't (not necessary as there is little holding the GPUs back, and implementing such a thing would only hurt performance overall).

AMD benefits from async due to hardware inefficiencies while nvidia don't have any such problems. Check out AMD's abysmal geometry performance for instance: https://hardforum.com/attachments/screenshot-www-pcgameshardware-de-2016-05-25-19-02-59-png.3726/

nashathedog · 20 Apr 2017 at 19:48

jjgreenwood said:
just checking in as I do periodically, do we know anything about this yet or still just arguing about nothing again?

Look in again at the beginning of May and hopefully we'll either be counting down the days or we'll know it's not coming in May at which point you should pop in at the beginning of June and hopefully we'll either be counting down the days or we'll know that it's not coming in June at which point you should pop in at the beginning of July and hopefully we'll either be counting down the days or we'll know it's not coming in July at which point you should pop in at the beginning of August and hopefully we'll either be counting down the days or we'll know it's not coming in August at which point you should pop in at the beginning of September yada yada yada...