• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

AMD VEGA confirmed for 2017 H1

Status
Not open for further replies.
Stopped reading there because again you are wrong.
Nvidia added hardware support for pixel level preemption and dynamic load balancing for starters:
https://www.bit-tech.net/hardware/graphics/2016/06/15/evga-geforce-gtx-1080-ftw-review/6
You stopped reading there? Jesus man, why does anybody waste their time talking to you about anything if you're not going to read a comment to get the full context?

Preemption's biggest usefulness is for timewarp in VR as it even states there in the article you linked. This is not the same thing as utilizing async compute shading, which is generally what people are referring to with this. Though it does have usefulness for certain resolution compositing techniques(which is still different from using async compute shaders).

And 'for starters'? Nah man, that's ALL there is to it. You named everything, so dont go acting like this was just the beginning of your argument. It is the beginning, middle and end of any 'hardware support' argument for this.

At the end of the day, AMD is the only one with *proper, full scale* support for this. Their compute engine system is literally built for this kind of thing, no overheard required, and as you even admit, is a more preferable solution for it. There are numerous comments from developers who back this up.
 
Last edited:
We will know when Nvidia can do it properly as they will shout it from the hills as if it's the next coming of Christ and so they should as it's a good feature. They will want everyone to know so it's not used against there cards as a selling point for AMD. This has not really happened with Pascal.
 
Would be great if you actually read the post before coming with incorrect claims, because that is not at all what it says:

"On Maxwell what would happen is Task A is assigned to 8 SMs such that execution time is 1.25ms and the FFU does not stall the SMs at all. Simple, right? However we now have 20% of our SMs going unused.
So we assign task B to those 2 SMs which will complete it in 1.5ms, in parallel with Task A's execution on the other 8 SMs.


Here is the problem; when Task A completes Task B will still have 0.25ms to go, and on Maxwell there's no way of reassigning those 8 SMs before Task B completes. Partitioning of resources is static(unchanging) and happens at the drawback boundary, controlled by the driver.

So if driver estimates the execution times of Tasks A and B incorrectly, the partitioning of execution units between them will lead to idle time as outlined above.

Pascal solves this problem with 'dynamic load balancing' ; the 8 SMs assigned to A can be reassigned to other tasks while Task B is still running; thus saturating the SMs and improving utilization.

For some reason many people have decided that Pascal uses preemption instead of async compute.

This makes no sense at all. Preemption is the act of telling a unit to halt execution of its running task. Preemption latency measures the time between the halt command being issued and the unit being ready for another assignment."

You couldn't be more wrong here.


I did read it, its always got SM's in reserve to load priority queues simultaneously, its still pre-empting and prioritising command queues, its not "Preemtion in the traditional sense" but Preemtion no less, what he is not telling you is that there is still a latency and a CPU overhead as the software needs to organise calls before executing them, the CPU runs those calculations. and its still very limited as its still calling draws on one command thread, this is why nVidia can't deal with Multi-threaded draw calls like AMD can, thats another thing he fails to mention.

The whole thing is irrelevant anyway, its a semantics argument... Pascal still can't deal with high core count CPU's like Broadwell-E and Ryzen, its bottlenecked by how well a 4 core performs, that's the limit.

Not with AMD, they have 4x as much room to grow.

If you own a Broadwell-E or Ryzen CPU you're better off with a Vega GPU, you'll get more out of it.
 
Last edited:
We will know when Nvidia can do it properly as they will shout it from the hills as if it's the next coming of Christ and so they should as it's a good feature. They will want everyone to know so it's not used against there cards as a selling point for AMD. This has not really happened with Pascal.

They'll also make sure to push for it's inclusion in any game development they're involved with as it can then be used to highlight why all the Pascal owners now need to move to Volta :D

EDIT: As you sort of mentioned at the end.
 
You stopped reading there? Jesus man, why does anybody waste their time talking to you about anything if you're not going to read a comment to get the full context?

Preemption's biggest usefulness is for timewarp in VR as it even states there in the article you linked. This is not the same thing as utilizing async compute shading, which is generally what people are referring to with this.

And 'for starters'? Nah man, that's ALL there is to it. You named everything, so dont go acting like this was just the beginning of your argument. It is the beginning, middle and end of any 'hardware support' argument for this.

At the end of the day, AMD is the only one with *proper, full scale* support for this. Their compute engine system is literally built for this kind of thing, no overheard required, and as you even admit, is a more preferable solution for it. There are numerous comments from developers who back this up.
Async shaders = marketing term created by AMD, so of course AMD are the only ones with "proper, full scale" support. Why don't you spend 10 minutes reading the pascal whitepaper instead of posting things you read online that just aren't true?

You are absolutely right that preemption is mostly useful for timewarp though.

From the whitepaper:

Two scenarios:

"These asynchronous workloads create two new scenarios for the GPU architecture to consider.

The first scenario involves overlapping workloads. Certain types of workloads do not fill the GPU completely by themselves. In these cases there is a performance opportunity to run two workloads at the same time, sharing the GPU and running more efficiently—for example a PhysX workload running concurrently with graphics rendering.

For overlapping workloads, Pascal introduces support for “dynamic load balancing.” In Maxwell generation GPUs, overlapping workloads were implemented with static partitioning of the GPU into a subset that runs graphics, and a subset that runs compute. This is efficient provided that the balance of work between the two loads roughly matches the partitioning ratio. However, if the compute workload takes longer than the graphics workload, and both need to complete before new work can be done, and the portion of the GPU configured to run graphics will go idle. This can cause reduced performance that may exceed any performance benefit that would have been provided from running the workloads overlapped. Hardware dynamic load balancing addresses this issue by allowing either workload to fill the rest of the machine if idle resources are available."


Second scenario:


"Time critical workloads are the second important asynchronous compute scenario. For example, an asynchronous timewarp operation must complete before scanout starts or a frame will be dropped. In this scenario, the GPU needs to support very fast and low latency preemption to move the less critical workload off of the GPU so that the more critical workload can run as soon as possible.​

As a single rendering command from a game engine can potentially contain hundreds of draw calls, with each draw call containing hundreds of triangles, and each triangle containing hundreds of pixels that have to be shaded and rendered. A traditional GPU implementation that implements preemption at a high level in the graphics pipeline would have to complete all of this work before switching tasks, resulting in a potentially very long delay.

To address this issue, Pascal is the first GPU architecture to implement Pixel Level Preemption. The graphics units of Pascal have been enhanced to keep track of their intermediate progress on rendering work, so that when preemption is requested, they can stop where they are, save off context information about where to start up again later, and preempt quickly. The illustration below shows a preemption request being executed."
 
They'll also make sure to push for it's inclusion in any game development they're involved with as it can then be used to highlight why all the Pascal owners now need to move to Volta :D

YAY! AMD might be getting async used after all people :D

Unless they block it when their cards are detected :p

'If AMD card detected - aysnc = **** the hell no!'
 
Last edited:
Async shaders = marketing term created by AMD, so of course AMD are the only ones with "proper, full scale" support. Why don't you spend 10 minutes reading the pascal whitepaper instead of posting things you read online that just aren't true?

You are absolutely right that preemption is mostly useful for timewarp though.

From the whitepaper:

Two scenarios:

"These asynchronous workloads create two new scenarios for the GPU architecture to consider.

The first scenario involves overlapping workloads. Certain types of workloads do not fill the GPU completely by themselves. In these cases there is a performance opportunity to run two workloads at the same time, sharing the GPU and running more efficiently—for example a PhysX workload running concurrently with graphics rendering.

For overlapping workloads, Pascal introduces support for “dynamic load balancing.” In Maxwell generation GPUs, overlapping workloads were implemented with static partitioning of the GPU into a subset that runs graphics, and a subset that runs compute. This is efficient provided that the balance of work between the two loads roughly matches the partitioning ratio. However, if the compute workload takes longer than the graphics workload, and both need to complete before new work can be done, and the portion of the GPU configured to run graphics will go idle. This can cause reduced performance that may exceed any performance benefit that would have been provided from running the workloads overlapped. Hardware dynamic load balancing addresses this issue by allowing either workload to fill the rest of the machine if idle resources are available."


Second scenario:


"Time critical workloads are the second important asynchronous compute scenario. For example, an asynchronous timewarp operation must complete before scanout starts or a frame will be dropped. In this scenario, the GPU needs to support very fast and low latency preemption to move the less critical workload off of the GPU so that the more critical workload can run as soon as possible.​

As a single rendering command from a game engine can potentially contain hundreds of draw calls, with each draw call containing hundreds of triangles, and each triangle containing hundreds of pixels that have to be shaded and rendered. A traditional GPU implementation that implements preemption at a high level in the graphics pipeline would have to complete all of this work before switching tasks, resulting in a potentially very long delay.

To address this issue, Pascal is the first GPU architecture to implement Pixel Level Preemption. The graphics units of Pascal have been enhanced to keep track of their intermediate progress on rendering work, so that when preemption is requested, they can stop where they are, save off context information about where to start up again later, and preempt quickly. The illustration below shows a preemption request being executed."
Well no, 'async shaders' is absolutely not a marketing term. The compute engine handles the shading 'computing' where applicable. This is literally one of its biggest tasks in terms of modern graphics rendering.

And doing it 'asynchronously' means being able to do that while simultaneously doing other graphics tasks on other parts of the GPU(whereas normally, the command queue has everything lined up sequentially). Unlike Nvidia's solution, where this can only be applied when there's 'idle resources available', this setup can be run at anytime, given the programming is done to facilitate it. It's a fairly significant difference. And depending on the application of course, can ultimately provide much better efficiency/performance.
 
They'll also make sure to push for it's inclusion in any game development they're involved with as it can then be used to highlight why all the Pascal owners now need to move to Volta :D

EDIT: As you sort of mentioned at the end.

Maybe.

Right now Broadwell-E and Ryzen owners would get more out of their CPU bound performance with Vega, or CF 580's :D

Thats assuming Vega is at least as fast as a 1080, no guarantees of that.
 
Well no, 'async shaders' is absolutely not a marketing term. The compute engine handles the shading 'computing' where applicable. This is literally one of its biggest tasks in terms of modern graphics rendering.

And doing it 'asynchronously' means being able to do that while simultaneously doing other graphics tasks on other parts of the GPU(whereas normally, the command queue has everything lined up sequentially). Unlike Nvidia's solution, where this can only be applied when there's 'idle resources available', this setup can be run at anytime, given the programming is done to facilitate it. It's a fairly significant difference. And depending on the application of course, can ultimately provide much better efficiency/performance.

No use in discussing this further as you seem to have no understanding of how this stuff works nor any interest in actually learning something. I mean how the hell are you supposed to be running something on resources that are already busy doing something else? Defies all logic. AMD needs 'idle' resources to do async compute just like nvidia. Async shaders = AMD's implementation and nothing to do with Async compute.

Here's yet another thread discussing and explaining everything for those that just refuse to grasp the concept: https://www.reddit.com/r/nvidia/comments/4mn0e3/can_someone_help_me_understand_the_difference/
 
I feel like Raj should probably stick with engineering and stop putting himself as the PR face of the company.

He regularly says some nonsense or overhypes products. This comment makes no sense - it's good for Vega that new content is bringing it to its knees? Naw, that doesn't make Vega look good. It makes it sound like Vega is struggling with it.

Besides, anybody can write a program that brings a GPU down to its knees. That's not an inherently good thing. It all depends on the actual level of visuals/experience achieved.
 
No use in discussing this further as you seem to have no understanding of how this stuff works nor any interest in actually learning something. I mean how the hell are you supposed to be running something on resources that are already busy doing something else? Defies all logic. AMD needs 'idle' resources to do async compute just like nvidia. Async shaders = AMD's implementation and nothing to do with Async compute.
The point is that you dont have to rely on a single command queue to get everything done. All GPU's have 'idle resources' in this scenario because only one bit of the GPU can be doing anything at one time(in terms of executed processes). Asynchronous compute changes this by allowing more parts of the GPU to be doing things at the same time instead of just waiting for their turn.
 
The point is that you dont have to rely on a single command queue to get everything done. All GPU's have 'idle resources' in this scenario because only one bit of the GPU can be doing anything at one time(in terms of executed processes). Asynchronous compute changes this by allowing more parts of the GPU to be doing things at the same time instead of just waiting for their turn.
Exactly and nvidia and amd have taken different apporaches, yet both support running graphics and compute tasks in parallell. AMD supports fast context switching which allows them to run tasks on the same CU's, concurrently, reducing idle time, while nvidia's hardware don't (not necessary as there is little holding the GPUs back, and implementing such a thing would only hurt performance overall).

AMD benefits from async due to hardware inefficiencies while nvidia don't have any such problems. Check out AMD's abysmal geometry performance for instance: https://hardforum.com/attachments/screenshot-www-pcgameshardware-de-2016-05-25-19-02-59-png.3726/
 
just checking in as I do periodically, do we know anything about this yet or still just arguing about nothing again?

Look in again at the beginning of May and hopefully we'll either be counting down the days or we'll know it's not coming in May at which point you should pop in at the beginning of June and hopefully we'll either be counting down the days or we'll know that it's not coming in June at which point you should pop in at the beginning of July and hopefully we'll either be counting down the days or we'll know it's not coming in July at which point you should pop in at the beginning of August and hopefully we'll either be counting down the days or we'll know it's not coming in August at which point you should pop in at the beginning of September yada yada yada...
 
Status
Not open for further replies.
Back
Top Bottom