• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

AMD VEGA confirmed for 2017 H1

Status
Not open for further replies.
AMD really need to start making noises with regards to Vega. Building up some momentum and excitement prior to release.

This will not only increase sales but stop people going Nvidia in the interim.

Or more likely it'll simply open the regulator on the hype train full steam ahead only to derail, crash and burn at some point as people's ridiculous expectations aren't met... and it won't sway green people anyway unless they pull off another Ryzen.
 
Nvidia do not have the hardware to properly implement async compute. This is not a case of them just 'not wanting' to do something. These architectures are designed many years ahead of time and they cant just go throw async compute-capable engines on their GPU's late in the process.
That is not true at all and shows a big misunderstanding of what async compute is. You are probably confused by the fact that nvidia moved the scheduler from a fixed hardware scheduler in Fermi to a mixed hardware-software scheduler in Maxwell and misunderstood that to mean that Nvidia removed scheduling hardware. But move the scheduler into drivers actually can enable a much greater flexibility and the ability to optimize at much courses level with much more sophisticated optimization. That isn't software emulation or any such nonsense, its a design decision that can improve performance and functionality. Maxwell's async performance suffered for other reasons such as the granularity of the preemption and static compute partitioning. Much of this was rectified in Pascal which is shown in benchmarks:

https://www.pcper.com/reviews/Graph...Looking-DX12-Asynchronous-Compute-Performance
GTX1080 gets a 6.8% performance improvement with async
RX480 gets an 8.5% improvement.


Would be interesting to see result with the latest NVidia drivers and how the 1080ti does but I would be expecting at least a 10% performance improvement from enabling async.


EDIT:
After some google here is some Grears of War numbers:
http://www.tweaktown.com/tweakipedia/113/gears-war-tested-dx12-gtx-1080-rx-480-more/index.html



Titan Pascal (the 1st iteration, stupid Nvidia naming). Get a 10% performance increase with Async on at 1080. The RX480 looses 2% performance. The FuryX gains 3.8%. One might be very tempted say that it is AMD that doesn't have a very good Async compute solution from such results, certainly if the shoe was ont eh other foot then such results would be used to say company X cant do Async! The reality is async compute is veyr complex, is never guaranteed to improve performance, and is highly dependent on the hardware and software. there is a good reason why even in Ashes of the Singularity async compute was removed for a lot of AMD hardware- it degraded performance and was hard to maintain.


EDIT2: and before someone starts some childish name calling, no, my opinion is that Polaris do a have a stronger async design than Pascal but I don;t think it is relevant in the slightest. The only metric a user should worry about is performance. Async compute can help improve performance when the card is getting bottle-necked and not used to full capacity. With a very efficient and well balanced GPU async compute has less value at the current time. nvidia cards are known to be extremely efficient
 
Last edited:
DX12 is not a good measure of A-Sync given that it doesn't a

That is not true at all and shows a big misunderstanding of what async compute is. You are probably confused by the fact that nvidia moved the scheduler from a fixed hardware scheduler in Fermi to a mixed hardware-software scheduler in Maxwell and misunderstood that to mean that Nvidia removed scheduling hardware. But move the scheduler into drivers actually can enable a much greater flexibility and the ability to optimize at much courses level with much more sophisticated optimization. That isn't software emulation or any such nonsense, its a design decision that can improve performance and functionality. Maxwell's async performance suffered for other reasons such as the granularity of the preemption and static compute partitioning. Much of this was rectified in Pascal which is shown in benchmarks:

https://www.pcper.com/reviews/Graph...Looking-DX12-Asynchronous-Compute-Performance
GTX1080 gets a 6.8% performance improvement with async
RX480 gets an 8.5% improvement.


Would be interesting to see result with the latest NVidia drivers and how the 1080ti does but I would be expecting at least a 10% performance improvement from enabling async.

nVidia's 'Software Solution' is limited to less threads than AMD's 'Hardware Solution', there is also a CPU overhead cost with nVidia's software A-Sync, its why nVidia's CPU bound performance doesn't scale with more than 4 cores where as AMD's does.

Its why nVidia are so bottlenecked on Intel and AMD lower Mhz 6 and 8 core CPU's vs higher Mhz 4 core CPU's. AMD will make use of the extra threads, with nVidia they sit idle, nVidia's A-Sync can't make use of the extra threads.
 
Last edited:
That is not true at all and shows a big misunderstanding of what async compute is.
It is essentially true. I'm simplifying things a bit for the sake of not writing out a ton on it, but your comments are not really disproving my point. I realize async compute can work on Pascal, but it doesn't have specific hardware support for it. As you say, they've moved the functionality largely to a driver-level function which is not going to be nearly as efficient.

And yes, async compute programming is complex and can differ from title to title and whatnot, but generally, AMD's cards with specific hardware support for it should generally be better able to achieve gains here with less overhead/effort. And with consoles being based on AMD's GPU architecture, I would expect more async compute effort to be optimized towards their specific hardware solution *generally*.

I'm also not claiming this is a massive deal, but I would suggest you're trying to downplay it as much as possible if you say it's not 'at all' relevant. Truth is in the middle and I imagine it will *slowly* become more and more relevant.
 
It is essentially true. I'm simplifying things a bit for the sake of not writing out a ton on it, but your comments are not really disproving my point. I realize async compute can work on Pascal, but it doesn't have specific hardware support for it..

Stopped reading there because again you are wrong.
Nvidia added hardware support for pixel level preemption and dynamic load balancing for starters:
https://www.bit-tech.net/hardware/graphics/2016/06/15/evga-geforce-gtx-1080-ftw-review/6
 
Its why nVidia are so bottlenecked on Intel and AMD lower Mhz 6 and 8 core CPU's vs higher Mhz 4 core CPU's. AMD will make use of the extra threads, with nVidia they sit idle, nVidia's A-Sync can't make use of the extra threads.

Its more complicated than that - and probably why Ryzen struggles - its performance and ability to take advantage of additional cores is bound by tickrate and to some extent memory latency/bandwidth - to take advantage to the same extent of threaded capabilities you need increasingly faster tick over and communication between worker threads and the main scheduler/marshalling thread.
 
DX12 is not a good measure of A-Sync given that it doesn't a



nVidia's 'Software Solution' is limited to less threads than AMD's 'Hardware Solution', there is also a CPU overhead cost with nVidia's software A-Sync, its why nVidia's CPU bound performance doesn't scale with more than 4 cores where as AMD's does.

Its why nVidia are so bottlenecked on Intel and AMD lower Mhz 6 and 8 core CPU's vs higher Mhz 4 core CPU's. AMD will make use of the extra threads, with nVidia they sit idle, nVidia's A-Sync can't make use of the extra threads.


Nvidia don't have a software solution, they have a hardware solution, It is impossible to do async compute in software, it would be utterly pointless. You still don;t seems to understand the basics of what async compute is.


NVidia driver overhead on different CPUs is completely irrelevant. Perhaps NVidia's drivers only launch 4 threads, so of performance wont scale with more cores. Considering most users only have 4 cores or less it makes absolute sense to optimize performance for the most common users. Diving work against more threads does not necessarily result in more performance inf act it quite often can result in lower performance if therr are data races and mutex fighting. Your post stinks fo that stupid Nvidia DX12 driver BS on Ryzen that was utterly dis-proven.
 
Its more complicated than that - and probably why Ryzen struggles - its performance and ability to take advantage of additional cores is bound by tickrate and to some extent memory latency/bandwidth - to take advantage to the same extent of threaded capabilities you need increasingly faster tick over and communication between worker threads and the main scheduler/marshalling thread.

Whatch this from here... https://youtu.be/0tfTZjugDeg?t=10m53s

Despite this with all the micro code patched and so on.... the IPC in games is actually roughly equal to Intel now, just as it always has been in productivity work.

The Ryzen Chips are about 20% lower clocked than the 7700K, with nVidia GPU's that actually shows though, the average review has the 7700K 20% ahead, apart from Tomb Raider which is more like 40%.

AdoredTV picked up on this, it didn't make any sense to him so he investigated it, while the reason for Tomb Raider odd performance is a conundrum.. what is obvious was that all reviewers are using nVidia GPU's, for good reason, but, if you can get enough AMD GPU power, like CF 480's what you find is that with AMD GPU's is the much lower clocked Ryzen chips catch right up with the 7700K, even in Tomb Raider, the reason being AMD's A-Sync Compute makes use of the extra threads on Ryzen while nVidia don't.

With the 295 X2 the 3Ghz Ryzen 1700 is faster than the 4.2Ghz 7700K. again in Tomb Raider.
https://www.youtube.com/watch?v=nLRCK7RfbUg&feature=youtu.be&t=7m47s

Again with AMD GPU (RX 480) 1800X vs 6900K, 1800X is overall faster....

As it should be with a slightly higher IPC than Broadwell-E

nVidia's A-Sync while good in limiting API's like DX11 is its self limiting the more powerful higher thread count CPU's in DX12 and Vulkan.

Wy_ZHas_Y.jpg
 
Last edited:
Whatch this from here... https://youtu.be/0tfTZjugDeg?t=10m53s

Despite this with all the micro code patched and so on.... the IPC in games is actually roughly equal to Intel now, just as it always has been in productivity work.

The Ryzen Chips are about 20% lower clocked than the 7700K, with nVidia GPU's that actually shows though, the average review has the 7700K 20% ahead, apart from Tomb Raider which is more like 40%.

AdoredTV picked up on this, it didn't make any sense to him so he investigated it, while the reason for Tomb Raider odd performance is a conundrum.. what is obvious was that all reviewers are using nVidia GPU's, for good reason, but, if you can get enough AMD GPU power, like CF 480's what you find is that with AMD GPU's is the much lower clocked Ryzen chips catch right up with the 7700K, even in Tomb Raider, the reason being AMD's A-Sync Compute makes use of the extra threads on Ryzen while nVidia don't.

With the 295 X2 the 3Ghz Ryzen 1700 is faster than the 4.2Ghz 7700K. again in Tomb Raider.
https://www.youtube.com/watch?v=nLRCK7RfbUg&feature=youtu.be&t=7m47s

Again with AMD GPU (RX 480) 1800X vs 6900K, 1800X is overall faster....

As it should be with a slightly higher IPC than Broadwell-E

nVidia's A-Sync while good in limiting API's like DX11 is its self limiting the more powerful higher thread count CPU's in DX12 and Vulkan.

Wy_ZHas_Y.jpg
couple new games will answer the question of how close Ryzen is to intel in gaming once and for all, starting with bethesda's Prey, since AMD partenered with them, we are like 2 weeks away from the release.
 
Nvidia are just killing PC gaming, and all us ******* mugs just letting em, we should all be ashamed :(

NVIDIA, THE CANCER OF PC GAMING!
Developers are making things worse for everyone tbh. The state of the gaming industry in terms of quality of games has noise dived - games are getting worse. Dumbed down AI, X-ray vision in every game, cut scenes where you just press buttons rather using skill....
 
couple new games will answer the question of how close Ryzen is to intel in gaming once and for all, starting with bethesda's Prey, since AMD partenered with them, we are like 2 weeks away from the release.

Very true :)

But, look at that chart, Ryzen and Broadwell-E are close to eachother but the 1800X is clearly faster overall.

In every benchmark outside of gaming the 1800X was at least as fast as the 6900K, even in Intel's favourite benchmark, Cinebench, the 1800X was between the 6900K and the 10 core 6950K.

It never made any sense to me that Ryzen could be as fast clock for clock core for core as Intel in everything but games, slower exclusively in games, really?

That slide just shows if you use an AMD GPU the performance with Ryzen in games matches its performance in productivity, of course when the CPU is being used to its full potential it would, wouldn't it?
 
i do agree with you half way on what you said, my reply to loadsamoney, was that, tailored to loadsamoney, with a bit of sarcasm.
and i said pretty much what you said on another post, that it's mostly due to hardware and OS migration, but some companies can abuse that, Nvdia for exemple is taking one major feature of DX12, async compute, and they just don't want to implement it, mostly because AMD gains an edge over them, and they do just what you said, they follow up with multiple generations of GPUs, even couple years after the feature was added to the API and some games and consoles already use it, this is really rare, to see a GPU manufacturer drag his feet behind in new API features, usualy they race to be the first to implement them, way before it starts being used.
in 2 year we will get 70-80% DX12 capable, with 20-30% async capable, making is a tough decision for devs to implement, instead of being automatic.

I have only read your post and some others where it's always nvidias fault, so i wanted to clear that a bit.

As for Async, it's just not true anymore that they didn't implement it or are blocking it. It has been the way at maxwell times, because maxwell can't do it and their communication was damn ****y, instead of just admiting that maxwell isn't able to do async. But Pascals Async Implementation is enough for every developer out there to use it. It's worse then amds and they don't gain as much as AMD, but that's also to some degree because their shaders are easier to utilize fully. They implemented as much as was possible into the maxwell architecture at that late time. AMD also didn't implement the DX12_1 Features into Polaris which would have been nice, but it seems there was no time for it. Pascal is gaining in 3dmark, in Sniper Elite: https://www.computerbase.de/2017-02/sniper-elite-4-benchmark/
and they even made gameworks with async and are promoting 2x the speed with async on 1080. 2x the speed sounds alright for a feature, which they don't implement:)

But enough of Nvidia here, we need more Vega in this topic.

Only a bit more than 1 month left till computex, where i would expect them to launch, but we still don't get any rumours.
 
Wait, are people still arguing about whether Pascal supports Async or not? If in doubt, please read this: https://www.reddit.com/r/nvidia/comments/50dqd5/demystifying_asynchronous_compute/
Should help clear things up (tl;dr: Pascal does async just fine).

Pretty sure AMD are only touting async because they don't support any other useful DX12 feature: https://en.wikipedia.org/wiki/Feature_levels_in_Direct3D

lol, Don't you just love how when people title a post stating its an 'understandable simplification of a subject' and then go on to write a 5,000 word academic Essay, he even used an academic word to describe his thread, "Demystifying" :D
anywho... All that nVidia reddit post does is bamboozle people...

What that article goes on to say in about 2,000 words is that nVidia use Pre-Emption and AMD use Simultaneous Hardware Command Queues.

Now this short video here actually is a layman's term of A-Synchronous Compute, in about 3 minutes it explains both parallel Command Queues and Pre-Emption.

 
Last edited:
lol, Don't you just love how when people title a post stating its an 'understandable simplification of a subject' and then go on to write a 5,000 word academic Essay, he even used an academic word to describe his thread, "Demystifying" :D
anywho... All that nVidia reddit post does is bamboozle people...

What that article goes on to say in about 2,000 words is that nVidia use Pre-Emption and AMD use Simultaneous Hardware Command Queues.

Now this short video here actually is a layman's term of A-Synchronous Compute, in about 3 minutes it explains both parallel Command Queues and Pre-Emption.

Would be great if you actually read the post before coming with incorrect claims, because that is not at all what it says:

"On Maxwell what would happen is Task A is assigned to 8 SMs such that execution time is 1.25ms and the FFU does not stall the SMs at all. Simple, right? However we now have 20% of our SMs going unused.
So we assign task B to those 2 SMs which will complete it in 1.5ms, in parallel with Task A's execution on the other 8 SMs.


Here is the problem; when Task A completes Task B will still have 0.25ms to go, and on Maxwell there's no way of reassigning those 8 SMs before Task B completes. Partitioning of resources is static(unchanging) and happens at the drawback boundary, controlled by the driver.

So if driver estimates the execution times of Tasks A and B incorrectly, the partitioning of execution units between them will lead to idle time as outlined above.

Pascal solves this problem with 'dynamic load balancing' ; the 8 SMs assigned to A can be reassigned to other tasks while Task B is still running; thus saturating the SMs and improving utilization.

For some reason many people have decided that Pascal uses preemption instead of async compute.

This makes no sense at all. Preemption is the act of telling a unit to halt execution of its running task. Preemption latency measures the time between the halt command being issued and the unit being ready for another assignment."

You couldn't be more wrong here.
 
Status
Not open for further replies.
Back
Top Bottom