The Official Nvidia GeForce 'Pascal' Thread - for general gossip and discussions

easyrider · 24 Mar 2016 at 23:29

JediFragger said:
Wouldn't surprise me tbh, Pascal has been speculated to be heavily based on Maxwell.

Drip...Drip...Drip...Feeeeeeeeeeeeeeeeeed

I'm hedging by bets and keeping my 980ti I think...

CAT-THE-FIFTH · 24 Mar 2016 at 23:30

LoadsaMoney said:
As do i, and if Pascal doesn't do a-sync like that above article suggests, i don't see them in trouble, as it will hardly get used, Nvidia have over 80% of the market, just about every game coming out will be a GameWorks, so no a-sync, a-sync won't make a jot of difference, its not the great white hope some are expecting, nothing will change, Nvidia are just too strong.

The problem is sadly the 55+ million consoles,by the time Pascal launches, which will be using it in all the big cross platform titles!

The fact that the next Total War uses it is a shock for me.

But AMD is more likely to delay and eff up the launch going by their track record.

andybird123 · 24 Mar 2016 at 23:31

easyrider said:
Drip...Drip...Drip...Feeeeeeeeeeeeeeeeeed

I'm hedging by bets and keeping my 980ti I think...

Hedging my bets is sold one, keeping one, though as Ive just found out EVE Valkyrie supports VR-SLI, i may end up regretting than in a week or twos time

LoadsaMoney · 24 Mar 2016 at 23:34

CAT-THE-FIFTH said:
The problem is sadly 55 million consoles which will be using it in all the big cross platform titles!

It'll only be used on the consoles, do you really think Nvidia are gona let game after game after game come onto the PC with a-sync if they can't do it, not a bloody chance, they had Dx10.1 ripped out, this'll be no different imo.

If they ever do a-sync, then the games will flood out, but not before.

JediFragger · 24 Mar 2016 at 23:38

LoadsaMoney said:
It'll only be used on the consoles, do you really think Nvidia are gona let game after game after game come onto the PC with a-sync if they can't do it, not a bloody chance, they had Dx10.1 ripped out, this'll be no different imo, besides, they'll all be their GameWorks as i said, so won't have it in anyway if they still don't do it.

Yup, nice guys come last (20%) in business

CAT-THE-FIFTH · 24 Mar 2016 at 23:43

LoadsaMoney said:
It'll only be used on the consoles, do you really think Nvidia are gona let game after game after game come onto the PC with a-sync if they can't do it, not a bloody chance, they had Dx10.1 ripped out, this'll be no different imo.

If they ever do a-sync, then the games will flood out, but not before.

It hasn't stopped a few games already having it though and even Nvidia won't be able to try and patch every game out there to stop it being used,as it will cost too much.

People overhype Gameworks too much - the effect is all in people's minds. No different than all the crap about PhysX a decade ago.

You either get a non-buggy or buggy Gameworks game - if you have buggy crap like the last Batman it runs terribad on all cards.

You only need to look at the Nvidia sponsored The Division - even with all the Gameworks nonsense and Nvidia game ready drivers,the AMD cards were doing really well especially under the GTX980TI.

Cards like the GTX960 were smashed to bits by a similar speed R9 380!!

ARK is the same on my GTX960 compared to what I saw with my mate's R9 280.

Look at the GTX970 against an R9 290 with a new lick of paint,ie,the R9 390.

I could not give a rats arse about Gameworks since it seems to make bug all difference now.

Look at all the DX12 games released so far - even TR and GoW run fine on them too,even though they had NV involvement for some of the effects.

You could see something similar even before Gameworks. Metro2033 was used by Nvidia to show how much better they were at tessellation - except the ATI cards ended up being better in many segments,which made me LOL.

Edit!!

Also remember Nvidia sucking at DX9 with the FX??

It didn't stop major titles like HL2 having it either,and the ATI cards running it better.

Nvidia sponsored more games back then too and still sold more than ATI too.

Plus Nvidia and DX10.1 didn't help them when the HD4000 series came along after the tepid HD2000 and HD3000 series,and no amount of PhysX/Gameworks 2008 really helped back then too.

bru · 25 Mar 2016 at 00:18

People keep saying that Maxwell and therefore Pascal cannot do Async, when it has been shown that it can, but only at cue depths of 32.

Remember this.

All these articles saying AMD will be ahead when using Async, well only if higher cue depths of Async are used.

I know the technology isn't the same nor are the results, but the discussion has loads of similarities to the tessellation/ over tessellation discussion's of recent years. if it is used to much it will be very biased to one side. Was tessellation over used, only in a handful of titles, much like Async, when titles arrive there will be a bit of a stink one way or the other and by the next refresh it will all be forgotten as both sets of architectures will be comparable again. ( except for whatever one side is ahead of the other in next time

)

drunkenmaster · 25 Mar 2016 at 01:03

Can Async be used for pointless effects at a performance cost, sure, can it be used for IQ improving effects, sure. Tessellation beyond a level can't improve IQ, having higher tessellation level hurts performance for absolutely no reason. Tessellation would be like a SINGLE effect that uses async compute and if that effect provides no IQ benefit, it can't remotely be compared to Async compute itself.

More to the point that graph doesn't show what you think it shows. The entire point of the graph is to show that running compute along with graphics doesn't increase the time taken on AMD. On Nvidia at EVERY cue depth it reduced performance. It's about compute taken X time, graphics taking Y time, and having compute + graphics not take X + Y + Z(overhead) but being X + Y or even less than X + Y.

Nvidia, compute + graphics = X + Y + Z... it takes a performance hit for switching.... that is not showing async working, the very point of async is to eliminate at the very least Z if not get less than X+Y by increasing efficiency of shader utilisation.

AMD is showing compute + graphics < X+Y.... which is the entire point of the graph. NO one forgot the graph, it's just that you have absolutely no understanding of what that graph is showing at all.

Orangey · 25 Mar 2016 at 01:52

I made a post detailing how NV might rush to market and then some hours later that Italian bloke steals it as "news"! :mad:

bru · 25 Mar 2016 at 02:02

bru said:
I know the technology isn't the same nor are the results, but the discussion has loads of similarities to the tessellation/ over tessellation discussion's of recent years.

As I just said I know the technology isn't the same and in GPU term the two techs are completely different.
But the discussion this forum is having has some similarities to the whole tessellation/over tessellation discussion that happen a while back, just this time it is the other way around with AMD having the tech performance advantage, if lots of Async is used.

Combat Fighter · 25 Mar 2016 at 09:48

andybird123 said:
If mid-range Pascal beats a 980ti, then obviously they will want to shift as many 980ti's as they can before hand

And I'll still be keeping my 980Ti.

I wont be falling for a mid range card marketed as high end. Once bitten twice shy. I only want the next Ti variant.

drunkenmaster · 25 Mar 2016 at 10:26

bru said:
As I just said I know the technology isn't the same and in GPU term the two techs are completely different.
But the discussion this forum is having has some similarities to the whole tessellation/over tessellation discussion that happen a while back, just this time it is the other way around with AMD having the tech performance advantage, if lots of Async is used.

Async can't be overdone. You can't have lots of async, you can have lots of effects being done, lots of post processing effects, these effects can be implemented with compute functions and async compute can more effectively utilise the gpu. There is no more or less of it, as the graph shows Nvidia has a latency and performance hit at every queue depth level... largely because it doesn't appear to support async compute at all. Async compute is simply describing the ability to switch between tasks more efficiently.

You could equate it to hyper-threading, a way to maximise usage of all available shaders by pushing more requests through the various CUs. Async compute is like adding HT logic into a CPU core to allow these requests to be pushed through without hurting the throughput of the initial thread.

Can you have 'more HT' or 'less HT'... no, it's just a hardware capability. If you have HT you can better utilise your cores if you have parallel code. Everything pushed to the GPU is parallel code.

Async compute can not be overused just like HT can't. You can run 8 threads on a 4 core CPU with HT on or off, or you can run 30 threads on a 4 core with HT on or off. More threads running slows down the processing overall(if you constantly switch between them) but it's got nothing to do with HT being enabled or not. Tessellation would equate to one of those threads, absolutely 100% completely unrelated to the concept of HT being enabled or not.

andybird123 · 25 Mar 2016 at 10:31

If thats all entirely 100% accurate, why did oxide disable async for nvidia GPU's?
You are suggesting that enabling async compute only represents a performance bonus for hardware that supports it. You are saying that enabling it and increasing the queue depth should have no penalty on nvidia hardware with a smaller queue capability. Yet the evidence completely obliterates that argument.

Oxide didnt disable the effects, they found that disabling async actually improved performance using the same settings, same number of effects.

Mtom · 25 Mar 2016 at 10:40

andybird123 said:
If thats all entirely 100% accurate, why did oxide disable async for nvidia GPU's?
You are suggesting that enabling async compute only represents a performance bonus for hardware that supports it. You are saying that enabling it and increasing the queue depth should have no penalty on nvidia hardware with a smaller queue capability. Yet the evidence completely obliterates that argument.

Oxide didnt disable the effects, they found that disabling async actually improved performance using the same settings, same number of effects.

They haven't disabled it, they run a single engine codepath, while amd runs a multi engine codepath as far as i know.
As i see it:
NV slows down with async code is because technically Maxwell can do async compute, but it needs a context switch for it, and switching between compute and graphics workload takes time. While the GCN architecture is capable of stateless compute, so it can do the task simultaneously.
DM is right in that you cannot do too much async, as you can do with tessellation. It is a basic DX12 feature. Either you use it or not.

Image this (lets stay at the tessellation example). You have tessellation in a game, and have two cards, one is natively support tessellation, and one that natively don't, but can do tessellation with a workaround. Naturally the one which needs a workaround will be slower with tessellation on. It is not slower because the devs used too much tessellation where it is not needed...it is slower because it was not prepared to do this type of work.

drunkenmaster · 25 Mar 2016 at 10:50

andybird123 said:
If thats all entirely 100% accurate, why did oxide disable async for nvidia GPU's?
You are suggesting that enabling async compute only represents a performance bonus for hardware that supports it. You are saying that enabling it and increasing the queue depth should have no penalty on nvidia hardware with a smaller queue capability. Yet the evidence completely obliterates that argument.

Oxide didnt disable the effects, they found that disabling async actually improved performance using the same settings, same number of effects.

If over 60ms you work on a single problem 100% of the time, woo, if a context switch costs you 5ms where the gpu goes idle and you have one context switch then you would only get 55ms of actual gpu usage within 60ms. If you context switch 10 times... you get only 5ms of actual gpu usage. Bru suggested that below a certain queue depth that 5ms was lower and it only got higher at high queue depth. His own graph proves that to be not true and is the fundamental problem. The entire concept of async compute is to make that context switch small enough that switching context improves overall utilisation, not reduces it.

I didn't at all suggest that more context switching doesn't have an aggregate increase in performance loss for Nvidia, I merely said that it has a problem at all queue depths, not that it doesn't get worse at higher queue depths. If a context switch costs you performance, then the more you do it the more performance you lose.

There is to date absolutely no real evidence Nvidia properly supports async compute. They have had access to daily builds of the oxide engine games, they have promised a driver for like a year, they still don't appear to have one. Can you imagine Intel providing a driver for HT a year after a year launches, it's a basic hardware feature, fundamental to how it operates. Every time Nvidia says they supports it and will bring a driver, the driver is delayed, another benchmark shows they suck using it or a game disables it because it's not working well on Nvidia hardware.

Every single piece of evidence points to Nvidia trying to emulate it in software and failing repeatedly, which shouldn't be surprising.

andybird123 · 25 Mar 2016 at 10:55

Either its like hyperthteading, or its not. You cant really write 4 paragraphs comparing it to HT saying it doesnt cause a performance hit and then just say, oh well actually it kinda does but its still fine for people to optimise for AMD hardware and not nvidia.

drunkenmaster · 25 Mar 2016 at 10:58

andybird123 said:
Either its like hyperthteading, or its not. You cant really write 4 paragraphs comparing it to HT saying it doesnt cause a performance hit and then just say, oh well actually it kinda does but its still fine for people to optimise for AMD hardware and not nvidia.

Where did I say any of that?

I didn't say it doesn't have a performance hit but, ah, it kinda does, nor anything like it. Everything suggests that AMD actually have async compute, and as such can push 8 threads through the 'gpu' without a significant performance hit and thus overall increase performance. It also suggests that Nvidia do NOT have async compute at all, and are as such trying to jam 8 threads through a 4 thread 'gpu', and every time the threads switch context they have a significant performance hit thus overall decreasing performance.

The entire point is that in this comparison, AMD definitely has HT enabled, Nvidia do not appear to.

Mauller · 25 Mar 2016 at 11:29

drunkenmaster said:
Where did I say any of that?

I didn't say it doesn't have a performance hit but, ah, it kinda does, nor anything like it. Everything suggests that AMD actually have async compute, and as such can push 8 threads through the 'gpu' without a significant performance hit and thus overall increase performance. It also suggests that Nvidia do NOT have async compute at all, and are as such trying to jam 8 threads through a 4 thread 'gpu', and every time the threads switch context they have a significant performance hit thus overall decreasing performance.

The entire point is that in this comparison, AMD definitely has HT enabled, Nvidia do not appear to.

I believe the context switching issues are when it switches between the Graphics queue and Compute queues. Maxwell can perform Async compute in a similar manner to GCN, but only when in pure compute mode. It is the context switching issues when jumping between the graphics and compute queues that it has issues with.

It has been shown that Maxwell can perform lighter loads of async compute, but only when the code is made to conform with this context switching to prevent stalls. But code written with software side scheduling to get around the context switching can cause performance issues on GCN. And in general, GCN is far happier when you just throw compute at it with the hardware self scheduling.

easyrider · 25 Mar 2016 at 14:27

Combat Fighter said:
And I'll still be keeping my 980Ti.

I wont be falling for a mid range card marketed as high end. Once bitten twice shy. I only want the next Ti variant.

Indeed...

Buying a Titan (I didn't ) and then the 980 ti coming out for half the price was ugly for consumers...

When pascal comes ,and if it's at £250 , and it's on par with 980 ti performance,I'm not buying into it....

This won't happen though

andybird123 · 25 Mar 2016 at 14:52

easyrider said:
Indeed...

Buying a Titan (I didn't ) and then the 980 ti coming out for half the price was ugly for consumers...

When pascal comes ,and if it's at £250 , and it's on par with 980 ti performance,I'm not buying into it....

This won't happen though

People who buy Titans know what the score is, even when the originals came out we all knew the 780 would be out at some point, even if it was a little sooner than expected.

Competitor rules

** The Official Nvidia GeForce 'Pascal' Thread - for general gossip and discussions **

The Official Nvidia GeForce 'Pascal' Thread - for general gossip and discussions