Doom to use async compute

Panos · 18 Mar 2016 at 14:51

LoadsaMoney said:
They'd be stupid to use it, only about 20 people who are able to use it on the PC.

Id expect it to be stripped out of just about every port if Nvidia havn't got it on Pascal.

Possibly GP104 isnt going to have async because is a die shrink of current.

We should be looking forward and utilise the tech available.
However i remember many here boasting thay TX and 980Ti are fully DX12 compatible and AMD 290/290X arent because the box never had this printed on

Here we are. 290X supporting DX12 much better than NV and you ask to cripple the AMD cards hardware abilities because NV cannot keep up?

Let nvidia burn........

layte · 18 Mar 2016 at 14:54

Panos said:
Possibly GP104 isnt going to have async because is a die shrink of current.

We should be looking forward and utilise the tech available.
However i remember many here boasting thay TX and 980Ti are fully DX12 compatible and AMD 290/290X arent because the box never had this printed on

Here we are. 290X supporting DX12 much better than NV and you ask to cripple the AMD cards hardware abilities because NV cannot keep up?

Let nvidia burn........

NV do actually support more features in hardware. Whether they are the right features is for another thread...

CAT-THE-FIFTH · 18 Mar 2016 at 14:54

Rroff said:
In GPU space AMD and ATI before them always developed very forward leaning products but they've also struggled immensely to get the balance right and capitalise on it - often jumping the gun to their own detriment. This time around it is something that they could really work to their advantage as the architecture has great synergy with the nature of nodes sub 20nm planar and DX12 and similar APIs.

Regarding Kepler though - outside of maybe the witcher 3 which I've never played - I'm not really seeing Kepler falling behind (that isn't to say it couldn't be running better than it is if they were spending more time on optimising it but that isn't something I can qualify) - a lot of benchmarks if you dig into it you'll find they are either using old Kepler numbers and not actually retested older cards when they say they have, using numbers given to them by AMD or nVidia or testing Kepler cards limited to the reference on paper clocks - the numbers you see are often anything upto 20% (though normally more like 10-15%) lower than what users will be experiencing in the real world. (Before you add in any end user overclocking).

Kepler is falling behind though since everyone only looks at the GTX780 series cards and not any of the others ones. My GTX660 is now consistently much slower than an HD7870 when they were quite closely matched at launch,and the rest of the range really didn't overclock anywhere as well in percentage terms as the GTX780. One of the reasons the review sites had to change their testing methodology(and I was talking with someone who knew a reviewer) was because Boost V1 tended to have issues over longer time periods,ie,throttling,which a few German and French websites discovered and it made the results look higher than they did.

Then you add certain review sites,ie, like pcgameshardware which use pre-overclocked cards in their benchmarks and even a highly overclocked GTX770 is not doing as well. Even a pre-overclocked GTX770 running at 1.2GHZ clockspeed is loosing to stock clockspeed R9 280X in The Division:

http://www.pcgameshardware.de/The-Division-Spiel-37399/Specials/PC-Benchmarks-1188384/

The R9 280 and R9 280X had very minimal boost,ie,around 50MHZ and these cards could hit 1.2GHZ which meant they could easily bypass an overclocked GTX770 in the game.

An overclocked GTX670 is slower than an overclocked HD7870 in the game too.

D.P. · 18 Mar 2016 at 15:14

Panos said:
Possibly GP104 isnt going to have async because is a die shrink of current.

Wrong on 2 counts:
1) GP104 isn't a die shrink, its a modified architecture.
2) Maxwell supports async compute anyway.

Async compute on maxwell just works different to GCN and works best with larger compute tasks. where it actually ends up very fast. It has a higher switching time so doesn't perform well with numerous small tasks.

Marine-RX179 · 18 Mar 2016 at 15:15

CAT-THE-FIFTH said:
Kepler is falling behind though since everyone only looks at the GTX780 series cards and not any of the others ones. My GTX660 is now consistently much slower than an HD7870 when they were quite closely matched at launch,and the rest of the range really didn't overclock anywhere as well in percentage terms as the GTX780. One of the reasons the review sites had to change their testing methodology(and I was talking with someone who knew a reviewer) was because Boost V1 tended to have issues over longer time periods,ie,throttling,which a few German and French websites discovered and it made the results look higher than they did.

Then you add certain review sites,ie, like pcgameshardware which use pre-overclocked cards in their benchmarks and even a highly overclocked GTX770 is not doing as well. Even a pre-overclocked GTX770 running at 1.2GHZ clockspeed is loosing to stock clockspeed R9 280X in The Division:

http://www.pcgameshardware.de/The-Division-Spiel-37399/Specials/PC-Benchmarks-1188384/

The R9 280 and R9 280X had very minimal boost,ie,around 50MHZ and these cards could hit 1.2GHZ which meant they could easily bypass an overclocked GTX770 in the game.

An overclocked GTX670 is slower than an overclocked HD7870 in the game too.

Unfortunately that's how it is.

Nvidia is more interested in winning drag race (benchmarks) than Le Mans 24Hours (gaming performance over time)

Rroff · 18 Mar 2016 at 15:24

CAT-THE-FIFTH said:
One of the reasons the review sites had to change their testing methodology(and I was talking with someone who knew a reviewer) was because Boost V1 tended to have issues over longer time periods,ie,throttling,which a few German and French websites discovered and it made the results look higher than they did.

You might have hit on something there that I was overlooking - I do remember the drama with Boost V1 - it could be (some) sites are still testing Kepler Boost 2.0 cards (which in 99% of cases can sustain their boosts indefinitely) in the same way they do Kepler Boost 1.0 cards - hence why the results generally match up with the on paper boost clock instead of the actual boost clocks that the cards will be running in the real world.

Infact the other day I had someone argue blue in the face that Boost 2.0 came out with the first Maxwell cards and they were a professional tech site writer lol.

D.P. · 18 Mar 2016 at 15:26

N19h7m4r3 said:
The fact that Maxwell had any proper compute tech sliced out of it to focus on pure DX11 gaming performance is probably going to be the main reason it falls behind. Even more so than Kepler did.

Pascal is a full GPU architecture with plenty of compute shoved back in, and with DX12 and the way it likes to handle things I'm sure Maxwell is going to look might bad once even more games switch to DX12 and Vulkan.

They were good cards, for the short term it seems.

Fiji also lost a lot of compute:
Fiji went form 1/2 DP down to 1/16th.
Maxwell went from 1/24 to 1/32 Dp for the consumer part (the Tesla gpus had 1/3 DP)

AlamoX · 18 Mar 2016 at 15:32

r7slayer said:
Is the hardware in the consoles have Async compute capabilities? If they do i see the appeal to developers.

i think console games use async by default, if they are pushing a AAA title

Lokken86 · 18 Mar 2016 at 15:36

CAT-THE-FIFTH said:
Kepler is falling behind though since everyone only looks at the GTX780 series cards and not any of the others ones. My GTX660 is now consistently much slower than an HD7870 when they were quite closely matched at launch,and the rest of the range really didn't overclock anywhere as well in percentage terms as the GTX780. One of the reasons the review sites had to change their testing methodology(and I was talking with someone who knew a reviewer) was because Boost V1 tended to have issues over longer time periods,ie,throttling,which a few German and French websites discovered and it made the results look higher than they did.

Then you add certain review sites,ie, like pcgameshardware which use pre-overclocked cards in their benchmarks and even a highly overclocked GTX770 is not doing as well. Even a pre-overclocked GTX770 running at 1.2GHZ clockspeed is loosing to stock clockspeed R9 280X in The Division:

http://www.pcgameshardware.de/The-Division-Spiel-37399/Specials/PC-Benchmarks-1188384/

The R9 280 and R9 280X had very minimal boost,ie,around 50MHZ and these cards could hit 1.2GHZ which meant they could easily bypass an overclocked GTX770 in the game.

An overclocked GTX670 is slower than an overclocked HD7870 in the game too.

This one has a much bigger gab between the 770 and 280x.

http://www.techspot.com/review/1148-tom-clancys-the-division-benchmarks/page2.html

Maybe they used the 16.3s but it doesn't actually say.

TheRealDeal · 18 Mar 2016 at 15:56

Rroff said:
Regarding Kepler though - outside of maybe the witcher 3 which I've never played - I'm not really seeing Kepler falling behind (that isn't to say it couldn't be running better than it is if they were spending more time on optimising it but that isn't something I can qualify) - a lot of benchmarks if you dig into it you'll find they are either using old Kepler numbers and not actually retested older cards when they say they have, using numbers given to them by AMD or nVidia or testing Kepler cards limited to the reference on paper clocks - the numbers you see are often anything upto 20% (though normally more like 10-15%) lower than what users will be experiencing in the real world. (Before you add in any end user overclocking).

What's happening here in the division. Kepler performance is dire at best compared to it's release.

http://www.techspot.com/review/1148-tom-clancys-the-division-benchmarks/page3.html

Kepler architecture just doesn't seem to cut it in this day and age due to what ever reason. As you said it looks like GCN was a very future looking architecture and it's proving it by still competing well now. Here a 7970ghz card is above the gtx780.

Mtom · 18 Mar 2016 at 15:57

muziqaz said:
Matt, I heard that Hitman uses async, yet AMD cards see nearly zero performance uplift going from dx11 to dx12 what's going on there?

Thats rubbish. My 390 gaibs about 10 fps (15%)

pete910 · 18 Mar 2016 at 18:13

I thought ID had gone/going the OpenGL/Vulkan route :confused:

muziqaz · 18 Mar 2016 at 18:30

Mtom said:
Thats rubbish. My 390 gaibs about 10 fps (15%)

1080p
I see 7fps, which is 9%
fury x only 5%(4fps)

1440p
I see 4fps on your 390, which is 7%
fury x gains 2fps which is 3%

4k
fury x and the rest of the AMD cards gain absolutely nothing

And I remember( I might be wrong) that AMD said Hitman will use async shaders quite a lot more than Ashes.

http://www.computerbase.de/2016-03/...2/2/#diagramm-hitman-mit-directx-12-3840-2160

pmc25 · 18 Mar 2016 at 21:03

I'm sure it'll be a huge boon for XBOX, PS4 ... but otherwise it seems like a hollow victory. The game is going to be locked to 60FPS yet again.

This probably only matters for low end AMD cards, or high end ones at 4K ... and this year I'm hoping to be able to run 4K or 3440x1440 at well over 60FPS on titles like this.

NVIDIA obviously won't benefit at all as their hardware can only emulate the function.

Were the game's frame rate not locked, I'd be very happy about this.

Ian Evey · 18 Mar 2016 at 22:49

TheRealDeal said:
What's happening here in the division. Kepler performance is dire at best compared to it's release.

http://www.techspot.com/review/1148-tom-clancys-the-division-benchmarks/page3.html

Kepler architecture just doesn't seem to cut it in this day and age due to what ever reason. As you said it looks like GCN was a very future looking architecture and it's proving it by still competing well now. Here a 7970ghz card is above the gtx780.

Don't you worry yourself about it, Ultra and high settings on my 780 holding 60 fps with drops to 52 occasionally.

Runs like a dream along with fallout 4.

tommybhoy · 18 Mar 2016 at 22:58

Ian Evey said:
Don't you worry yourself about it

No one has since Maxwell launched.

Mtom · 18 Mar 2016 at 23:08

muziqaz said:
1080p
I see 7fps, which is 9%
fury x only 5%(4fps)

1440p
I see 4fps on your 390, which is 7%
fury x gains 2fps which is 3%

4k
fury x and the rest of the AMD cards gain absolutely nothing

And I remember( I might be wrong) that AMD said Hitman will use async shaders quite a lot more than Ashes.

http://www.computerbase.de/2016-03/...2/2/#diagramm-hitman-mit-directx-12-3840-2160

I don't look at benchmarks but run it on my on computer, and get 76-77FPS on DX11, and 87FPS on DX12 on max settings.

Mauller · 19 Mar 2016 at 10:23

D.P. said:
Fiji also lost a lot of compute:
Fiji went form 1/2 DP down to 1/16th.
Maxwell went from 1/24 to 1/32 Dp for the consumer part (the Tesla gpus had 1/3 DP)

Fiji never lost any compute, it is still capable of 1/2 DP in hardware due to how GCN performs DP. They just limited the DP in software like the desktop Hawaii parts are limited.

NVIDIA literally removed silicon to reduce the DP of Maxwell, which is why Maxwell never replaced Kepler in the tesla parts and why they don't list the DP specs of their maxwell quadro parts on their site.

Also, due to memory constraints we will only see Hawaii replaced in fire pro parts once Vega drops with HBM. No fire pro Fiji due to memory constraints.

D.P. · 19 Mar 2016 at 12:03

Mauller said:
Fiji never lost any compute, it is still capable of 1/2 DP in hardware due to how GCN performs DP. They just limited the DP in software like the desktop Hawaii parts are limited.

NVIDIA literally removed silicon to reduce the DP of Maxwell, which is why Maxwell never replaced Kepler in the tesla parts and why they don't list the DP specs of their maxwell quadro parts on their site.

Also, due to memory constraints we will only see Hawaii replaced in fire pro parts once Vega drops with HBM. No fire pro Fiji due to memory constraints.

You need to provide some Evidence for that as everything I have read suggest AMD stripped transistors in Fiji reducing the DP performance. Why on earth would they limit it in software? Nvidia did with keep,r for specific business reasons to separate Teslas hardware sales. AMD doesn't have that reason, there is no compute orientated Fiji.

Mauller · 19 Mar 2016 at 13:34

D.P. said:
You need to provide some Evidence for that as everything I have read suggest AMD stripped transistors in Fiji reducing the DP performance. Why on earth would they limit it in software? Nvidia did with keep,r for specific business reasons to separate Teslas hardware sales. AMD doesn't have that reason, there is no compute orientated Fiji.

Read some white papers on how GCN performs DP, also desktop Hawaii is limited to 1/8 DP but is unlocked to use 1/2 in the fire pro parts.