Concerned by Nvidia's current state of DX12 and Vulkan performance

humbug · 7 Oct 2016 at 17:32

Gregster said:
No it doesn't. I ran that benchmark time and again on both NVidia and AMD and I thought there was something up with it when running on AMD, as it was so far behind, I ran it several times with varying tweaks.

You are also not using any common sense and ignoring a CPU bottleneck to try and justify your lack of knowledge. I can't explain it any simpler than I have done, so if you can't grasp it, not much I can add. You can lead a horse to water but you can't make it drink!

Kaap gets it and understands what is what.

Your obsession with this is too obvious Greg, its ##### insane.

Kaap's post was nothing at all to do with AMD or Nvidia. nor was my response to him.

Your so obvious Greg watching you sometimes its almost comedy, absolutely surreal, you will try to shoehorn an AMD vs Nvidia argument into literally anything at all, with such desperation you can't even see what the conversation your targeting for such insane purpose is actually about.

Put simply, You're mad.

Griffildur · 7 Oct 2016 at 17:39

love this forum, sometimes you just don't need any other entertainment

Gregster · 7 Oct 2016 at 17:42

humbug said:
Your obsession with this is too obvious Greg, its ##### insane.

Kaap's post was nothing at all to do with AMD or Nvidia. nor was my response to him.

Your so obvious Greg watching you sometimes its almost comedy, absolutely surreal, you will try to shoehorn an AMD vs Nvidia argument into literally anything at all, with such desperation you can't even see what the conversation your targeting for such insane purpose is actually about.

Put simply, You're mad.

Why can't you grasp what a bottleneck is? This has nothing to do with AMD or NVidia but both together and regardless of anything, a GPU working well under its capacity is going to end up as slow as a much slower GPU when the CPU is maxed out. The Fury X has huge memory bandwidth, so I assume this is why it shows a nice lead when a crappy CPU is used.

Put a Ferrari and a Fiat 500 on an Autobahn with no traffic and the Ferrari will leave the Fiat 500 standing but chuck in tons of slow traffic and no room to pass, they will both go the same speed, this is a bottleneck situation. Not sure how else I can explain it to you but calling me mad because you don't understand it is kind of ironic.

humbug · 7 Oct 2016 at 17:48

CPU performance and its relation to GPU bottlenecking has nothing to do with GPU memory bandwidth and you know it, Greg ^^^^ anything.... anything but reality will do. eh?

Actually, explain your reasoning, this.....

Griffildur said:
love this forum, sometimes you just don't need any other entertainment

Should be real fun.

humbug · 7 Oct 2016 at 18:10

Gregster said:
Why can't you grasp what a bottleneck is? This has nothing to do with AMD or NVidia but both together and regardless of anything, a GPU working well under its capacity is going to end up as slow as a much slower GPU when the CPU is maxed out. The Fury X has huge memory bandwidth, so I assume this is why it shows a nice lead when a crappy CPU is used.

Put a Ferrari and a Fiat 500 on an Autobahn with no traffic and the Ferrari will leave the Fiat 500 standing but chuck in tons of slow traffic and no room to pass, they will both go the same speed, this is a bottleneck situation. Not sure how else I can explain it to you but calling me mad because you don't understand it is kind of ironic.

In this case the Fiat 500 is going faster than the Ferrari, a lot faster.
The Fiat 500 is not effected by traffic, it is able to cut through that traffic and continue on unimpeded.

The Traffic here BTW is how each GPU 'the car' is able to handle the communication lines between the CPU 'Destination' and the GPU 'Car'

The Fiat 500, here, is not slowed by traffic like the Ferrari is, it will simply continue between, or over that traffic.

Gregster · 7 Oct 2016 at 18:12

humbug said:
CPU performance and its relation to GPU bottlenecking has nothing to do with GPU memory bandwidth and you know it, Greg ^^^^ anything.... anything but reality will do. eh?

Actually, explain your reasoning, this.....

Right, play at 480P and you will see a CPU heavy game bottleneck on pretty much every single system going assuming you are running a semi decent GPU, in fact, the GPU will sit mostly redundant whilst the CPU is sitting at 99% but the memory bandwidth of said GPUs will play a part in helping keep frames up, so a 256Bit bus will be slower than a 384Bit bus, which again will be slower than a 512Bit bus and if you need proof of that, start upping the resolution and as long as the VRAM isn't breached, you will see the faster VRAM cards showing their strength. An example of that is a Fury X vs a 1080 and as the res gets to say 4K, the Fury X catches up the difference over GTX 1080 as an example. This is why you see closer frame differences at 4K in a Fury X Vs a GTX 1080 as opposed to a bigger difference at 1080P in favour of the GTX 1080.

humbug · 7 Oct 2016 at 18:29

Gregster said:
Right, play at 480P and you will see a CPU heavy game bottleneck on pretty much every single system going assuming you are running a semi decent GPU, in fact, the GPU will sit mostly redundant whilst the CPU is sitting at 99% but the memory bandwidth of said GPUs will play a part in helping keep frames up, so a 256Bit bus will be slower than a 384Bit bus, which again will be slower than a 512Bit bus and if you need proof of that, start upping the resolution and as long as the VRAM isn't breached, you will see the faster VRAM cards showing their strength. An example of that is a Fury X vs a 1080 and as the res gets to say 4K, the Fury X catches up the difference over GTX 1080 as an example. This is why you see closer frame differences at 4K in a Fury X Vs a GTX 1080 as opposed to a bigger difference at 1080P in favour of the GTX 1080.

No... you have it all mixed up... give me a few minutes to write out an explanation. already did once but FireFox decided to lose it because it loves keyboad shortcuts and if you hit any 2 keys at once by mistake its invarably some form of keyboard shortcut that then loses you everything you wrote by refreshing or one of a million ways to back space.......

Gregster · 7 Oct 2016 at 18:34

Look, I give up and going out for beer.

LoadsaMoney · 7 Oct 2016 at 18:42

humbug · 7 Oct 2016 at 18:43

Its like this.

The GPU renders objects on screen, it doesn't know where to place those objects, If you have an IA, a Tree... anything that you see on screen is there because the CPU ran the calculations for the GPU to put it there.

So, it may go something like this..

GPU: I'm ready to render this decal, but where does it go?
CPU: Let me crunch the numbers, ah.... yes. the player is here, he fired a bullet in this direction and it hit that wall, so the Decal needs to go to these coordinates, Y 37688563 - X 7657365876 - Z 5689567243.

GPU: thank you, now i will take this texture from this directory, overlay it on the mesh in Vram and then display it at those given coordinates.

Now...

If you have 2 CPU's crunching numbers and communicating over one phone-line with the GPU then you have a certain level of speed at which you can do that before you as the GPU need to slow down in order for the CPU to catch up, or for those communication lines to be free to use again.

If you have 8 phone-lines communicating with 6 CPU's then you can push through a lot more communication before the CPU's get too slow or communication lines are overwhelmed.

The Former is Nvidia, the latter AMD.

maonayze · 7 Oct 2016 at 20:00

Gregster said:
Look, I give up and going out for beer.

Too late Humbug, Greg's bottle(neck)d it.

Rroff · 7 Oct 2016 at 20:13

humbug said:
Its like this.

The GPU renders objects on screen, it doesn't know where to place those objects, If you have an IA, a Tree... anything that you see on screen is there because the CPU ran the calculations for the GPU to put it there.

So, it may go something like this..

GPU: I'm ready to render this decal, but where does it go?
CPU: Let me crunch the numbers, ah.... yes. the player is here, he fired a bullet in this direction and it hit that wall, so the Decal needs to go to these coordinates, Y 37688563 - X 7657365876 - Z 5689567243.

GPU: thank you, now i will take this texture from this directory, overlay it on the mesh in Vram and then display it at those given coordinates.

Now...

If you have 2 CPU's crunching numbers and communicating over one phone-line with the GPU then you have a certain level of speed at which you can do that before you as the GPU need to slow down in order for the CPU to catch up, or for those communication lines to be free to use again.

If you have 8 phone-lines communicating with 6 CPU's then you can push through a lot more communication before the CPU's get too slow or communication lines are overwhelmed.

The Former is Nvidia, the latter AMD.

The reasons why you are seeing nVidia performance fall off against the FX in situations with an under powered CPU is quite complex - for one i.e. tessellation and some aspects of triangle setup, etc. nVidia repurposes its reprogrammable shader architecture to carry out those tasks rather than have an area of dedicated hardware to those tasks - so with AMD they can just despatch the task and pretty much forget about it CPU wise until its done while nVidia will have a fair amount of back and forth from the CPU as the various GPU threads complete, etc.

Then you have all the DX12 stuff where in some areas nVidia is using software scheduling so again the GPU and CPU are communicating back and forth more often versus where AMD has handed off a task list to the GPU and much less back and forth.

That is before you get into anything to do with the number of concurrent threads/pipelines for DX12/Vulkan functionality.

LoadsaMoney · 7 Oct 2016 at 20:24

maonayze said:
Too late Humbug, Greg's bottle(neck)d it.

Final8y · 7 Oct 2016 at 20:27

Kaapstad said:
Great example of GPUs being bottlenecked in this game.

970 SLI beating 980 SLI at low resolutions.

The bottlenecking of multi GPU does not apply to single GPUS.
With multi GPU the same information has to be sent multiple times and with quadfire that's 4 times minimum.

If a faster single GPU falls below a Slower single GPU then thats because the faster GPU has more of a CPU overhead.
We normally see CPU bottlenecked with single GPUs where the FPS are pretty much all the same at a given resolution and setting between the faster and slower cards.

A faster GPU should not require more CPU to equal the same FPS as a slower GPU at the same settings.

Gregster · 8 Oct 2016 at 09:53

Final8y said:
The bottlenecking of multi GPU does not apply to single GPUS.
With multi GPU the same information has to be sent multiple times and with quadfire that's 4 times minimum.

If a faster single GPU falls below a Slower single GPU then thats because the faster GPU has more of a CPU overhead.
We normally see CPU bottlenecked with single GPUs where the FPS are pretty much all the same at a given resolution and setting between the faster and slower cards.

A faster GPU should not require more CPU to equal the same FPS as a slower GPU at the same settings.

What a reallllllly bizarre thing to say. How on earth does the bottlenecking of multi GPUs not apply to single GPUs? They will bottleneck like a single GPU will if the CPU isn't up to the job.

humbug · 8 Oct 2016 at 12:32

Rroff said:
The reasons why you are seeing nVidia performance fall off against the FX in situations with an under powered CPU is quite complex - for one i.e. tessellation and some aspects of triangle setup, etc. nVidia repurposes its reprogrammable shader architecture to carry out those tasks rather than have an area of dedicated hardware to those tasks - so with AMD they can just despatch the task and pretty much forget about it CPU wise until its done while nVidia will have a fair amount of back and forth from the CPU as the various GPU threads complete, etc.

Then you have all the DX12 stuff where in some areas nVidia is using software scheduling so again the GPU and CPU are communicating back and forth more often versus where AMD has handed off a task list to the GPU and much less back and forth.

That is before you get into anything to do with the number of concurrent threads/pipelines for DX12/Vulkan functionality.

Since Tonga (R9 285 - GCN 1.2) AMD employ similar systems.

Rroff · 8 Oct 2016 at 14:56

humbug said:
Since Tonga (R9 285 - GCN 1.2) AMD employ similar systems.

Not like nVidia does - hence why with Polaris they've taken extra steps to boost throughput of some aspects of triangle rendering, etc.

JediFragger · 8 Oct 2016 at 15:15

For me Pascal is still mainly a DX11 card with some DX12 bits and bobs. Fine with me as DX11 titles will be around for a while yet!

Volta is where it'll be with DX12 and nVidia :cool:

TheRealDeal · 8 Oct 2016 at 16:14

JediFragger said:
For me Pascal is still mainly a DX11 card with some DX12 bits and bobs. Fine with me as DX11 titles will be around for a while yet!

Volta is where it'll be with DX12 and nVidia

Yep. For people like yourself that switch cards most gens Nvidia are fine but for people like me that keep cards around 2-3 years AMD make a lot more sense due to there forward looking architecture and the added bonus of more Vram v Nvidia's offerings. Add to that they are usually cheaper it's a no brainer for me.

DarrenM343 · 8 Oct 2016 at 16:42

TheRealDeal said:
Yep. For people like yourself that switch cards most gens Nvidia are fine but for people like me that keep cards around 2-3 years AMD make a lot more sense due to there forward looking architecture and the added bonus of more Vram v Nvidia's offerings. Add to that they are usually cheaper it's a no brainer for me.

Not a bad plan but 2-3 years is a long time to keep a mid-range card.
I wouldn't be surprised if Volta actually blows away AMD in DX12, but to that we'll have to wait, similar to DX11 if I remember correctly although that time AMD were first out with DX11 cards but when it mattered, when DX11 was being more utilised, Nvidia came along and did it properly. I do think we'll see Volta in 2017 when more games are built from the ground up for DX12 but also when devs and hardware co's have learned how best it's utilised. 2018 is too far away even if Pascal has the grunt to make up for it's lack of other things

Even when games are built from scratch for DX12, DX12 games will still get better over time as developers get more experienced with it. GPU's will likely get better too at running DX12 games, with both AMD and Nvidia optimising their hardware for it and how the developers are using it.