Will we see low-end gpu's with DDR4 memory instead of DDR3?

Mauller · 16 Aug 2015 at 08:37

Kaapstad said:
Remember with graphics cards they produce one frame at a time and then move on to the next. What that means is each frame produced at 1080p on a Fury X will not come close to using all the bus width available as there is simply not enough data in each frame to do it.

As to the Thief bench I think you have just dug a hole for yourself.

Let me put if this way then, which method is faster at transferring 100mb from 8 memory columns within a chip. So 800mb in total.

A 4gb line, 2x 2gb lines, 4x 1gb lines or 8x 0.5gb lines.

Because all of them are equal, even if you think the single 4gb line would be best. Once the data goes from ram data line to gpu core, parts of the data are routed to core cache or elsewhere.

And the mantle thief comment is perfectly valid as it refers back to the AMD overhead issue. You can watch gregs thief video if you like to see. No different to I a DirectX 12 bench was done, the case is that with a lower overhead api it beat itself by a large margin even with the flaky mantle support.

But as I said DX 12 benchmark will show if the issue I to do with overhead which it more likely is. The Fiji has far more shaders that need feeding to keep high fps than a 290. Beside the architecture of each core being different. And they cant run if the driver is not feeding them fast enough.

Geeman1979 · 16 Aug 2015 at 08:37

You should've just posted the scores, after all that's what the benchmark threads are for to make comparisons between cards.

@ Kaap

Kaapstad · 16 Aug 2015 at 08:43

Mauller said:
Let me put if this way then, which method is faster at transferring 100mb from 8 memory columns within a chip. So 800mb in total.

A 4gb line, 2x 2gb lines, 4x 1gb lines or 8x 0.5gb lines.

Because all of them are equal, even if you think the single 4gb line would be best. Once the data goes from ram data line to gpu core, parts of the data are routed to core cache or elsewhere.

And the mantle thief comment is perfectly valid as it refers back to the AMD overhead issue. You can watch gregs thief video if you like to see. No different to I a DirectX 12 bench was done, the case is that with a lower overhead api it beat itself by a large margin even with the flaky mantle support.

But as I said DX 12 benchmark will show if the issue I to do with overhead which it more likely is. The Fiji has far more shaders that need feeding to keep high clocks than a 290. And they cant run if the driver is not feeding them fast enough.

It does not matter how you dress it up 1080p needs high clockspeed on the memory to push out the fps, it does not need a wide bus as there is not enough data in each frame to use it.

Kaapstad · 16 Aug 2015 at 08:45

Geeman1979 said:
You should've just posted the scores, after all that's what the benchmark threads are for to make comparisons between cards.

@ Kaap

I was only using the Thief bench and a few others to test my Fury Xs before putting waterblocks on them.

I was going to do some proper runs but I have hit a few snags at the moment.

Mauller · 16 Aug 2015 at 08:52

Kaapstad said:
It does not matter how you dress it up 1080p needs high clockspeed on the memory to push out the fps, it does not need a wide bus as there is not enough data in each frame to use it.

Memory clocks don't work the way you are thinking. It refers back to the point I just made about line numbers and line speeds. And the internal gddr5 memory frequency is nowhere near the external frequency.

The frequency relates to the lines data rate, you have multiple memory columns sending data down a single line. Or you have a slower data rate and more lines, so fewer columns send data down a single line.

If anything it is more efficient to have a wider bus in the way HBM works since it increases the number of simultaneous read/writes per chip. thus reducing latency.

Geeman1979 · 16 Aug 2015 at 08:58

Kaapstad said:
I was only using the Thief bench and a few others to test my Fury Xs before putting waterblocks on them.

I was going to do some proper runs but I have hit a few snags at the moment.

Ok sound, look forward to your results, as of up to now, I've seen nothing that impresses me about the Fury X. I'm also curious to see how much coil whine your cards have 'if any' once you're up and running.

Silent_Scone · 16 Aug 2015 at 09:53

Kaapstad said:
Total rubbish !!!

Most of the market is @1080p, I would like to think that AMD are not stupid and write their drivers to get the best out of their cards at the resolution.

No it's not it actually makes sense, although it won't be by and large the whole reason. There is still a fair amount of CPU overhead with AMD drivers within DX11 so it makes perfect sense for the performance to suffer as a direct result, especially as things progress. This may change with W10 and DX12, there are some results that suggest heavy groundwork has been put in place there, so with limited man power it again makes perfect sense that 1080p optimisation on a flagship product takes a backseat.

The one thing that can be agreed on is that HBM isn't doing a lot for this card. I won't pretend to be overly educated on the matter, but the capacity is definitely a problem for it at UHD and high resolutions. All this talk about memory optimisation is essentially unwittingly holding back temporary buffers in system memory creating packet traffic unnecessarily. This will have a performance penalty of it's own, given the 4GB capacity.

Currently the only saving grace for AMD's flagship is XDMA, and I'm not entirely convinced passed raw frame scaling, that there's much benefit there either over the opposition.

Of course I wouldn't know for myself, as that ship had sailed when looking for stock a few weeks ago.

P.B · 16 Aug 2015 at 10:16

But surely you wouldn't buy a titan x or fury x to only play at 1080 that's why these are touted as 4k cards, so even if the card is crippled at 1080 it wouldn't matter as they are not designed for that.

Yes they can do it, but was never intended for it that's what there midrange cards are designed for.

humbug · 16 Aug 2015 at 13:57

Kaap explain why the GTX 660TI was slower than the GTX 670 @ 1080P despite having the same number of Shaders (1344)

Explain why Tahiti LE with 1536 Shaders was some 20% slower than Tahiti Pro with 1796 Shaders.

David Bisset · 16 Aug 2015 at 15:32

Kaapstad said:
It does not matter how you dress it up 1080p needs high clockspeed on the memory to push out the fps, it does not need a wide bus as there is not enough data in each frame to use it.

Not at all how memory works Kaap sorry!

Clockspeed is just a number - this is true in GPUs, CPUs, DDR, everywhere. All the GPU cares about is throughput and latency, both of which are improved by HBM (latency hardly at all though

). Sure, the FurX only shines at higher resolutions, there have been many articles on why. Some of them talk - amongst numerous other factors like driver overhead, idle shaders etc - about how the gains from HBM are more noticeable at higher resolutions - this doesn't mean it's slowing performance compared to previous gen memory tech though. HBM isn't to 'blame' for the 1080p performance (which is still well above the low-end market that is being talked about, so how it's going to hold these cards back even if you were right I don't know!)

Kaapstad · 16 Aug 2015 at 15:56

David Bisset said:
Not at all how memory works Kaap sorry!

Clockspeed is just a number - this is true in GPUs, CPUs, DDR, everywhere.

You guys can argue all you want but the bottom line is every review shows the Fury X has poor 1080p performance.

This is despite the fact the card comes with a 9 billion transistor core compared to just over 6 billion for the 290X. This begs the question how does the older card get so close to the newer one @1080p ?

Perhaps you guys can answer it ?

Would the answer be HBM by any chance.

humbug · 16 Aug 2015 at 16:15

Kaapstad said:
You guys can argue all you want but the bottom line is every review shows the Fury X has poor 1080p performance.

This is despite the fact the card comes with a 9 billion transistor core compared to just over 6 billion for the 290X. This begs the question how does the older card get so close to the newer one @1080p ?

Perhaps you guys can answer it ?

Would the answer be HBM by any chance.

Driver overheads + ROP's

Kaapstad · 16 Aug 2015 at 16:44

humbug said:
Driver overheads + ROP's

Don't forget the 290X uses very similar drivers.

Mauller · 16 Aug 2015 at 16:52

Kaapstad said:
Don't forget the 290X uses very similar drivers.

Which is why Hawaii saw a nice bump in performance when the 390/Fury x driver dropped. Because it contained the driver overhead optimisations over previous drivers.

Also the 290x has far fewer shaders than the Furyx, so the driver is now at the point where it can load the 290x well, but is still not enough to fully load the furyx yet.

you have to consider that the Nvidia drivers can pump out near double the number of Draw calls compared to AMD's. So it can load the 980's and TX far better than the AMD driver can the furyx at lower resolutions.

Kaapstad · 16 Aug 2015 at 17:10

Mauller said:
Which is why Hawaii saw a nice bump in performance when the 390/Fury x driver dropped. Because it contained the driver overhead optimisations over previous drivers.

Also the 290x has far fewer shaders than the Furyx, so the driver is now at the point where it can load the 290x well, but is still not enough to fully load the furyx yet.

you have to consider that the Nvidia drivers can pump out near double the number of Draw calls compared to AMD's. So it can load the 980's and TX far better than the AMD driver can the furyx at lower resolutions.

And this is the reason you are giving for a much smaller card performing nearly as well as a much larger one, I don't think so.

Mauller · 16 Aug 2015 at 17:15

Kaapstad said:
And this is the reason you are giving for a much smaller card performing nearly as well as a much larger one, I don't think so.

Because the driver can't feed the larger card fast enough to allow it to use all of it's shaders effectively... how hard is it to understand?

if the card needs to render at a faster rate then it needs to be fed faster. Nothing hard to understand about that. It can only process the data as fast as it receives it.

it is no different to any type of processor, no matter how large the cores are. it can only process the data if it is available.

P.B · 16 Aug 2015 at 17:37

then if that's the case would you see the same trend with bf4 using mantle ?

Kaapstad · 16 Aug 2015 at 17:42

Mauller said:
Because the driver can't feed the larger card fast enough to allow it to use all of it's shaders effectively... how hard is it to understand?

if the card needs to render at a faster rate then it needs to be fed faster. Nothing hard to understand about that. It can only process the data as fast as it receives it.

it is no different to any type of processor, no matter how large the cores are. it can only process the data if it is available.

I can not believe this thread, AMD owners blaming the drivers !!!!

The problem is a hardware one and there is only so much the drivers can do however well written they are.

The problem is the low clocked HBM bottlenecking things @1080p. I could name people who know more than any of us debating in this thread who have also said the problem is clockspeed but I won't as it would be unfair on them to get drawn into this.

As for my Fury Xs I am seriously considering throwing them in the bin and walking away as they have caused me way too much trouble with HBM.

The first card I tested straight out the box went pop on the HBM as soon as I ran the Thief bench lol.

The RMA replacement was a very used one with a noisy cooler.

When I finally got all 4 of them together and tried to run them another card went pop on the HBM with memory artifacts all over the screen. This all happened running the cards @stock.

Now do I waste my time and money RMAing another card or do I bin them lol.

HBM is slow and unreliable or maybe my cards came from a bad batch.

Mauller · 16 Aug 2015 at 17:57

And would you like to know something kaap?
Even when HBM2 comes out, it will still be clocked lower than mid range GDDR5.

And humbug asked you a very good question which you tactically ignored.

But i am done now, I have said my bit. You went and jumped the shark and called Fanboyism. I was explaining the most likely reasons for the problem and giving concise reasons as to why. But you as you have done so before added nothing to the conversion, made no attempt to constructively discuss the points i made, and in the end kept brow beating. So no point in this discussion continuing.

And it is a shame that you had a bad set of cards and it is unfortunate for you.

Better luck next time old chap.

humbug · 16 Aug 2015 at 18:42

Kaapstad said:
I can not believe this thread, AMD owners blaming the drivers !!!!

The problem is a hardware one and there is only so much the drivers can do however well written they are.

The problem is the low clocked HBM bottlenecking things @1080p. I could name people who know more than any of us debating in this thread who have also said the problem is clockspeed but I won't as it would be unfair on them to get drawn into this.

As for my Fury Xs I am seriously considering throwing them in the bin and walking away as they have caused me way too much trouble with HBM.

The first card I tested straight out the box went pop on the HBM as soon as I ran the Thief bench lol.

The RMA replacement was a very used one with a noisy cooler.

When I finally got all 4 of them together and tried to run them another card went pop on the HBM with memory artifacts all over the screen. This all happened running the cards @stock.

Now do I waste my time and money RMAing another card or do I bin them lol.

HBM is slow and unreliable or maybe my cards came from a bad batch.

That sux

I thought it was HBM too, you may remember me moaning "WTF is it doing????"

But after some debate around here i have changed my mind,

Now i don't know what is holding it back, at the end of the day the performance of Fiji's buffer is measured at 512GB/s. i don't think it is the memory....

There are two other possibilities.

Its a much bigger core than Hawaii and yet it only has the same number of ROP's (64) is that a bottleneck? possibly.

AMD's Drivers are inefficient compared with Nvidia, Mauller makes some very good points on that, no need to repeat what he said, i couldn't do it better.

I think AMD were forced to release it before it was ready, I don't think the architecture is fully completed and i don't think the Drivers are ready, it will come into its own over time.