Remember with graphics cards they produce one frame at a time and then move on to the next. What that means is each frame produced at 1080p on a Fury X will not come close to using all the bus width available as there is simply not enough data in each frame to do it.
As to the Thief bench I think you have just dug a hole for yourself.
Let me put if this way then, which method is faster at transferring 100mb from 8 memory columns within a chip. So 800mb in total.
A 4gb line, 2x 2gb lines, 4x 1gb lines or 8x 0.5gb lines.
Because all of them are equal, even if you think the single 4gb line would be best. Once the data goes from ram data line to gpu core, parts of the data are routed to core cache or elsewhere.
And the mantle thief comment is perfectly valid as it refers back to the AMD overhead issue. You can watch gregs thief video if you like to see. No different to I a DirectX 12 bench was done, the case is that with a lower overhead api it beat itself by a large margin even with the flaky mantle support.
But as I said DX 12 benchmark will show if the issue I to do with overhead which it more likely is. The Fiji has far more shaders that need feeding to keep high fps than a 290. Beside the architecture of each core being different. And they cant run if the driver is not feeding them fast enough.
Last edited: