As we all know HBM1 performance can be updated to work entirely differently by drivers... so somehow it is still the HBM1 that is the problem.
From what I recall doesn't Fury X beat a 980ti at 1080p, and at every resolution and every setting except Hyper(de-optimised memory storage Nvidia are paying devs to use to hurt 4GB cards, both Nvidia and AMD ones) and lowest settings... funnily enough.
But still you know, HBM1 sucks, that it's beating the 980ti is irrelevant.
Fury is a not optimised overly large core built on an architecture not particularly designed for that number of shaders. It was more than anything a test bed for HBM so they could implement it, get the production chain for HBM1 established, start ramping that up, learn about HBM and how to optimise their next architecture that will use it and make improvements. Vega will bring chips with architecture tweaks designed for both HBM and a higher number of shaders.
Fury X is the reason AMD is able to bring HBM2 to higher volume and cheaper products than Nvidia will manage this generation.
512GB/s via GDDR5 is a split of 35W chips and 50W PHY layer, via HBM1 it's roughly 17W chip and 12W PHY. GDDR5x doesn't reduce the PHY layer power much at all, regardless of bus size or chip power moving X amount of data off a chip takes Y power without a huge amount of difference, 512bit with 6Gbps chips or 256bit bus with 12Gbps chips will use about the same PHY power. You'll save about maybe 10W at 512GB/s using less GDDR5x chips over GDDR5 as you use half the chips but the chips themselves run faster and use more power overall.
HBM2 makes this comparison even worse as it reduces chip power to achieve 512GB/s.
HBM will always use significantly less power, that 50W difference can't go away. That means a 250W high end chip which will probably use around 512GB/s or more, meaning 75W or so will be purely memory leaving 175W for the chip. AMD within the same power budget will use maybe 25W for the memory leaving 225W for the gpu.
The bandwidth achievable with gddr5/x is not a problem, it never was. The problem is the power it will take to achieve it. HBM will always use significantly less power than GDDR5.
The sole reason an older GCN architecture could remotely rival Maxwell a much updated one in performance/w was purely HBM using much less power. It saved 50W. With gddr5 Fury X would need to use over 300W or more likely, have 500-1000 less shaders, either way reducing performance/watt massively and making a Nano completely unachievable, not size but the compelling performance in that sized package or it would have been so loud and hot it wouldn't have worked.
HBM2 will do the same and be an even bigger difference vs even higher bandwidth high end chips this generation.