Keep in mind I'm not saying that GDDR5 is better than/the same/has any future compared to HBM generations but its going to be atleast 1 more generation til the strengths of HBM are really needed or are really going to mature enough to really make the difference.
I think your stuck a bit in the older 60-40nm GDDR5 chips, the new stuff is significantly better - sure its still a last hurrah but it can do 8GHz as standard with 8GB in the same kind of situation as your looking at 5.5GHz now and potentially quite a bit more depending on how much your prepared to pay for binning in a gaming GPU context. Unfortunately the power saving aren't as dramatic but the feasible speed increases are just shy of 50%.
Create a architecture designed around 60% more bandwidth and it would suck with less. Not needing that bandwidth on todays products doesn't mean you won't need it on tomorrows products.
I'll also point out that Hawaii chose to use a bigger bus and lower power chips because it also cuts down on the signalling power usage(at lower clock speeds). They decided 512bit + 5Ghz clock speeds was more power efficient than 384bit + 7Ghz clock speeds, it's not that the chips couldn't do 7Ghz, it's that they used lower clocks on purpose to reduce total memory system power usage, the chips are the smaller part of the equation.
Then we get to the speed increase, chips already did 7Ghz 'as standard', this is moving to 8Ghz with a new process node, at just over 14% increase, it's certainly not a 50% increase and it fails to take into account that the rest of the memory power usage will increase at 8Ghz vs 7Ghz. If the power drop in the 20nm chip even outweighs the increase from memory controller/signalling is questionable, particular with everyone's 20nm chips basically sucking balls power wise with not massive gains vs previous nodes.
Think about other downsides of GDDR5 even at 20nm. 32bit connection per chip... 16 chips needed to get a 512bit bus, 20nm min density is 1GB, 16GB of low yield memory on the most expensive process yet? Requires a huge memory controller to connect with. HBM can achieve the same bandwidth in only 4 stacks, 4 connections to the gpu with the connections using a fraction of the power. Yield on 1GHz HBM chips, 1.2v for lower power, scale it, 2Ghz for 256GB/s per stack is only a year or two away, why, because scaling to 2Ghz is trivial for these chips, scaling to 8Ghz is not trivial(power/yield/price wise). 16 sets of traces on a PCB, with huge power wasted on signalling vs 4 on package connections that saves a huge amount of power.
HBM also has some early downsides, potentially 1GB per stack is a limitation at the other end of the scale because they MIGHT be limited to 4 stacks(they might just go with 6 or 8 stacks and even more bandwidth). However HBM will scale brilliantly in a short space of time. 2 years from now it should be pretty easy to put 16GB of memory providing 1TB/s of bandwidth in only 4 chips. Not long after that we'll be looking at 32GB in only 4 stacks with the same memory or using 2-4 more stacks. This will all be with vastly simplified PCB's, making power components cheaper, layout cheap, the process of making custom cooling will be quicker as PCB's will be so simple. There are lots of other area's HBM will improve as a by product of a very simple PCB.
There are many more negative things about GDDR5 that you aren't considering and the speed increase you suggested simply isn't close to accurate but you've also ignored the power increase from other parts running at higher speeds.
HBM's biggest advantage is not the stacking directly, nor the clock speeds, nor the lower power per GB that the memory itself uses. It's the connection method, the main saving in power is from being on package, GDDR5 will never claw that back, if the memory was magically made on 0.1nm process and used 0.01W per chip it would still use more power to use gddr5 than current HBM. HBM clockspeeds are so low they'll double with the next process drop, not go up 14%, and they'll do so in the roughly same power.