Kaap please read my post you quoted. Nothing you're seeing suggests HBM is poor. Indeed what you see backs what DM says is the problem, as if ROPs constrain performance then memory bandwidth becomes even more important. As Mauller also pointed out, if it were HBMs 'fault' then driver changes wouldn't be closing the gap. I'd suggest both of their answers are relevant & are both contributing to what we see, but that HBM is not in any way a factor.
To repeat clockspeed means nothing. At all. Literally irrelevant as a number when looking at performance. You have two main metrics, latency and bandwidth. Latency is similar between GDDR5 and HBM (and system RAM) as it's about the limits of the actual chips. Bandwidth is better with HBM. Clockspeed is simply a mechanism to get bandwidth through your interface, not something that gives any performance in itself. The reason increasing clocks helps is that it increases bandwidth. GDDR5 would be worse, even though clockspeeds of 7000 sound impressive compared with 500 it's totally a pointless comparison.
The only time clockspeed can be used as even a slight performance indicator is when comparing two systems using the same tech, and even then it's poor as we don't generally see the timings of GDDR so sometimes faster clockspeed is causing increased latency so you're trading one relevant metric for another meaning situationally worse performance.
You might find this, though written a while ago, provides some useful information:
https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/
It's still fairly relevant to todays cards.
(Edit any pedants: yes, bandwidth and latency can alternately be represented by the trio of frequency, bus width and timings ... but as bus width is clearly better with HBM we're just comparing the combo of width with frequency for bandwidth, and timings with frequency for latency, making frequency on it's own not a relevant stat. As we don't normally see timings we can't compare easily there, but we can compare latency and doing so reveals that even using HBM as 'wide GDDR' rather than looking at better addressing we see matching latency)