If it's HBM it comes in stacks of 128bit 128GB/s of bandwidth, each stack size is at least 1GB but could be 2GB. initially only 1GB stacks were being made with 2GB to come later, so it depends if the delay in 20nm pushes the release date to when 2GB stacks are available or not.
If you presume a 512bit bus, then you could fit 4 HBM stacks in which would give 512GB/s bandwidth with options of 4 or 8GB.
If they will require this much bandwidth is unknown, on the one hand you have new compression that vastly improves the efficiency meaning what perhaps before required 300gb/s bandwidth now only requires 200gb/s. But moving to 20nm we'll have bigger cores with more shaders and more brute power.
Each extra chip on an interposer also decreases yields, quite large differences in yields to the point that if they can use less they will.
It's quite possible we'll see a midrange with 256gb/s bandwidth, 2 stacks and 4GB, with the high end with 384gb/s bandwidth with 3 stacks of HBM giving 6GB. With that 384GB/s probably being more or less equivalent of 512gb/s on the pre Tonga architecture.
I would say it's exceptionally unlikely to have more than 4 stacks for desktop products and they might be limited to 4GB with that. I can't remember the date precisely but I was under the impression the 2GB stacks were due 8-12 months after 1GB stacks were available, 4GB looked like it would be another 2+ years after that. Ultimately Nvidia/AMD will be limited by the same availability of stacks and won't be able to just double up memory by doubling the number of chips as they've been able to do in the past(when both are doing HBM that is).
AMD are obviously WAY closer to 20nm production than Nvidia, as such Nvidia put a LOT of extra work pushing a 20nm architecture into a 28nm design. AMD way a test product with Tonga by the looks of things to play around with compression giving them both a chance to try out a large architecture improvement and give them time to optimise it before the next set of cards on 20nm.
It would seem to be a very good idea to do Tonga with new memory compression to give them a huge amount of data on how it works now because HBM will be a monumental change in the memory architecture, spreading these two changes amongst a longer period and different cards is a very good idea. Lots of brand new ideas all on the same chip on a new process is something almost everyone avoids now. Intel, AMD, Nvidia have frequently tried to make some of the changes on a test product first, sometimes it's new process + old architecture, sometimes it's new architecture + old process. Even looking at Apple with the A8, it's a fairly minor upgrade to cylcone on a new process. A9 is likely to be a bigger architectural change on a similar process.
If Nvidia had 20nm cards coming as early as Feb it's almost certain they wouldn't have put as much money into a incredibly short term 28nm product.