It's a bit like RAID.
When going to two GPUs, you are generally doing it to (hopefully) double your performance.
Two GPUs means twice the processing power, but each GPU keeps a complete copy of all the data, so that it can use all it's own memory bandwidth for itself, without having to worry about it's partner.
If we moved to the model you are suggesting, 4+4=8, it implies no duplication of data for textures etc. So, GPU A needs as much access to GPU B's vram as it's own, and vice versa. This data would have to be transferred over the PCIe bus. Let's take the 290x, with 320GB/s memory bandwidth, a PCIe 3.0 x16 connector has only 16GB/s in each direction.
So, in the time this GPU can pull 1 GByte from the other GPU's vram, it can pull 20 GByte from it's own. Given that with an 1155 board you are talking PCIe 3.0 x8 for each card, it's now a 1:40 ratio, that is, it's 40 times quicker to read from local vram than the remote vram, assuming the PCIe bus is used for NOTHING ELSE.
Given this, you'd slow the system down so much by doing this, that it's only practical for each GPU to have it's own copy of all the data.
This all also applies to dual GPU cards such as the 295X2, as they have a PCIe bridge on the card and the two GPUs still communicate over PCIe.
I don't forsee it ever being cheaper to build a replacement interface for PCIe that's as fast as the GPU memory bus than it is to just add more vram, so I don't think this will ever happen.
In some GPGPU activities the memory bandwidth isn't as crucial and this would work.