Here is a video of cycles render engine, rendering a scene using 2 Nvidia GPUs (of different make).
https://youtu.be/dreR2z8Kgyk?t=6m34s
The cards don't need to be in SLI to function they just need to have drivers installed.
The scene is copied on to both GPUs and rendered out. From memory, cycles works by firing a "photon" and bouncing it around the scene till it hits the camera. That means that parts of the image that haven't been rendered out can affect the reflection in a scene that is currently being rendered. I must admit i'm not entirely sure how it all works from a coding perspective however it does show that it is possible for 2 cards to render out different parts of a scene simultaneously.
The biggest issue seems to be distributing the workload and syncing up memory. HBCC could potentially take care of the memory issue. As for distributing workload, either AdoredTV or NerdTechGasm, mentioned that with the NCU coming in vega should have an improved loadbalancing system which was one of the things which bottlenecked the fury cards.
Edit: Found the video. Watch till 16:25
https://youtu.be/m5EFbIhslKU?t=13m33s
Look at the AMD slide for improved load balancing. I believe that each row after the intelligent workgroup distributor block refers to geometry process engine. If you keep watching he says that GCN can support a maximum of 4 these.
so a few thinks to consider why haven't they shown only 4 rows. It would fit in the slide. A few theories.
1. Vega has more than 4 shader engines and AMD didn't want to reveal this.
2. The new IWD can scale to as many geometry engines as present
3. AMD just thought it looked better and it means nothing