I can't find the nF200 connection block diagram. But, anyway... I don't get it, why it should be faster? Or even equal than having the GPUs directly connected to the CPU?
The nF200 must be connected somehow, it doesn't magically add 32 PCIe lanes. Looking at the block diagram of P67, it must be connected to either 8 of the 16 lanes of the CPU, or 8 of the 8 lanes of the P67 chipset itself. In the first scenario you'll have: 8x,4x,4x (nF200 connected to 8x PCIe lanes, and there is a bottleneck, how can it magically add 24x is beyond me). In the second scenario you'll have 16x,4x,4x with added latency because of the P67.
Looking at the graphic from the review, where only 2 cards are connected the first one is working at 16x using nF200, so I guess the second scenario applies. I know that the connection between the nF200 and the cards are working at 16x, but if the nF200 is connected to the P67 at 8x PICe 2.0 then nothing will be faster than that...
Edit:
I forgot to add the P67 block diagram I was talking about:
http://www.itcode.org/wp-content/plugins/RSSPoster_PRO/cache/087c7_p67_block_diagram.png