Isn't the cache there because they are using FSB and there isn't enough bandwidth for more cores.
Yes, exactly, but with a huge L3 cache, there is no need for such a large L2 caches on each pair of cores. Its just a "hack" to get Penryn cores working in an "almost" native method. It should work well, but its expensive in terms of die space.
The truely native Nelehems wont have the same issue as they are a native design in the first place, and (I believe) use a single shared L2 cache, and not have an L3 at all. And of course they will have CSI (Same as Hypertransport) so intercore communication will be very good, without the same cost in terms of die space.