The problem is, what cards we see, a 60% advantage in one benchmark may or may not take a huge hit in tesselation performance in a real game, we've yet to see. THe problem comes from if the actual release cards aren't anywhere near as powerful as the "architecture" we've been shown, not cards, the architecture, then a 448sp part with signifcantly lower clocks won't have anywhere near a 60% advantage to begin with, it will be losing over 10% of its raw tesselation power with 2 less clusters, with the remaining also at lower clocks, if its be believed that we'd be looking at 25% clock drops and 448sp's for any "available" parts, that could be a good 40% hit in raw tesselation power BEFORE any real world performance loss on top of that.
AS everyones said a million times, without card specs we have NO idea about anything. THeres nothing to suggest we'll see a 512sp card at whatever clocks were used in those benchies at all.
I think the problem with non fixed function tesselation comes in the dev's not knowing easily how much they can add. AMD's implementation pretty much lets the dev's know to an exact degree how much tesselation every single card in the series can handle, and therefore they can optimise games knowing exactly how much they can add without harming performance elsewhere. While with a variable output ability, and a changing amount of power from one card to another it will be far harder to scale Tesselation.
A game with AMD cards might find they can do all characters to X depth, buildings but leave ground flat for this generation but it will work on most hardware and won't give you changing framerates depending on what area of the game you're in.
This is the problem it will be very hard power wise for any card to just tesselate every last thing like in the uniengine demo, a fixed level to work to should make it fairly easy to implement smoothly.
But I've said before, it will be great if tesselation becomes a massively used thing, is definately not going to be completely unused as in AMD's case since the 2900xt, so next gen they can know they want tesselation, all game dev's want it(which seems to be the case) and so dedicating an extra X amount of transistors isn't a huge risk at all, while this gen it was.
If both companies move up to 28nm next rather than 32nm, theres going to be a HUGEEE increase in the number of transistors they can stuff in and still end up as tiny cores compared to this generation. We should be back in that process to good yields, tiny cores and low prices which simply aren't possible on 40nm like they were at 55nm. Even with a vastly increased tesselator unit and a huge bump in raw shader power in the next gen, they'll be small cores if they skip 32nm.