Because these things never scale linearly at all, other bottle necks come in to play. 11% would be the increase in theoretical computes but things like ROPs, TMU, cache, command processor, schedulers, geometry, tessellation processors, bandwidth all come to play.
You see this with all the cutdown cards, they are never as slow as the theoretical difference. The 1070 is much faster than the reduction in CUs would indicate, likewise the 980to to the TX etc.
So yeah, a full 40Cu would in theory be 11% faster, in realty somewhere short like 7%, all dependent on the exact game and where the bottle neck lies.