Er that is exactly what is happening isn't it.
Mantle doesn't run on Nvidia hardware so a game has to run it on the default DirectX path which is a lot slower.
Gameworks doesn't run on non CUDA hardware so it has to run it on the default CPU path which is a lot slower.
A lot of GameWorks stuff is ported to DirectCompute instead of running on CUDA like the original version hence Flex, etc. rather than the original PhysX based versions of the functionality rather than forcing people to use CPU on hardware that doesn't support CUDA.