The CU can't push its calculations through a 128Bit pipe as well as it can a 256Bit pipe, an actual bottleneck.
Its only cinebench that operates in this way and it does to give you a single threaded performance option. But its flawed as even cinema 4D its self does not operate in this way. Its fully threaded.
But in single core bench, you're saying it's got one 128bit pipe for one core - but in multi-core it will have one 256bit pipe for 2 cores - is that any different? And why is it resulting in worse performance for the multi-core bench?