I have been doing some research into the HTT/SMT affect on gaming and ended up finding this thread here which has links to other threads, its a gpu discussion, but what is interesting is the affect of AMD and Nvidia policies on CPU usage, and it has a lot of answers to the questions I had.
https://forums.guru3d.com/threads/discussion-on-async-compute-from-amd-prospective.423079/
So to summarise.
On DX11, nvidia look at hardware threads available (so logical threaded cpu's have more of them), and then offload draw calls from the main cpu core to those threads to boost performance (this has been primarily whats been holding back AMD on DX11 as well).
Also AMD prefer developers to do parallel execution, whilst nvidia prefer concurrent. Concurrent seems much less efficient when reading how it works. Some of you on here may remember I have commented a few times I have observed that when games are loaded that 2/3 of my cpu utilisation is on context switches. Yet when I did things like limit games to a single core to get rid of that overhead performance often dropped. After reading up on geforce CMDLIST I now have all the answers to this.
So basically nvidia drivers are very wasteful with cpu resources, but ultimately in DX11 it often gains performance as it bypasses a bottleneck, at the cost of needing more cpu power. This explains why in certain games logical threaded cpu's have seen gains as the amount nvidia offload from the main cpu core, is directly determined by the amount of cpu threads available to it. It isnt because SMT/HTT cpu's are faster (which I already knew they not), its because it offloads more as a result of more threads been available, basically there is smaller slices on the pie as a result of there been more slices and the slice allocated to the main core/thread is smaller.
I recommend reading that thread as well as many of the spin off links, is some very interesting information there. Mainly on gpu's and the way they interact with cpu's for gaming performance.
How much of this applies to older DX9 I have no idea, but of course we know vulkan and DX12 are engineered for high threads from the ground up, so those will benefit from higher cpu cores pretty much all the time.