Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.
Don't be silly Roff, how dare we even suggest that Nvidia may be capable of doing the same thing that AMD did and bring out a driver that gives a nice boost to performance.![]()
I will be the first to moan at them for taking so long (as soon as I had finished benching that is)![]()
My gut instinct is we aren't seeing the full performance GK110 can bring to gaming... IMO we should be seeing closer to that ~70% (but at the current clocks rather than clock for clock) that humbug originally mentioned not the ~50% it is - not sure if I can explain this very well...
When your dealing with crunching through gigabytes of data the optimal approach is often to batch up large amounts of data at once, delay/reorder some operations to get the best long term performance which is great for plowing through lots of data but not so optimal for typical gaming scenarios where you want to quickly process smaller amounts of data - in a simplistic sense this is why AMD's old vliw architecture had such high theoretical performance and does so well at some things but struggles to bring that level of performance to gaming.
I think we are seeing something similiar with Titan, it might be that some level of performance is unavoidably lost by dispatching data sub-optimally for game type processing on a compute focused design - its possible tho they still haven't fully optimised to get the best out of it and in shader heavy games/benchmarks we are likely to see upto ~20% increases (upto 60% in context with figures humbug mentioned) with future drivers.
Different architectures and different jobs, if a GPU has 4Tflops then it might be really good at compute but that's not to say its going to be any better at GFX rendering than another 2Tflop GPU.
Adding more performance to a GPU is not as simple as beefing a part of it it up.
The GTX Titan does not scale 70% with 75% extra SP's compared to the GTX 680 because it is not 75% more GPU, not even close.
In exactly the same way the 2048 SP 7970 does not scale 15% up from the 1792 SP 7950, as we all know its about 5 to 7%.
Reason: they both have exactly the same memory bandwidth and ROP's (the rest of the GPU)
The 1792 SP 7950 scales much better from the 1280 SP 7870 because it has a wider bus vs the 7870 (384Bit vs 256Bit) but its not 40% which is the difference in SP count, its about 30%, Tahiti LE with 1536 SP's is about 20%~ slower than the 7950 with 15% less SP's and about ~10% faster than the 7870 with 20% more SP's.
Long story short, for a GPU to scale +75% you need to scale the rest of the GPU up 100%.
And that's not what the GTX Titan is, it has 75% more SP's held back by only 50% extra bandwidth and ROP's
Take 50% off 75 and you have 37, add the 50% you gained from the rest of the GPU and you have about 55% total gain clock for clock, which is exactly what it is![]()
It will depend on how much shader workload makes up the overall performance, the different architectures (you can't transplant AMD shader performance scaling ratios directly onto nVidia) and some more complicated issues with pipeline depth latency that can start to impact scaling on higher end GPUs in some cases. Don't forget that the 680 has a relative poor compute/shader performance for what it is while still having more than adequate pixel and polygon pushing capabilities so you don't need to scale those up so much to get more performance overall out of the GPU as you would with say the 7970.
Then lets use Nvidia.
GTX 680: 1536 SP's / 256Bit = 100%
GTX 670: 1344 SP's / 256Bit = ~95% with 20% less SP's and the same memory bandwidth.
GTX 660TI: 1344 SP's / 192Bit = ~80% with the same number of SP's as the GTX 670 but slower bus = 10%~ slower than that GTX 670.
Then lets use Nvidia.
GTX 680: 1536 SP's / 256Bit = 100%
GTX 670: 1344 SP's / 256Bit = ~95% with 20% less SP's and the same memory bandwidth.
GTX 660TI: 1344 SP's / 192Bit = ~80% with the same number of SP's as the GTX 670 but slower bus = 10%~ slower than that GTX 670.
Besides, while GK110 has a multitude of complex internal interactions that aren't present in GK104, for gaming very few of these are ever utilised. The data pathways for gaming-type data will be very similar between GK110 and GK104.
From what I've heard the difference in cache and scheduler behavior on GK110 over GK104 geared towards better handling of ILP, etc. results in some loss of efficency in handling gaming type data but its not really an area I'm an expert on.
Your forgetting the effects of the boost clock.
It's not just the number of shaders, but the clockspeed. The number of shaders and the clockspeed together give you the floating-point performance, which is the thing you want to be comparing:
Floating point performance = #parallel threads X #clocks per second X number of computations per clock.
e.g:
- GTX680: 1536 SPs X 1.006Ghz X 2 FLOPs per cycle = 3090 GFLOPS
(etc)
- 7970 (original): 2048 SPs X 0.925Ghz X 2 FLOPs per cycle = 3789 GFLOPS
That's the primary measure of performance ("pixel pushing power"), but you also have to consider memory bandwidth as well. For a true comparison you want both to increase in ratio. That's why the GTX680 is an interesting case - 50% bump in SPs, AND a 50% bump in memory bandwidth. When the clocks are set the same the GTX780 is pretty much a 50% bump over the GTX680, and in fully GPU-limited cases you would expect to see framerates increase by 50% as a result. Right now we're not too far off that, so I can't see any driver-related miracles coming through for gaming-type data.
My illustration is quite obviously clock for clock, we all know the GTX 670 is only about 5%~ behind the GTX 680 at the same clocks, I mean how many times has that been said in this room? and that the GTX 660Ti is slower clock for clock than the GTX 670 with the same SP's, its because it has a slower bus, again something that is widely known.
You can see that for yourself every time you play with your clocks to get the highest scores or highest FPS, when you increase your memory speed you are increasing the other half of the GPU's performance and with that you get the full effect, increase your GPU clocks by 10% and you get 5%, increase your memory clocks by 10% as well and you get the other 5% to add to your 5% giving you the full 10%.
Clock for clock the 670 does not have 20% lower compute performance than the 680 (let alone 20% less SPs) - its barely 10% slower compute (gflops) clock for clock (~12% less SPs) its only 20% less gflops if you compare out the box stock clocks without taking into account boost clocks.
Unless I'm missing something pretty much everything you've said about kepler is inaccurate because your not allowing for the way boost works.
This will depend where the bottleneck is both at a software level and the ratios on the hardware. i.e. tomb raider I can raise the core clocks up 10% and get a 10% increase in performance without even touch the memory clocks until I'm considerably above stock clocks because its not memory bandwidth limited.
Some GPUs come with barely adequate memory bandwidth out the box, others come with considerably more memory bandwidth than they need until you've massive increased the core clocks over stock.
Clock for clock the 670 does not have 20% lower compute performance than the 680 (let alone 20% less SPs) - its barely 10% slower compute (gflops) clock for clock (~12% less SPs) its only 20% less gflops if you compare out the box stock clocks without taking into account boost clocks.
This will depend where the bottleneck is both at a software level and the ratios on the hardware. i.e. tomb raider I can raise the core clocks up 10% and get a 10% increase in performance without even touch the memory clocks until I'm considerably above stock clocks because its not memory bandwidth limited.
AMD's GPU's have more stream processors, they have a higher compute performance rating than the GTX 680, about 3 times as powerful, you wouldn't think it looking at the figures (3Tflops vs 4) yet in practice they are.
Its because they use those resources differently and are geared to work with different instructions, CUDA vs OpenCL for example.
AMD's 7970 has 25% more SP's than the GTX 680, yet it is not any better at rendering GFX.
The closest example would be my 7870 having 1536 SP's, 32 ROP's and a 256Bit bus, exactly the same as the GTX 680, yet in GFX rendering the GTX 680 eats my 7870 for breakfast, some 25% faster. but with that same number of SP's my 7870 eats the GTX 680 alive in compute.
They are different GPU's with different ways of getting to result A.
My gut instinct is we aren't seeing the full performance GK110 can bring to gaming... IMO we should be seeing closer to that ~70% (but at the current clocks rather than clock for clock) that humbug originally mentioned not the ~50% it is - not sure if I can explain this very well...
When your dealing with crunching through gigabytes of data the optimal approach is often to batch up large amounts of data at once, delay/reorder some operations to get the best long term performance which is great for plowing through lots of data but not so optimal for typical gaming scenarios where you want to quickly process smaller amounts of data - in a simplistic sense this is why AMD's old vliw architecture had such high theoretical performance and does so well at some things but struggles to bring that level of performance to gaming.
I think we are seeing something similiar with Titan, it might be that some level of performance is unavoidably lost by dispatching data sub-optimally for game type processing on a compute focused design - its possible tho they still haven't fully optimised to get the best out of it and in shader heavy games/benchmarks we are likely to see upto ~20% increases (upto 60% in context with figures humbug mentioned) with future drivers.
Don't be silly Roff, how dare we even suggest that Nvidia may be capable of doing the same thing that AMD did and bring out a driver that gives a nice boost to performance.![]()
For the record you didn't say "may" And if you don't like our reasons why Nvidia won't have a driver to increase performance like AMD did for GCN, then please share with us your reasons why they will.
Well AMD squeezed out a driver that gave a nice solid boost to performance, so there is no reason why Nvidia cannot do the same.
Its been said already, and not just by me, you can't compare different architectures.
[etc]
Compute performance has little to do with game performance unless its a compute heavy game. like TombRaider
Mr humbug... what are you chatting about?!
That was to demonstrate how the computational performance is calculated, so you can work out the relative performance between cards - instead of using the clumsy "oh this is 100%, so this one is about 120%" methodQuite obviously performance is different across different architectures - they have entirely different internal pathways, scheduling, and internal inefficiencies. If that wasn't the case, then we would just be comparing total compute performance and not bothering with benchmarks wouldn't we?!
When comparing GK104 to GK110 it's a reasonably valid comparison. While GK110 has a lot of additional transistors dedicated to general purpose computing applications, the core architecture used for simple and predictable parallel workloads (as we encounter in gaming) is very similar.
Not sure if you're joking or not here
You realise that the process of applying shaders to pixels is a series of multiply-add ("MADD") computations, right? And that's the vast majority of what games do these days (lighting, reflections, refraction, translucency effects, subsurface scattering - all shaders). GPUs are massively parallel computing devices, designed to perform simple computational operations very quickly - it's why they're ideal for processing graphical effects.
Other processes such as geometry setup or tessellation are all floating point computations as well, but they're generally slightly less well-ordered than shader data, and so are more susceptible to inefficiencies in the architecture of the GPU.
Physics would be considered further along the scale of "complexity", in that the data coming in is less predictable and more "lumpy". To perform physics computations effectively on a GPU you require access to a much wider range of data, which requires better branch prediction, and needs the GPU (and perhaps more importantly the the controlling software) to be designed in such a way as to handle it efficiently. This is why CPU-based physics can still compete with GPU physics, but if you tried to render or pixel-shade a typical graphics scene the CPU would do so at well under 1fps. CPUs are good at handling complexity in data-structures - GPUs prefer a steady stream of predictable data (as generally encountered in gaming).
At the far end of the scale you have "general purpose compute", or GPGPU activities. These can cover a vast range of scientific and financial simulations, and due to this wide range of different data requirements, improving performance in these areas is the most challenging task. GPUs need complex internal links between components to allow the data-structures that are stored in the GPU memory to be assigned efficiently to the optimal pipelines. Cache and fast interconnects, as well as efficient scheduling and branch prediction are key to performance in these areas (something Nvidia first took seriously with Fermi).
... So, I'd love to know what you mean by "a compute heavy game like tomb raider". They're all "compute heavy". That's what modern games are!![]()