Caporegime
- Joined
- 18 Oct 2002
- Posts
- 33,188
Computer science is basically the answer to how and why they would cull more triangles. With every iteration of hardware and software you learn more about the algorithms used, their weaknesses, their strengths and you find new and better ways to implement the algorithm, or just find a new way to implement an existing algorithm better within the hardware.
In exactly the same way TressFX 2.0 improved the speed dramatically over TressFX 1.0, and quality for that matter, you can usually find a way to improve things.
You also get more die size and transistor to play with. So that hardware culling unit that needed 1 billion transistors to improve efficiency by 15% is efficient and worthwhile when you have 15+ billion transistors in the processor but in a 5-8 billion transistor gpu is just too big. Same with memory compression, the Maxwell and whatever the heck the last GCN version it was memory compression wasn't the first nor last improvement in memory compression. There will be better more efficient ways of compressing memory found in the future and again die size of hardware to implement it factors in. Bigger dies with higher transistor costs mean being able to add more features to a new generation without the same kind of space reduction/power hit that feature would use up on the previous process so often what is not feasible for instance at 28nm they have room/power budget for at 14nm.
There is also the simple human cost of improving parts of a GPU. With a limited amount of man power you can't research every single part of a GPU every generation, you can't dedicate unlimited money or time to every part so you take an educated guess at which parts of the gpu can be improved to provide the highest efficiency boost for the time and money spent. So think that for the 290x culling or memory compression they thought would bring 9% gains but a change in the ROP efficiency could bring 12%, so they chose to improve Rops. Now this gen with ROPS having been improved culling can bring about a say 11% improvement(because a previous bottleneck in the rops was removed making culling a bigger bottleneck) and ROPS have less room for improvement and can only improve performance another guesstimated 5%, so this time you focus on culling improvements.
CPU, GPU, it's all balancing and frankly gambling where the biggest gains will come, where to focus time. This gen they improve A D, E and F, next gen they focus on B, C, G and H. There are teams doing ongoing research, teams who decide which way the next gpu architecture should go based off that research, teams who implement that into a shipping GPU.
In exactly the same way TressFX 2.0 improved the speed dramatically over TressFX 1.0, and quality for that matter, you can usually find a way to improve things.
You also get more die size and transistor to play with. So that hardware culling unit that needed 1 billion transistors to improve efficiency by 15% is efficient and worthwhile when you have 15+ billion transistors in the processor but in a 5-8 billion transistor gpu is just too big. Same with memory compression, the Maxwell and whatever the heck the last GCN version it was memory compression wasn't the first nor last improvement in memory compression. There will be better more efficient ways of compressing memory found in the future and again die size of hardware to implement it factors in. Bigger dies with higher transistor costs mean being able to add more features to a new generation without the same kind of space reduction/power hit that feature would use up on the previous process so often what is not feasible for instance at 28nm they have room/power budget for at 14nm.
There is also the simple human cost of improving parts of a GPU. With a limited amount of man power you can't research every single part of a GPU every generation, you can't dedicate unlimited money or time to every part so you take an educated guess at which parts of the gpu can be improved to provide the highest efficiency boost for the time and money spent. So think that for the 290x culling or memory compression they thought would bring 9% gains but a change in the ROP efficiency could bring 12%, so they chose to improve Rops. Now this gen with ROPS having been improved culling can bring about a say 11% improvement(because a previous bottleneck in the rops was removed making culling a bigger bottleneck) and ROPS have less room for improvement and can only improve performance another guesstimated 5%, so this time you focus on culling improvements.
CPU, GPU, it's all balancing and frankly gambling where the biggest gains will come, where to focus time. This gen they improve A D, E and F, next gen they focus on B, C, G and H. There are teams doing ongoing research, teams who decide which way the next gpu architecture should go based off that research, teams who implement that into a shipping GPU.
Last edited: