What are the implications for the rest of the architecture if there are 128 ROPs? Does there need to be a minimum amount of other components to feed the ROPs? I had a quick read about how GPUs work but it's not a 5 minute job and seems to vary generation to generation.
I can only provide a general framework.. what you need to do is basically simulate the pipeline. so there are basically 3 stages:
1. cpu sends instructions to graphic card
2. graphic card loads data on vram
3. graphic card uses data on vram to generate the final output which i guess is again pushed to the vram
ROPs and CUs fall in the third stage..
So the framework to design a GPU in this stage basically entails the following:
a. estimating the set of instructions that will have to be executed on average for generating 1 frame
b. estimating storage required for executing above instructions (both inputs and outputs)
c. targeting an output rate of "x" frames per second
Now, since Nvidia and AMD must be using almost the same sample of games for estimating the required set of instructions per frame, the overall architecture from a ROP/CU standpoint should scale proportionately, just a
hypothesis supported by above thought process..
not a CPU/GPU expert just a math guy