There are really three possibiliies:
- The chips came out defective from the fab with missing/non-functional ROPs
- The chips were intentionally cut to have less ROPs.
- Both 1 and 2 could have occured i.e. chips were defective, then they were cut so they would function without issue
I find #1 less likely because this would mean there would be some variance in the number of non-functional ROPs. And from what I understand, all the 5090s and 5080s that have this issue so far have the same number of missing ROPs. It's much more likely these chips therefore were cut to have less ROPs.
Any which way, the issue originated at TSMC, then was "missed" by nV in validation, then was "missed" by (multiple it seems now) AIBs in QC. My theory is nV caught these in validation, let them go to AIBs, then AIBs either intentionally disregarded QC results (on their own or under instructions from nV). The fact that there are reports of these issues across *both* 5080 and 5090, and across multiple AIBs points to an intent here i.e. someone knew about the issue, and much less probable, an endemic failure of validation and QC at multiple steps.