=.= From the Nai's Benchmark, assuming if the allocation is caused by disabled of SMM units, and different bandwidth for each different gpus once Nai's Benchmark memory allocation reaches 2816MiBytes to 3500MiBytes range, I can only assume this is caused by the way SMM units being disabled.
Allow me to elaborate my assumption. As we know, there are four raster engines for GTX 970 and GTX 980.
Each raster engine has four SMM units. GTX 980 has full SMM units for each raster engine, so there are 16 SMM units.
GTX970 is made by disabling 3 of SMM units. What nvidia refused to told us is which one of the raster engine has its SMM unit being disabled.
I found most reviewers simply modified the high level architecture overview of GTX 980 diagram by removing one SMM unit for each three raster engine with one raster engine has four SMM unit intact.
First scenario
What if the first (or the second, third, fourth) raster engine has its 3 SMM units disabled instead of evenly spread across four raster engine?
Second scenario
Or, first raster engine has two SMM units disabled and second raster engine has one SMM unit disabled?
Oh, please do notice the memory controller diagram for each of the raster engine too. >.< If we follow the first scenario, definitely, the raster engine will not be able to make fully use of the memory controller bandwidth.
I agree that this is the most likely explanation, which would mean the issue is hardware-related and cannot be fixed. Here's an illustration of GM204:
** No hotlinking **
http://images.bit-tech.net/content_images/2014/09/nvidia-geforce-gtx-980-review/gtx980-17b.jpg
It does seem that each of the four 64-bit memory controllers corresponds with each of the four raster engines and in the same way that the 970's effective pixel fillrate has been demonstrated to be considerably lower than the 980's even though SMM cutting leaves the ROPs fully intact (
http://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980), the same situation may apply to bandwidth with Maxwell. However, the issue may be completely independent of which SMMs are cut and may simply relate to how many.
GM206's block diagram demonstrates the same raster engine to memory controller ratio/physical proximity:
http://cdn3.wccftech.com/wp-content/uploads/2015/01/GM206-Block-Diagram.jpg
I expect a cut-down GM206 part and even a GM200 part will exhibit the same issue as a result, it might be intrinsically tied to how Maxwell as an architecture operates. Cut down SMMs -> effectively mess up ROP and memory controller behavior as well as shaders and TMUs. I also don't think there's a chance in hell Nvidia were unaware of this, but I could be wrong.