Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.
Something interesting about the specs.
The leaked specs were 12,288 Shaders, its actually 6,144, exactly half of the leaked specs.
When i originally saw this my thinking was AMD had doubled the shader count per compute unit but with lower IPC per shader, which is exactly what Nvidia did going from Turning to Ampere, its how Nvidia got more than double FP32 vs Turing.
Navi 21 FP32 is 23.04 TFlops
Navi 31 FP32 is 61.56 TFlops
Like Turing to Apere its over 2X the compute throughput.
So what's going on here? Are AMD being obtuse because they don't want to look like they copied what Nvidia did or are the shaders individually just that much more powerful vs Navi 21? Which has 5,120 shaders.
Sounds like they added hyperthreading (or rather something like it) to their shaders?
Many willies being waved here, but they all look pretty small to me.
I bought the dip, so I'm up about 30% on share price at the moment
My dad is a rock star and drives a Lamborghini
Hmm, rumor has it that when RDNA 3 is released it's going to be version 1 which is AMD version. Or better know as MBA (made by AMD). But they fixed whatever errata that the GPU has which includes higher clocks. And AIB variants will use the version 2.
I say version 2 because I haven't read what they are calling it yet. Hmm, if I was buying I am going to wait until next year to see if all of this pans out true or not.
None bought a 4090..........................................................money's tight.Many willies being waved here, but they all look pretty small to me.
This came up a while ago. Yes, they added another Float / Matrix SIMD32 block so you double FP32 compute.Something interesting about the specs.
The leaked specs were 12,288 Shaders, its actually 6,144, exactly half of the leaked specs.
When i originally saw this my thinking was AMD had doubled the shader count per compute unit but with lower IPC per shader, which is exactly what Nvidia did going from Turning to Ampere, its how Nvidia got more than double FP32 vs Turing.
Navi 21 FP32 is 23.04 TFlops
Navi 31 FP32 is 61.56 TFlops
Like Turing to Apere its over 2X the compute throughput.
So what's going on here? Are AMD being obtuse because they don't want to look like they copied what Nvidia did or are the shaders individually just that much more powerful vs Navi 21? Which has 5,120 shaders.
toms said:You can choose to look at things in one of two ways: Either each CU now has 128 Stream Processors (SPs, or GPU shaders), and you get 12,288 total shader ALUs (Arithmetic Logic Units), or you can view it as 64 "full" SPs that just happen to have double the FP32 throughput compared to the previous generation RDNA 2 CUs.
This is sort of funny because some places are saying that Navi 31 has 6,144 shaders, and others are saying 12,288 shaders, so I specifically asked AMD's Mike Mantor — the Chief GPU Architect and the main guy behind the RDNA 3 design — whether it was 6,144 or 12,288. He pulled out a calculator, punched in some numbers, and said, "Yeah, it should be 12,288." And yet, in some ways, it's not.
AMD's own slides in a different presentation (above) say 6,144 SPs and 96 CUs for the 7900 XTX, and 84 CUs with 5,376 SPs for the 7900 XT, so AMD is taking the approach of using the lower number. However, raw FP32 compute (and matrix compute) has doubled. Personally, it makes more sense to me to call it 128 SPs per CU rather than 64, and the overall design looks similar to Nvidia's Ampere and Ada Lovelace architectures. Those now have 128 FP32 CUDA cores per Streaming Multiprocessor (SM), but also 64 INT32 units.
RDNA 3 shaders are dual issue. Great when you can extract ILP out of game code, not so great when you can't. Could see some nice gains over the next 12 months as the driver team find more ways to extract ILP and performance improves.
It's actually a step back towards how things were done in GCN. Anantech has a decent description if you're interested: https://www.anandtech.com/show/1763...first-rdna-3-parts-to-hit-shelves-in-decemberIs that similar to Ampere?
It's actually a step back towards how things were done in GCN. Anantech has a decent description if you're interested: https://www.anandtech.com/show/1763...first-rdna-3-parts-to-hit-shelves-in-december
But, as with all dual-issue configurations, there is a trade-off involved. The SIMDs can only issue a second instruction when AMD’s hardware and software can extract a second instruction from the current wavefront. This means that RDNA 3 is now explicitly reliant on extracting Instruction Level Parallelism (ILP) from wavefronts in order to hit maximum utilization. If the next instruction in a wavefront cannot be executed in parallel with the current instruction, then those additional ALUs will go unfilled.
This is a notable change because AMD developed RDNA (1) in part to get away from a reliance on ILP, which was identified as a weakness of GCN – which was why AMD’s real-world throughput was not as fast as their on-paper FLOPS numbers would indicated. So AMD has, in some respects, walked backwards on that change by re-introducing an ILP dependence.
We’re still waiting on more information from AMD outlining why they made this change. But dual-issue is typically a cheap way to add more throughput to a processor design (you don’t have to do all the instruction tracking required for a fully separate Dual Compute Unit), and it can be worthwhile tradeoff if you can ensure you’ll be able to dual-issue most of the time. But it means that AMD’s real-world ALU utilization rate is likely lower on RDNA 3 than RDNA 2, due to the bubbles from not being able to dual-issue.
Which to bring things back to gaming and the products at hand, it means that the FLOPS numbers between RDNA 3 and RDNA 2 parts are not going to be entirely comparable. 7900 XTX may push 2.6x as many FP32 FLOPs as 6950 XTX on paper, but the real world advantage on anything less than ideal code is going to be less. Which is one of the reasons why AMD is only promoting a real-world performance uplift of 1.7x for the 7900 XTX.