AMD RDNA3 unveiling event

Mr Evil · 25 Nov 2022 at 14:42

Many willies being waved here, but they all look pretty small to me.

humbug · 25 Nov 2022 at 15:18

Something interesting about the specs.

The leaked specs were 12,288 Shaders, its actually 6,144, exactly half of the leaked specs.

When i originally saw this my thinking was AMD had doubled the shader count per compute unit but with lower IPC per shader, which is exactly what Nvidia did going from Turning to Ampere, its how Nvidia got more than double FP32 vs Turing.

Navi 21 FP32 is 23.04 TFlops
Navi 31 FP32 is 61.56 TFlops

Like Turing to Apere its over 2X the compute throughput.

So what's going on here? Are AMD being obtuse because they don't want to look like they copied what Nvidia did or are the shaders individually just that much more powerful vs Navi 21? Which has 5,120 shaders.

Twinz · 25 Nov 2022 at 16:08

humbug said:
Something interesting about the specs.

The leaked specs were 12,288 Shaders, its actually 6,144, exactly half of the leaked specs.

When i originally saw this my thinking was AMD had doubled the shader count per compute unit but with lower IPC per shader, which is exactly what Nvidia did going from Turning to Ampere, its how Nvidia got more than double FP32 vs Turing.

Navi 21 FP32 is 23.04 TFlops
Navi 31 FP32 is 61.56 TFlops

Like Turing to Apere its over 2X the compute throughput.

So what's going on here? Are AMD being obtuse because they don't want to look like they copied what Nvidia did or are the shaders individually just that much more powerful vs Navi 21? Which has 5,120 shaders.

Sounds like they added hyperthreading (or rather something like it) to their shaders?

humbug · 25 Nov 2022 at 16:39

Twinz said:
Sounds like they added hyperthreading (or rather something like it) to their shaders?

Hmm... maybe.

kieran_read · 25 Nov 2022 at 18:55

Mr Evil said:
Many willies being waved here, but they all look pretty small to me.

I had a friend at school once upon a time always told us he had a theme park in the back garden of his terraced house.

SpudMaster · 25 Nov 2022 at 19:01

My dad is a rock star and drives a Lamborghini

Grim5 · 25 Nov 2022 at 19:12

andybird123 said:
I bought the dip, so I'm up about 30% on share price at the moment

Framechasers doesn't call it the "AMDip" for nothing

EastCoastHandle · 25 Nov 2022 at 19:54

Hmm, rumor has it that when RDNA 3 is released it's going to be version 1 which is AMD version. Or better know as MBA (made by AMD). But they fixed whatever errata that the GPU has which includes higher clocks. And AIB variants will use the version 2.

I say version 2 because I haven't read what they are calling it yet. Hmm, if I was buying I am going to wait until next year to see if all of this pans out true or not.

Deleted member 76686 · 25 Nov 2022 at 20:59

SpudMaster said:
My dad is a rock star and drives a Lamborghini

Your dad plays rockstar and you've got a Lambo keychain

humbug · 25 Nov 2022 at 21:16

EastCoastHandle said:
Hmm, rumor has it that when RDNA 3 is released it's going to be version 1 which is AMD version. Or better know as MBA (made by AMD). But they fixed whatever errata that the GPU has which includes higher clocks. And AIB variants will use the version 2.

I say version 2 because I haven't read what they are calling it yet. Hmm, if I was buying I am going to wait until next year to see if all of this pans out true or not.

What? As in they are physically different?

Tech Journalists have leaked AMD products that never materialised, those Tech Journalists then said its because AMD canceled it, now Tech Journalists are saying the reason RDNA3 is only clocked to 2.5Ghz is because there is a problem with it, those same Tech Journalists had said it would clock over 3Ghz, some even said as high as 4Ghz, that didn't happen, hence the "because something is wrong with it"
Oh but wait, AIB's will launch RDNA3 GPU's that clock higher, crap...damn! Erm? There are two different versions.....

ffs..... AMD gave us a hint long before they launched it, they said they didn't want to make large inefficient GPU's, but they would let AIB's do what they want.

Freddie1980 · 26 Nov 2022 at 00:06

Sorry wrong thread

Woodsta888 · 26 Nov 2022 at 01:02

Mr Evil said:
Many willies being waved here, but they all look pretty small to me.

None bought a 4090..........................................................money's tight.

Grim5 · 26 Nov 2022 at 03:04

Navi 32 has 62% of Navi 31 XTX cores and Navi 33 has 32% of the cores.

Assuming 7900xtx is 80% faster than 6900xt, then Navi 32 is 10% faster than 6900xt and Navi 33 is 40% slower than 6900xt. Basically the performance difference between each chip is big

This assumes 1:1 scaling and using the same clocks speeds. But I don't actually think the other rdna3 chips have the MCM with large cache so they may not be directly comparable. If the scaling is there, then Navi 33 looks a bit weak; as would only be 10% faster than Navi 23.

AMD ROCm Software update confirms Navi 32 GPU has 60 Compute Units - VideoCardz.com

AMD ‘confirms’ Navi 32 with 60 Compute Units, Navi 33 has 32 The rumor from Angstronomics on possible lower core count for Navi 32 GPU has now been indirectly confirmed by AMD. AMD Navi 31 (RDNA3) GPU for Radeon RX 7900 Series, Source: AMD The site alleged that Navi 32 GPU has 30 Workgroup...

videocardz.com

kalniel · 26 Nov 2022 at 11:04

humbug said:
Something interesting about the specs.

The leaked specs were 12,288 Shaders, its actually 6,144, exactly half of the leaked specs.

When i originally saw this my thinking was AMD had doubled the shader count per compute unit but with lower IPC per shader, which is exactly what Nvidia did going from Turning to Ampere, its how Nvidia got more than double FP32 vs Turing.

Navi 21 FP32 is 23.04 TFlops
Navi 31 FP32 is 61.56 TFlops

Like Turing to Apere its over 2X the compute throughput.

So what's going on here? Are AMD being obtuse because they don't want to look like they copied what Nvidia did or are the shaders individually just that much more powerful vs Navi 21? Which has 5,120 shaders.

This came up a while ago. Yes, they added another Float / Matrix SIMD32 block so you double FP32 compute.

toms said:
You can choose to look at things in one of two ways: Either each CU now has 128 Stream Processors (SPs, or GPU shaders), and you get 12,288 total shader ALUs (Arithmetic Logic Units), or you can view it as 64 "full" SPs that just happen to have double the FP32 throughput compared to the previous generation RDNA 2 CUs.

This is sort of funny because some places are saying that Navi 31 has 6,144 shaders, and others are saying 12,288 shaders, so I specifically asked AMD's Mike Mantor — the Chief GPU Architect and the main guy behind the RDNA 3 design — whether it was 6,144 or 12,288. He pulled out a calculator, punched in some numbers, and said, "Yeah, it should be 12,288." And yet, in some ways, it's not.
AMD's own slides in a different presentation (above) say 6,144 SPs and 96 CUs for the 7900 XTX, and 84 CUs with 5,376 SPs for the 7900 XT, so AMD is taking the approach of using the lower number. However, raw FP32 compute (and matrix compute) has doubled. Personally, it makes more sense to me to call it 128 SPs per CU rather than 64, and the overall design looks similar to Nvidia's Ampere and Ada Lovelace architectures. Those now have 128 FP32 CUDA cores per Streaming Multiprocessor (SM), but also 64 INT32 units.

timorous · 28 Nov 2022 at 07:27

RDNA 3 shaders are dual issue. Great when you can extract ILP out of game code, not so great when you can't. Could see some nice gains over the next 12 months as the driver team find more ways to extract ILP and performance improves.

humbug · 28 Nov 2022 at 10:37

timorous said:
RDNA 3 shaders are dual issue. Great when you can extract ILP out of game code, not so great when you can't. Could see some nice gains over the next 12 months as the driver team find more ways to extract ILP and performance improves.

Is that similar to Ampere?

Zarax · 28 Nov 2022 at 10:59

humbug said:
Is that similar to Ampere?

It's actually a step back towards how things were done in GCN. Anantech has a decent description if you're interested: https://www.anandtech.com/show/1763...first-rdna-3-parts-to-hit-shelves-in-december

humbug · 28 Nov 2022 at 11:00

Zarax said:
It's actually a step back towards how things were done in GCN. Anantech has a decent description if you're interested: https://www.anandtech.com/show/1763...first-rdna-3-parts-to-hit-shelves-in-december

I am, thanks

humbug · 28 Nov 2022 at 11:30

here it is....

But, as with all dual-issue configurations, there is a trade-off involved. The SIMDs can only issue a second instruction when AMD’s hardware and software can extract a second instruction from the current wavefront. This means that RDNA 3 is now explicitly reliant on extracting Instruction Level Parallelism (ILP) from wavefronts in order to hit maximum utilization. If the next instruction in a wavefront cannot be executed in parallel with the current instruction, then those additional ALUs will go unfilled.

This is a notable change because AMD developed RDNA (1) in part to get away from a reliance on ILP, which was identified as a weakness of GCN – which was why AMD’s real-world throughput was not as fast as their on-paper FLOPS numbers would indicated. So AMD has, in some respects, walked backwards on that change by re-introducing an ILP dependence.

We’re still waiting on more information from AMD outlining why they made this change. But dual-issue is typically a cheap way to add more throughput to a processor design (you don’t have to do all the instruction tracking required for a fully separate Dual Compute Unit), and it can be worthwhile tradeoff if you can ensure you’ll be able to dual-issue most of the time. But it means that AMD’s real-world ALU utilization rate is likely lower on RDNA 3 than RDNA 2, due to the bubbles from not being able to dual-issue.

Which to bring things back to gaming and the products at hand, it means that the FLOPS numbers between RDNA 3 and RDNA 2 parts are not going to be entirely comparable. 7900 XTX may push 2.6x as many FP32 FLOPs as 6950 XTX on paper, but the real world advantage on anything less than ideal code is going to be less. Which is one of the reasons why AMD is only promoting a real-world performance uplift of 1.7x for the 7900 XTX.

Seems like a risky move.

If you take AMD claimed 54% greater PPW vs the 6950XT, which was 335 watts vs 355 watts 7900XTX (+6%) you get to +60% performance, right in the middle of their claimed 1.5X to 1.7X.

The actual ALU instructions on the 7900XTX are 'up to' 2.4X at slightly higher clock speeds, not that i think this will scale 1:1 even at its very best, but its a large chunk higher than the 1.6X the PPW would suggest.

Its an interesting one to watch as it remains to be seen if AMD are getting these numbers from optimistic dual-issue results or if they are being realistic, and if we are back to the fine wine effect.

humbug · 28 Nov 2022 at 11:44

Just a thought....

Worst case. Assuming the dual-issue ALU's are of no help the 7900XTX has 20% more shaders at 2.5Ghz vs 2.3Ghz of the 6950XT (+8.7%) so 1.29X a 6950XT.
The memory architecture needs to be factored in on top of that but who knows what that actually means, its impossible to calculate, with a much improved fabric link, less but much faster cache, a 384Bit Bus and faster GDDR6 the total effective memory bandwidth is according to AMD 2.7X faster, that must account for something.

Competitor rules

AMD RDNA3 unveiling event

Deleted member 76686

Deleted member 76686