• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

RDNA 3 rumours Q3/4 2022

Status
Not open for further replies.
@nvidiamd how did AMD more than double the performance of the 5700XT at similar power consumption on the same node? How did they do that?

How is it that a 16 core CPU can exist with higher per core performance than its 8 core competitor with half the power consumption? that's 4 times the performance per watt, how is that possible?

What you say cannot be done has been done more than once.

It is not the same power consumption. The 6900xt is almost 100W over 5700xt. That is around 30% more power than the 5700xt for 80-100% more performance. I am not sure they can afford that every generation. Maybe for the next gen, i hope they will go full power, something like 15-20% more power for 40% raster/100% RT increase.
I already told you the best thing RDNA2 did was the improvement on perf/watt. It is very impressive and when we look at Ampere we see the perf/watt improvement over Turing is almost unexistent. The biggest gain from Nvidia comes from adding more hardware and increasing the power usage.
 
Actually the 6900xt eats more than 40% more power than the 5700xt. From 220W to over 300. So again i am betting on 40% raster/100% RT perf increase if they increase the power usage by 15-20%. I think the numbers that we see on internet are unrealistic.
 
Actually the 6900xt eats more than 40% more power than the 5700xt. From 220W to over 300. So again i am betting on 40% raster/100% RT perf increase if they increase the power usage by 15-20%. I think the numbers that we see on internet are unrealistic.
If they go MCM with 2 80CU chiplets then just a 40% raster increase would be a big fail as I would expect around that with just a single chiplet.
 
I honestly think we are expecting too much. AMD needs to do this generation what Nvidia did with Ampere: pack as much hw they can especially for RT and increasing power usage.
Nvidia needs to do exactly what AMD did with RDNA2: focus more in perf/watt.

Of course AMD will probably also squeeze more perf/watt but i don't expect the same gains as they had with RDNA2. Anyway it is a good news that they are ahead of Nvidia in perf/watt and maybe this is why they can bring their next gen faster. Just like Nvidia did with Ampere.
 
If they go MCM with 2 80CU chiplets then just a 40% raster increase would be a big fail as I would expect around that with just a single chiplet.

40% higher performance from a 160CU graphics card, like you said, would be a fail and highly unlikely - the scaling between the two chiplets would have to be historically terrible OR the heat output and power draw would have to be so bad they would have to run the chiplets under 2ghz compared to the 2.9ghz you see now on the 6900xt
 
Isn't the large cache something AMD has already employed on RDNA2 probably as a test run preparing for RDNA3.

Also AMD submitted a patent for an MCM design in which each GPU chiplets can simultaneously communicate with the CPU.

on die talking to other parts of the die is one thing, die to die commutation will always have a latency hit. A large amount oc cache per MCM die will ofc help with offline work but in real time games it will only get you so far.

MCM for general compute on GPUs is "easy" using chiplets - but gaming is another matter again and you have all the issues almost no matter the hardware design that plague Crossfire/SLI.

One potential way around it would be to have blocks of "functionality" spread out on the substrate, using command processors, with high-speed interconnects where the compute blocks have rapid repurposability (somewhat like Intel tried but in software with Larrabee) so you can on the fly build multiple virtual GPUs with varying resource level depending on what part of the scene they are dealing with - but you still have problems there if one GPU has resident and/or is in the middle or processing the output that another GPU requires to finish its work, etc.

Again I still thing the issue in games will be as soon as data needs to commutated with one MCM or more or some central die controlling it all even if its on the same package.

I'm not chip designer or anything but maybe if each MCM has there own access to the VRAM but are given tasks by a command processor to render whole frames then send it back to the central die and the central die then sends out the outputed frames from all the mcm's

also Larrabee was Extremely flexible, in theory you could have just loaded any new API, but it came at the price Speeds and was a massive die 700mm die but on 45nm
 
on die talking to other parts of the die is one thing, die to die commutation will always have a latency hit. A large amount oc cache per MCM die will ofc help with offline work but in real time games it will only get you so far.
I'm sure people were saying the same thing about CPUs a few years back but this is why they employ really smart people to solve these issues.
 
Again I still thing the issue in games will be as soon as data needs to commutated with one MCM or more or some central die controlling it all even if its on the same package.

I'm not chip designer or anything but maybe if each MCM has there own access to the VRAM but are given tasks by a command processor to render whole frames then send it back to the central die and the central die then sends out the outputed frames from all the mcm's

also Larrabee was Extremely flexible, in theory you could have just loaded any new API, but it came at the price Speeds and was a massive die 700mm die but on 45nm

There is no easy way around the issues with gaming - even short cutting transactions and improved memory access only nets you 5-10% over what is possible with Crossfire/SLI.
 
on die talking to other parts of the die is one thing, die to die commutation will always have a latency hit. A large amount oc cache per MCM die will ofc help with offline work but in real time games it will only get you so far.



Again I still thing the issue in games will be as soon as data needs to commutated with one MCM or more or some central die controlling it all even if its on the same package.

I'm not chip designer or anything but maybe if each MCM has there own access to the VRAM but are given tasks by a command processor to render whole frames then send it back to the central die and the central die then sends out the outputed frames from all the mcm's

also Larrabee was Extremely flexible, in theory you could have just loaded any new API, but it came at the price Speeds and was a massive die 700mm die but on 45nm

Not to poo poo @Rroff knowledge but his semiconductor engineering understanding is much like the rest of us, very much backseat logic.

In rudimentary understanding Zen 3 shouldn't work with the IMC being entirely separate from the L3, even the cores are divided up into separate chiplets, but it does work, even in Zen 2 with the latency it still worked, Zen 3, which is only AMD second MCM CPU design its already that good there are no drawbacks, there is no latency that the rudimentary understanding dictates there should be, its separate chips behaving as tho they are one monolith.

This is why my attitude toward this is "don't make any assumptions" too many backseat semiconductor engineers have and ended up with egg on their face. There is a reason these people don't work at AMD.

HPgAaA9.png

njr1fM6.png
 

Just watching now.

That has much more reasonable expectation of performance and is very very different to the previous "leak" about being 300% faster

From Moores law is dead:

* The 7900XT (Navi31) is targeting at least performance improvement over the 6900XT (Navi21) of 40%, though 60% to 80% is also likely.

* The absolute maximum performance improvement would be 100%, though its unlikely - so in summary, at least 40% faster, though 60 to 80% more likely and 100% high unlikely but not impossible.

* MCM design has large performance penalties due to intercore latency, you gain overall performance improvement from having multiple chiplets but you also face a penalty and that penalty is much heavier than on a ryzen CPU. This explains how you can potentially have a 160CU GPU split across multiple chiplets but only have 40 to 80% improvement instead of over 100% you'd expect if scaling was perfect.
 
Last edited:
If Moore's law is dead is right, the MCM penalties are quite heavy - is it even worth it? I assume a large MCM gpu will cost more than the 6900xt right?

6900XT: 1 x monolithic 7nm die with 80CU = 100% baseline performance

7900XT: 2 x 5nm chiplets containing 80CU each (and 7nm IO chiplet) and 50% performance per watt architecture improvement = just 160% to 180% compared to baseline performance


So you have a very large IPC improvement and you're doubling your core count for just 60 to 80% gains
 
I said 40% and i don't have any inside information so all these youtubers claiming to have sources are useless. They make a lot of videos with different numbers and then they claim they were right because they said that on may 3rd or february 30th. :)
MLID is saying that the perf will be 40% min but we might expect 60 to 80% even 100% if we are lucky. Yeah let's throw a lot of numbers, one of them may be true.
 
I said 40% and i don't have any inside information so all these youtubers claiming to have sources are useless. They make a lot of videos with different numbers and then they claim they were right because they said that on may 3rd or february 30th. :)
MLID is saying that the perf will be 40% min but we might expect 60 to 80% even 100% if we are lucky. Yeah let's throw a lot of numbers, one of them may be true.


That's the way these videos are usually done, it might be mcm then again it might not, it might be this amount faster but then again it might be this amount faster.

They're like psychics, they chuck out a lot of guess work and when a bit of their **** actually sticks they point to that and make a hoopla about it and ignore what they got wrong. The vast majority of what they post is just stuff that's expected of the next gen but they frame it like it needed "insider info" to know anything about it.
 
Kopeite says both Lovelace and RDNA 3 are targeting roughly double performance over Ampere and RNDA 2.

However, both cards are also currently running into cost issues - that's not to say people won't accept higher prices anyway as we've seen, just that the new cards are more expensive to make than the current ones.

The big kicker is memory bandwidth, GDDR6 and GDDR6x just isn't cutting it - both Lovelace and RDNA 3 are taking large performance penalties when mated with G6 and G6x modules, to get the doubling of performance thats being estimated requires a large improvement in bandwidth which is what they are struggling with.

It will be interesting to see what Nvidia and AMD come up with to solve the bandwidth issue. Will AMD increase it's L3 game cache for RDNA 3 to try and fix the bandwidth limitation? What's Nvidia plan, expensive HBM2 modules? HBM2e supports up to 24GB capacity and up to 2500GB/s of bandwidth but it is several times more expensive than G6x
 
Kopeite says both Lovelace and RDNA 3 are targeting roughly double performance over Ampere and RNDA 2.

However, both cards are also currently running into cost issues - that's not to say people won't accept higher prices anyway as we've seen, just that the new cards are more expensive to make than the current ones.

The big kicker is memory bandwidth, GDDR6 and GDDR6x just isn't cutting it - both Lovelace and RDNA 3 are taking large performance penalties when mated with G6 and G6x modules, to get the doubling of performance thats being estimated requires a large improvement in bandwidth which is what they are struggling with.

It will be interesting to see what Nvidia and AMD come up with to solve the bandwidth issue. Will AMD increase it's L3 game cache for RDNA 3 to try and fix the bandwidth limitation? What's Nvidia plan, expensive HBM2 modules? HBM2e supports up to 24GB capacity and up to 2500GB/s of bandwidth but it is several times more expensive than G6x
Can't they just increase the bus speed to say 512?.
 
Status
Not open for further replies.
Back
Top Bottom