• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

AMD Navi 23 ‘NVIDIA Killer’ GPU Rumored to Support Hardware Ray Tracing, Coming Next Year

Status
Not open for further replies.
Here's an oversimplified analysis of why I think AMD did not show their top card last week.

Since the 5700XT is about 30% slower than a 2080Ti we could assume that to match the 2080Ti we need to increase CU's by 30%.

So 40CU + 30% = 52CU.

The Xbox Seris X gpu is actually 52CU's and is supposed to be similar to a 2080Ti in performance which makes sense since it has exactly 30% more CU's than the 5700XT albeit at a lower clockspeed.
If we want to match the 3080 we need to add a further 30% more CU's.

52CU +30% = 68CU

This tells us that AMD has probably shown the mid level RNDA2 gpu with around 68CU's. The bigger 80CU chip(62CU +20%) will be shown on the 28th October imo and probably come close to the RTX 3090.

The 7nm process allows for an even bigger chip if AMD wanted since the 52CU Xbox chip is only 171mm² (smaller than the 5700XT's 251 mm²) and we know the GA102 in the RTX 3090 is a huge 628 mm². IF AMD made Big Navi 3x bigger than the XSX chip it would still be only 513mm² but pack in 156CU's :eek::eek:. We can only dream.....
Now we just need the usual @Rroff debunking of this, stand by, popcorn ready chaps! Haha
 
Not quite sure where you are going with that first bit - but it never turns out to be a thing - there are just too many differences between console and desktop architectures so console games coming to PC tend to either be optimised for PC with little respect to the console hardware or "ports" which run badly on any vendor regardless. I don't see that changing.

It takes a long while, like the lifetime of a console but most games start to default to the new standard. There's is still some bias towards AMD in DX12 (in that AMD are most competitive here.)
You're right that usually there's not a lot of bias because game engines have to work well on a lot of different platforms and architectures, and games tuned for the Xbox and PS usually don't bring those optimisations over but then games engine devs are exposed to AMD hardware and architecture far more than their market share would have them otherwise.

Tessellation is an interesting one - there was a bit of an awkward crossover there in that the amount of space that could be used for fixed function hardware didn't give much scope for it and ATI/AMD never really got behind pushing it anyway (they had it there but just sat on it - which is one of the reasons I don't have much time for ATI/AMD) and by the time things had moved to the point you feasibly could include enough fixed function hardware to do the job proper the general shader architecture and GPU performance in general had progressed to where it didn't need fixed function for broad usage (this won't be true for ray tracing any time soon as you'd need absolute monster shader architectures to do the job - 10 or even 20x or more more powerful than today).

You're right of course it took a long time for tessellation to take off.

Nvidia have set aside ~25% (?) of the die the RT, AMD's whole die area for CU is also for RT (from what I recall from the patents some time ago its just an addition to the render pipeline). How much of the CU die area that amounts to I don't think that anyone outside of AMD knows, but I'd expect it to be competitive.
 
Last edited:
Some more rumours (via 3dcenter.org)
https://twitter.com/Avery78/status/1316145669741051905

- DXR support is apparently only on N21 and cut variants
- Implementing DXR has meant big design changes that has ramifications in accommodation
- Benchmarks with DXR enabled should show increased PWR consumption (dual clocks?)
- N21 "XTX" consumes a lot of PWR> RTX3080
- N21 XTX can compete with RTX3080 but not RTX3090
- Expensive to make and expensive to buy> RTX3080
- Proper niche card - The big daddy
- RT perf less that 3080
- Lots of ties / trades in 4K in traditional raster perf w / 3080 really depends on the game, possibly overall> 3080
- As mentioned on Sept 20 pinned tweet - there is something funky going on with how throughput performance works in particular with DXR RT. It takes a hit like Turing does but maybe more so is what I am hearing.
- I am unsure of the DXR stuff on N21 only - but it what was said
 
Here's an oversimplified analysis of why I think AMD did not show their top card last week.

Since the 5700XT is about 30% slower than a 2080Ti we could assume that to match the 2080Ti we need to increase CU's by 30%.

So 40CU + 30% = 52CU.

The Xbox Seris X gpu is actually 52CU's and is supposed to be similar to a 2080Ti in performance which makes sense since it has exactly 30% more CU's than the 5700XT albeit at a lower clockspeed.
If we want to match the 3080 we need to add a further 30% more CU's.

52CU +30% = 68CU

This tells us that AMD has probably shown the mid level RNDA2 gpu with around 68CU's. The bigger 80CU chip(62CU +20%) will be shown on the 28th October imo and probably come close to the RTX 3090.

The 7nm process allows for an even bigger chip if AMD wanted since the 52CU Xbox chip is only 171mm² (smaller than the 5700XT's 251 mm²) and we know the GA102 in the RTX 3090 is a huge 628 mm². IF AMD made Big Navi 3x bigger than the XSX chip it would still be only 513mm² but pack in 156CU's :eek::eek:. We can only dream.....

Is that including memory controllers? They tend to occupy a lot of floor space and don't shrink especially well. Feeding 156 CUs would be a challenge, cooling them even more so (so no 2.3+GHz clocks).

Mark Cerny did an interesting talk on why Sony went fewer CUs and higher clocks, it's all about balancing throughput.

Sometimes just because you can, does not mean you should!
 
It takes a long while, like the lifetime of a console but most games start to default to the new standard. There's is still some bias towards AMD in DX12 (in that AMD are most competitive here.)



You're right of course it took a long time for tessellation to take off.

Nvidia have set aside ~25% (?) of the die the RT, AMD's whole die area for CU is also for RT (from what I recall from the patents some time ago its just an addition to the render pipeline). How much of the CU die area that amounts to I don't think that anyone outside of AMD knows, but I'd expect it to be competitive.


ehh, the die area on Turing for RT cores was under 5%.
 
Some more rumours (via 3dcenter.org)
https://twitter.com/Avery78/status/1316145669741051905

- DXR support is apparently only on N21 and cut variants
- Implementing DXR has meant big design changes that has ramifications in accommodation
- Benchmarks with DXR enabled should show increased PWR consumption (dual clocks?)
- N21 "XTX" consumes a lot of PWR> RTX3080
- N21 XTX can compete with RTX3080 but not RTX3090
- Expensive to make and expensive to buy> RTX3080
- Proper niche card - The big daddy
- RT perf less that 3080
- Lots of ties / trades in 4K in traditional raster perf w / 3080 really depends on the game, possibly overall> 3080
- As mentioned on Sept 20 pinned tweet - there is something funky going on with how throughput performance works in particular with DXR RT. It takes a hit like Turing does but maybe more so is what I am hearing.
- I am unsure of the DXR stuff on N21 only - but it what was said
Utter guff that Avery has a history of spouting utter rubbish, pretty sure he just makes stuff up and speculates, id ignore all of that above.

His source is probably his pizza deliver driver
 
It is more scalable in that regard but I don't see that as relevant any time soon - you'd need to scale up by orders of magnitude before that was an advantage.

Thing with ray tracing a lot of the meat of it is extremely simple calculations but lots of them - you want some kind of context switching or dedicated hardware so you can batch up vast amounts of them (or in some cases serially compute very fast) in a way that general purpose architectures are always second best for. A lot of vector operations are really simple a+b+c stuff until you hit plane intersections, etc. which can be really nasty - for some reason I always struggle a bit mentally with dot and cross product though they aren't really that difficult.

Years ago I found a way to cheat the first intersection test for each ray which meant even 10 year old GPUs went from like 10 seconds per frame to 300 FPS for that portion - but sadly never found a way to then implement the rest of ray tracing without all the normal performance hit.

https://www.youtube.com/watch?v=ydJCmXEHLrY

I believe it to be the other way around. Look at Amperes RT performance without DLSS. The RT has a massive hit on frame rate. If it was using a hybrid approach the rest of the GPU, that isn't busy rendering something, could be used instead of standing idle due to the bottleneck caused by the limited fixed-function hardware. Until the fixed function hardware becomes powerful enough to not cause a bottleneck I personally believe a hybrid solution is better simply because the bottleneck would be the entire GPU. Only want some RT or lower quality RT fine, you now have more shader power and the GPU is still being utilized 100%. If I've missed something then feel free to enlighten me :)
 
Here's an oversimplified analysis of why I think AMD did not show their top card last week.

Since the 5700XT is about 30% slower than a 2080Ti we could assume that to match the 2080Ti we need to increase CU's by 30%.

So 40CU + 30% = 52CU.

The Xbox Seris X gpu is actually 52CU's and is supposed to be similar to a 2080Ti in performance which makes sense since it has exactly 30% more CU's than the 5700XT albeit at a lower clockspeed.
If we want to match the 3080 we need to add a further 30% more CU's.

52CU +30% = 68CU

This tells us that AMD has probably shown the mid level RNDA2 gpu with around 68CU's. The bigger 80CU chip(62CU +20%) will be shown on the 28th October imo and probably come close to the RTX 3090.

The 7nm process allows for an even bigger chip if AMD wanted since the 52CU Xbox chip is only 171mm² (smaller than the 5700XT's 251 mm²) and we know the GA102 in the RTX 3090 is a huge 628 mm². IF AMD made Big Navi 3x bigger than the XSX chip it would still be only 513mm² but pack in 156CU's :eek::eek:. We can only dream.....

The up lift in Border lands is basically double the performance of the 5700XT (actually a slither more) and the up lift in the other benchmarks at least on the nVidia side pretty much needs double the hardware (raw performance) to get the same kind of framerate up lift. To get that kind of performance from the 5700XT it would require more than just doubling up on everything due to the diminishing returns you'd encounter scaling an architecture that far (at least historically). Add in some realistic estimates for refinements between RDNA1 and 2 and node refinements you are looking at more like 72 CUs or equivalent - I'd say your approach is more or less is on the right path but underestimating how much % you'd have to increase hardware to see a given % increase in game performance which tends to run into diminishing returns beyond a certain point.

I believe it to be the other way around. Look at Amperes RT performance without DLSS. The RT has a massive hit on frame rate. If it was using a hybrid approach the rest of the GPU, that isn't busy rendering something, could be used instead of standing idle due to the bottleneck caused by the limited fixed-function hardware. Until the fixed function hardware becomes powerful enough to not cause a bottleneck I personally believe a hybrid solution is better simply because the bottleneck would be the entire GPU. Only want some RT or lower quality RT fine, you now have more shader power and the GPU is still being utilized 100%. If I've missed something then feel free to enlighten me :)

The rest of the hardware is involved in shading ray trace hits, etc. and there is a limit to what you can process concurrently with ray tracing due to the serial nature of parts of a game rendering pipeline. I suspect there is still some optimisation possible in that respect though.
 
Also not to mention AMD already previously stated they would have ray Tracing on all their RDNA2 cards, as we know there is not just Navi 21

There seems to be conflicting reports on that - some seem to think that it will be a respun Navi 10 core used for the 5700XT replacement and below in the 6000 series which wouldn't have ray tracing instead of Navi 22 based GPUs - but I'm not sure AMD would do that (as per previous comments they wanted ray tracing on all of the stack).
 
looks like we are converging to a solution here...

Here's an oversimplified analysis of why I think AMD did not show their top card last week.

Since the 5700XT is about 30% slower than a 2080Ti we could assume that to match the 2080Ti we need to increase CU's by 30%.

So 40CU + 30% = 52CU.

The Xbox Seris X gpu is actually 52CU's and is supposed to be similar to a 2080Ti in performance which makes sense since it has exactly 30% more CU's than the 5700XT albeit at a lower clockspeed.
If we want to match the 3080 we need to add a further 30% more CU's.

52CU +30% = 68CU

This tells us that AMD has probably shown the mid level RNDA2 gpu with around 68CU's. The bigger 80CU chip(62CU +20%) will be shown on the 28th October imo and probably come close to the RTX 3090.

The 7nm process allows for an even bigger chip if AMD wanted since the 52CU Xbox chip is only 171mm² (smaller than the 5700XT's 251 mm²) and we know the GA102 in the RTX 3090 is a huge 628 mm². IF AMD made Big Navi 3x bigger than the XSX chip it would still be only 513mm² but pack in 156CU's :eek::eek:. We can only dream.....


I looked at this article on Xbox:
https://www.eurogamer.net/amp/digitalfoundry-2020-xbox-series-x-silicon-hot-chips-analysis

Total area 360.4 mm2
56 CUs take 47% of area (assuming this includes RB and TMU going by the dieshot)
per CU area 3 mm2
11% area for memory controllers which is 39.6 mm2
I don't know what the multimedia accel zone contains but looks like 4.8% which is 17.3 mm2.. am taking this into the GPU die

Total available area 536 mm2
Total fixed area 56.9 mm2
Total available area for CUs 479.1 mm2
Works out to 160 CU.. this might also mean that the PC CU is twice the size of Xbox CU
Hard to believe but if true the largest sku will have 10240 shaders to match Nvidia
Or some of that die space is taken by infinity cache.. too many variables :(
 
Some more rumours (via 3dcenter.org)
https://twitter.com/Avery78/status/1316145669741051905

- DXR support is apparently only on N21 and cut variants
- Implementing DXR has meant big design changes that has ramifications in accommodation
- Benchmarks with DXR enabled should show increased PWR consumption (dual clocks?)
- N21 "XTX" consumes a lot of PWR> RTX3080
- N21 XTX can compete with RTX3080 but not RTX3090
- Expensive to make and expensive to buy> RTX3080
- Proper niche card - The big daddy
- RT perf less that 3080
- Lots of ties / trades in 4K in traditional raster perf w / 3080 really depends on the game, possibly overall> 3080
- As mentioned on Sept 20 pinned tweet - there is something funky going on with how throughput performance works in particular with DXR RT. It takes a hit like Turing does but maybe more so is what I am hearing.
- I am unsure of the DXR stuff on N21 only - but it what was said

Completely made up
 
Some more rumours (via 3dcenter.org)
https://twitter.com/Avery78/status/1316145669741051905

- DXR support is apparently only on N21 and cut variants
- Implementing DXR has meant big design changes that has ramifications in accommodation
- Benchmarks with DXR enabled should show increased PWR consumption (dual clocks?)
- N21 "XTX" consumes a lot of PWR> RTX3080
- N21 XTX can compete with RTX3080 but not RTX3090
- Expensive to make and expensive to buy> RTX3080
- Proper niche card - The big daddy
- RT perf less that 3080
- Lots of ties / trades in 4K in traditional raster perf w / 3080 really depends on the game, possibly overall> 3080
- As mentioned on Sept 20 pinned tweet - there is something funky going on with how throughput performance works in particular with DXR RT. It takes a hit like Turing does but maybe more so is what I am hearing.
- I am unsure of the DXR stuff on N21 only - but it what was said
When Turing came out AMD directly answered "where's your RT" with "when we can do it across the full stack". So that immediately debunks any ray tracing features limited to Navi 21, unkess AMD have really screwed themselves up. And I sincerely doubt power consumption will be the same, let alone higher, than Ampere.
The only thing I do believe in that is RT performance is less than the 3080, because that's what we're all expecting anyway.
 
When Turing came out AMD directly answered "where's your RT" with "when we can do it across the full stack". So that immediately debunks any ray tracing features limited to Navi 21, unkess AMD have really screwed themselves up. And I sincerely doubt power consumption will be the same, let alone higher, than Ampere.
The only thing I do believe in that is RT performance is less than the 3080, because that's what we're all expecting anyway.

Yeah it's all trash, XSX and PS5 can do it DXR on small dies with low power consumption, i.e both consoles have like 350W power bricks and that's for the whole system.
 
And I sincerely doubt power consumption will be the same, let alone higher, than Ampere.

Would be a bit weird on 7nm TSMC vs 8nm Samsung however I can imagine in RT heavy workloads you could see quite a bit higher power consumption than normal if AMD's approach is basically to light the whole shader hardware up to do RT which would be closer to a synthetic work load then your typical gaming scenario.
 
Yeah it's all trash, XSX and PS5 can do it DXR on small dies with low power consumption, i.e both consoles have like 350W power bricks and that's for the whole system.

So far we have seen that a 4k30 game does 1/4 resolution (1080p) RT at best with reduced detail. Consoles can do it, they just can't do it well. If you consider that good enough then AMD is on a winner.
 
I believe the shader has do some operations for keeping track of diverging searches (and feel it would be occupied till a rt core reverts a hit).. i dont see how 2 rt cores can decide on the correct hit by just looking up cached values

As a more general comment but along the same kind of lines - somewhat it is going to depend on the application approach - initially we saw a lot of brute force chucking it at the hardware but now developers are starting to look at utilising the hardware power more surgically while using software/shader functions to guide the approach for better performance.

I just dabble with this stuff as a hobby so my knowledge is a bit all over the place on the stuff going on under the hood.
 
Status
Not open for further replies.
Back
Top Bottom