• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

*** The AMD RDNA 4 Rumour Mill ***

Associate
Joined
19 Sep 2022
Posts
615
Location
Pyongyang
I don't know where this reasoning that Nvidia has "dedicated RT cores" and AMD doesn't, it is not correct, it is not how this works, they both have "dedicated RT hardware" there is nothing emulated about AMD's RT, it is physical.

amd doesnt have dedicated RT pipeline, available documentation suggests the TMU performs ray-intersection tests in hybrid mode, but it can only do one type of operation in a clock cycle


The difference is in how Nvidia and AMD build BVH, this is really over simplified because i'm not going to write a wall of text to explain this, AMD Construct BVH over many branches, Nvidia do it over a very wide tree, this is a bit like 8 slow cores vs 4 fast cores, both can be equally as fast but not by the same method.
nvidia also has BVH traversal co-processor which amd lacks, and wider isnt always better in RT, because a deeper BVH lets you reject whole clusters of triangles in a single op, thus saving a lot of work, nvidia seems to have adopted a wider BVH structure for 40 series but there might be something else that goes into their optimization algorithm (nvidia has been leading innovation in the industry, so its reasonable to assume they have a secret sauce)

and i believe the 50 series will further widen the performance/efficiency gap wrt amd's offerings
 
Last edited:
Man of Honour
Joined
22 Jun 2006
Posts
12,540
I don't know where this reasoning that Nvidia has "dedicated RT cores" and AMD doesn't, it is not correct, it is not how this works, they both have "dedicated RT hardware" there is nothing emulated about AMD's RT, it is physical.
The way I understood it (for when RDNA2 launched), was that nvidia had more dedicated RT hardware and they could be considered 'independent', whereas AMD had accelerators for ray tracing and they were not (or were much less so).

This article describes it similar to the way I understood it:
The most notable change for the general consumer is that AMD now offer hardware acceleration for specific routines within ray tracing.

This part of the CU performs ray-box or ray-triangle intersection checks – the same as the RT Cores in Ampere. However, the latter also accelerates BVH traversal algorithms, whereas in RDNA 2 this is done via compute shaders using the SIMD 32 units.

...

The RA units are next to the texture processors, because they're actually part of the same structure. Back in July 2019, we reported on the appearance of a patent filed by AMD which detailed using a 'hybrid' approach to handling the key algorithms in ray tracing...

While this system does offer greater flexibility and removes the need to have portions of the die doing nothing when there's ray tracing workload, AMD's first implementation of this does have some drawbacks. The most notable of which is that the texture processors can only handle operations involving textures or ray-primitive intersections at any one time.

Given that Nvidia's RT Cores now operate fully independently of the rest of the SM, this would seem to give Ampere a distinct lead, compared to RNDA 2, when it comes to grinding through the acceleration structures and intersection tests required in ray tracing.

Like I said earlier though, this stuff goes way over my head, "BVH traversal algorithm" - Whut? Is that for helping trains turn around?
 
Caporegime
OP
Joined
17 Mar 2012
Posts
48,521
Location
ARC-L1, Stanton System
^^^^ :D

I don't want to drag this off topic given the complaint already.... AMD's shaders double up as RT cores, Nvidia have specific, or "dedicated" RT cores, so in terms of wording yes it is accurate to say that but in terms of functionality AMD's is still hardware with specific hardware extension functionality, IE "not emulated"

That's the last i will say on it here :)
 
Soldato
Joined
19 Dec 2010
Posts
12,062
I don't know where this reasoning that Nvidia has "dedicated RT cores" and AMD doesn't, it is not correct, it is not how this works, they both have "dedicated RT hardware" there is nothing emulated about AMD's RT, it is physical.

The difference is in how Nvidia and AMD build BVH, this is really over simplified because i'm not going to write a wall of text to explain this, AMD Construct BVH over many branches, Nvidia do it over a very wide tree, this is a bit like 8 slow cores vs 4 fast cores, both can be equally as fast but not by the same method.

The advantage of the wide approach is it doesn't really matter which BVH construction you code for you will always get the most out of being wide, the disadvantage wide requires more caching, its why Ada has so much L2 cache, the advantage of the branch approach is you don't need so much cache, but unless you're specifically going to code for that its going to be slower.

Now i's sure AMD's thinking was keep the die size down, it doesn't matter as we own consoles they are going to code for us, hmm... well they don't have to and if the studio is packed with Nvidia cards they aren't going to.

Also, and game that AMD does RT well in must be fake RT, no, not necessarily.

Ah, the AMD defence force arrives.

I was keeping my previous post very simple and even clarified that AMD has some dedicated hardware. What they leave out is hardware dedicated to BVH traversal.

And I don't need you to write a wall of text explaining it. Because I can sum up your wall of text right now. It will be load of technical sounding gibberish basically giving a bunch of excuses for why AMD doesn't perform as well as Nvidia in Ray Tracing.

I'm sorry but this isn't going to descend into a silly argument. Because if you actually know what you say know, you know that there is no getting around the hardware limitation that AMD's solution has compared to Nvidia's. Nvidia's will always be faster. Hence the reason why it's changing for RDNA 4.

And I never said Fake RT either. Where did you even get that from?
 
Caporegime
OP
Joined
17 Mar 2012
Posts
48,521
Location
ARC-L1, Stanton System
Ah, the AMD defence force arrives.

I was keeping my previous post very simple and even clarified that AMD has some dedicated hardware. What they leave out is hardware dedicated to BVH traversal.

And I don't need you to write a wall of text explaining it. Because I can sum up your wall of text right now. It will be load of technical sounding gibberish basically giving a bunch of excuses for why AMD doesn't perform as well as Nvidia in Ray Tracing.

I'm sorry but this isn't going to descend into a silly argument. Because if you actually know what you say know, you know that there is no getting around the hardware limitation that AMD's solution has compared to Nvidia's. Nvidia's will always be faster. Hence the reason why it's changing for RDNA 4.

And I never said Fake RT either. Where did you even get that from?


Read first line, ignores rest... goes back to reading something interesting.
 
Soldato
Joined
12 Sep 2003
Posts
10,263
Location
Newcastle, UK
Oh boy I’ll get the pop corn ready. In the mean time please just post RDNA 4 rumours in this thread. You know like the title suggests.
I was just about to post the same. Please can we keep this one on track solely for RDNA 4 rumours and info. We don't need anymore Nvidia stuff clogging it up.
 
Associate
Joined
19 Sep 2022
Posts
615
Location
Pyongyang
I was keeping my previous post very simple and even clarified that AMD has some dedicated hardware. What they leave out is hardware dedicated to BVH traversal.

this has not changed for rdna 3 and isnt expected to change for rdna4, amd doesnt have any dedicated hardware for RT, the intersection tests are also done using hybrid units, it cannot perform more than one of these ops in a clock cycle:
- 4 textue mapping ops (TMU)
- 4 ray-box intersection tests
- 1 ray-triangle intersection test (AMD counts this as one Ray Accelerator op)
source: https://www.reddit.com/r/Amd/comments/ic4bn1/amd_ray_tracing_implementation/

and this can also be corroborated with aggregate numbers: 7900XTX for example has 384 TMUs and 96 Ray Accelerators (4:1 ratio), the ratio must have been the same for 6800XT as well, rdna 4 is not expected to have a big structural change in the pipeline, they are expected to make minor tweaks like the number of intersection tests that can be executed per clock cycle

there's no separate mention of BVH hardware
so in summary they dont have a separate RT pipeline and no big changes are expected in rdna4
 
Last edited:
Back
Top Bottom