I find it pretty hard to believe that, scene complexity has little impact on performance. Is it because it uses a low number of bounce passes? How many bounce passes are used? Or is there significant optimisations going on under the hood to enable this small performance loss?
Also what resolution of textures do you use? Does the resolution affect performance?
This is a misunderstanding of how path tracing works.
The number os rays required is mostly dependent on the screen resolution. At a particular resolution to get a certain quality after denoising, peehaps ypu have to shoot 1 million primary rays and eaxh of tjose will on average have 2 bounces, so you have 3 millions ray-intersection tests. Thi is the dsme whether the environment is a a simple cube, or has millions of polygons. The number of rays is the same. This is kind lf the whole big deal about ray tracing, it doesn't suffer from scaling issues in regards to scene complexity.
Each ray does have to determine which geometry to intersect with, but this is computed efficiently with a bounding Volume Hierarchy, which is a recursive sub-division of the geometry in a tree. This allows logarithmic lookup.
So for a simple scene with 100,000 tris, the cost might be at most 5 units of time. For a scene 10x nore complex snd 1Million tris that increases to only 6 time units, for 10million tris that is only 7 etc.
Further, the natural world doesn't have uniform spacing of geometry. When modeled in a computer game, many lf the on scene triangles are concentrated in dmall area suxh as a character. Any ray that doesn't go close to such nodels will skip millilions of triangles tests in a quick bounding box test. All of this is conducted in hardware with lots of optimization.
Same for texture detail. When a ray intersects a triangle itvjust has to get the interpolated texel at that point in the texture space, which is irrelevant to texture size (but will increase bandwidth).
Where things get interesting is the surface reflectivity. Completely matt surfaces don't reflect light so there are no bounced rays. Snooth reflective materials like a pool of water have a single reflection path, so to get good results you only have to shoot 1 ray, because every ray will reflect the same. But Inbetween the 2 the rays get reflected in random directions, so to get good quality, enough rays have to be cast so the reflected secondary rays sample the scene.
This is where the scene optimization comes in. Nothing to do with the geometry complexity but how many additional rays are required. That can be adapted by changing the amount lf slightly reflective materials are used in a scene. This is also why sonw lf the early games used lots of glass and puddles of water as the reflections are simple. More natural scenes with balanced reflections requires some more hands on tunibg lf material properties.
There are then also the usual optimizations. e.g in a very long corridor a lihht thst is around several corners coilf in theory have a light ray that bounces lf multiple walls pibg ponging down the corridor and around corners to end up at the camera (and the ssme goes fo any material thst happens to be on the ray path way downstream). The reality is the rays get attentued enough that their statistical impact on the render is insignificant. so as is standard for computer games, things a long llbg way away out of sight are pruned from the computation, even if theoretically they have a non-zero addition to the lighting.