He does explain the results. Double performance on tests with lots of reflective surfaces. Not double in the mat scenes and the animation where there is a lot of CPU initial set up time (ie half) included.
There are a number of issues with both his explaination and yours.
Scenes with reflective surfaces need more bounces and samples to achieve a clean render in comparion to the same scene without those reflective surfaces, hence they take longer. The difference between gpus is simply how quickly they can calculate the bounces. Unless the 3rd gen RT cores have systems in place to specifically speed up recursive bounces in reflective surfaces only (or deep recursive bounces in general), we should be seeing a consistent speed up in render times regardless of material used.
About your explaination on CPU initial set up time. Set up time is dictated by the following
The number and total size of textures in the scene
Polygon density of models
Complexity of shaders (how many nodes does it use)
Render size does play a small part but it is tiny compared to the above I have listed.
The junkshop scene isn't a particularly demanding scene for the CPU to set up. It took 20 seconds on my system using a 5900x and a sata SSD, at 8k which is the same amount of time as a 1080p render. Which is a tiny amount of time compared to the total render time he quotes of 30 ish minutes. So that doesn't explain the discrepancy.
The junkshop scene is present in both his benchmark and in the blender benchmark. The blender benchmark results that he shows has the 4090 at double the sample per minute of the 3090, so in theory you should get half (or close to half) the render time.
Edit: Look again at the first frame results for the animated frame. The difference between the 4090 and 3090 is almost negligible. That does appear to take 1-3 minutes to setup, however I would not expect the results to be so close between the 4090 and 3090. Frame 48 render time is still not particularly impressive.