Intel’s OneAPI unlocks ray tracing for any DX11 GPU in World of Tanks

shankly1985 · 17 Sep 2019 at 17:58

Intel’s OneAPI will soon enable ray tracing for DX11-compatible graphics cards in World of Tanks. Reportedly rolling out very soon, Intel’s seemingly GPU vendor-agnostic ray tracing solution will enhance the Core engine in-game with the ability to ray trace shadows in very, very specific circumstances.

Soon enough players will be able to enjoy “super-realistic shadows” thanks to EnCore RT… but only on intact vehicles in direct sunlight. Yes, the implementation is rather limited at this point in time. But, however nascent and limited in scope this implementation may be, it’s one of few alternatives to Microsoft’s DirectX Raytracing API hand-in-hand with Nvidia’s RTX-capable GPUs.

It’s all thanks to Intel’s OneAPI Rendering Toolkit, which is a part of a wider move by the company to incorporate its entire software stack – from FPGAs and CPUs to graphics cards – under a single API umbrella. It’s a bold undertaking, and has already received some flak from Nvidia’s CEO Jen-Hsun Huang. If successful, however, the API could see the company integrate its upcoming Intel Xe GPUs succinctly alongside its many other products, most notably its broad CPU product lineup, for improved co-operative performance.

The performance degradation of Intel’s implementation is not yet known. If it’s anything like Nvidia RTX titles, it could be quite severe. However, with such a limited scope for effects, there’s a good chance it has a lessened impact to what we expect with today’s ray tracing experiences in-game.

You can take a look at the EnCore RT implementation in World of Tanks over at Dom1n.com.

And this isn’t technically the first time Intel’s tried its hand at ray tracing – in fact it’s been ray tracing for years on Xeon servers. Similar to Radeon Rays, AMD’s own ray tracing implementation, the company has been offering the stack necessary to compute ray tracing, yet so far only in the professional space. This would be the first time we see Intel venture fourth into the gaming sphere with the tech.

There have also been instances of game engine developers taking the burden of ray tracing unto themselves. Crytek did so with CryEngine, showing off a capable ray tracing tech demo running on AMD’s RX Vega 56.

https://worldoftanks.eu/en/news/general-news/wargaming-fest-2019/

https://www.pcgamesn.com/intel/ray-tracing-world-of-tanks

https://twitter.com/PhilippGerasim/...pcgamesn.com/intel/ray-tracing-world-of-tanks

Rofflay · 17 Sep 2019 at 18:14

What happens then with RT cores was it a scam? Do they deliver more perf than RT done on just like any other feature? I still think this is too early it all looks great but the last thing we need is Raytracing when you are trying to get 4k 144hz HDR first this was the holy grail.

After that go nuts on Raytracing it will be great i actually think wow classic would still be the best game out there to benefit from RT but i doubt i can even turn this on at 4k in wot even on my planned 2080ti.

Rroff · 17 Sep 2019 at 18:28

Rofflay said:
Do they deliver more perf than RT done on just like any other feature?

Dedicated hardware is a far more efficient way to do ray tracing than the same core estate doing general purpose compute.

Rofflay · 17 Sep 2019 at 18:31

Rroff said:
Dedicated hardware is a far more efficient way to do ray tracing than the same core estate doing general purpose compute.

Maybe the 2080ti will not be such a waste then thanks Rroff i never do read into RT much i just gaze at the pretty images i can not get at 4k 120fps.

Nasher · 17 Sep 2019 at 18:32

So, it just works? (without paying Nvidia tax)?

4K8KW10 · 17 Sep 2019 at 18:40

Rroff said:
Dedicated hardware is a far more efficient way to do ray tracing than the same core estate doing general purpose compute.

What is the so called dedicated in the RT units?

Rroff · 17 Sep 2019 at 18:45

4K8KW10 said:
What is the so called dedicated in the RT units?

How do you mean?

4K8KW10 · 17 Sep 2019 at 18:47

Rroff said:
How do you mean?

What is the special thing in them that makes them "dedicated" for this task and not useful for example for physics acceleration or another compute problem?

Rofflay · 17 Sep 2019 at 18:47

Rroff said:
How do you mean?

I think he wants proof of secret sauce RT cores and Nvidia trademarks and ten years worth of research and thus the very high costs.

Rroff · 17 Sep 2019 at 18:53

4K8KW10 said:
What is the special thing in them that makes them "dedicated" for this task and not useful for example for physics acceleration or another compute problem?

General compute usually has a lot of overheads that make them more efficient for mixed workloads but less efficient at batching up massive amounts of the same kind of calculations. In Turing each SM has a block with specialised processing for bounding box evaluation and ray intersect testing hanging off it as the "RT" core (techniques which actually are reasonably similar to physics processing and you probably could with a bit of tweaking use them to accelerate physics) the combination of BVH optimisations via dedicated hardware for that and having the specialised hardware for ray/triangle intersect testing means you can do ray tracing relevant calculations at least 6x and upto around 10x faster than would be possible with using the same silicon estate via general compute pipelines.

EDIT: At a very simplified level the RT hardware in Turing not only allows you to accelerate ray tracing calculations themselves faster than general compute use of the same space would allow but also has specialised hardware that allows you to accelerate narrowing down where you actually need to use ray tracing to be more efficient with the hardware capabilities you do have so you aren't spending a lot of time needlessly testing rays where they aren't going to do anything useful for your scene anyway.

4K8KW10 · 17 Sep 2019 at 18:55

Rroff said:
General compute usually has a lot of overheads that make them more efficient for mixed workloads but less efficient at batching up massive amounts of the same kind of calculations. In Turing each SM has a block with specialised processing for bounding box evaluation and ray intersect testing hanging off it as the "RT" core (techniques which actually are reasonably similar to physics processing and you probably could with a bit of tweaking use them to accelerate physics) the combination of BVH optimisations via dedicated hardware for that and having the specialised hardware for ray/triangle intersect testing means you can do ray tracing relevant calculations at least 6x and upto around 10x faster than would be possible with using the same silicon estate via general compute pipelines.

Thanks for the clarification

GloriousMess · 17 Sep 2019 at 18:59

Rroff is pretty much bang on. Hardware acceleration for a particular task is usually silicon and the appropriate software driver designed to accelerate particular calculations, usually in bulk. It's what a graphics card does, it's what a PhysX card does, and it's what the floating-point co-processor chips used to do with the older processor lines. CPUs are 'slower' because they need to handle the general case of calculating any arbitrary combination of instructions, which could require any amount of memory, any data structure, etc.

Nasher · 18 Sep 2019 at 15:55

4K8KW10 said:
What is the special thing in them that makes them "dedicated" for this task and not useful for example for physics acceleration or another compute problem?

The special thing is...Nsnake oil

Panos · 18 Sep 2019 at 17:05

GloriousMess said:
Rroff is pretty much bang on. Hardware acceleration for a particular task is usually silicon and the appropriate software driver designed to accelerate particular calculations, usually in bulk. It's what a graphics card does, it's what a PhysX card does, and it's what the floating-point co-processor chips used to do with the older processor lines. CPUs are 'slower' because they need to handle the general case of calculating any arbitrary combination of instructions, which could require any amount of memory, any data structure, etc.

There is issue with current solution. Both tensor & RT cores are much slower than Shaders in Turing cards. So the Shaders need to be slowed down quite a lot, so the Tensor & RT to work in line. And is the reason the RT is so limited and not fully implement it because the RT cores cannot do a whole image. And that the RT cores of the 2080Ti. All lesser cards have slower & less RT cores.

The Intel proposal @shankly1985 wrote above, is no different than the AMD Hybrid solution where things would be rendered in CPU & GPU Shaders at full speed.
Also WG had ingame video about this on the Twitch stream on Sunday (I wrote about this elsewhere), and they said that it would run without requirement of expensive new hardware (Turing cards) with plans to be feasible on laptops also. And the new client with the Ray Tracing will be out next month.

Also is hardware agnostic. It would work on AMD CPUs as well as on Intel CPUs, similarly it would be GPU agnostic also

Panos · 18 Sep 2019 at 17:17

@shankly1985 i thing this image you used is from the existing client, as is exactly how it looks atm.
The Ray Tracing client had amazing shadows casting of shadows.

LeMson · 18 Sep 2019 at 18:15

MY RTX DREAMS ARE COMING TRUE! THE FUTURE!

Rroff · 18 Sep 2019 at 18:44

Panos said:
There is issue with current solution. Both tensor & RT cores are much slower than Shaders in Turing cards. So the Shaders need to be slowed down quite a lot, so the Tensor & RT to work in line. And is the reason the RT is so limited and not fully implement it because the RT cores cannot do a whole image. And that the RT cores of the 2080Ti. All lesser cards have slower & less RT cores.

The way you are wording it there is only true of the Tensor cores (affecting DLSS) - the RT cores hang off the shader cores and their workload can co-exist alongside shading, etc. running at their max possible throughput they only take up half the frametime of the shading of a typical graphically advanced rasterised scene. See appendix C here https://www.nvidia.com/content/dam/...ure/NVIDIA-Turing-Architecture-Whitepaper.pdf

Sure if you pumped up the performance of the RT cores in some way and/or added more of them you could significantly increase the quality of ray tracing being done but the limit isn't that the RT cores are a bottleneck to shader performance in a relatively sense in that respect (having them running flat out doesn't inhibit the performance of your other shading significantly) - unlike Tensor cores where heavy use of them for DLSS especially you can encounter latency that makes them prohibitive to high framerate gaming without careful implementation (which is probably half the reason it looks smeary at times) and as can be seen in the link above provide a degree of bottleneck additional to the main rasterisation process.

DrCrabHands · 18 Sep 2019 at 19:08

I always enjoy reading your posts @Rroff, very informative and well explained and balanced. And it helps cut through all the fud that gets posted.

Rroff · 18 Sep 2019 at 19:13

DrCrabHands said:
I always enjoy reading your posts @Rroff, very informative and well explained and balanced. And it helps cut through all the fud that gets posted.

I've been a bit clumsy wording it there - but essentially RT cores running flat out don't in any significant way inhibit shader performance as Panos was implying - sure if they ran faster or there was more of them we could ramp up RT quality. While DLSS functionality does tend to have an impact like he is saying and can provide a bottleneck to high framerates - so people don't have to dive in the whitepaper this is one example of how that works within the rendering of a frame:

DNN processing here being the DLSS bit.

DrCrabHands · 18 Sep 2019 at 19:16

Rroff said:
I've been a bit clumsy wording it there - but essentially RT cores running flat out don't in any significant way inhibit shader performance as Panos was implying - sure if they ran faster or there was more of them we could ramp up RT quality. While DLSS functionality does tend to have an impact like he is saying and can provide a bottleneck to high framerates - so people don't have to dive in the whitepaper this is one example of how that works within the rendering of a frame:

DNN processing here being the DLSS bit.

That was my basic understanding of it but I wasn't 100% sure I had it right. I'm still learning all this stuff. Thanks for the clarification