• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

A different take on SLI / MGPU: pipelining?

Is it? The throughput of NVLink is 100 GB / sec and it's bidirectional, so you can simultaneously have 100 GB/s from card A to cvard B and 100 GB /sec from B to C. If you assume that all the cards have all the textures in VRAM, how much data actually needs to be transferred from card to card?

Look at this way:

1) You have a processor (GPU/CPU whatever) doing 1Ghz
2) That's one billion cycles in a second.
3) So one cycle is one billionth of a second.
4) In that time light/electricity can only travel around 30cm (due to the speed of light).
5) So if you were 30cm apart you'd already have a problem.

In reality, there are many factors making the journey slower than the speed of light and your GPU and future GPUs are trying to do things much faster than 1Ghz. Some fast fixed function operations would already from ballpark maths need distances down to 0.2cm, simply due to the speed of light.

There's more to it than that but hopefully it sort of directs the scope of the problem. The speed of electron flow is a limitation driving the development of quantum computing and other technologies.
 
I'm sorry but I'm still not seeing the problem. So long as the composited image gets to the next stage sufficiently quickly I don't see an issue. Nvlink has a bandwidth of 100GB / sec and the image data is going to be far less than 1 GB so you can transfer over 100 fps. Probably over 200. Remember that the textures are preloaded into all cards. So the added latency is very small.
 
I'm sorry but I'm still not seeing the problem. So long as the composited image gets to the next stage sufficiently quickly I don't see an issue. Nvlink has a bandwidth of 100GB / sec and the image data is going to be far less than 1 GB so you can transfer over 100 fps. Probably over 200. Remember that the textures are preloaded into all cards. So the added latency is very small.

You aren't understanding how games are processed I think and it is beyond my ability to put it concisely into a post here - these days static textures as loaded into VRAM are just the base for shader materials, etc. with lots of on the fly computation that would need to be mirrored between devices and you don't tend to have just one defined moment where things like transformation, clipping and lighting are going on that you can then just dump to the next stage when you involve things like geometry shaders and various deferred effects, etc.
 
This article covers some of the graphics pipeline of a modern game

http://www.adriancourreges.com/blog/2015/03/10/deus-ex-human-revolution-graphics-study/

You'd need to be farming out a lot of this stuff to make decent use of multiple GPUs - at least until the load shifts to waiting on ray tracing results but the heavy weight bits of ray tracing are another story and can be farmed out fairly effectively even now to multiple GPUs.
 
I'm well aware that there's much that cannot be separated out. But there are parts which can - RTX and Hairworks are two.


Hairworks is not any kind of fixed function, it is just a program rub on CUDA cores. Then rendering nof which can easily depend on large amounts of calculationd performed elsewhere in the scene.

.RTX has fixed function acceleration of ray-intersection tests, but most the general RTX computation including bthe real-time processing of the BVH is done using CUDA cores. So although it has data independence, there is a hardware dependence.
 
I'd imagine if latency is the issue this wont be solved until quantum computing becomes common place with quantum entanglement.

I don't actually know what i'm on about, i just thought it would sound cool :)
 
Back
Top Bottom