NVIDIA Launches NVLINK - High-Speed GPU Interconnect

Nickolp1974 · 25 Mar 2014 at 20:05

not sure if this has been posted???

Source http://www.guru3d.com/news_story/nvidia_launches_nvlink_high_speed_gpu_interconnect.html

NVIDIA announced that it plans to integrate a high-speed interconnect, called NVIDIA NVLink, into its future GPUs, enabling GPUs and CPUs to share data five to 12 times faster than they can today. This will eliminate a longstanding bottleneck and help pave the way for a new generation of exascale supercomputers that are 50-100 times faster than today's most powerful systems.

NVIDIA will add NVLink technology into its Pascal GPU architecture -- expected to be introduced in 2016 -- following this year's new NVIDIA Maxwell compute architecture. The new interconnect was co-developed with IBM, which is incorporating it in future versions of its POWER CPUs.

"NVLink technology unlocks the GPU's full potential by dramatically improving data movement between the CPU and GPU, minimizing the time that the GPU has to wait for data to be processed," said Brian Kelleher, senior vice president of GPU Engineering at NVIDIA.

"NVLink enables fast data exchange between CPU and GPU, thereby improving data throughput through the computing system and overcoming a key bottleneck for accelerated computing today," said Bradley McCredie, vice president and IBM Fellow at IBM. "NVLink makes it easier for developers to modify high-performance and data analytics applications to take advantage of accelerated CPU-GPU systems. We think this technology represents another significant contribution to our OpenPOWER ecosystem."

With NVLink technology tightly coupling IBM POWER CPUs with NVIDIA Tesla® GPUs, the POWER data center ecosystem will be able to fully leverage GPU acceleration for a diverse set of applications, such as high performance computing, data analytics and machine learning.

Advantages Over PCI Express 3.0
Today's GPUs are connected to x86-based CPUs through the PCI Express (PCIe) interface, which limits the GPU's ability to access the CPU memory system and is four- to five-times slower than typical CPU memory systems. PCIe is an even greater bottleneck between the GPU and IBM POWER CPUs, which have more bandwidth than x86 CPUs. As the NVLink interface will match the bandwidth of typical CPU memory systems, it will enable GPUs to access CPU memory at its full bandwidth.

This high-bandwidth interconnect will dramatically improve accelerated software application performance. Because of memory system differences -- GPUs have fast but small memories, and CPUs have large but slow memories -- accelerated computing applications typically move data from the network or disk storage to CPU memory, and then copy the data to GPU memory before it can be crunched by the GPU. With NVLink, the data moves between the CPU memory and GPU memory at much faster speeds, making GPU-accelerated applicationsrun much faster.

Unified Memory Feature
Faster data movement, coupled with another feature known as Unified Memory, will simplify GPU accelerator programming. Unified Memory allows the programmer to treat the CPU and GPU memories as one block of memory. The programmer can operate on the data without worrying about whether it resides in the CPU's or GPU's memory.

Although future NVIDIA GPUs will continue to support PCIe, NVLink technology will be used for connecting GPUs to NVLink-enabled CPUs as well as providing high-bandwidth connections directly between multiple GPUs. Also, despite its very high bandwidth, NVLink is substantially more energy efficient per bit transferred than PCIe.

NVIDIA has designed a module to house GPUs based on the Pascal architecture with NVLink. This new GPU module is one-third the size of the standard PCIe boards used for GPUs today. Connectors at the bottom of the Pascal module enable it to be plugged into the motherboard, improving system design and signal integrity.

NVLink high-speed interconnect will enable the tightly coupled systems that present a path to highly energy-efficient and scalable exascale supercomputers, running at 1,000 petaflops (1 x 1018 floating point operations per second), or 50 to 100 times faster than today's fastest systems.

Gregster · 25 Mar 2014 at 20:07

Nice Nick. I had major buffering going on when this was on

Nickolp1974 · 25 Mar 2014 at 20:12

does this mean it will help people like Kaap with 4 high powered cards???

Make · 25 Mar 2014 at 20:32

Sounds like another proprietary block we'll be plugging into our motherboards if we want to take an advantage of this.

..And if it quadruples the performance of our graphics cards, I don't give a damn.

An interesting look to the future.

Tonester0011 · 25 Mar 2014 at 20:58

Nickolp1974 said:
does this mean it will help people like Kaap with 4 high powered cards???

Maybe in the future, not currently though.

cwgk91 · 25 Mar 2014 at 21:56

So we will now need more expensive motherboards with this feature (much like having to have a g-sync monitor) or have to buy an ibm cpu?

jjgreenwood · 29 Aug 2018 at 23:06

I know this is an old thread but what does this mean for the new 20 series. Now they use this nvlink does that now mean sli is changing to make it more useful, access the memory of both cards etc. Does this now mean the price/performance could be better with 2x2080 than 1 2080ti?

Smffy · 30 Aug 2018 at 07:34

jjgreenwood said:
I know this is an old thread but what does this mean for the new 20 series. Now they use this nvlink does that now mean sli is changing to make it more useful, access the memory of both cards etc. Does this now mean the price/performance could be better with 2x2080 than 1 2080ti?

I’m guessing so as theoretically it’s not about splitting frames up between cards anymore which inherently leads to the problems. If you could ‘see it as one card/resource’ in an application may well be really interesting...

Time will tell, I’ll confess I bought a pair of Ti and the link so looking forward to playing around with it!

jjgreenwood · 30 Aug 2018 at 11:55

Smffy said:
I’m guessing so as theoretically it’s not about splitting frames up between cards anymore which inherently leads to the problems. If you could ‘see it as one card/resource’ in an application may well be really interesting...

Time will tell, I’ll confess I bought a pair of Ti and the link so looking forward to playing around with it!

Yes I've seen that, makes me wonder if multi-gpu might become a thing again.

D.P. · 30 Aug 2018 at 12:03

Smffy said:
I’m guessing so as theoretically it’s not about splitting frames up between cards anymore which inherently leads to the problems. If you could ‘see it as one card/resource’ in an application may well be really interesting...

Time will tell, I’ll confess I bought a pair of Ti and the link so looking forward to playing around with it!

NVLink doesn't do any of that though, it is just a faster version of the SLI link.

CaptainRAVE · 30 Aug 2018 at 12:04

What silly price are they charging for it?

jjgreenwood · 30 Aug 2018 at 12:09

D.P. said:
NVLink doesn't do any of that though, it is just a faster version of the SLI link.

Is it? I can't find a definitive answer anywhere for it in relation to 20 series cards.

DarrenM343 · 30 Aug 2018 at 12:13

I'm interested in how NVLink performs.
I've mentioned in the past mgpu needs to move away from driver and game developer control and just work. I think someone gave a good explanation why that's not possible but in my head I still think it can/should happen

. Bit like with raid drives, add two drives and they work as one from the OS side of things.
There's been complaints about the RT performance with the new cards (pre-release) but all GPU's suffer slowdowns at times. Far Cry 5 works great with a 1070 Ti for example but there have been a few occassions with lots happening on screen where the performance seems to dip < 30fps and very noticable with g-sync. So one thing I've wondered is if a second GPU could be used to help smooth gameplay, rather than just increase average FPS, but try to remove the troughs in FPS. Once you hit a decent FPS, it's the deviations away from the average that hurts the smoothness/experience of a game.

NVLink i believe will work differently than SLI. Cannot find much info yet though.

Smffy · 30 Aug 2018 at 12:35

D.P. said:
NVLink doesn't do any of that though, it is just a faster version of the SLI link.

jjgreenwood said:
Is it? I can't find a definitive answer anywhere for it in relation to 20 series cards.

It does do that, it’s essentially what stitches the massive 8 GPU board together - yes the link is a much fast version of the other link but the way it works from a software point is fundamentally different than simply sharing rendering.

The mega £60,000 board Jensen was waving about has added benefits with the IBM CPU but the Turing cards should function like Quadro with the link and hopefully extend to enabled CPUs in future!

CaptainRAVE said:
What silly price are they charging for it?

It’s not £350 anymore it’s £75... so not a mile from a HB SLI bridge.

Silent_Scone · 30 Aug 2018 at 12:53

I'll be interested the moment I see it working in more than a handful of titles.

jjgreenwood · 30 Aug 2018 at 13:02

Smffy said:
It does do that, it’s essentially what stitches the massive 8 GPU board together - yes the link is a much fast version of the other link but the way it works from a software point is fundamentally different than simply sharing rendering.

The mega £60,000 board Jensen was waving about has added benefits with the IBM CPU but the Turing cards should function like Quadro with the link and hopefully extend to enabled CPUs in future!

It’s not £350 anymore it’s £75... so not a mile from a HB SLI bridge.

So the big question is do we think 2x2080 will be better on price/performance at £1570 than a 2080ti at £1050

Smffy · 30 Aug 2018 at 17:03

jjgreenwood said:
So the big question is do we think 2x2080 will be better on price/performance at £1570 than a 2080ti at £1050

I did think that but I would guess based on past card performance in pascal and the expected jump in the kuda core portion not quite set to blow us away (or is it) it would be slightly ahead but not enough. On the compute tensor cores and RTX cores potentially though as they seem more linear gains.

I suppose to be more accurate in my previous post my main interest is in the NVLink Fabric in future hardware - would be amazing if we see some consumer grade NVSwitch style hardware. Information is pretty non existent on how much the RTX cards will make use of it all though.

jjgreenwood · 30 Aug 2018 at 18:31

Smffy said:
I did think that but I would guess based on past card performance in pascal and the expected jump in the kuda core portion not quite set to blow us away (or is it) it would be slightly ahead but not enough. On the compute tensor cores and RTX cores potentially though as they seem more linear gains.

I suppose to be more accurate in my previous post my main interest is in the NVLink Fabric in future hardware - would be amazing if we see some consumer grade NVSwitch style hardware. Information is pretty non existent on how much the RTX cards will make use of it all though.

I just watched the Nvlink section of this: https://www.youtube.com/watch?v=YNnDRtZ_ODM its at about the 30min mark

It does appear that this is a bit of future technology, the guy from Nvidia plays down any expectations of sharing memory or of eliminating microstutter, and outright dismisses that the 2 cards will be seen as one. I guess you are right RE the RTX cores as presumably they will increase in performance much like the current scaling of the shaders.

smilingcrow · 30 Aug 2018 at 23:05

D.P. said:
NVLink doesn't do any of that though, it is just a faster version of the SLI link.

That is false as it offers other features and benefits although how that plays out in the consumer space is to early to say.

drunkenmaster · 30 Aug 2018 at 23:26

First off, Nvlink is built as a superset of pci-e, it's just a higher power version of it that has more bandwidth and skips the whole waiting for pci-e 4/5/6 stage.

Second, it's only faster if you want it to be, if it uses less links and lower speed it can be as low bandwidth as you want. You don't want a 200GB/s link to pass 200MB/s of data because you'd be burning 100 times more power than you needed to, power that then couldn't be used by the GPU.

local memory has a bandwidth pushing what, over 500GB/s, a much higher latency distant connection that was even 250GB/s would lead to a massive performance degradation, let alone the likely <10GB/s it probably uses for the sli connection for power and simply no need to be higher. Maxed out it would be far far too slow to enable sharing memory.

It's on there because it's compatible with the gpu core already so if you have to use something why not use a compatible link.

Card to card links are never going to let several gpus run as well as a single core, basically within any kind of technology that exists anywhere in the world today.

Using pci-e lanes as usual as AMD does for xfire theoretically takes away maybe 50mb/s of bandwidth that the gpu can't use normally to talk to the cpu, but being that 8x pci-e barely makes a dent in performance, meh.

Honestly it sounds like a cynical ploy to me, sli bridges, people buying expensive custom versions to look cool. Now they same people will be convinced to buy expensive silly Nvlink connectors.