NVLink vs InfiniBand: Comparative Analysis and Future Trends

Posted on Apr 22, 2024 by

 311

In today's high-performance computing (HPC) landscape, network interconnect technology is essential in linking compute nodes to ensure efficient data transfer. Among the interconnect technologies, NVIDIA's NVLink and InfiniBand stand out. Each technology offers diverse advantages for specific use cases. This article delves into a detailed comparison of these two technologies and discusses their potential future developments.

Insight into NVLink Technology

NVLink is a protocol that addresses the communication limitations between GPUs within a server. Unlike traditional PCIe switches with limited bandwidth, NVLink enables high-speed direct interconnection between GPUs within the server.

NVLink Bandwidth Calculation

Understanding the intricacies of NVLink's calculation method is vital for comprehending its capabilities and optimizing its usage in various applications. Here, we will delve into the calculation method of NVLink, taking NVLink 3.0 as an example.

This version comprises four differential pairs that combine to form a "sub-link" (NVIDIA typically refers to these as Port/Link, but there's a bit of ambiguity in the terminology). These four pairs of differential signal lines serve the purpose of transmitting and receiving data simultaneously. When evaluating network bandwidth, a 400Gbps interface denotes the capacity to both send and receive data at 400Gbps concurrently. This is illustrated in the diagram below.

Memory Controller

The NVLink 3.0 consists of four pairs of differential signal lines, each featuring RX (receiving) and TX (transmitting) components. From the network's perspective, it represents a unidirectional 400Gbps link. However, in terms of memory bandwidth, it supports an impressive capacity of 100GB/s. For more details about NVLink, you can read the post An Overview of NVIDIA NVLink.

Overview of InfiniBand Technology

InfiniBand (IB) is a communication network that allows data to flow between CPUs and I/O devices, with up to 64,000 addressable devices. It uses a point-to-point connection in which each node communicates directly with other nodes over dedicated channels, thereby minimizing network congestion and boosting overall performance. This architecture supports Remote Direct Memory Access (RDMA) technology, which allows data to be transferred directly between memories without the involvement of the host CPU, hence increasing transfer efficiency.

A subnet is the smallest full unit in the InfiniBand architecture, with routers connecting numerous subnets to build a vast InfiniBand network. Each subnet consists of end nodes, switches, connections, and subnet managers. InfiniBand networks have applications in data centers, cloud computing, high-performance computing (HPC), and others.

Comparison between NVLink and InfiniBand

NVLink and InfiniBand are significantly different in design.

Bandwidth: NVLink can offer higher data transfer speeds in certain configurations, while InfiniBand occupies a place in large-scale clusters due to its excellent scalability and mature ecosystem.
Latency: they both have been optimized to minimize such impacts, but InfiniBand's open standards and wide support give it better adaptability in diverse environments.
Cost: NVLink usually involves a higher investment due to its tie with NVIDIA GPUs, while InfiniBand, being a well-established market player, offers more pricing options and configuration flexibility.
Application: In the fields of AI and machine learning, NVLink's application is growing, with its optimized data exchange capabilities providing significant speed advantages for model training. InfiniBand sees wider application in scientific research and academic studies, where its support for large-scale clusters and excellent network performance are critical for running complex simulations and data-intensive tasks.

In fact, large-scale data centers and supercomputing systems often opt for a hybrid interconnect architecture that embraces both NVLINK and InfiniBand technologies. This strategic approach capitalizes on the strengths of each technology.

NVLINK is frequently employed to interconnect GPU nodes, enhancing the performance of compute-intensive and deep learning tasks. Meanwhile, InfiniBand takes charge of connecting general-purpose server nodes, storage devices, and other critical equipment within the data center. This combination ensures seamless coordination and efficient operation across the entire system.

Future Trends

With the growing demands for computation, both NVLink and InfiniBand are evolving continuously to meet the higher performance requirements of future data centers. NVLink may focus on deepening integration within the NVIDIA ecosystem, while InfiniBand might concentrate more on enhancing openness and compatibility. With emerging technologies, there could also be a convergence of the two in some scenarios.

InfiniBand Products Provided by FS

FS provides a wide range of InfiniBand solutions, including switches, adapters, transceivers, and cables, to meet a variety of networking requirements. These devices are intended to provide excellent performance, reliability, and scalability, fulfilling the demands of modern data center environments.

InfiniBand Switches

Product	MQM8790-HS2F	MQM8700-HS2F	MQM9700-NS2F	MQM9790-NS2F
Link Speed	200Gb/s	200Gb/s	800Gb/s	800Gb/s
Ports	40	40	32	32
Fan	5+1 Hot-swappable	5+1 Hot-swappable	6+1 Hot-swappable	6+1 Hot-swappable
Power Supply	1+1 Hot-swappable	1+1 Hot-swappable	1+1 Hot-swappable	1+1 Hot-swappable

InfiniBand Adapters

Product	MCX75310AAC-NEAT	MCX715105AS-WEAT	MCX653105A-HDAT-SP	MCX653106A-HDAT-SP	MCX653105A-ECAT-SP	MCX653106A-ECAT-SP	MCX75510AAS-NEAT
Ports	Single-Port OSFP	Single-Port QSFP112	Single-Port QSFP56	Dual-Port QSFP56	Single-Port QSFP56	Dual-Port QSFP56	Single-Port OSFP
PCIe Interface	PCIe 5.0x 16	PCIe 5.0x 16	PCIe 4.0x 16	PCIe 4.0x 16	PCIe 4.0x 16	PCIe 4.0x 16	PCIe 5.0x 16

Conclusion

FS's expertise in bespoke networking solutions enables enterprises to optimize their interconnect designs to meet unique workloads and operational requirements. Whether it's establishing high-speed InfiniBand fabrics, improving network topologies, or implementing bespoke interconnect solutions, FS's commitment to quality enables businesses to maximize the potential of their data ecosystems.