English 



Exploring InfiniBand Network, HDR and Significance of IB Applications in Supercomputing

Posted on Dec 21, 2023 by

 2.2k

InfiniBand (IB) stands as a cutting-edge computer network communication standard established by the InfiniBand Trade Association (IBTA). Its widespread adoption in high performance computing (HPC) is attributed to its ability to deliver exceptional throughput, bandwidth, and low latency for network transmission.

InfiniBand serves as a critical data connection within and outside computing systems. Whether through direct links or interconnections via network switches, InfiniBand facilitates high-performance networks for server-to-storage and storage-to-storage data transmission. InfiniBand Network scalability allows horizontal expansion through switching networks to meet diverse networking needs. Amid the rapid progress in scientific computing, artificial intelligence (AI), and cloud data centers, InfiniBand is increasingly favored in HPC supercomputing applications for end-to-end high-performance networking.

The Prevalence of InfiniBand in Supercomputers and HPC Data Centers

In June 2015, InfiniBand constituted a remarkable 51.8% of the Top500 list of the world's most powerful supercomputers, showcasing a substantial 15.8% year-on-year growth.

TOP500 Interconnect Trends

In the June 2022 Top500 list, InfiniBand networks once again claimed the top spot among supercomputer interconnect devices, demonstrating both numerical and performance advantages compared to the previous list. Key trends include:

InfiniBand-based supercomputers leading significantly with 189 systems.
InfiniBand-based supercomputers dominating the Top 100 systems, with 59 units.
NVIDIA GPU and networking products, particularly Mellanox HDR Quantum QM87xx switches and BlueField DPU, establishing themselves as dominant interconnects in over two-thirds of the supercomputers.

Beyond traditional HPC applications, InfiniBand networks find extensive use in enterprise-class data centers and public clouds. For instance, NVIDIA Selene, the leading enterprise supercomputer, and Microsoft's Azure public cloud leverage InfiniBand networks to deliver exceptional business performance.

In the latest Top500 list in November 2023, InfiniBand maintained its leadership, underscoring its continuous growth. The high regard for InfiniBand in the Top500 stems from its performance benefits, which play a pivotal role.

Advantages of InfiniBand Network

InfiniBand technology, positioned as a future-proof standard for High Performance Computing (HPC), commands high esteem in HPC connectivity for supercomputers, storage, and even LAN networks. InfiniBand boasts a myriad of advantages, including simplified management, high bandwidth, full CPU offload, ultra-low latency, cluster scalability and flexibility, Quality of Service (QoS), SHARP support, and more.

Easy Network Management

InfiniBand represents the pioneering network architecture specifically crafted for Software-Defined Networking (SDN) and is overseen by a subnet manager. The subnet manager is responsible for configuring the local subnet, ensuring seamless operation. To manage traffic, all channel adapters and switches are mandated to implement a Subnet Management Agent (SMA) that collaborates with the subnet manager. Each subnet necessitates at least one subnet manager for initial setup and reconfiguration when links are established or severed. An arbitration mechanism is employed to designate a master subnet manager, with other subnet managers functioning in standby mode. In standby mode, each subnet manager retains backup topology information and verifies the operational status of the subnet. In the event of the primary subnet manager's failure, a standby subnet manager assumes control, guaranteeing uninterrupted subnet management.

InfiniBand efficient and simple networking

Higher Bandwidth

Since the inception of InfiniBand, its network data rate has consistently outpaced Ethernet, primarily due to its utilization in server interconnections for high-performance computing, demanding elevated bandwidth. In the early stages of 2014, the prevalent InfiniBand speeds were 40Gb/s QDR and 56Gb/s FDR. Presently, higher InfiniBand speeds, such as 100Gb/s EDR and 200Gb/s HDR, have been widely adopted in numerous supercomputers worldwide. The introduction of the latest OpenAI tool, ChatGPT, has prompted businesses to contemplate deploying cutting-edge InfiniBand networking products with a 400Gb/s NDR data rate, including InfiniBand NDR switches and optical connectivity cables, within their High-Performance Computing (HPC) systems.

InfiniBand speed type

The abbreviations for each InfiniBand speed type are as follows:

SDR - Single Data Rate, 8Gbps.
DDR - Double Data Rate, 10Gbps/16Gbps.
QDR - Quad Data Rate, 40Gbps/32Gbps.
FDR - Fourteen Data Rate, 56Gbps.
EDR - Enhanced Data Rate, 100Gbps.
HDR - High Dynamic Range, 200Gbps.
NDR - Next Data Rate, 400Gbps.
XDR - eXtreme Data Rate, 800Gbps.

Efficient CPU Offload

A pivotal technology for enhanced computing performance is CPU offload, and the InfiniBand network architecture facilitates data transfer with minimal CPU resources through:

Hardware offload of the complete transport layer protocol stack.
Kernel bypass, zero copy.
RDMA (Remote Direct Memory Access), a process that directly writes data from one server's memory to another's memory without CPU involvement.

GPUdirect

Utilizing GPUDirect technology is an option as well, enabling direct access to data in GPU memory and facilitating the transfer of data from GPU memory to other nodes. This capability enhances the performance of computational applications such as Artificial Intelligence (AI), Deep Learning training, Machine Learning, and more.

RDMA

Low Latency

The latency contrast between InfiniBand and Ethernet can be segmented into two main components. Firstly, at the switch level, Ethernet switches, operating as layer 2 devices in the network transport model, typically employ MAC table lookup addressing and store-and-forward mechanisms (some products may incorporate InfiniBand's Cut-Through technology). The inclusion of complex services such as IP, MPLS, QinQ, and other processing in Ethernet switches results in an extended processing duration, with latency measurements often in microseconds (cut-through support may exceed 200ns). In contrast, InfiniBand switches streamline layer 2 processing, relying solely on the 16-bit LID for forwarding path information. Additionally, Cut-Through technology is employed to significantly reduce forwarding delay to less than 100ns, surpassing the speed of Ethernet switches.

At the network interface card (NIC) level, as previously mentioned, RDMMA technology eliminates the need for NICs to traverse the CPU for message forwarding. This acceleration minimizes the delay in message processing during encapsulation and decapsulation. In general, InfiniBand NICs exhibit a send and receive delay (write, send) of 600ns, while the send and receive delay for Ethernet-based TCP UDP applications using Ethernet typically hover around 10us. This results in a latency difference exceeding tenfold between InfiniBand and Ethernet.

Measured latency for MPI

Scalability and Flexibility

A significant advantage of the InfiniBand network lies in its capability to deploy a substantial 48,000 nodes within a single subnet, forming an extensive Layer 2 network. Furthermore, InfiniBand networks steer clear of broadcast mechanisms like ARP, thus avoiding broadcast storms and the associated waste of additional bandwidth. The connectivity of multiple InfiniBand subnets is achievable through routers and switches, showcasing the technology's versatility in supporting various network topologies.

Various neural network models

For smaller scales, a 2-layer fat-tree topology is recommended, while a larger scale may opt for a 3-layer fat-tree network topology. Beyond a specific scale, the cost-effective Dragonfly+ topology can be employed to further enhance scalability.

neural network models

Quality of Service (QoS) Support

In managing an InfiniBand network where various applications coexist on the same subnet with distinct priority requirements, the provision of Quality of Service (QoS) becomes a pivotal concern. QoS denotes the capacity to offer distinct priority services tailored for different applications, users, or data flows. In the InfiniBand context, high-priority applications can be assigned to specific port queues, ensuring that messages within these queues receive preferential treatment.

InfiniBand achieves QoS through the implementation of Virtual Lanes (VLs). Virtual Lanes are discrete logical communication links that share a common physical link. Each VL has the capability to support up to 15 standard virtual lanes alongside one management channel, designated as VL15. This approach enables the effective segregation of traffic based on priority, allowing for the prioritized transmission of high-priority applications within the InfiniBand network.

Virtual Lanes

Stability and Resilience

In an ideal scenario, a network operates with stability and devoid of failures. However, the reality of long-running networks entails occasional failures. In addressing these challenges and ensuring swift recovery, InfiniBand employs a mechanism known as Self-Healing Networking, a hardware capability integrated into InfiniBand switches.

NVIDIA Mellanox InfiniBand solutions, encompassing hardware elements such as InfiniBand switches, NICs, and Mellanox cables, leverage Self-Healing Networking to achieve rapid recovery from link failures. This hardware-based capability enables the restoration of link failures in a remarkable 1 millisecond, surpassing normal recovery times by a factor of 5000x.

NVIDIA Mellanox InfiniBand solution

Optimized Load Balancing

Enhancing network utilization is a crucial requirement within a high-performance data center. In the InfiniBand network, one effective approach is through the implementation of load balancing.

Load balancing, a routing strategy, facilitates the distribution of traffic across multiple available ports. Adaptive Routing, a key feature, ensures the even distribution of traffic across switch ports. This feature is hardware-supported on the switch and is under the management of the Adaptive Routing Manager.

When Adaptive Routing is active, the Queue Manager on the switch monitors traffic on all GROUP EXIT ports, equalizes the load on each queue, and steers traffic toward underutilized ports. Adaptive Routing dynamically balances loads, preventing network congestion and optimizing network bandwidth utilization.

In-Network Computing Technology - SHARP

InfiniBand switches additionally feature the network computing technology known as SHARP, which stands for Scalable Hierarchical Aggregation and Reduction Protocol. SHARP is software integrated into switch hardware and is a centrally managed software package.

By offloading aggregate communication tasks from CPUs and GPUs to the switch, SHARP optimizes these communications. It prevents redundant data transfers between nodes, thereby reducing the volume of data that must traverse the network. Consequently, SHARP significantly enhances the performance of accelerated computing, particularly in MPI applications like AI and machine learning.

NVIDIASHARP converged communication architecture block diagram

Diverse Network Topologies

InfiniBand supports various network topologies like Fat Tree, Torus, Dragonfly+, Hypercube, and HyperX, catering to different needs such as network scaling, reduced total cost of ownership (TCO), minimized latency, and extended transmission distance.

Diverse Network Topologies

InfiniBand, leveraging its unparalleled technical advantages, significantly streamlines high-performance network architecture, mitigating latency arising from multi-level architectural hierarchies. This capability offers robust support for seamlessly upgrading access bandwidth for critical computing nodes. The InfiniBand network is increasingly finding application in various scenarios due to its high bandwidth, low latency, and compatibility with Ethernet.

Introduction to InfiniBand HDR Product Solutions

With rising client-side demands, the 100Gb/s EDR is gradually phasing out of the market. While the data rate of NDR is currently deemed too high, HDR, with its flexibility in offering HDR100 (100G) and HDR200 (200G), has gained widespread adoption.

InfiniBand HDR Switch

NVIDIA offers two types of InfiniBand HDR switches. The first is the HDR CS8500 modular chassis switch, a 29U switch providing up to 800 HDR 200Gb/s ports. Each 200G port supports splitting into 2X100G, enabling support for up to 1600 HDR100 (100Gb/s) ports. The second is the QM87xx series fixed switch, a 1U panel integrating 40 200G QSFP56 ports. These ports can be split into up to 80 HDR 100G ports to connect to 100G HDR network cards. Simultaneously, each port also backwardly supports the EDR rate to connect a 100G EDR NIC card. It is important to note that a single 200G HDR port can only be speed-reduced to 100G to connect with the EDR network card and cannot be split into 2X100G to connect two EDR network cards.

There are two variants of the 200G HDR QM87xx switches: MQM8700-HS2F and MQM8790-HS2F. The sole distinction between the two models lies in the management approach. The QM8700 features a management port supporting out-of-band management, whereas the QM8790 necessitates the NVIDIA Unified Fabric Manager (UFMR) platform for management.

For both QM8700 and QM8790, each switch type offers two airflow options. Among them, MQM8790-HS2F features P2C (Power to Cable) airflow, identifiable by a blue mark on the fan module. In case the color mark is not remembered, airflow direction can also be determined by placing your hand in front of the switch's air inlet and outlet. MQM8790-HS2R adopts C2P (Cable to Power) airflow, with a red mark on the fan module. The QM87xx series switch models are detailed as follows:

Switch Model	Ports	Interface Type	Link Speed	Rack Units	Management
MQM8790-HS2F	40	QSFP56	200Gb/s	1RU	Inband/outband
MQM8790-HS2R	40	QSFP56	200Gb/s	1RU	Inband

CQM8700 and QM8790 switches commonly serve two connectivity applications. One involves linking with the 200G HDR network card, enabling a direct connection using 200G to 200G AOC/DAC cables. The other common application is to connect with 100G HDR network cards, requiring the use of a 200G to 2X100G cable to split a physical 200G (4X50G) QSFP56 port of the switch into two virtual 100G (2X50G) ports. Following the split, the port symbol transforms from x/y to x/Y/z, where "x/Y" denotes the original symbol of the port before the split, and "z" indicates the number (1,2) of the single-lane port, with each sub-physical port considered as an individual port.

Typical topology of HDR two layer fat tree

InfiniBand HDR Network Interface Cards (NICs)

In comparison to HDR switches, there is a variety of HDR network interface cards (NICs). Regarding speed, there are two options: HDR100 and HDR.

The HDR100 NIC card supports a transmission rate of 100Gb/s, and two HDR100 ports can connect to the HDR switch using a 200G HDR to 2X100G HDR100 cable. In contrast to the 100G EDR network adapter, the 100G port of the HDR100 NIC card supports both 4X25G NRZ transmission and 2X50G PAM4 transmission.

The 200G HDR network card supports a transmission rate of 200G and can be directly connected to the switch using a 200G direct cable.

Besides the two interface data rates, the network card for each rate can choose single-port, dual-port, and PCIe types based on business requirements. Commonly used IB HDR network card models are as follows:

NIC Model	Ports	Support InfiniBand Data Rate	Support Ethernet Data Rate	Interface	Host Interface [PCIe]
MCX653105A-ECAT	Single-Port	SDR/DDR/QDR FDR/EDR/HDR	1/10/25/40/50/200Gb/s	QSFP56	PCIe 3.0/4.0 x16
MCX653106A-ECAT	Dual-Port	SDR/DDR/QDR FDR/EDR/HDR100	1/10/25/40/50/100Gb/s	QSFP56	PCIe 3.0/4.0 x16
MCX653105A-ECAT	Single-Port	SDR/DDR/QDR FDR/EDR/HDR100	1/10/25/40/50/100Gb/s	QSFP56	PCIe 3.0/4.0 x16
MCX653106A-HDAT	Dual-Port	SDR/DDR/QDR FDR/EDR/HDR	1/10/25/40/50/200Gb/s	QSFP56	PCIe 3.0/4.0 x16

The HDR InfiniBand network architecture is straightforward, yet it offers various hardware options. For the 100Gb/s rate, there are both 100G EDR and 100G HDR100 solutions. The 200Gb/s rate includes HDR and 200G NDR200 options. There are significant differences in switches, network cards, and accessories used in various applications. InfiniBand high-performance HDR and EDR switches, SmartNIC cards, and NADDOD/Mellanox/Cisco/HPE AOC&DAC&optical module product portfolio solutions provide more advantageous and valuable optical network products and comprehensive solutions for data centers, high-performance computing, edge computing, artificial intelligence, and other application scenarios. This significantly enhances customers' business acceleration capabilities with low cost and excellent performance.

What's the Difference Between InfiniBand vs Ethernet, Fibre Channel, and Omni-Path

InfiniBand vs Ethernet

Distinguishing Technologies: InfiniBand and Ethernet serve as crucial communication technologies for data transfer, each catering to distinct applications.
Historical Speeds: InfiniBand's historical data transfer speed commenced at InfiniBand SDR 10Gb/s, surpassing the initial speed of Gigabit Ethernet.
Current Dominance: InfiniBand has evolved to dominate with network speeds of 100G EDR or 200G HDR, and a trajectory towards faster speeds like 400G NDR and 800G XDR.
Strict Latency Requirements: InfiniBand adheres to stringent latency requirements, approaching near-zero latency.
Ideal Applications: InfiniBand excels in applications demanding rapid and precise data processing, prevalent in supercomputing for tasks such as large-volume data analysis, machine learning, deep learning training, inference, conversational AI, prediction, and forecasting.
Ethernet's Role: Ethernet, while comparatively slower, is characterized by high reliability, making it well-suited for LAN network applications requiring consistent and dependable data transfer.
Divergence in Speed and Reliability: The primary divergence between these technologies lies in their speed and reliability. In HPC networking, InfiniBand takes precedence for applications necessitating swift data transfer, whereas Ethernet's reliability makes it preferable for consistent data transfer in LAN networks.

InfiniBand vs Fibre Channel

Fibre Channel in Storage Area Networks (SANs): Fibre Channel is primarily employed in Storage Area Networks (SANs), specializing in high-speed data transfer between servers, storage devices, or client nodes within data center environments.
Secure Channel Technology: Fibre Channel employs dedicated and secure channel technology, ensuring quick and reliable data transfers.
Versatility in Storage Solutions: Fibre Channel serves as a reliable and expandable technology extensively used in business storage solutions.
Distinguishing Data Transfer Types: The primary distinction between InfiniBand and Fibre Channel lies in the types of data transfer each technology typically facilitates.
Optimal Choices: Ethernet is favored for client-server connections in LAN environments, Fibre Channel excels in storage applications within SANs, while InfiniBand emerges as an innovative technology connecting CPU-memory components in an IAN, supporting clustering and connections to I/O controllers.

InfiniBand vs Omni-Path

Evolution of Data Center Networks: Despite NVIDIA's introduction of the InfiniBand 400G NDR solution, some users continue to utilize the 100G EDR solution. Both Omni-Path and InfiniBand are common choices for high-performance data center networks operating at 100Gb/s speeds.
Network Structure Distinctions: While both technologies offer similar performance, the network structures of Omni-Path and InfiniBand differ significantly. Illustratively, a 400-node cluster using InfiniBand requires only 15 NVIDIA Quantum 8000 series switches and specific cables, whereas Omni-Path demands 24 switches and a larger quantity of active optical cables.
Advantages of InfiniBand EDR Solution: The InfiniBand EDR solution demonstrates notable advantages in terms of equipment cost, operational and maintenance costs, and overall power consumption compared to Omni-Path. This positions InfiniBand as a more environmentally friendly option.