English

InfiniBand Networking: Exploring Features, Components, and Benefits

Posted on Dec 18, 2023 by
1.6k

InfiniBand is an open standard that facilitates high-performance connectivity among CPU/GPU servers, storage servers, and other devices. This article provides insights into the fundamental aspects of InfiniBand networking, including its structural components, core features, and advantages over traditional Ethernet solutions.

What is InfiniBand?

InfiniBand is an open industrial standard that defines a high-speed network for interconnecting servers, storage devices, and more. It leverages point-to-point bidirectional links to enable seamless communication between processors located on different servers. InfiniBand is compatible with various operating systems such as Linux, Windows, and ESXi.

Structural Components of an InfiniBand Network

标签

An InfiniBand network consists of the following elements:

Host Channel Adapter (HCA)

The HCA, also known as a network card, serves as an end node connected to the InfiniBand network. It facilitates transport layer functions and supports the verbs interface, which provides a programming interface for InfiniBand devices.

InfiniBand to Ethernet Gateway/Bridge

This device allows for the conversion of InfiniBand and Ethernet messages, enabling communication between the InfiniBand network and Ethernet network when necessary.

InfiniBand Switch

An InfiniBand switch forwards messages between InfiniBand networks, enabling efficient data transmission across connected devices.

Subnet Manager (SM)

The Subnet Manager is responsible for managing the InfiniBand subnet. It can run on hosts, switches, or be deployed alongside Unified Fabric Manager (UFM) for comprehensive management.

InfiniBand Router

An InfiniBand router facilitates the transmission of messages between different InfiniBand subnets.

Core Features of InfiniBand

标签

  • Subnet Manager (SM): The Subnet Manager program ensures centralized routing management, enabling plug-and-play functionality across all nodes in the network. Each subnet requires a master SM, with other SMs operating in standby mode.

  • GPU Direct: GPU Direct allows direct data transfers between GPUs, reducing latency and enhancing performance, particularly in GPU-based computing. NVIDIA GPUs also support compute task offloading.

  • Low Latency: InfiniBand achieves extremely low latency through hardware offloading and acceleration mechanisms. Cut-through forwarding mode in InfiniBand switches reduces transmission latencies to as low as 130ns. RDMA technology further reduces end-to-end transport latency.

  • Network Scalability: InfiniBand enables the interconnection of multiple subnets using InfiniBand routers, facilitating easy scalability to accommodate over 48,000 nodes.

  • Fault-Tolerant Stable Network: InfiniBand networks ensure rapid traffic recovery, thanks to the subnet manager's routing algorithm and efficient flow reordering. This results in quick traffic restoration.

  • Self-Healing Network: NVIDIA IB switches feature a hardware-based self-healing mechanism, enabling fast recovery in just one millisecond.

  • Adaptive Routing: Adaptive routing balances traffic distribution across switch ports. NVIDIA switches incorporate this feature in their hardware and manage it through the Adaptive Routing Manager.

  • SHARP (Scalable Hierarchical Aggregation and Reduction Protocol): SHARP, based on NVIDIA switch hardware and central management packets, optimizes collective communication and reduces data transfers between nodes during MPI-based applications like AI and machine learning.

Advantages of InfiniBand vs. Ethernet

InfiniBand offers several advantages over traditional Ethernet networking solutions, making it highly suitable for applications requiring fast communication and large-scale data transfer. The benefits of InfiniBand include:

  • High Bandwidth and Low Latency: InfiniBand provides higher bandwidth and lower latency, meeting the performance demands of large-scale data transfer and real-time communication applications.

  • RDMA Support: InfiniBand supports Remote Direct Memory Access (RDMA), enabling direct data transfer between node memories. This reduces CPU overhead and improves transfer efficiency.

  • Scalability: InfiniBand Fabric allows for easy scalability by connecting a large number of nodes and supporting high-density server layouts. Additional InfiniBand switches and cables can expand network scale and bandwidth capacity.

  • High Reliability: InfiniBand Fabric incorporates redundant designs and fault isolation mechanisms, enhancing network availability and fault tolerance. Alternate paths maintain network connectivity in case of node or connection failures.

FS InfiniBand Solution

No.  Type
 1  Optical Modules/DAC/AOC  800G NDR InfiniBand
 2  400G NDR InfiniBand
 3  200G HDR InfiniBand
 4  100G EDR InfiniBand
 5  56/40G FDR InfiniBand
 6  NICs  NVIDIA® InfiniBand Adapters
 7  Switches  NVIDIA® InfiniBand Switches

FS Solutions harnesses InfiniBand's cutting edge network solution, empowering users with high performance computing capabilities. With tailored solutions for various applications and user requirements, FS optimizes performance, delivering high bandwidth, low latency, and seamless data transfer.

By partnering with FS and implementing a stable InfiniBand network, you can unlock new opportunities, accelerate business growth, and enhance the overall user experience!

You might be interested in

Knowledge
Knowledge
Knowledge
See profile for Sheldon.
Sheldon
Decoding OLT, ONU, ONT, and ODN in PON Network
Mar 14, 2023
385.1k
Knowledge
See profile for Irving.
Irving
What's the Difference? Hub vs Switch vs Router
Dec 17, 2021
367.1k
Knowledge
See profile for Sheldon.
Sheldon
What Is SFP Port of Gigabit Switch?
Jan 6, 2023
334.6k
Knowledge
See profile for Migelle.
Migelle
PoE vs PoE+ vs PoE++ Switch: How to Choose?
Mar 16, 2023
419.9k
Knowledge
Knowledge
Knowledge
Knowledge
See profile for Moris.
Moris
How Much Do You Know About Power Cord Types?
Sep 29, 2021
293.8k