InfiniBand Network and Architecture Overview
With the rapid development of current computing power, the demand for data centers and high-performance computing continues to surge. Against this backdrop, the InfiniBand architecture has garnered significant attention as a high-performance network solution. This article delves into the concepts and features of InfiniBand, exploring its inevitable growth and development in the current trend of computing power. We will focus on the main components of InfiniBand networks, comparing them with traditional TCP/IP and revealing their advantages in high-performance computing environments. Through an analysis of the different levels of the InfiniBand architecture, including the upper layer, transport layer, network layer, and physical layer, we will gain a comprehensive understanding of its role in constructing efficient, low-latency data transmission networks.
InfiniBand Architecture Basics
InfiniBand（IB） is a communication link for data flow between processors and I/O devices, supporting up to 64,000 addressable devices. InfiniBand Architecture (IBA) is the industry-standard specification that defines a point-to-point switching input/output framework, typically used for interconnecting servers, communication infrastructure, storage devices, and embedded systems. InfiniBand features universality, low latency, high bandwidth, and low management costs, making it an ideal connection network for single-connection multiple data streams (clustering, communication, storage, management), with interconnected nodes reaching thousands.
InfiniBand networks utilize a point-to-point connection where each node communicates directly with other nodes through dedicated channels, reducing network congestion and improving overall performance. This architecture supports Remote Direct Memory Access (RDMA) technology, enabling data transfer directly between memories without involving the host CPU, further enhancing transfer efficiency.
The smallest complete unit in the InfiniBand Architecture is a subnet, and multiple subnets are connected by routers to form a large InfiniBand network. Each subnet comprises end-nodes, switches, links, and subnet managers. InfiniBand networks find applications in scenarios such as data centers, cloud computing, high-performance computing (HPC), machine learning, and artificial intelligence. The core objectives include maximizing network utilization, CPU utilization, and application performance.
Data Transfer Over InfiniBand
In traditional interconnected fabrics, the operating system is the sole owner of shared network resources, meaning that applications cannot directly access the network. Instead, applications must rely on the operating system to transfer data from the application's virtual buffer to the network stack and lines.
InfiniBand enables applications to exchange data directly over the network without directly involving the operating system. This application-centric approach is a key differentiator between InfiniBand networks and traditional networks and is precisely where the advantages of the InfiniBand network lie.
InfiniBand Architecture vs TCP/IP
The InfiniBand architecture is organized into five layers, similar to the traditional TCP/IP model. However, there are many differences between them.
In the realm of distributed storage, IB is often utilized in the storage front-end network of DPC (Distributed Parallel Computing) scenarios. On the other hand, TCP/IP is commonly employed in business networks.
This is because the existing TCP/IP software and hardware architecture struggle to meet the demands of high-concurrency and low-latency applications. Traditional TCP/IP network communication involves message transmission through the kernel, incurring high data movement and replication costs. While RDMA technology including RoCE and InfiniBand resolves the delay in server-side data processing during network transmission. It enables direct access to memory data through the network interface without kernel intervention, facilitating high throughput and low-latency network communication. This makes RDMA particularly suitable for large-scale parallel computer clusters.
InfiniBand Architecture Layers
The InfiniBand architecture consists of the following layers:
Upper Layer: The Upper Layer includes protocols and services such as SCSI (Small Computer System Interface), IPoIB (IP over InfiniBand), etc., supporting different applications and services.
Transport Layer: The Transport Layer provides end-to-end communication services, including reliable data transmission, flow control, and error handling. It manages the sequence, integrity, and reliability of data transmission.
Network Layer: The Network Layer handles end-to-end communication and routing, ensuring the correct transmission of data packets from source nodes to destination nodes. It defines the addresses and routing rules for nodes in the InfiniBand subnet.
Physical Layer: The Physical Layer defines the hardware specifications of InfiniBand connections, including cables, optical fibers, connectors, and interfaces.
Link Layer: The Data Link Layer is responsible for error detection and correction, managing the flow of data packets, and encapsulating and decapsulating data frames at the physical layer.
Some of the supported upper layer protocols are:
SCSI Protocol (Small Computer System Interface): A standard interface protocol for data transfer between computers and external devices, supporting devices such as disk drives, printers, and other peripherals.
IPoIB Protocol (IP over InfiniBand): A protocol that allows IP data transfer over the InfiniBand network, enabling InfiniBand to support the TCP/IP protocol stack.
SDP Protocol (Sockets Direct Protocol): A protocol used for socket communication over the InfiniBand network, providing high-performance, low-latency data transfer.
MPI (Message Passing Interface): A standard protocol for interprocess communication in parallel computing, commonly used in HPC applications.
InfiniBand Transport Service is different from TCP/IP in the transport layer.
InfiniBand employs Remote Direct Memory Access (RDMA) technology, enabling data transfer directly between network adapters by bypassing the host memory. This approach achieves low latency and high throughput.
TCP/IP utilizes an end-to-end transport mechanism, establishing connections between sending and receiving hosts for data transmission. This mechanism may introduce additional transmission latency and system overhead.
Transport Layer Connections in Hardware
In the transport layer, a 'virtual channel' is established to connect two applications, enabling them to communicate in entirely separate address spaces. During message transmission, this design leverages direct hardware data transfer, contributing to enhanced communication efficiency and performance.
The message is transferred directly through hardware without the need for intervention by the host processor. Consequently, upon reaching the receiving end, the message is delivered directly to the receiving application's buffer without requiring additional processing steps.
The InfiniBand network is divided into multiple subnets, each typically connected by one or more InfiniBand switches. A subnet is an independent communication domain with its own topology and configuration. Routers and switches are employed to route and switch packets between different subnets. Routers are responsible for connecting distinct subnets, while switches handle packet switching within the same subnet.
All devices in a subnet have a Local Identifier (LID), a 16-bit address assigned by the Subnet Manager. All packets sent within a subnet use the LID as the destination address for forwarding and switching packets at the Link Level.
In an InfiniBand network, each switch maintains a forwarding table that records the mapping between its connected ports and the associated destination Local ID (LID). Such forwarding tables are dynamically calculated and configured by the Subnet Manager.
The Subnet Manager is responsible for monitoring changes in the network topology, generating appropriate forwarding tables for each switch, ensuring that data packets can be transmitted correctly and swiftly to their destination nodes.
Switching InfiniBand Packets
In an InfiniBand network, data packets are transmitted and routed through network switches. Employing switches for packet switching allows the construction of flexible, high-performance network topologies, ensuring that data can be rapidly and efficiently transmitted from source nodes to destination nodes.
The link layer of InfiniBand incorporates a flow control protocol, which is employed to adjust the transmission rate between the sender and receiver, preventing a fast sender from overwhelming the processing capability of a slower receiver.
The InfiniBand network is designed as a lossless network, where the sending node dynamically monitors the usage of the receiving buffer and transmits data only when there is available space in the receiving node's buffer. This is what makes InfiniBand a lossless network. Even in the event of network congestion, data packet loss does not occur, which is crucial for applications requiring high reliability and low latency.
Physical layer defines electrical and mechanical characteristics, including cables and connectors for both optical fiber and copper media, backplane connectors, hot-swapping features, etc., ensuring interoperability between different devices and manufacturers. This standardization contributes to the construction of high-performance computing and data center networks, providing reliable, high-bandwidth data transmission. The following picture shows examples of NVIDIA InfiniBand DAC and NVIDIA InfiniBand AOC cables.
In summary, the InfiniBand architecture has emerged as a preferred choice for high-performance computing and data center networks due to its outstanding performance and low-latency characteristics. Its unique design positions it as an ideal solution for handling large-scale data transfers and complex computational tasks. With the ever-increasing demands for computing power and the expansion of data center scale, InfiniBand, as a high-performance interconnect technology, will continue to play a crucial role in scientific, engineering, and business domains.