How Much Do You Know About InfiniBand In-Network Computing?

Posted on Dec 30, 2023 by

 1.4k

InfiniBand plays a crucial role in high-performance computing (HPC) applications, as reflected in its provision of high-speed, low-latency network communication to support large-scale data transfer and complex computational tasks. The significance of InfiniBand extends to the realm of In-Network Computing, where its applications are gradually expanding. By executing computational tasks within the network, InfiniBand further reduces latency and enhances overall system efficiency, propelling the HPC domains towards higher performance and increased intelligence.

For more details about InfiniBand, you can check this post: InfiniBand, What Exactly Is It?

InfiniBand In-Network Computing

InfiniBand In-Network Computing: What Is It?

InfiniBand In-Network Computing (INC) is an extension of InfiniBand technology designed to enhance system performance by introducing computational capabilities into the network. In the realm of network computing, it effectively addresses collective communication and point-to-point bottleneck issues in HPC applications, providing novel perspectives and solutions for the scalability of data centers.

The philosophy of In-Network Computing involves integrating computational capabilities into the switches and InfiniBand adapters of the InfiniBand network. This enables the execution of simple computing tasks concurrently with data transmission, eliminating the need to transfer data to terminal nodes such as servers for processing.

InfiniBand In-Network Computing in Data Center

In recent years, the evolution of modern data centers has manifested in a novel distributed parallel processing architecture, driven by cloud computing, big data, high-performance computing. Resources such as CPU, memory, and storage are dispersed throughout the data center and interconnected via high-speed networking technologies like InfiniBand, Ethernet, Fibre Channel, and Omni-Path. Collaborative design and division of labor facilitate the collective accomplishment of data processing tasks, creating a balanced system architecture centered around business data.

InfiniBand In-Network Computing integrates in-network computing by executing computational tasks within the network, transferring data processing responsibilities from the CPU to the network to reduce latency and enhance system performance. Through key technologies like network protocol offloading, RDMA, GPUDirect, InfiniBand achieves functionalities such as online computation, decreased communication latency, and optimized data transfer efficiency. This profound integration of in-network computing provides effective support for high-performance computing applications.

Key Technologies of InfiniBand In-Network Computing

Network Protocol Offloading

Network protocol offloading involves relieving the CPU from the burden of processing network-related protocols by moving these tasks to dedicated hardware.

InfiniBand network adapters and InfiniBand switches handle the processing of the entire network communication protocol stack, including the physical layer, link layer, network layer, and transport layer. This offloading eliminates the need for additional software and CPU processing resources during data transmission, significantly improving communication performance.

RDMA

Remote Direct Memory Access (RDMA) technology is developed to address the issue of server-side data processing latency in network transmission. RDMA enables direct data transmission from the memory of one computer to another without involving the CPU, reducing data processing latency and improving network transmission efficiency.

RDMA allows data to be transferred directly from user applications to the storage area of the server, which can then be quickly transmitted to the remote system's storage via the network. This eliminates the need for multiple data copying and text exchanging operations during the transmission process, resulting in a significant reduction in CPU load.

GPUDirect RDMA

GPUDirect RDMA is a technology that leverages RDMA capability to facilitate direct communication between GPU nodes, enhancing communication efficiency in GPU clusters.

In scenarios where two GPU processes on different nodes within a cluster need to communicate, GPUDirect RDMA enables the RDMA network adapter to directly transfer data between the GPU memories of the two nodes. This eliminates the need for CPU involvement in data copying, reduces accesses to the PCIe bus, minimizes unnecessary data copying, and significantly enhances communication performance.

SHARP

Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) is a collective communication network offloading technology designed to optimize efficiency in high-performance computing and artificial intelligence applications that involve collective communications.

SHARP integrates a compute engine unit into the InfiniBand switch chip, supporting various fixed-point or floating-point calculations. In a cluster environment with multiple switches, SHARP establishes a logical tree in the physical topology, where multiple switches process collective communication operations in parallel and distributed manner. This parallel and distributed processing of the SHARP tree significantly reduces the latency of collective communication, minimizes network congestion, and improves the scalability of the cluster system. The protocol supports operations such as Barrier, Reduce, and All-Reduce, enhancing the efficiency of collective communications in large-scale computing environments.

InfiniBand In-network Computing Applications: HPC

InfiniBand In-Network Computing finds prominent applications in HPC due to its ability to enhance overall system performance and efficiency.

In the field of HPC, where computing-intensive tasks are predominant, InfiniBand is instrumental in mitigating CPU/GPU resource contention. The communication-intensive nature of HPC tasks, involving both point-to-point and collective communications, necessitates effective communication protocols. In this context, offloading techniques, RDMA, GPUDirect, and SHARP technologies are widely employed to optimize computing performance.

Conclusion

InfiniBand In-Network Computing, as an innovative network computing technology, provides efficient and reliable computational support for HPC fields. As one of the significant innovations in the field of information technology, InfiniBand In-Network Computing will lead the continuous advancement and evolution of network computing technology. FS can provide HPC solution-related InfiniBand products, such as IB switches, IB network cards, and IB module cables, which are available for purchase on FS.com.