English

Building Effective HPC Networks: A Detailed Comparison of InfiniBand Solution and RoCEv2 Solution

Posted on Jul 1, 2024 by
202

FS InfiniBand Solution

As the demand for more efficient computing grows, many enterprises seek an effective and suitable network solution to build a future-proofed and powerful High-Performance Computing (HPC) network. Two major solutions that dominate the HPC network landscape are the InfiniBand solution and the RoCEv2 solution. Each has distinct technical advantages, network architectures, and suitable products, making them appropriate for different HPC network scenarios.

InfiniBand Network Solution Overview

InfiniBand network is recognized as a high-performance, low-latency solution highly suitable for HPC, large enterprise data centers, and extensive cloud infrastructures. It utilizes a dual-layer architecture that segregates the physical and link layers from the network layer. The physical layer employs high-bandwidth serial links for direct point-to-point connections between devices, while the link layer manages packet transmission and reception. The network layer oversees end-to-end communication and routing, ensuring accurate packet delivery from source to destination.

InfiniBand network solution enables point-to-point connections and supports essential features such as virtualization, Quality of Service (QoS), and Remote Direct Memory Access (RDMA), making them ideal for HPC workloads that demand swift processing of substantial data volumes.

To learn more about the benefits of InfiniBand networks, read What Makes InfiniBand Stand Out?

InfiniBand Network Architecture

The following is a diagram of the InfiniBand network architecture provided by FS, encompassing the InfiniBand compute network, in-band and out-of-band management network, and storage network. The primary distinction between InfiniBand networks and RoCEv2 networks is that the InfiniBand compute network for HPC workloads uses dedicated InfiniBand infrastructure.

For the InfiniBand compute network, FS employs highly acclaimed NVIDIA® InfiniBand devices, including the 800G InfiniBand data center switch MQM9790-NS2F, 800G and 400G InfiniBand modules, and the MCX75510AAS-NEAT ConnectX®-7 InfiniBand adapter card. In the management network, FS switches utilize PicOS® software and AmpCon™ management platform, enabling customers to efficiently provision, monitor, manage, troubleshoot, and maintain HPC infrastructure. FS PicOS® switches support the BGP protocol with robust routing control capabilities for the storage network, ensuring optimal forwarding paths and low-latency performance.

InfiniBand Network

InfiniBand Switches

NVIDIA currently leads the IB switch market, holding the largest market share. NVIDIA Quantum InfiniBand switches provide self-healing network capabilities, enhanced quality of service (QoS), congestion control, and adaptive routing to provide the highest overall application throughput. In 2021, NVIDIA introduced the InfiniBand Quantum-2 platform, which empowers the world’s leading supercomputing data centers with software-defined networking, In-Network Computing, performance isolation, advanced acceleration engines, RDMA, and the fastest speeds and feeds up to 400Gb/s.

As a trusted Elite Partner in the NVIDIA Partner Network, FS offers a range of high-performance NVIDIA® Quantum and Quantum-2 InfiniBand switches. The optimized 400G InfiniBand switches boast a bidirectional throughput of 51.2 terabits per second (Tb/s) and a capacity exceeding 66.5 billion packets per second (BPPS). These InfiniBand switches provide a high-speed, ultra-low latency, and scalable solution for HPC networks.

 

MQM8790-HS2F

MQM8700-HS2F

MQM9700-NS2F

MQM9790-NS2F

Product MQM8790-HS2F MQM8700-HS2F MQM9700-NS2F MQM9790-NS2F
Ports 40 x HDR 200G 40 x HDR 200G 64 x NDR 400G 64 x NDR 400G
40 QSFP56 Connectors 40 QSFP56 Connectors 32 OSFP Connectors 32 OSFP Connectors
CPU Broadwell ComEx D-1508 2.2GHZ Broadwell ComEx D-1508 2.2GHZ x86 Coffee Lake i3 x86 Coffee Lake i3
Switch Chip NVIDIA QUANTUM NVIDIA QUANTUM NVIDIA QUANTUM-2 NVIDIA QUANTUM-2
Switching Capacity 16Tbps 16Tbps 51.2Tbps 51.2 Tbps
Management Type Unmanaged Managed Managed Unmanaged

InfiniBand Transceivers and Cables

InfiniBand networks require dedicated InfiniBand transceivers and cables for switch-to-switch and switch-to-NIC connections.

FS InfiniBand optics and transceivers are meticulously designed to meet the demanding requirements of modern data centers and HPC environments. These transceivers span a broad range of speeds, from 40G to 800G, and are available in various form factors such as QSFP+, QSFP28, QSFP56, and OSFP. They are 100% verified by the original, and perfectly compatible with NVIDIA Quantum/Quantum-2 InfiniBand Switches and ConnectX HCA.

For more details, check FS InfiniBand Transceivers and Cables Complete Guide.

FS InfiniBand Transceivers and Cables

InfiniBand Network Adapters

Suppliers of InfiniBand NICs are predominantly led by NVIDIA. FS also offers a diverse selection of NVIDIA® ConnectX®-6 and ConnectX®-7 InfiniBand adapters, featuring ports such as 100G QSFP56, 200G QSFP56, 400G QSFP112, and 400G OSFP. By utilizing higher speeds and groundbreaking In-Network Computing, NVIDIA® ConnectX® InfiniBand smart adapters deliver exceptional performance and scalability. They reduce the cost per operation, thereby enhancing ROI for HPC, ML, advanced storage, low-latency embedded I/O applications, and more.

 

MCX653105A-ECAT

MCX653106A-ECAT

MCX653105A-HDAT

MCX653106A-HDAT

MCX75510AAS-NEAT

MCX715105AS-WEAT

Product Family ConnectX®-6 VPI ConnectX®-6 VPI ConnectX®-6 VPI ConnectX®-6 ConnectX®-7 ConnectX®-7 VPI
Model MCX653105A-ECAT MCX653106A-ECAT MCX653105A-HDAT MCX653106A-HDAT MCX75510AAS-NEAT MCX715105AS-WEAT
Ports 100G Single-Port QSFP56 100G Dual-Port QSFP56 200Gb Single-Port QSFP56 200G Dual-Port QSFP56 400Gb Single-Port OSFP 400G Single-Port QSFP112
Host Interface PCIe 4.0 x16 PCIe 4.0 x16 PCIe 4.0 x16 PCIe 4.0 x16 PCIe5.0 x16 PCIe 5.0 x16

RoCEv2 Network Solution Overview

RoCE is a network protocol that allows RDMA over an Ethernet network. RoCEv2, the second version of the protocol, provides improved performance and functionality.

RoCEv2 network solution helps to reduce CPU workload as it provides direct memory access for applications bypassing the CPU. As the packet processing and memory access are done in hardware, the RoCEv2 network allows for higher throughput, lower latency, and lower CPU utilization on both the sender and the receiver side, which are critical for HPC applications.

RoCE networks use advanced technologies like PFC and ECN to ensure zero packet loss and create a lossless Ethernet environment.

PFC (Priority-based Flow Control): PFC outlines a flow control mechanism at the link layer for directly connected peers. It employs 802.3 PAUSE frames to manage flow control for multiple traffic classes. Switches can discard less critical traffic and signal peer devices to halt traffic in specific classes, ensuring that crucial data is not dropped and can pass through the same port without limitations.

ECN (Explicit Congestion Notification): ECN defines flow control and end-to-end congestion notification mechanisms based on the IP and transport layers. When a device experiences congestion, ECN marks the ECN field in the IP header of the data packets. The receiver sends a Congestion Notification Packet (CNP) to notify the sender to reduce the transmission rate. ECN provides end-to-end congestion management, minimizing the spread and intensification of congestion.

For more details about RoCE, you can check RDMA over Converged Ethernet Guide.

RoCEv2 Network Architecture

In contrast to InfiniBand networks, RoCEv2 networks utilize high-performance Ethernet data center infrastructure, leading to better interoperability between the RoCE compute network, management network, and storage network. As illustrated in FS's RoCE solution below, these various network segments can all deploy FS's PicOS® switches and the AmpCon™ management platform to achieve unified network management.

To meet the high throughput demands of high-performance networks, the devices in RoCEv2 deployments achieve speeds of up to 400G. In the FS RoCE network solution, the FS 400G Ethernet transceivers and N9550-64D data center switches with 64 x 400Gb QSFP-DD ports are deployed in the RoCE compute network to ensure high throughput.

RoCE Network

Data Center Switches

The core component of high-performance switches is the switching chip. Currently, Broadcom's Tomahawk series chips are extensively used in data center switches for RoCEv2 networks, which are designed for hyper-scale cloud networks, storage networks, and HPC environments.

FS offers high-performance, highly reliable data center switches for building RoCEv2 networks, ranging from 10G to 400G. The image below showcases some of FS's data center switches powered by Tomahawk series chips, tailored for HPC applications, offering high density and availability. These switches come pre-installed with PicOS® software and feature support for PFC and ECN, delivering low-latency, zero packet loss, and non-blocking lossless Ethernet networks. The PicOS® switch software provides comprehensive SDN capabilities and is compatible with the AmpCon™ management platform, delivering a resilient, programmable, and scalable network operating system (NOS) at a lower TCO.

FS Data Center Switches

High-Speed Ethernet Transceivers and Cables

RoCEv2 networks operate over Ethernet, which allows the deployment of traditional Ethernet transceivers and DAC/AOC cables. As a result, there are more suppliers of RoCEv2 network devices to choose from. And, the RoCEv2 solution provides a wider variety of products and deployment strategies.

For HPC networks, FS offers high-speed 200G/400G/800G modules and cables that are compatible with devices from brands like Cisco, Arista, Dell, Juniper, and so on. Modules or cables that support various transmission rates, such as SR4, SR8, DR4, LR4, and ER8, are all available.

Ethernet Network Adapters for RoCEv2 Networks

The NVIDIA ConnectX series network cards, which support RoCE, currently dominate the market. With unmatched RoCE performance, ConnectX NICs deliver efficient, high-performance RDMA services to bandwidth-sensitive and latency-sensitive applications. FS offers a variety of NVIDIA® ConnectX ethernet adapters that deliver low latency and high throughput at speeds of 25G, 100G, 200G, and up to 400G.

  • The ConnectX®-4 adapter cards provide cost-effective solutions for data centers, combining performance and scalability to ensure infrastructures operate efficiently, meeting the demands of various critical applications.

  • The ConnectX®-5 network adapters feature advanced hardware offloading capabilities to reduce CPU resource consumption and achieve extremely high packet rates and throughput, enhancing the efficiency of data center infrastructure.

  • The ConnectX®-6 adapter cards incorporate all the innovations of previous versions, along with numerous enhancements, delivering unmatched performance and efficiency at any scale.

  • With a throughput of up to 400Gb/s, the NVIDIA® ConnectX®-7 NICs offer hardware-accelerated networking, storage, security, and manageability services at a data center scale, catering to cloud, telecommunications, HPC data centers, and enterprise workloads.

FS Ethernet NICs

InfiniBand Network Solution VS. RoCEv2 Network Solution

From a technical standpoint, InfiniBand network solutions employ advanced technologies to boost network forwarding performance, minimize fault recovery time, enhance scalability, and reduce operational complexity.

InfiniBand vs RoCEv2

InfiniBand solutions are perfect for dedicated HPC environments where extreme performance and low latency are critical. However, they entail higher hardware costs and provide a more limited selection of suppliers. On the other hand, RoCEv2 solutions are well-suited for HPC and big data applications that require compatibility with existing Ethernet infrastructure. RoCEv2 hardware is relatively easier to integrate, though its performance is slightly inferior to that of InfiniBand.

  • Performance: InfiniBand solution's lower end-to-end latency gives it an advantage in application performance over RoCEv2. Nonetheless, RoCEv2 can meet the performance requirements for most intelligent computing tasks.

  • Function & Scale: InfiniBand networks can support single-cluster scales with tens of thousands of GPUs without a decline in overall performance. RoCEv2 networks can support clusters with thousands of GPUs without significantly compromising overall network performance.

  • Operations and Maintenance: InfiniBand is more mature than RoCEv2, offering features such as multi-tenant isolation and advanced operational diagnostics. RoCEv2 networks rely on PFC to turn Ethernet into a lossless fabric. However, PFC can lead to management issues, including PFC storms and deadlocks.

  • Cost: InfiniBand is more expensive than RoCEv2, primarily due to the higher cost of InfiniBand switches compared to Ethernet switches.

  • Supplier: InfiniBand solution suppliers are predominantly led by NVIDIA, whereas RoCEv2 solution has a broader range of suppliers.

InfiniBand vs RoCEv2 Overview

Conclusion

The choice between the InfiniBand solution and the RoCEv2 solution ultimately depends on the specific needs and constraints of the HPC environment. Organizations with extensive HPC requirements and a specialized infrastructure budget may find InfiniBand the superior choice. Meanwhile, those looking for a more cost-effective solution that integrates seamlessly with existing systems may prefer RoCEv2.

By understanding the strengths and limitations of each solution, organizations can make informed decisions that optimize their HPC networks for performance, reliability, and cost-efficiency.

 

Related Articles:

Building HPC Data Center Networking Architecture with FS InfiniBand Solution

InfiniBand Insights: Powering High-Performance Computing in the Digital Age

An In-Depth Guide to RoCE v2 Network

You might be interested in

Knowledge
Knowledge
Knowledge
See profile for Sheldon.
Sheldon
Decoding OLT, ONU, ONT, and ODN in PON Network
Mar 14, 2023
411.0k
Knowledge
See profile for Irving.
Irving
What's the Difference? Hub vs Switch vs Router
Dec 17, 2021
375.5k
Knowledge
See profile for Sheldon.
Sheldon
What Is SFP Port of Gigabit Switch?
Jan 6, 2023
356.0k
Knowledge