English

An In-Depth Guide to RoCE v2 Network

Posted on Dec 20, 2023 by
3.3k

In the ever-evolving landscape of networking technologies, Remote Direct Memory Access (RDMA) has emerged as a crucial player, streamlining data transfer processes and enhancing overall network efficiency. One prominent RDMA technology is RoCE (RDMA over Converged Ethernet), with its second version, RoCE v2, making significant strides in performance and versatility. This article highlights the intricacies of RoCE v2, exploring its technology, network cards, and comparison with InfiniBand.

What is RoCE v2?

RoCE v2 is an RDMA protocol designed to facilitate low-latency, high-throughput data transfers over Ethernet networks. Unlike traditional data transfer methods that involve multiple layers of processing, RoCE v2 enables direct memory access between systems, minimizing CPU involvement and reducing latency. This makes RoCE v2 particularly advantageous in scenarios demanding swift and efficient data communication, such as high-performance computing (HPC) environments, data centers, and cloud computing.

The protocol builds upon the foundation of its predecessor, RoCE v1, by introducing enhancements that address certain limitations and improve overall performance. RoCE v2 utilizes a Converged Ethernet infrastructure, enabling the coexistence of traditional Ethernet traffic with RDMA traffic on the same network. This convergence streamlines network management and eliminates the need for a separate RDMA fabric, making RoCE v2 more accessible and cost-effective.

RoCE v2 Network Infrastructure

RoCE Network Card

Central to the RoCE v2 ecosystem is the RoCE network card, a specialized network interface card (NIC) designed to support RDMA operations. These cards, also known as RoCE adapters, are pivotal in enabling direct memory access between systems. RoCE network cards are equipped with the necessary hardware capabilities to offload RDMA operations from the CPU, resulting in lower latency and improved overall system performance.

RoCE Network Card

The core of high-performance switches lies in the forwarding chips they employ. Notably, the Tomahawk3 series chips are widely employed in switches, with a growing trend toward switches supporting the newer Tomahawk4 series chips. This shift highlights the importance of these chips, which are commonly used in the current commercial market for forwarding data.

Tomahawk3 series chips

RoCE v2 vs. Infiniband

RoCE v2 (RDMA over Converged Ethernet version 2) and InfiniBand are both technologies designed to provide high-speed, low-latency communication in data centers and high-performance computing environments. Here are some key differences across various aspects.

RoCE v2 vs. Infiniband

Physical Layer

  • RoCE v2: Relies on Ethernet infrastructure, allowing for the convergence of storage and regular data traffic on the same network. This also makes it easier to integrate into existing data center setups.

  • InfiniBand: Uses a dedicated fabric for communication, separate from Ethernet. It often requires a specialized InfiniBand network, which might necessitate separate cabling and switches.

Protocol Stack & Network Stack

  • RoCE v2: Utilizes the RDMA (Remote Direct Memory Access) protocol over Ethernet. It integrates with the traditional TCP/IP stack, making it compatible with standard networking protocols.

  • InfiniBand: Has its own protocol stack optimized for high-speed, low-latency communication, and network stack, which may require specialized drivers and configurations.

Switching

  • RoCE v2: Can operate over standard Ethernet switches with Data Center Bridging (DCB) features, supporting lossless Ethernet.

  • InfiniBand: Requires InfiniBand switches that are specifically designed for low-latency, high-throughput communication.

Congestion

RoCE v2:

  • Handling Congestion: RoCE v2 relies on Data Center Bridging (DCB) features of Ethernet switches to handle congestion. DCB provides a lossless Ethernet environment, preventing packet loss due to congestion.

  • Congestion Control: RoCE itself does not have built-in congestion control mechanisms. Instead, it relies on the underlying Ethernet infrastructure to manage congestion.

InfiniBand:

  • Handling Congestion: InfiniBand has native support for congestion management. It employs mechanisms such as credit-based flow control to prevent congestion and ensure lossless communication.

  • Congestion Control: InfiniBand includes adaptive routing and congestion control algorithms to dynamically adjust traffic routes and prevent congestion in the network.

Routing

RoCE v2:

  • Routing Mechanism: RoCE v2 typically relies on traditional Ethernet routing protocols such as Routing Information Protocol (RIP) or Open Shortest Path First (OSPF) for routing decisions.

  • Topology: RoCE is often used in standard Ethernet topologies, and the routing decisions are influenced by the underlying Ethernet infrastructure.

InfiniBand:

  • Routing Mechanism: InfiniBand has its routing mechanisms optimized for low-latency, high-throughput communication. It supports multiple paths for redundancy and load balancing.

  • Topology: InfiniBand supports a variety of topologies, including fat-tree, hypercube, and multi-rail configurations. The choice of topology can influence routing decisions.

Choosing between RoCE v2 and InfiniBand depends on factors such as existing infrastructure, application requirements, and the specific needs of the environment. RoCE v2 provides a more seamless integration path into existing Ethernet networks, while InfiniBand may be preferred in high-performance computing environments demanding the highest levels of performance and scalability.

UEC Brings New Transport Protocol

The Ultra Ethernet Consortium (UEC) was formally founded on July 19th, with the primary objective of surpassing current Ethernet capabilities. The founding members include AMD, Arista, Broadcom, Cisco, Eviden, HPE, Intel, Meta, and Microsoft. These companies collectively bring decades of expertise in network infrastructure, artificial intelligence, cloud technologies, and high-performance computing deployments.

The consortium contends that Remote Direct Memory Access (RDMA), established decades ago, has become obsolete for the rigorous demands of AI/ML network traffic. RDMA's tendency to transfer data in sizable traffic blocks can result in link imbalances and excessive burdens. UEC advocates for the initiation of the development of a contemporary transport protocol that integrates RDMA for emerging applications.

Summary

RoCE v2 stands as a formidable force in the realm of RDMA technologies, offering a powerful solution for organizations seeking high-performance, low-latency data communication. Its convergence over Ethernet infrastructure, coupled with the advancements brought by UEC's transport protocol, positions RoCE v2 as a versatile and cost-effective choice for various applications, from HPC environments to cloud computing.

While comparisons with InfiniBand highlight the strengths of RoCE v2, organizations need to consider their specific requirements and existing infrastructures when choosing the most suitable RDMA solution. As technology continues to evolve, RoCE v2 and its associated innovations are poised to play a pivotal role in shaping the future of high-performance networking.

Related Article:

Optimizing Data Center Performance with RoCE Switches

You might be interested in

Knowledge
Knowledge
Knowledge
See profile for Sheldon.
Sheldon
Decoding OLT, ONU, ONT, and ODN in PON Network
Mar 14, 2023
385.9k
Knowledge
See profile for Irving.
Irving
What's the Difference? Hub vs Switch vs Router
Dec 17, 2021
367.5k
Knowledge
See profile for Sheldon.
Sheldon
What Is SFP Port of Gigabit Switch?
Jan 6, 2023
335.3k
Knowledge
See profile for Migelle.
Migelle
PoE vs PoE+ vs PoE++ Switch: How to Choose?
Mar 16, 2023
420.4k
Knowledge
Knowledge
Knowledge
Knowledge
See profile for Moris.
Moris
How Much Do You Know About Power Cord Types?
Sep 29, 2021
294.5k