English

RDMA over Converged Ethernet Guide

Updated on Dec 16, 2023 by
33.1k

In the era of data, the requirements for a faster, more efficient, and scalable network has never been reduced. Since the traditional TCP/IP Ethernet connections are CPU intensitive and require extra processing and copying of the data, they can’t meet the current network needs any more. In that context, the RDMA over Converged Ethernet (RoCE) arrives. To figure out what RoCE is, it's worth looking at RDMA first.

 

RDMA (Remote Direct Memory Access) enables direct data transfer between devices in a network, and RoCE (RDMA over Converged Ethernet) is a leading implementation of this technology. RoCE improves data transmission with high speed and low latency, making it ideal for high-performance computing and cloud environments. This article explores the fundamentals of RDMA and how RoCE enhances data transfer over Ethernet networks.

What Is RDMA?

Remote Direct Memory Access (RDMA) is a technology that enables direct memory access from the memory of one host or server to the memory of another host or server without involving the CPU. Thus, it frees the CPUs to do the work they meant to do such as running applications and processing massive amounts of data. Then, the network and host performance with lower latency, lower CPU load, and higher bandwidth can be cost-effectively achieved.

RDMA technology achieves a perfect combination of intelligent neetwork cards and optimized software architecture, providing strong support for high-speed direct access to remote memory. By embedding the RDMA protocol in hardware (i.e., the network card) and using methods such as zero-copy and kernel bypass, high-performance remote data access is achieved. RDMA Technology

Figure 1: RDMA Technology

 

What Is RoCE?

As a type of RDMA, RoCE is a network protocol defined in the InfiniBand Trade Association (IBTA) standard, allowing RDMA over converged Ethernet network. Shortly, it can be regarded as the application of RDMA technology in hyper-converged data centers, cloud, storage, and virtualized environments. It possesses all the benefits of RDMA technology and the familiarity of Ethernet. To understand the differences between RoCE and Infiniband, you can read this article RoCE vs Infiniband vs TCP/IPAnd the encapsulation of IB and RoCE is compared as follows:

标签

Figure 2: InfiniBand Vs. RoCEv2

 

Types of RoCE

Generally, there are two RDMA over Converged Ethernet versions: RoCE v1 and RoCE v2. It depends on the network adapter or card used.

  • RoCE v1: The RoCE v1 protocol is an Ethernet link layer protocol allowing two hosts in the same Ethernet broadcast domain (VLAN) to communicate. It uses Ethertype 0x8915, which limits the frame length to 1500 bytes for a standard Ethernet frame and 9000 bytes for an Ethernet jumbo frame.

  • RoCE v2: The RoCE v2 protocol overcomes the limitation of version 1 being bounded to a single broadcast domain (VLAN). By changing the packet encapsulation to include IP and UDP headers, RoCE v2 can now be used across both L2 and L3 networks. This enables Layer 3 routing, which brings RDMA to network with multiple subnets for great scalability. Therefore, RoCE v2 is also regarded as Routable RoCE (RRoCE). Owing to the arrival of RoCE v2, the IP multicast is now also possible.

Figure 3: RoCE v1 Vs. RoCE v2 Packet Format

Benefits of RoCE

Since RDMA over Converged Ethernet has direct access to memory data via network interface rather than through the kernel, it can enable low-latency and high-performance transmission.

    • Low CPU involvement: Access remote switch or server’s memory without consuming CPU cycles on the remote server, which enables full use of the available bandwidth and higher scalability.

    • Zero-copy: Send and receive data to and from remote buffers.

    • High-productive: Since latency and throughput have been improved by RoCE, network performance has gained a lot.

    • Cost-saving: With RoCE there is no need to buy new equipment or replace Ethernet infrastructure to handle the massive amount of data, which greatly saves capital expenditures for companies.

Limitations of RoCE

Due to the performance bottlenecks of traditional Ethernetnetworks, general RoCE applications still suffer from performance losses such as congestion, packet loss, and latency jitter in high-performance businesses.

      • Congestion: In multi-cast scenarios, queues can become congested, with queue latency that cannot be ignored.

      • Packet loss: Compared to FC, traditional Ethernet is prone to congestion and packeet loss, with the resending of lost packets easily leading to data disorder.

      • Latency jitter: The Ethernet network experiences large amounts of jitter, and its store-and-forward mode leads to complex lookup processes and high forwarding latency.

Figure 4: Before Vs. After RoCE

 

How to Realize RoCE?

Generally, to realize RDMA over converged Ethernet for a data center, you can install network adapter or cards drivers supporting RoCE. All Ethernet NICs require RoCE network adapter cards. RoCE drivers are available in Red Hat, Linux, Microsoft Windows, and other common operating systems. RDMA over converged Ethernet is available in two ways. For network switches, you can choose to use the switch with an operating system supporting PFC (priority flow control). As for a rack server or host, you will need to use a network adapter card, such as ConnectX-3 pro and ConnectX-4 and above.

''Also Check- Network Servers & Network Switch

FAQs About RoCE

Here we list some frequently asked questions about RDMA over converged Ethernet so that you can understand it better.

1. Which FS switches or network cards/adapters support RoCE?

Up to now, except S5860 series, S5850-24S2Q, S5850-24S2Q-DC and S5800-8TF12S switches, all FS N series switches and S58/80 series can support RoCE v1 and v2. Customers need to enable their PFC function after buying an RDMA switch. What's more, FS can also provide NVIDIA Ethernet Adapters that support RoCE.

2. Can RoCE adapters communicate with other adapter types, like iWARP?

RoCE adapters can only communicate with other RDMA over converged Ethernet adapters. Any configurations that attempt to mix adapter types, say RoCE adapters combined with iWARP adapters, will probably revert to traditional TCP/IP connections.

3. What’s the difference between RoCE and iWARP?

As RoCE network protocol, iWARP (Internet wide area RDMA protocol) also supports RDMA function with lower latency, but they do have some differences.

On the one hand, RoCE is the only industry-standard Ethernet-based RDMA solution with a multi-vendor ecosystem delivering network adapters and operating over standard Layer 2 and Layer 3 Ethernet switches. And iWARP has seen only minimal support.

On the other hand, iWARP uses a complex mix of layers, including DDP (Direct Data Placement), a tweak known as MPA (Marker PDU Aligned framing), and a separate RDMA protocol (RDMAP) to deliver RDMA services over TCP/IP. With such a complex architecture, it will be hard for iWARP protocol to apply RDMA to the existing software transport frameworks. After such a compromise, the throughput, latency, and CPU utilization for iWARP will be dampened.

Figure 5: iWARP's Complex Network Layers Vs. RoCE’s Simpler Model

 

Conclusion

Running RDMA in data centers, offloading of data movement and the higher availability of CPU resources to the application can be achieved. Adopters of RoCE can benefit from RDMA’s capabilities without changing their network infrastructure. By reducing Ethernet network latency and offloading CPU overhead, RoCE increases performance in search, storage, database, financial and high transaction rate applications. By increasing CPU efficiency and improving application performance, RoCE can reduce the number of servers needed, thereby producing energy savings and reducing the footprint of Ethernet-based data centers.

You might be interested in

Knowledge
Knowledge
Knowledge
Knowledge
See profile for Howard.
Howard
EVPN-VXLAN: How to Use It in Data Center
May 20, 2023
10.8k
Knowledge
Knowledge
Knowledge
See profile for Sheldon.
Sheldon
Decoding OLT, ONU, ONT, and ODN in PON Network
Mar 14, 2023
398.7k
Knowledge
See profile for Irving.
Irving
What's the Difference? Hub vs Switch vs Router
Dec 17, 2021
371.9k
Knowledge
See profile for Sheldon.
Sheldon
What Is SFP Port of Gigabit Switch?
Jan 6, 2023
345.8k
Knowledge