In the era of data, the requirements for a faster, more efficient, and scalable network has never been reduced. Since the traditional TCP/IP Ethernet connections are CPU intensitive and require extra processing and copying of the data, they can’t meet the current network needs any more. In that context, the RDMA over Converged Ethernet (RoCE) arrives.
Background Knowledge: What Is RDMA?
RDMA (remote direct memory access) is a technology that enables direct memory access from the memory of one host or server to the memory of another host or server without involving the CPU. In thus doing, it frees the CPUs to do the work they meant to do such as running applications and processing massive amounts of data. Then, the network and host performance with lower latency, lower CPU load, and higher bandwidth can be cost-effectively achieved.
Figure 1: RDMA Technology
What Is RoCE？
As its name shows, RoCE is a network protocol defined in the InfiniBand Trade Association (IBTA) standard, allowing RDMA over converged Ethernet network. Shortly, it can be regarded as the application of RDMA technology in hyper-converged data centers, cloud, storage, and virtualized environments. It possesses all the benefits of RDMA technology and the familiarity of Ethernet.
Types of RoCE
Generally, there are two RoCE versions: RoCE v1 and RoCE v2. It depends on the network adapter or card used.
RoCE v1: The RoCE v1 protocol is an Ethernet link layer protocol allowing two hosts in the same Ethernet broadcast domain (VLAN) to communicate. It uses Ethertype 0x8915, which limits the frame length as 1500 bytes for a standard Ethernet frame and 9000 bytes for an Ethernet jumbo frame.
RoCE v2: The RoCE v2 protocol overcomes the limitation of version 1 being bounded to a single broadcast domain (VLAN). By changing the packet encapsulation to include IP and UDP headers, RoCE v2 can now be used across both L2 and L3 networks. This enables Layer 3 routing, which brings RDMA to network with multiple subnets for great scalability. Therefore, RoCE v2 is also regarded as Routable RoCE (RRoCE). Owing to the arrival of RoCE v2, the IP multicast is now also possible.
Figure 2: RoCE v1 vs RoCE v2 Packet Format
How to Realize RoCE？
Generally, to realize RDMA over converged Ethernet for a data center, you can install network adapter or cards drivers supporting RoCE. All Ethernet NICs require RoCE network adapter cards. RoCE drivers are available in Red Hat, Linux, Microsoft Windows, and other common operating systems. RoCE is available in two ways. For network switch, you can choose to use the switch with an operating system supporting PFC (priority flow control). As for a rack server or host, you will need to use a network adapter card, such as ConnectX-3 pro and ConnectX-4 and above.
Figure 3: RoCE
Benefits of RoCE
Figure 4: Before Vs. After RoCE
FAQs About RoCE
Here we list some frequently asked questions about RoCE for your better understanding about it.
1. Which FS switches or network cards/adapters support RoCE?
Up to now, FS N series switches with Cumulus OS and S58/80 series all can support RoCE v1 and v2. Customers need to enable their PFC function after buying an RDMA switch. As for adapters and cards, the RoCE is not yet accessible in FS.
2. What’s the difference between RoCE and iWARP?
As RoCE network protocol, iWARP (Internet wide area RDMA protocol) also support RDMA function with lower latency. However, RoCE is the only industry-standard Ethernet-based RDMA solution with a multi-vendor ecosystem delivering network adapters and operating over standard Layer 2 and Layer 3 Ethernet switches. And iWARP has seen only minimal support. iWARP uses a complex mix of layers, including DDP (Direct Data Placement), a tweak known as MPA (Marker PDU Aligned framing), and a separate RDMA protocol (RDMAP) to deliver RDMA services over TCP/IP. With such a complex architecture, it will be hard for iWARP protocol to apply RDMA to the existing software transport frameworks. After such a compromise, the throughput, latency, and CPU utilization for iWARP will be dampened.
Figure 5: iWARP's Complex Network Layers Vs. RoCE’s Simpler Model
3. Can RoCE adapters communicate with other adapter types, like iWARP?
RoCE adapters can only communicate with other RoCE adapters. Any configurations that attempt to mix adapter types, say RoCE adapters combined with iWARP adapters, will probably revert to traditional TCP/IP connections.
Running RDMA in data centers, offloading of data movement and the higher availability of CPU resources to the application can be achieved. Adopters of RoCE can benefit from RDMA’s capabilities without changing their network infrastructure. By reducing Ethernet network latency and offloading CPU overhead, RoCE increases performance in search, storage, database, financial and high transaction rate applications. By increasing CPU efficiency and improving application performance, RoCE can reduce the number of servers needed, thereby producing energy savings and reducing the footprint of Ethernet-based data centers.
Copyright © 2002-2019. All Rights Reserved.