English

QSFP-DD 400G SR4 Optical Module: The New Choice for High-performance Networks

Posted on May 31, 2024 by
116

As we advance into the supercomputing era, data centers are rapidly evolving. The ongoing technological advancements and the surge in AI supercomputing are driving digital innovation. In this data-centric age, data centers face increasing demands not only for larger scale but also for faster data transmission. The emergence of 400G/800G transmission rates is revolutionizing data center infrastructure. This article introduces a new 400G product: the 400G QSFP-DD SR4 optical module, highlighting its application in contemporary data centers.

400G SR4 Optical Module: The Need for Speed and Efficiency

As large-scale data models and cloud computing continue to evolve, the need for efficient data transmission is critical. Traditional network solutions often fall short, struggling with latency and power consumption issues. This is where the 400G SR4 module steps in, offering a solution that meets the rigorous demands of modern network infrastructures.

The Emergence of Distributed Training Technology

The evolution of technologies like large-scale models, cloud computing, and big data has emphasized the need for robust network connectivity. Training models with immense computational requirements necessitates efficient network solutions. To expedite training, distributed training techniques, leveraging multiple machines and cards, are crucial. The performance of such clusters heavily depends on the interconnecting network's efficiency. Distributed training is a method where the training process of machine learning models is spread across multiple computing resources, such as GPUs, TPUs, or even entire servers. Instead of training a model on a single machine, the workload is divided and processed in parallel, dramatically reducing the time required to train large models.

Distributed training methods create clusters with significant computational and storage power, where the network's performance directly affects communication efficiency between nodes. This impacts the overall throughput and performance of the cluster. Consequently, low-latency, high-speed networks are essential for data centers.

Remote Direct Memory Access (RDMA) technology is crucial for reducing communication latency between machines and cards. RDMA bypasses the operating system kernel, allowing direct memory access between hosts and significantly improving latency compared to traditional TCP/IP networks. RDMA is mainly implemented through InfiniBand and RoCE (RDMA over Converged Ethernet). InfiniBand offers superior performance and maturity but at higher costs, typically provided by vendors like NVIDIA. RoCE, however, has a more diverse and open ecosystem, with multiple vendors contributing to a wide range of device options.

For more information about the differences between RDMA and TCP/IP, please read this article: A Quick Look at the Differences: RDMA vs TCP/IP.

Current 400G Network Architecture and Its Problem

The high-speed network in a GPU cluster primarily includes network cards, optical connectors (transceivers and cables), and high-speed switches. Servers with H800 GPUs typically use NVIDIA CX7 400G adapter cards. Switch options include 64x400G QDD switches with BCM 25.6T Tomahawk4 chips, 64x800G switches with BCM 51.2T chips, Cisco Nexus-based switches, and 400G/800G switches with NVIDIA Spectrum chips. Currently, 400G QDD switches are the most popular. The following figure illustrates the commonly used network architecture in clusters:

400G Network Architecture

It is noticeable that the mainstream 400G switches available in the market use 56G SerDes, whereas the CX7 network card utilizes 112G SerDes. To maintain network connectivity, this discrepancy must be taken into account when choosing optical transceivers to guarantee component compatibility and proper functionality. As illustrated in the picture:

112G SerDes

To address this issue, the current approach is to use 400G QSFP-DD DR4 optical modules on the switch side and 400G OSFP DR4 optical modules on the server side. The connection setup is illustrated below:

400G OSFP DR4 - 400G QSFP-DD DR4

QSFP-DD 400G SR4 Optical Module: The New Choice for High-performance Networks

400G DR4 is an effective solution for connection distances under 500m. However, for distances not exceeding 100m, the DR4 solution can be unnecessarily expensive. In such cases, the short-distance 400G QSFP-DD SR4, which uses a low-cost 850nm VCSEL laser, is a more economical choice. It is significantly cheaper than the DR4 and meets the transmission distance requirement of 100m. The 400G QSFP-DD SR4 module can be inserted into a 400G QSFP-DD 8x50G PAM4 port and can interoperate with a 400G OSFP (4x100G PAM4) network card, facilitating interconnection between 400G QSFP-DD SR4 and 400G OSFP SR4 modules. As depicted:

Connectivity Solution

FS QSFP-DD 400G SR4 Optical Module Introduction

FS’s newly launched 400G QSFP-DD SR4 module boasts an advanced design that significantly enhances performance and efficiency. The electrical port utilizes 8x50G PAM4 modulation, while the optical port employs 4x100G PAM4 modulation. This configuration allows the SR4 to reduce the number of lasers from eight to four compared to the 400G QSFP-DD SR8 model. As a result, it achieves significantly lower power consumption and a more cost-effective price point, with power consumption dropping to less than 8W.

Built-in Maxlinear Chip and equipped with an MPO12 optical port, the 400G QSFP-DD SR4 module supports a transmission distance of up to 60 meters when using OM3 multi-mode fiber and up to 100 meters with OM4 multi-mode fiber. This makes it a highly efficient and practical solution for short-distance data transmission needs within data centers.

QDD-SR4-400G

Category Model Description
InfiniBand QDD-SR4-400G NVIDIA Ethernet Compatible QSFP-DD 400GBASE-SR4 PAM4 850nm 100m DOM MPO-12/APC MMF Optical Transceiver Module, Support 2 x 200G-SR and 4 x 100G-SR
Ethernet QDD-SR4-400G Cisco Compatible QSFP-DD 400GBASE-SR4 PAM4 850nm 100m DOM MPO-12/APC MMF Optical Transceiver Module, Support 2 x 200G-SR and 4 x 100G-SR

The introduction of FS’s 400G QSFP-DD SR4 module effectively fills a critical gap in short-distance 400G data center interconnection solutions. It offers a more comprehensive and versatile choice for modern data centers, enabling them to meet the growing demand for higher bandwidth and efficient data handling capabilities. With this addition, FS continues to lead the way in providing innovative and reliable networking solutions tailored to the needs of today’s data-driven world.

You might be interested in

Knowledge
Knowledge
Knowledge
See profile for Sheldon.
Sheldon
Decoding OLT, ONU, ONT, and ODN in PON Network
Mar 14, 2023
402.4k
Knowledge
See profile for Irving.
Irving
What's the Difference? Hub vs Switch vs Router
Dec 17, 2021
373.0k
Knowledge
See profile for Sheldon.
Sheldon
What Is SFP Port of Gigabit Switch?
Jan 6, 2023
348.8k
Knowledge
See profile for Migelle.
Migelle
PoE vs PoE+ vs PoE++ Switch: How to Choose?
May 30, 2024
431.3k
Knowledge
Knowledge
Knowledge
Knowledge
See profile for Moris.
Moris
How Much Do You Know About Power Cord Types?
Sep 29, 2021
308.0k