Revolutionizing Data Center Networks: 800G Optical Modules and NDR Switches
With advancements in technologies such as expansive models, cloud computing, and big data analytics, data centers are undergoing a period of explosive growth. The burgeoning need to train and deploy expansive models is presenting new challenges to the supporting frameworks of computing, storage, and networking. The emergence of sophisticated deep learning models like GPT-4, along with the intensive workloads managed on cloud platforms, as well as the requirements of large-scale data analysis and high-performance computing tasks, all necessitate robust data center networks capable of delivering swift.
Building high-speed networks in data centers involves multiple key components, including high-rate network cards, optical moduless, switches, and high performance network interconnect technologies. In this complex network ecosystem, InfiniBand (IB) network technology has emerged as the market leader becoming a crucial means of achieving high-speed data transfer and low-latency communication.
The NDR(400G) dervices within InfiniBand network technology have been widely implemented, establishing them as the superior option for high-velocity data center networks catering to complex models and high-performance computing requirements. On the front of switches, NVIDIA's QM9700 and QM9790 series are the leading equipment. Built on the NVIDIA Quantum-2 architecture, these switches deliver an exceptional 64 NDR 400Gb/s InfiniBand ports within a standard 1U chassis. This breakthrough translates to an individual switch providing a total bidirectional bandwidth of 51.2 terabits per second (Tb/s), along with an unprecedented handling capacity exceeding 66.5 billion packets per second (BPPS).
The NVIDIA Quantum-2 InfiniBand switches extend beyond their NDR high-speed data transfer capabilities, incorporating extensive throughput, on-chip compute processing, advanced intelligent acceleration features, adaptability, and sturdy construction. These attributes establish them as the quintessential selections for sectors involving high-performance computing (HPC), artificial intelligence, and expansive cloud-based infrastructures. Additionally, the integration of NDR switches helps minimize overall expenses and complexity, propelling the progression and evolution of data center network technologies.
The difference between QM9700 and QM9790
Similar to previous generations of IB switches, in NDR switches, the QM9700 is a managed switch, while the QM9790 is an unmanagjed switch. The difference in functionality is that the managed switch runs a Network Operating System (NOS) similar to regular Ethernet switches. It can be accessed and configured directly through a dedicated management port and provides the functionality of a subnet manager (enabled as needed). On the other hand, the unmanaged switch does not have a CPU on the hardware level and does not run NOS6. Configuration is done through a remote configuration tool called mlxconfig. Below are the images depicting the QM9700 (with a management interface on the far right) and the QM9790,:
There are also operational differences between the two. The QM9700, being a managed switch, allows direct login for configuration management. Port and module information can be queried using commands, as shown in the examples below:
Querying port information: show interface ib 1/1/1 (using port 1/1/1 as an example).
Querying port module information: show interface ib 1/1/1 transceiver.
Querying port module DDM (Digital Diagnostic Monitoring): show interface ib 1/1/1 transceiver diagnostics.
For the unmanaged QM9790, configuration management is done by logging into the connected server (or another managed switch). The following steps outline the process:
Enter the "fae" mode.
Enter "ibswitches" to obtain the lid (using lid-1 as an example) of the connected device.
Query module information: mlxlink -d lid-1 -p 1 -m (query modulee information for port 1).
Enable/disable port splitting: mlxconfig -d lid-1 set SPLIT_MODE=1 (0 to disable).
Enable/disable splitting functionality for a specific port: mlxconfig-d lid-1 set SPLIT_PORT[1.32]=1 (0 to disable).
Switch Side Module:OSFP 800G Optical Transceiver
Due to size and power constraints, the 9700/9790 series switches are limited to 32 cages (OSFP). Each physical interface of OSFP actually provides two independent 400G interfaces, referred to as Twin port 400G by NVIDIA. To complement the use of these switches, FS has introduced the OSFP-800G module.
The OSFP-800G SR8 Module is designed for use in 800Gb/s 2xNDR InfiniBand systems throughput up to 30m over OM3 or 50m over OM4 multimode fiber (MMF) using a wavelength of 850nm via dual MTP/MPO-12 connectors. The dual-port design is a key innovation that incorporates two internal transceiver engines, fully unleashing the potential of the switch. This allows the 32 physical interfaces to provide up to 64 400G NDR interfaces. This high-density and higgh-bandwidth design enables data centers to meet the growing network demands and requirements of applications such as high-performance computing artificial intelligence, and cloud infrastructure.
FS's OSFP-800G SR8 Module offers superior performance and dependability, offering strong optical interconnection options for data centers. This module empowers data centers to harness the full performance capabilities of the QM9700/9790 switch series, supporting the transmission of data with both high bandwidth and low latency.