Revolutionizing Data Centers with 800G Optical Transceivers
What is Driving the Surge in 800G Optical Transceivers for Data Centers?
The rise of advanced technologies has sparked transformative changes across various industries, notably within the optical transceiver market. AI Models like ChatGPT, powered by deep learning techniques, are reshaping data center networks, driving the need for faster, more reliable, and higher-capacity optical transceivers. As these technologies continue to enhance communication and streamline data processing, they have become key factors in the anticipated surge in 800G optical transceivers expected in 2024.
The operation of models such as ChatGPT demands significant cloud computing resources. For instance, GPT-3, with its 175 billion parameters, required 45TB of data and approximately 3640 PF days of computational power during training. Supporting the current user base for ChatGPT alone necessitates an estimated $3-4 billion investment in computing infrastructure, highlighting the critical role of high-performance optical transceivers in sustaining this technological wave.
How HPC Reshapes Data Center Networks
The integration of HPC into data centers has redefined the landscape of data transmission. Traditional data centers, designed for conventional computing workloads, are undergoing a metamorphosis to meet the demands of HPC-driven applications. The key differentiator lies in the way data is processed and transmitted.
Traditional Data Center vs. HPC data center
In a traditional data center, data flows through a hierarchical network architecture, with each layer introducing latency and potential bottlenecks. Initially, data centers adopted the traditional three-tier model, comprising the access layer, aggregation layer, and core layer. The access layer linked computing nodes to cabinet switches, the aggregation layer facilitated interconnections between access layers, and the core layer managed connections between aggregation layers and external networks.
However, as the volume of east-west traffic within data centers rapidly increased, the core and aggregation layers of the three-tier network architecture faced growing tasks and higher performance requirements, resulting in significantly elevated equipment costs. Consequently, a more streamlined leaf-spine network architecture tailored for east-west traffic emerged. In this revised architecture, leaf switches establish direct connections to compute nodes, while spine switches function as core switches, dynamically selecting multiple paths through Equal-Cost Multipath (ECMP).
The leaf-spine network architecture brings several advantages, including high bandwidth utilization, excellent scalability, predictable network latency, and enhanced security. These features make it widely applicable and advantageous for deployment in various data center scenarios.
HPC data centers, on the other hand, leverage parallel processing, distributed computing, and high-speed interconnects to ensure seamless data flow and minimal latency. The need for an unblocked fat-tree network architecture has become crucial due to the substantial internal data traffic. NVIDIA's HPC data centers employ a fat-tree network architecture to ensure unblocked functionality.
The fundamental idea behind it involves utilizing a large number of low-performance switches to construct an extensive unblocked network. This design ensures that, for any communication pattern, there are paths enabling communication bandwidth to match the bandwidth of the network interface cards (NICs), and all switches within the architecture are identical. The fat-tree network architecture finds widespread application in data centers with demanding network requirements, particularly in high-performance computing centers and HPC data centers.
Take NVIDIA's DGX A100 SuperPOD HPC data center system as an example, all three-tier switches consist of NVIDIA Quantum QM8790 40-port switches. The first-tier switches are linked to 1120 Mellanox HDR 200G InfiniBand NICs. In this setup, the second-tier switches' downlink ports connect to the first-tier switches, while their uplink ports connect to the third-tier switches. The third-tier switches exclusively feature downlink ports and are interconnected with the second-tier switches.
Furthermore, the storage side of the system employs a distinct network architecture, kept separate from the compute side. This segregation necessitates a specific number of switches and optical transceivers. Thus, when compared to conventional data centers, the count of switches and optical transceivers in HPC data centers has experienced a substantial increase.
For more details, check The Rise of HPC Data Centers: FS Empowering Next-gen Data Centers.
800G Optical Transceivers Play a Pivotal Role
800G optical transceivers play a pivotal role in this transformation. A single 800G optical transceiver in the optical port can replace two 400G optical transceivers. Additionally, in the electrical port, 8 SerDes channels can be integrated, aligning with the 8 100G channels in the optical port. This design leads to an enhanced channel density in switches, accompanied by a notable reduction in physical size.
The optical transceiver rate is influenced by the network cards, and the network card speed is constrained by the PCIe channel speed. In NVIDIA's A100 DGX servers, internal connections occur through NVLink3 with a unidirectional bandwidth of 300GB/s. However, the A100 GPUs link to ConnectX-6 network cards via 16 PCIe 4.0 channels, yielding a total bandwidth of approximately 200G. Consequently, a 200G optical transceiver or DAC cable is needed to match the network card's bandwidth of 200G.
In the case of H100 DGX servers, internal connections utilize NVLink4 with a unidirectional bandwidth of 450GB/s. The H100 GPUs connect to ConnectX-7 network cards through 16 PCIe 5.0 channels, resulting in a total bandwidth of around 400G for an individual network card. Notably, the optical transceiver speed is influenced by the PCIe bandwidth between the network card and the GPU.
If the internal PCIe channel speed in A100 and H100 DGX servers were to reach 800G (PCIe 6.0), it would become feasible to deploy network cards with an 800G bandwidth and employ 800G optical transceivers. This advancement has the potential to significantly enhance the computational efficiency of the system.
2024 — The Year of 800G Optical Transceivers
Looking ahead, 2024 is poised to be a significant year for the optical transceiver market, with the spotlight on 800G solutions. As of 2019, marked as the point in time for transitioning to 100G optical transceivers, the market presented two upgrade paths: 200G and 400G. However, the upcoming generation of high-speed optical transceivers in the market is exclusively geared towards 800G optical transceivers. Combined with the escalating computational power and competition driven by AI and GC (Generalized Convolutional) networks, it is anticipated that major cloud providers and technology giants in North America are likely to make substantial acquisitions of 800G optical transceivers in 2024.
Amidst this transformative landscape, having a reliable and innovative partner becomes crucial. FS, as a reliable provider of networking solutions, offers a complete 800G portfolio designed for ultra-large-scale cloud data centers worldwide. In 2023, we unveiled a new series of 800G NDR InfiniBand solutions. Our product range encompasses both 800G OSFP and 800G QSFP-DD optical transceiver types. FS also extends its product line to include 800G AOCs and DACs. This helps broaden our support for customers across various industries, ensuring a continuous supply of top-notch and reliable optical network products and solutions.
Conclusion
In conclusion, the confluence of HPC advancements and the optical transceiver market heralds a new era of high-speed and efficient data transmission. The transformative impact of HPC on data center networks underscores the pivotal role of optical transceivers. As we anticipate 2024, the year of 800G optical transceivers, businesses can always rely on FS to navigate the complexities of the HPC era and build resilient, high-performance networks that pave the way for a future of limitless possibilities. Discover how FS can help you build a future-proof HPC data center with our advanced 800G optical transceivers. Learn more today!
You might be interested in
Email Address
-
Cat5/5e, Cat6/6a, Cat7 and Cat8 Cable Buying Guide
Feb 24, 2024