English 



Analyzing GPU to Optical Module Ratios and Demand in HPC Networks

Posted on Dec 22, 2023 by

 1.5k

Multiple approaches to calculating the proportion of optical modules to GPUs are present in the market, leading to inconsistent results. The primary cause for these differences stems from the fluctuating number of optical modules implemented in various network structures. The exact quantity of optical modules required is principally contingent upon several critical determinants.

Network Card Model

It mainly includes two network cards, ConnectX-6 (200Gb/s, mainly used with A100) and ConnectX-7 (400Gb/s, mainly used with H100).

nic

At the same time, the next generation ConnectX-8 800Gb/s is expected to be released in 2024.

Switch Model

It mainly includes two types of switches, the QM 9700 switches(32-port OSFP 2x400Gb/s), a total of 64 channels of 400Gb/s transmission rate, a total of 51.2Tb/s throughput rate).

VXLAN

And the QM8700 switches (40-port QSFP56, a total of 40 channels of 200Gb/s transmission rate, a total of 16Tb/s throughput rate).

VXLAN

Number of Units (Scalable Unit)

The number of units dictates the configuration of the switch network's architecture. A two-tier structure is adopted for smaller quantities, whereas a three-tier architecture is implemented to accommodate larger quantities.

H100 SuperPOD: Each unit consists of 32 nodes (DGX H100servers) and supports a maximum of 4 units to form a cluster, using a two-layer switching architecture.

A100 SuperPOD: Each unit consists of 20 nodes (DGX A100 servers) and supports a maximum of 7 units to form a cluster. If the number of units exceeds 5, a three-layer switching architecture is required.

nic

Optical Module Demand under Four Network Configurations

A100+ConnectX6+QM8700 Three-layer Network: Ratio 1:6, all using 200G optical modules.
A100+ConnectX6+QM9700 Two-layer Network: 1:0.75 of 800G optical modules + 1:1 of 200G optical modules.
H100+ConnectX7+QM9700 Two-layer Network: 1:1.5 of 800G optical modules + 1:1 of 400G optical modules.
H100+ConnectX8 (yet to be released)+QM9700 Three-layer Network: Ratio 1:6, all using 800G optical modules.

Optical Transceivers incremental market:

If we assume the shipment of 300,000 units of H100 and 900,000 units of A100 in 2023, the total demand would be 3.15 million units of 200G, 300,000 units of 400G, and 7.875 million units of 800G optical modules. This would lead to a significant expansion in the HPC market, estimated at $1.38 billion.

In the case of 1.5 million units of H100 and 1.5 million units of A100 being shipped in 2024, the total demand would be 750,000 units of 200G, 750,000 units of 400G, and 6.75 million units of 800G optical modules. This would result in a remarkable growth in the HPC market, estimated at $4.97 billion, which is approximately equal to the combined market size of the optical module industry in 2021.

Below is the meticulous breakdown of the calculations for each scenario mentioned:

The First Case: A100+ConnectX6+QM8700 Three-Layer Network

The A100 GPU is designed with eight compute interfaces, with an equal distribution of four interfaces located on the left and four on the right as depicted in the diagram. At present, the majority of A100 GPU shipments are coupled with ConnectX-6 for outward communications, providing connection speeds of up to 200Gb/s.

nic

In the first-layer architecture, each node has 8 interfaces (ports), and the node is connected to 8 leaf switches. Every 20 nodes form a single unit(SU). Therefore, in the first layer, a total of 8xSU leaf switches are required, along with 8xSUx20 cables and 2x8xSUx20units of 200G optical modules.

nic

In the second-layer architecture, due to the use of a non-blockingdesign, the upstream speed is equal to the downstream speed. In the first layer, the total unidirectional transmission speed is 200G multiplied by the number of cables. Since the second layer also uses 200G transmission speed per cable, the number of cables in the second layer should be the same as in the first layer, requiring 8xSUx20 cables and 2x8xSUx20 units of 200G optical modules. The number of spine switches required is calculated by dividing the number of cables by the number of leaf switches, which results in (8xSUx200) / (8xSU) spine switches needed. However, when there are not enough leaf switches, to save on the number of spine switches, it is possible to have multiple connections between leaf and spine switches (as long as it does not exceed the limit of 40 interfaces). Therefore, when the unit quantity is 1/2/4/5, the required number of spine switches is 4/10/20/20, and the required number of optical modules is 320,/640/1280/1600. The number of spine switches does not increase proportionally, but the number of optical modules does.

When the system expands to encompass seven units, the implementation of a third-layer architectural setup becomes necessary. Owing to its non-blocking configuration, the requisite number of cables in the third layer remains unchanged from that of the second layer.

NVIDIA's suggested blueprint for a SuperPOD entails the integration of networking across seven units, incorporating a third-layer architecture, and the adoption of core switches. The detailed graphic illustrates the varying quantities of switches across different layers and the associated cabling required for diverse unit counts.

nic

For a setup of 140 servers, the total number of A100 GPUs involved is 1,120, achieved by multiplying 140 servers by eight. To support this configuration, 140 QM8790 switches are deployed, alongside 3,360 cables. In addition, the setup necessitates the use of 6,720 200G optical modules. The proportion of A100 GPUs to 200G optical modules stands at a 1:6 ratio, correlating 1,120 GPUs to 6,720 optical modules.

Second Case: A100+ConnectX6+QM9700 Two-Layer Netwoork

At the moment, the particular arrangement described is not part of the suggested configurations. Nonetheless, it is possible that as time progresses, a growing number of A100 GPUs will opt to connect via QM9700 switches. Such a shift would lead to a decrease in the quantity of optical modules required but would create a need for 800G optical modules. The primary distinction can be observed in the connections at the first layer, where the current method of employing eight separate 200G cables would be replaced by the use of QSFP to OSFP adapters, with each adapter facilitating two connections, thereby enabling 1-to-4 connectivity.

nic

In the first layer: For a cluster with 7 units and 140 servers, there is a total of 140x8 = 1,120 interfaces. This corresponds to 280 1 to-4 cables, resulting in a demand for 280 units of 800G and 1, 120 units of 200G optical modules. This requires 12 QM9700 switches.

In the second layer: Utilizing only 800G connections, 280x2 = 5600 units of 800G optical modules are needed along with 9 the QM 9700 switches.

Therefore, for 140 servers and 1,120 A100 GPUs, a total of 21 switches (12 + 9) are required, along with 840 units of 800G optical modules800G optical modules and 1,120 units of 200G optical modules.

The ratio between A100 GPUS and 800G optical modules is 1,120:840, which simplifies to 1:0.75. The ratio between A1000 GPUs and 200G optical modules is 1:1.

The Third Situation: H100+ConnectX7+QM9700 Two-Layer Network

A distinctive feature of the H100's architecture is that the card, despite housing 8 GPUs, comes outfitted with 8 400G networking cards which are combined to form 4 800G interfaces. This fusion generates a considerable need for 800G optical modules.

In the first layer, according to NVIDIA's recommended configuratioon, it is suggested to connect one to the server interface. This can be achieved by using a twin-port connection with two optical cables (MPO), where each cable is inserted into a separate switch.

nic

Therefore, in the first layer, each unit consists of 32 servers,and each server is connected to 24=8 switches. In a SuperkPOD with 4 units, a total of 48=32 leaf switches are required in the first layer.

NVIDIA recommends reserving one node for management purposses (UFM). Since the impact on the usage of optical modules is limited, let's approximate the calculation based on 4 units with a total of 128 servers.

In the first layer, a total of 4x128=512 units of 800G optical modules and 2x4x128=1024 units of 400G optical modules are needed.

nic

In the second layer, the switches are directly connected using 800G optical modules. Each leaf switch is connected downward with anidirectional speed of 32x400G. To ensure consistent upstream and downstream speeds,the upward connection requires a unidirectional speed of 16x800G. This necessitates 16 spine switches, resulting in a total of 4x8x162=1024 units of 800G optical modules needed.

In this architecture, the infrastructure demands a total of 1536 units of 800G optical modules and 1024 units of 400G optical modules. Factoring in the full composition of the SuperPOD, which includes 128 (4x32) servers equipped with 8 H100 GPUs each, there are 1024 H100 GPUs in total. The resulting ratio of GPUs to 800G optical modules is 1:1.5, translating to 1024 GPUs requiring 1536 optical modules. The ratio for GPUs to 400G optical modules is 1:1, with an equal count of 1024 GPUs to 1024 optical modules.

The Fourth Situation: H100+ConnectX8 (not yet released)+QM9700 Three-Layer Network

In this imagined scenario where H100 GPUs receive network card upgrades to 800G, the external interfaces would need to be expanded from four to eight OSFP interfaces. Accordingly, the inter-layer connections would utilize 800G optical modules as well. The fundamental network design remains consistent with the initial scenario, with the sole alteration being the substitution of 200G optical modules for 800G counterparts. Thus, within this network framework, the correlation between the number of GPUs and the requisite optical modules maintains a 1:65 ratio, identical to that of the initial scenario.

We organize the above four situations into the following table:

Projected shipments of 300,000 H100 GPUs and 900,000 A100 GPUs in 2023 would create a combined requirement for 3.15 million 200G optical modules, 300,000 400G optical modules, and 787,500 800G optical modules. Looking toward 2024, with expected deliveries of 1.5 million H100 GPUs and 1.5 million A100 GPUs, the demand would include 750,000 units of 200G optical modules, 750,000 units of 400G optical modules, and a substantial 6.75 million units of 800G optical modules.

For the A100 GPUs, connections are split evenly between 200G switches and 400G switches.
The H100 GPUs are equally divided in their connections to 400G switches and 800G switches.

In Conlusion

As technology continues to progress and advance, the networking field is witnessing the emergence of 400G multimode optical modules, AOCs, and DACs. These high-speed solutions are anticipated to spearhead further developments, providing robust support for the network demands of the digital era. FS specializes in producing a wide range of optical modules, spanning from 1G to 800G. We extend a warm invitation to everyone to explore and acquire our product offerings.