English

Inquiries and Answers about Infiniband Technology

Posted on Dec 26, 2023 by
767

The demand for high-performance computing is constantly rising with the advancements in big data and artificial intelligence technologies. To cater to this demand, NVIDIA Quantum-2 InfiniBand platform offers users exceptional distributed computing performance, enabling high-speed and low-latency data transmission and processing capabilities.

标签

These are common Q&A about IB technology.

Q: Can CX7 NDR200 QSFP112 port be compatible with HDR/EDR cables?

A: Yes,it is compatible.

Q: How can the CX7 NDR network card be connected to the Quantum-2 QM97XX series switch?

A: The CX7 NDR network card utilizes NVIDIA's 400GBASE-SR4 or 400GBASE-DR4 optical modules, while the QM97XX series switch uses 800GBASE-SR8 (equivalent to 2x400GBASE-SR4) or 800GBASE-DR8 (equivalent to 2x400GBASE-DR4) optical modules. These modules are connected using a 12-core multimode universal polarity APC end face patch cord.

Q: Can the CX7 Dual-port 400G achieve 800G through bonding? Why can 200G achieve 400G through bonding?

A: The overall network performance is determined by factors such as PCIe bandwidth bottleneck, network card processing capacity, and physical network port bandwidth. The CX7 network card has a PCIe specification of 5.0 x16, with a theoretical bandwidth limit of 512Gbps. Due to the maximum bandwidth limitation of PCIe 5.0 x16, the hardware for Dual-port 400G is not available on the CX7 network card.

Q: How to connect a one-to-two cable?

A: To achieve optimal performance, a one-to-two cable (800G to 2X400G) needs to be connected to two different servers. This ensures the branch cable is not fully connected to the Ethernet server network card, as GPU servers typically have multiple network cards.

Q: How are one-to-two cables connected in InfiniBand NDR scenarios?

A: In InfiniBand NDR scenarios, there are two types of one-to-two cables. The first type uses optical modules with one-to-two patch cords (400G split into 2x200G), such as MMS4X00-NS400 + MFP7E20-NXXX + MMS4X00-NS400 (downgraded for 200G use). The second type utilizes one-to-two DAC copper cables (800G split into 2x400G), such as MCP7Y00-NXXX or MCP7Y10-NXXX.

Q: In a Superpod network, can four NDR200 cards on each server be directly connected to the same switch using a 1x4 cable, or should two 1x2 cables be used to connect to different switches?

A: It is not recommended to connect the four NDR200 ports on each server to the same switch using a one-to-four cable in a Superpod network. This connection method does not comply with the Superpod network rules. To ensure optimal performance of NCCL/SHARP, the leaf switches should use one-to-four cables to connect the NDR200 ports of different servers in a specific pattern.

Q: Concerning the latest Superpod network, as stated in the Superpod Network White Paper, it involves configuring two IB switches with UFM software separately in the computing network. However, this configuration results in one fewer GPU node in my cluster. If I choose not to set up a separate UFM switch and instead deploy UFM software solely on the management node, can I manage the cluster through another set of storage network without affecting the computing network?

A: It is recommended to configure UFM equipment, including software. Deploying UFM software on the management node within the computing network is an alternative solution, but it should not bear the GPU computing workload. The storage network operates independently as a distinct network plane, and it cannot be used for managing the computing cluster.

Q: What are the distinctions between UFM Enterprise, SDN, Telemetry, and Cyber-Al? Is it necessary to purchase UFM?

A: It is possible to use the opensm and command script tools inccluded in OFED for simple management and monitoring, but it lacks the UFM-friendly graphical user interface and many functions.

标签

Q: Is there any difference in the number of subnet managers required for the switch, OFED, and UFM? Which one is more suitable for customer deployment?

A: Switch management is suitable for networks with up to 2K nodes. UFM and OFED's openSM node management capabilities are unlimited but require coordination with the CPU and hardware processing capabilities of the management node.

Q: Why does a switch with 64 400Gb ports have 32 OSFP ports?

A: The constraint lies in the size and power consumption limitations of the 2U panel, which can only accommodate 32 cages. This configuration is designed for OSFP interfaces that support two 400G ports. It is important to differentiate between the concepts of cage and port for the NDR switch.

Q: Is it possible to connect two modules with different interfaces using a cable to transmit data? For example, connecting an OSFP port on a server to a QSFP112 port on a switch using a cable?

A: The interconnection of modules is independent of packaging. OSFP and QSFP112 primarily describe the physical size of the module. As long as the Ethernet media type is the same (i.e., both ends of the link are 400G-DR4 or 400G-FR4, etc.), OSFP and QSFP112 modules can be mutually compatible.

Q: Can UFM be used to monitor RoCE networks?

A: No, UFM only supports InfiniBand networks.

Q: Are the functionalities of UFM the same for managed and unmanaged switches?

A: Yes, the functionalities remain the same.

Q:What is the maximum transmission distance supported by IB cables without impacting the transmission bandwidth and latency?

A: Optical modules + jumpers can achieve approximately 500m, while passive DAC cables have a range of around 3m, and active ACC cables can reach up to 5m.

Q:Can CX7 network cards be connected to other 400G Ethernet switches that support RDMA in Ethernet mode?

A: It is possible to establish a 400G Ethernet connection, and RDMA (RoCE) can operate under these circumstances, but the performance is not guaranteed. For 400G Ethernet, it is recommended to use the Spectrum-X platform consisting of BF3+Spectrum-4.

Q: If NDR is compatible with HDR and EDR, are these cables and modules only available in one piece?

A: Yes, typically OSFP to 2xQSFP56 DAC/AOC cables are used to ensure compatibility with HDR or EDR.

Q: Should the module on the OSFP network card side be a flat module?

A: The network card comes with a heat sink, so a fat module can be directly used. Finned modules are mainly employed on the liquid-cooled switch side.

Q: Does the IB network card support RDMA in Ethernet mode?

A: RDMA over Ethernet (RoCE) can be enabled, and it is recommended to use the Nvidia Spectrum-X solution.

Q: Why are there no NDR AOCs?

A: OSFP modules are large and heavy, making optical fibers more susceptible to damage. A two-branch cable would have three large transceiver ends, and a four-branch cable would have five transceivers. This increases the risk of fiber breakage during installation, particularly for 30-meter AOCs.

Q: Are the cables the same for 400G IB and 400G Ethernet, apart from the different optical modules?

A: The optical cables are the same, but it's important to note that they are APC type with an 8-degree angle.

Q: Are there specific requirements for the latency performance of CX7 network cards? What is the network latency requirement under optimal debug environments, such as full memory and bound cores? What is an acceptable latency value, e.g., less than how many microseconds?

A:The latency performance is dependent on the frequency and configuration of the testing machine, as well as the testing tools utilized, such as perftest and MPI.

Q: Should the module on the OSFP network card side be an OSFP-flat module? Why is there a mention of OSFP-Riding Heatsink?

A: "Riding heatsink" refers to a heat sink integrated into the cage.

标签

Q: Where does UFM fit into this cluster solution? I would like to understand its role.

A: UFM operates separately on a server and can be treated as a node. It supports high availability using two servers. However, it is not recommended to run UFM on a node that also handles compute workloads.

Q: For what scale of network clusters is UFM recommended?

A: It is recommended to configure UFM for all InfiniBand networks, as UFM provides not only OpenSubnet Manager (openSM) but also other powerful management and interface functions.

Q: Does PCIe 5 only support up to 512G? What about PCIe 4?

A: PCIe Gen5 provides up to 32G x 16 lanes, resulting in a maximum bandwidth of 512G. On the other hand, PCIe Gen4 offers up to 16G x 16 lanes, providing a maximum bandwidth of 256G.

Q: Do IB network cards support simplex or duplex modes?

A: IB network cards are all duplex. Simplex or duplex is merely a concept for current devices, as the physical channels for transmitting and receiving data are already separated.

Q:Can FS provide technical support and high-quality products for building IB network clusters?

A: Of course, FS specializes in providing high-performance computing and data center solutions. It has rich experience and expertise in building IB network clusters and provides a variety of hardware connectivity solutions to meet the needs of different customers.

FS InfiniBand solution includes AOC/DAC cables and modules with speeds of 800G, 400G, 200G, 100G and 56/40G, NVIDIA InfiniBand adapters and NVIDIA InfiniBand switches.In IB network cluster solutions, FS's professional team will provide the appropriate hardware connectivity solutions based on your needs and network scale, ensuring network stability and high performance.

For more information and support, please visit FS.COM.

You might be interested in

Knowledge
Knowledge
Knowledge
See profile for Sheldon.
Sheldon
Decoding OLT, ONU, ONT, and ODN in PON Network
Mar 14, 2023
386.2k
Knowledge
See profile for Irving.
Irving
What's the Difference? Hub vs Switch vs Router
Dec 17, 2021
367.6k
Knowledge
See profile for Sheldon.
Sheldon
What Is SFP Port of Gigabit Switch?
Jan 6, 2023
335.5k
Knowledge
See profile for Migelle.
Migelle
PoE vs PoE+ vs PoE++ Switch: How to Choose?
Mar 16, 2023
420.5k
Knowledge
Knowledge
Knowledge
Knowledge
See profile for Moris.
Moris
How Much Do You Know About Power Cord Types?
Sep 29, 2021
294.7k