English 



Inquiries and Answers about Infiniband Technology

Posted on Dec 26, 2023 by

 1.2k

The demand for high-performance computing is constantly rising with the advancements in big data and artificial intelligence technologies. To cater to this demand, NVIDIA Quantum-2 InfiniBand platform offers users exceptional distributed computing performance, enabling high-speed and low-latency data transmission and processing capabilities.

These are common Q&A about IB technology.

InfiniBand Switch FAQs

Q1: Why does a switch with 64 400Gb ports have 32 OSFP ports?

A1: The constraint lies in the size and power consumption limitations of the 2U panel, which can only accommodate 32 cages. This configuration is designed for OSFP interfaces that support two 400G ports. It is important to differentiate between the concepts of cage and port for the NDR switch.

Q2: Concerning the latest Superpod network, as stated in the Superpod Network White Paper, it involves configuring two IB switches with UFM software separately in the computing network. However, this configuration results in one fewer GPU node in my cluster. If I choose not to set up a separate UFM switch and instead deploy UFM software solely on the management node, can I manage the cluster through another set of storage network without affecting the computing network?

A2: It is recommended to configure UFM equipment, including software. Deploying UFM software on the management node within the computing network is an alternative solution, but it should not bear the GPU computing workload. The storage network operates independently as a distinct network plane, and it cannot be used for managing the computing cluster.

Q3: Can FS provide technical support and high-quality products for building IB network clusters?

A3: Of course, FS specializes in providing high-performance computing and data center solutions. It has rich experience and expertise in building IB network clusters and provides a variety of hardware connectivity solutions to meet the needs of different customers.

FS InfiniBand solution includes AOC/DAC cables and modules with speeds of 800G, 400G, 200G, 100G and 56/40G, NVIDIA InfiniBand adapters and NVIDIA InfiniBand switches.In IB network cluster solutions, FS's professional team will provide the appropriate hardware connectivity solutions based on your needs and network scale, ensuring network stability and high performance.

For more information and support, please visit FS.COM.

InfiniBand Cable FAQs

Q1: Can CX7 NDR200 QSFP112 port be compatible with HDR/EDR cables?

A1: Yes,it is compatible.

Q2: How to connect a one-to-two cable?

A2: To achieve optimal performance, a one-to-two cable (800G to 2X400G) needs to be connected to two different servers. This ensures the branch cable is not fully connected to the Ethernet server network card, as GPU servers typically have multiple network cards.

Q3: In a Superpod network, can four NDR200 cards on each server be directly connected to the same switch using a 1x4 cable, or should two 1x2 cables be used to connect to different switches?

A3: It is not recommended to connect the four NDR200 ports on each server to the same switch using a one-to-four cable in a Superpod network. This connection method does not comply with the Superpod network rules. To ensure optimal performance of NCCL/SHARP, the leaf switches should use one-to-four cables to connect the NDR200 ports of different servers in a specific pattern.

Q4: If NDR is compatible with HDR and EDR, are these cables and modules only available in one piece?

A4: Yes, typically OSFP to 2xQSFP56 DAC/AOC cables are used to ensure compatibility with HDR or EDR.

Q5: Why are there no NDR AOCs?

A5: OSFP modules are large and heavy, making optical fibers more susceptible to damage. A two-branch cable would have three large transceiver ends, and a four-branch cable would have five transceivers. This increases the risk of fiber breakage during installation, particularly for 30-meter AOCs.

Q6: Are the cables the same for 400G IB and 400G Ethernet, apart from the different optical modules?

A6: The optical cables are the same, but it's important to note that they are APC type with an 8-degree angle.

Q7: What is the maximum transmission distance supported by IB cables without impacting the transmission bandwidth and latency?

A7: Optical modules + jumpers can achieve approximately 500m, while passive DAC cables have a range of around 3m, and active ACC cables can reach up to 5m.

Q8: How are one-to-two cables connected in InfiniBand NDR scenarios?

A8: In InfiniBand NDR scenarios, there are two types of one-to-two cables. The first type uses optical modules with one-to-two patch cords (400G split into 2x200G), such as MMS4X00-NS400 + MFP7E20-NXXX + MMS4X00-NS400 (downgraded for 200G use). The second type utilizes one-to-two DAC copper cables (800G split into 2x400G), such as MCP7Y00-NXXX or MCP7Y10-NXXX.

InfiniBand Adapter FAQs

Q1: How can the CX7 NDR network card be connected to the Quantum-2 QM97XX series switch?

A1: The CX7 NDR network card utilizes NVIDIA's 400GBASE-SR4 or 400GBASE-DR4 optical modules, while the QM97XX series switch uses 800GBASE-SR8 (equivalent to 2x400GBASE-SR4) or 800GBASE-DR8 (equivalent to 2x400GBASE-DR4) optical modules. These modules are connected using a 12-core multimode universal polarity APC end face patch cord.

Q2: Can the CX7 Dual-port 400G achieve 800G through bonding? Why can 200G achieve 400G through bonding?

A2: The overall network performance is determined by factors such as PCIe bandwidth bottleneck, network card processing capacity, and physical network port bandwidth. The CX7 network card has a PCIe specification of 5.0 x16, with a theoretical bandwidth limit of 512Gbps. Due to the maximum bandwidth limitation of PCIe 5.0 x16, the hardware for Dual-port 400G is not available on the CX7 network card.

Q3: Can CX7 network cards be connected to other 400G Ethernet switches that support RDMA in Ethernet mode?

A3: It is possible to establish a 400G Ethernet connection, and RDMA (RoCE) can operate under these circumstances, but the performance is not guaranteed. For 400G Ethernet, it is recommended to use the Spectrum-X platform consisting of BF3+Spectrum-4.

Q4: Should the module on the OSFP network card side be a flat module?

A4: The network card comes with a heat sink, so a fat module can be directly used. Finned modules are mainly employed on the liquid-cooled switch side.

Q5: Does the IB network card support RDMA in Ethernet mode?

A5: RDMA over Ethernet (RoCE) can be enabled, and it is recommended to use the Nvidia Spectrum-X solution.

Q6: Are there specific requirements for the latency performance of CX7 network cards? What is the network latency requirement under optimal debug environments, such as full memory and bound cores? What is an acceptable latency value, e.g., less than how many microseconds?

A6: The latency performance is dependent on the frequency and configuration of the testing machine, as well as the testing tools utilized, such as perfect and MPI.

Q7: Should the module on the OSFP network card side be an OSFP-flat module? Why is there a mention of OSFP-Riding Heatsink?

A7: "Riding heatsink" refers to a heat sink integrated into the cage.

Q8: Does PCIe 5 only support up to 512G? What about PCIe 4?

A8: PCIe Gen5 provides up to 32G x 16 lanes, resulting in a maximum bandwidth of 512G. On the other hand, PCIe Gen4 offers up to 16G x 16 lanes, providing a maximum bandwidth of 256G.

Q9: Do IB network cards support simplex or duplex modes?

A9: IB network cards are all duplex. Simplex or duplex is merely a concept for current devices, as the physical channels for transmitting and receiving data are already separated.

InfiniBand Module FAQs

Q1: Is it possible to connect two modules with different interfaces using a cable to transmit data? For example, connecting an OSFP port on a server to a QSFP112 port on a switch using a cable?

A1: The interconnection of modules is independent of packaging. OSFP and QSFP112 primarily describe the physical size of the module. As long as the Ethernet media type is the same (i.e., both ends of the link are 400G-DR4 or 400G-FR4, etc.), OSFP and QSFP112 modules can be mutually compatible.

UFM FAQs

Q1: Can UFM be used to monitor RoCE networks?

A1: No, UFM only supports InfiniBand networks.

Q2: Are the functionalities of UFM the same for managed and unmanaged switches?

A2: Yes, the functionalities remain the same.

Q3: Is there any difference in the number of subnet managers required for the switch, OFED, and UFM? Which one is more suitable for customer deployment?

A3: Switch management is suitable for networks with up to 2K nodes. UFM and OFED's openSM node management capabilities are unlimited but require coordination with the CPU and hardware processing capabilities of the management node.

Q4: What are the distinctions between UFM Enterprise, SDN, Telemetry, and Cyber-Al? Is it necessary to purchase UFM?

A4: It is possible to use the open and command script tools included in OFED for simple management and monitoring, but it lacks the UFM-friendly graphical user interface and many functions.

Q5: Where does UFM fit into this cluster solution? I would like to understand its role.

A5: UFM operates separately on a server and can be treated as a node. It supports high availability using two servers. However, it is not recommended to run UFM on a node that also handles compute workloads.

Q6: For what scale of network clusters is UFM recommended?

A6: It is recommended to configure UFM for all InfiniBand networks, as UFM provides not only OpenSubnet Manager (openSM) but also other powerful management and interface functions.

Protocol FAQs

Q1: Is InfiniBand a layered protocol?

A1: Yes. The InfiniBand specification defines the protocol in modular layers, which are largely based on the OSI 7-layer model and include layers 1-4. The standard specifies the interfaces between a given layer and the layers immediately above and below. Thus, the lowest physical layer only communicates with the link layer above. The InfiniBand link layer defines two interfaces: one to the physical layer below and another to the network layer above.

Q2: What Is Virtual Protocol Interconnect® and How Does it relate to InfiniBand?

A2: Mellanox's Virtual Protocol Interconnect® (VPI) technology provides customers with the finest available connectivity while assuring affordable scalability. VPI enables an HCA or switch's ports to transition between InfiniBand and Ethernet protocols as needed. This provides maximum networking flexibility for servers and storage systems. This creates an ideal gateway for the switch fabric right out of the box, allowing for integration of InfiniBand and Ethernet fabric and clusters.

Q3: Does InfiniBand support Quality of Service?

A3: Quality of Service (QoS) is the ability of a network to provide different priorities to applications and to guarantee a certain level of performance in the flow of data to those endpoints. InfiniBand supports QoS by creating Virtual Lanes (VL).