English

NVIDIA H100 GPU: Uncovering the Engine Behind Next-Generation AI and HPC

Posted on Apr 7, 2024 by
563

Due to the complexity of artificial intelligence (AI), high-performance computing (HPC) and big data analysis, existing computing resources can no longer meet the growing market demand. The emergence of NVIDIA H100 GPU quickly set off a market craze due to its excellent workload processing capabilities. Read this article to learn how NVIDIA H100 GPU interconnect solutions can help you achieve performance improvements and business growth.

What is the NVIDIA H100 GPU?

NVIDIA H100 GPU is the latest product in the DGX series, aims to provide robust support for high-performance computing and data centre applications. The H100 leverages a dedicated Transformer engine tailored for trillion-parameter language models to accelerate workloads in the billions to trillions range. This achieves a significant leap in the scale of artificial intelligence and high-performance computing, offering unprecedented performance, scalability, and security to every data centre. It delivers unparalleled acceleration in the realms of AI, HPC, and graphics processing, addressing the most challenging computational problems. Consequently, it has become the preferred choice for many supercomputing data centres.

For more about H100 GPU, you can read this: Introduction to NVIDIA DGX H100

NVIDIA H100 GPU VS A100 GPU

The A100 is the predecessor of the H100 GPU, which was already released as early as 2020. Built on a 7-nanometer process, it supports AI inference and training. In terms of performance, the H100 GPU can be considered a quantum leap compared to the A100.

Differences in Performance

Compared with the previous generation A100, H100 has gradually enhanced in terms of high throughput and performance. The NVIDIA A100 GPU is known to deliver impressive performance across various benchmarks. In terms of floating point operations, the A100 delivers up to 19.5 TFLOPS (TFLOPS) for double precision (FP64) and up to 39.5 TFLOPS for single precision (FP32) operations. NVIDIA H100 GPU although specific TFLOPS values for double precision (FP64) and single precision (FP32) are not provided, the H100 is designed to significantly increase computational throughput, which is critical for data analysis in scientific simulation and high-performance computing applications.

In terms of AI computing, A100 tensor operations deliver up to 312 TFLOPS for FP16 precision and up to 156 TFLOPS for tensor floating point 32 (TF32) operations. The fourth-generation tensor core is expected to bring significant performance improvements to the H100, making it an extremely powerful AI modelling and deep learning tool.

H100 VS A100

Design Power Comparison

In addition to differences in baseline performance, NVIDIA A100 GPU and NVIDIA H100 GPU differ in thermal design and power efficiency. The A100 GPU comes with 40 GB HBM2 memory, a TDP of 250W, and relatively low power consumption. The H100 PCIe version has a TDP of 350W, close to the 300W TDP of its A100 80GB PCIe counterpart. Therefore, the A100 GPU consumes relatively less power and requires more cooling systems to help dissipate heat. While both can reach TDPs of up to 700w on certain configurations, the H100 GPU is more energy efficient than the A100 GPU. The H100 has improved efficiency, especially in artificial intelligence and deep learning tasks, which can better meet computing performance.

Overall, the performance level of NVIDIA H100 GPU is three times higher than that of A100, while the cost is only 1.5-2 times higher. Therefore, the H100's performance is even more attractive. And from the technical details, compared with the A100, the 16-bit inference speed of the H100 has increased by approximately 3.5 times, and the 16-bit training speed has also increased by 2.3 times.

How to Use NVIDIA H100 GPU to Complete Interconnection?

After understanding the advantages of NVIDIA H100 GPU through the above content, the next step is to study how to complete the H100 connections of the network. NVIDIA interconnects GPUs through NVLink+NVSwitch, bypassing the traditional PCIe bus to achieve higher bandwidth and lower latency.

NVSwitch Connection

NVIDIA third-generation NVSwitch and fourth-generation NVLink technologies provide NVIDIA H100 GPU with a higher-speed point-to-point interconnection solution than A100 GPU. The main purpose of NVLink is to provide a high-speed and point-to-point network for GPU interconnection, and it develops with the evolution of GPU architecture.

In this network architecture, each H100 has 18 NVLink connections, divided into 4 groups, and each group is connected to 4 NVSwitches. These 4 NVSwitch chips collectively have 18 OSFP interfaces for interconnection with GPU nodes. The bandwidth of each NVLink connection is 50GB/s, which is equivalent to 400Gb/s for one OSFP port. A DGX H100 server has 18 OSFP ports, while an NVLink switch has 124 NVLink and 32 OSFP ports. For a single SU containing 32 GPU servers, 18 NVLink switches are required for interconnection.

H100 VS A100

RDMA-InfiniBand Connection

In the IB network architecture, a single HGX H100 8-GPU motherboard is connected to four PCIe switches through eight PCIe Gen5x16 lanes. The interconnection between GPU nodes is achieved through eight network interface cards (NICs) on PCIe switches. These eight NICs typically use CX7 400G network cards and are interconnected through a 400G IB switch. Compared with the previous connection solution, the InfiniBand connection solution achieves 4 x 800G bandwidth.

RDMA-InfiniBand Connection

RDMA-RoCE Connection

The third connection solution is to use RoCE-V2 (RDMA over Ethernet) through the UDP layer of the Ethernet TCP/IP protocol. As the name suggests, it utilizes Ethernet switches, computes network architecture, and quantities consistent with IB networking. As shown below.

RDMA-RoCE Connection

Empowering the Future with Advanced H100 Solution

Explore FS H100 InfiniBand Solution

Further bolstering the architectural framework of the FS H100 InfiniBand solution is its compatibility with NVIDIA H100 GPU servers, forming a robust, reliable, and scalable computing network. This network is not just tailored for AI workloads but is also adept at handling a wide array of intensive computing tasks, including high-performance computing, machine learning, and big data analytics.

The FS H100 InfiniBand solution revolutionises AI network architecture by integrating state-of-the-art technology with high-performance computing capabilities, tailored specifically for AI-optimised data centre networking. At the heart of this solution is the FS NVIDIA® Quantum-2 MQM9790 InfiniBand switch , boasting 64 400Gb/s ports across 32 physical OSFP ports. This setup not only offers unmatched performance and port density but also harnesses the power of NVIDIA's cutting-edge 400Gb/s high-speed interconnect technology. The integration of NVIDIA Quantum-2 InfiniBand into this solution ensures a network architecture that is both high-speed and extremely low-latency, while also being highly scalable. This is further enhanced by incorporating technologies such as RDMA (Remote Direct Memory Access), adaptive routing, and NVIDIA Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)™, all designed to facilitate efficient and stable data transmission.

Advantages of FS H100 InfiniBand Solution

This solution is based on the NVIDIA H100 GPU and customised configurations are tailored according to the network topology of AI architecture, including computational network, management network, and storage network, to meet the requirements of various business scenarios.

Streamlined Management: The AmpCon™ unified platform enables one-click configuration, monitoring, and maintenance of the entire Infiniband H100 network, enhancing network security through automated configuration and comprehensive security policies.

Professional IB Network Architecture: With a global presence spanning over 200 countries and regions, in strategic collaboration with NVIDIA, we offer bespoke solutions and professional technical services, including requirement analysis, solution design, and validation.

Cost-Effective Solutions: Compared to RoCE solutions, IB network architecture delivers superior stability and reliability, reducing network failures and maintenance costs while offering an average cost advantage of approximately 30%.

Global Warehousing: With over 50,000 square meters of global warehouse space, we ensure abundant inventory coverage across 200+ countries, providing seamless supply services with over 90% of orders shipped the same day.

Localized Services: We offer comprehensive localized services, including on-site surveys, installations, and troubleshooting, supplemented by remote online maintenance to help save installation costs and minimize system downtime.

Final Thought

NVIDIA H100 GPU will further promote innovation in artificial intelligence and large-scale computing, bringing huge performance improvements and efficiency gains to future scientific research and engineering fields. More solutions with H100 as the core will continue to be improved and developed.

How FS Can Help

As a global technology leader specialising in high-speed network systems, we deliver top-quality products and services for HPC, Data Centre, Enterprise, and Telecom solutions. FS is committed to providing tailored H100 solutions. If you're interested, please don't hesitate to contact us.

You might be interested in

Knowledge
Knowledge
Knowledge
See profile for Sheldon.
Sheldon
Decoding OLT, ONU, ONT, and ODN in PON Network
Mar 14, 2023
386.2k
Knowledge
See profile for Irving.
Irving
What's the Difference? Hub vs Switch vs Router
Dec 17, 2021
367.6k
Knowledge
See profile for Sheldon.
Sheldon
What Is SFP Port of Gigabit Switch?
Jan 6, 2023
335.6k
Knowledge
See profile for Migelle.
Migelle
PoE vs PoE+ vs PoE++ Switch: How to Choose?
Mar 16, 2023
420.6k
Knowledge
Knowledge
Knowledge
Knowledge
See profile for Moris.
Moris
How Much Do You Know About Power Cord Types?
Sep 29, 2021
294.7k