English

Introduction to NVIDIA DGX H100

Posted on Jan 25, 2024 by
1.8k

The NVIDIA DGX H100 System stands as a dedicated and versatile solution designed for all AI infrastructure and workloads, spanning from analytics and training to inference. It includes NVIDIA Base Command™ and the NVIDIA AI Enterprise software suite, plus expert advice from NVIDIA DGXperts.

DGX H100 Hardware and Component Features

Hardware Overview

The NVIDIA DGX H100 640GB system includes the following components.

Hardware Overview

Front Panel Connections and Controls

On the left is an image of the DGX H100 system with bezel, on the right is an image of the DGX H100 system without bezel.

Front Panel Connections and Controls

Here is an image that shows the real panel modules on DGX H100.

real panel modules on DGX H100.

  • Dimensions: 8U rack-mounted, height 14 inches, maximum width 19 inches, maximum depth 35.3 inches.

  • Weight: Maximum 287.6 pounds (130.45 kg).

  • Input voltage: 200-240 volts AC.

  • Specifications for each power supply: Maximum 10.2 kilowatts, 3300 watts @200-240 volts, 16 amps, 50-60 hertz.

  • Supports high-speed network connections, including InfiniBand and Ethernet, with speeds up to 400Gbps.

External Port Connections and Controls

The following diagram shows the motherboard connections and controls in a DGX H100 system.

External Port Connections and Controls

  • Slot 1: Dual port ConnectX-7 card

  • Slot 2: Dual port ConnectX-7 card

  • Slot 3: 100 Gb/s Ethernet NIC

  • Slot 4: M.2 PCle carrier for Dual 1.92TB NVMe boot drives

  • 2xUSB ports (keyboard or storage)

  • Serial l/O port

  • VGA port (monitor)

  • 1 GbE RJ-45 for Remote System Management

  • 10 GbE RJ-45 for Remote Host

Motherboard Tray Components

The CPU motherboard tray serves as the central component in a server, encompassing both standard servers and those designed for artificial intelligence. It houses essential elements, including the CPU motherboard, system memory, network card, PCIE Switch, and various other components. Here is an image that shows the motherboard tray components in DGX H100.

Motherboard Tray Components

  • System memory: Each 32 DIMMs provides 2 TB of memory.

  • Out-of-band system management (BMC): Supports Redfish, IPMI, SNMP, KVM, and Web user interface.

  • In-band system management: 3 dual-port 100GbE and 10GbE RJ45 interfaces. Mechanical & power specifications.

  • Storage: 

Operating system storage: 2 1.92 TB NVMe M.2 SSDs (RAID 1 array).

Data cache storage: 8 3.84 TB NVMe U.2 SEDs (RAID 0 array).

Motherboard Tray Components

  • Network: 

Cluster network: 4 OSFP ports, supporting InfiniBand (up to 400Gbps) and Ethernet (up to 400GbE).

Storage network: 2 NVIDIA ConnectX-7 dual-port Ethernet cards, supporting Ethernet (up to 400GbE) and InfiniBand (up to 400Gbps).

GPU Tray Components

Here is an image of the GPU tray components in a DGX H100 system.

GPU Tray Components

  • Graphics processor: 8 NVIDIA H100 GPUs, providing 640 GB of GPU memory.

Hopper H100 Tensor Core GPU

  • NVLink: 4 fourth-generation NVLinks, providing 900 GB/s of GPU-to-GPU bandwidth.

The GPU Board Tray serves as the pivotal assembly area within the AI server. At its core is the GPU Board Tray, encompassing essential elements such as GPU components, module boards, and NVSwitches.

DGX H100 System Topology

Here is an image of the DGX H100 system topology, illustrating the connections, configurations, and interrelationships among various hardware components within a system.

DGX H100 System Topology

The Functional Advantages of DGX H100

Artificial Intelligence has become the preferred solution for addressing challenging business challenges. For enterprises, AI is not just about performance and functionality; it also involves close integration with the organization's IT architecture and practices. As a pioneer in AI infrastructure, NVIDIA's DGX system provides the most powerful and comprehensive AI platform to realize these fundamental ideas.

The system is engineered to optimize AI throughput, offering enterprises a highly refined, systematically organized, and scalable platform to enable breakthroughs in natural language processing, recommender systems, data analytics, and more.

The DGX H100 offers versatile deployment options, whether on-premises for direct management, colocated in NVIDIA DGX-Ready data centers, rented through NVIDIA DGX Foundry, or accessed via NVIDIA-certified managed service providers. The DGX-Ready Lifecycle Management program ensures organizations a predictable financial model, keeping their deployment at the forefront of technology. This positions DGX H100 as user-friendly and accessible as traditional IT infrastructure, alleviating additional burdens on busy IT staff.

You might be interested in

Knowledge
Knowledge
Knowledge
See profile for Sheldon.
Sheldon
Decoding OLT, ONU, ONT, and ODN in PON Network
Mar 14, 2023
386.2k
Knowledge
See profile for Irving.
Irving
What's the Difference? Hub vs Switch vs Router
Dec 17, 2021
367.5k
Knowledge
See profile for Sheldon.
Sheldon
What Is SFP Port of Gigabit Switch?
Jan 6, 2023
335.5k
Knowledge
See profile for Migelle.
Migelle
PoE vs PoE+ vs PoE++ Switch: How to Choose?
Mar 16, 2023
420.5k
Knowledge
Knowledge
Knowledge
Knowledge
See profile for Moris.
Moris
How Much Do You Know About Power Cord Types?
Sep 29, 2021
294.6k