Introduction to NVIDIA DGX H100
The NVIDIA DGX H100 System stands as a dedicated and versatile solution designed for all HPC infrastructure and workloads, spanning from analytics and training to inference. It includes NVIDIA Base Command™ and the NVIDIA Enterprise software suite, plus expert advice from NVIDIA DGXperts.
DGX H100 Hardware and Component Features
Hardware Overview
The NVIDIA DGX H100 640GB system includes the following components.
Front Panel Connections and Controls
On the left is an image of the DGX H100 system with bezel, on the right is an image of the DGX H100 system without bezel.
Here is an image that shows the real panel modules on DGX H100.
-
Dimensions: 8U rack-mounted, height 14 inches, maximum width 19 inches, maximum depth 35.3 inches.
-
Weight: Maximum 287.6 pounds (130.45 kg).
-
Input voltage: 200-240 volts AC.
-
Specifications for each power supply: Maximum 10.2 kilowatts, 3300 watts @200-240 volts, 16 amps, 50-60 hertz.
-
Supports high-speed network connections, including InfiniBand and Ethernet, with speeds up to 400Gbps.
External Port Connections and Controls
The following diagram shows the motherboard connections and controls in a DGX H100 system.
-
Slot 1: Dual port ConnectX-7 card
-
Slot 2: Dual port ConnectX-7 card
-
Slot 3: 100 Gb/s Ethernet NIC
-
Slot 4: M.2 PCle carrier for Dual 1.92TB NVMe boot drives
-
2xUSB ports (keyboard or storage)
-
Serial l/O port
-
VGA port (monitor)
-
1 GbE RJ-45 for Remote System Management
-
10 GbE RJ-45 for Remote Host
Motherboard Tray Components
The CPU motherboard tray serves as the central component in a server, encompassing both standard servers and those designed for HPC. It houses essential elements, including the CPU motherboard, system memory, network card, PCIE Switch, and various other components. Here is an image that shows the motherboard tray components in DGX H100.
-
System memory: Each 32 DIMMs provides 2 TB of memory.
-
Out-of-band system management (BMC): Supports Redfish, IPMI, SNMP, KVM, and Web user interface.
-
In-band system management: 3 dual-port 100GbE and 10GbE RJ45 interfaces. Mechanical & power specifications.
-
Storage:
Operating system storage: 2 1.92 TB NVMe M.2 SSDs (RAID 1 array).
Data cache storage: 8 3.84 TB NVMe U.2 SEDs (RAID 0 array).
-
Network:
Cluster network: 4 OSFP ports, supporting InfiniBand (up to 400Gbps) and Ethernet (up to 400GbE).
Storage network: 2 NVIDIA ConnectX-7 dual-port Ethernet cards, supporting Ethernet (up to 400GbE) and InfiniBand (up to 400Gbps).
GPU Tray Components
Here is an image of the GPU tray components in a DGX H100 system.
-
Graphics processor: 8 NVIDIA H100 GPUs, providing 640 GB of GPU memory.
-
NVLink: 4 fourth-generation NVLinks, providing 900 GB/s of GPU-to-GPU bandwidth.
The GPU Board Tray serves as the pivotal assembly area within the HPC server. At its core is the GPU Board Tray, encompassing essential elements such as GPU components, module boards, and NVSwitches.
DGX H100 System Topology
Here is an image of the DGX H100 system topology, illustrating the connections, configurations, and interrelationships among various hardware components within a system.
The Functional Advantages of DGX H100
HPC has become the preferred solution for addressing challenging business challenges. For enterprises, HPC is not just about performance and functionality; it also involves close integration with the organization's IT architecture and practices. As a pioneer in HPC infrastructure, NVIDIA's DGX system provides the most powerful and comprehensive HPC platform to realize these fundamental ideas.
The system is engineered to optimize HPC throughput, offering enterprises a highly refined, systematically organized, and scalable platform to enable breakthroughs in natural language processing, recommender systems, data analytics, and more.
The DGX H100 offers versatile deployment options, whether on-premises for direct management, colocated in NVIDIA DGX-Ready data centers, rented through NVIDIA DGX Foundry, or accessed via NVIDIA-certified managed service providers. The DGX-Ready Lifecycle Management program ensures organizations a predictable financial model, keeping their deployment at the forefront of technology. This positions DGX H100 as user-friendly and accessible as traditional IT infrastructure, alleviating additional burdens on busy IT staff.
You might be interested in
Email Address
-
PoE vs PoE+ vs PoE++ Switch: How to Choose?
May 30, 2024