# NVIDIA GB200 NVL72

Powering the era of accelerated computing.

[Read Datasheet](https://nvdam.widen.net/s/wwnsxrhm2w/blackwell-datasheet-3384703)

Overview

## Unlocking Real-Time Trillion-Parameter Models

The NVIDIA GB200 NVL72 connects 36 Grace CPUs and 72 Blackwell GPUs in a rack-scale, liquid-cooled design. It boasts a 72-GPU NVIDIA NVLink™ domain that acts as a single, massive GPU and delivers 30x faster real-time trillion-parameter large language model (LLM) inference, with 10x greater performance for [mixture-of-experts (MoE) architectures](https://blogs.nvidia.com/blog/mixture-of-experts-frontier-models/).

The GB200 Grace Blackwell Superchip is a key component of the [NVIDIA GB200 NVL72](https://developer.nvidia.com/blog/nvidia-gb200-nvl72-delivers-trillion-parameter-llm-training-and-real-time-inference/), connecting two high-performance NVIDIA Blackwell Tensor Core GPUs and an NVIDIA Grace™ CPU using the NVLink-C2C interconnect to the two Blackwell GPUs.

### The Blackwell Rack-Scale Architecture for Real-Time Trillion-Parameter Inference and Training

The NVIDIA GB200 NVL72 is an exascale computer in a single rack. With 72 NVIDIA Blackwell GPUs interconnected by the largest NVIDIA NVLink domain ever offered, NVLink Switch System provides 130 terabytes per second (TB/s) of low-latency GPU communications for AI and high-performance computing (HPC) workloads.

[Tech Blog](https://developer.nvidia.com/blog/nvidia-gb200-nvl72-delivers-trillion-parameter-llm-training-and-real-time-inference/)

Highlights

## Supercharging Next-Generation AI and Accelerated Computing

### LLM Inference

30x vs. NVIDIA H100 GPU

### LLM Training

4x vs. H100

### Energy Efficiency

25xvs. H100

### Data Processing

18x vs. CPU

LLM inference and energy efficiency: TTL = 50 milliseconds (ms) real time, FTL = 5s, 32,768 input/1,024 output, NVIDIA HGX™ H100 scaled over InfiniBand (IB) vs. GB200 NVL72, training 1.8T MOE 4096x HGX H100 scaled over IB vs. 456x GB200 NVL72 scaled over IB. Cluster size: 32,768  
 A database join and aggregation workload with Snappy / Deflate compression derived from TPC-H Q4 query. Custom query implementations for x86, H100 single GPU and single GPU from GB200 NLV72 vs. Intel Xeon 8480+  
 Projected performance subject to change.

### Real-Time LLM Inference

GB200 NVL72 introduces cutting-edge capabilities and a second-generation Transformer Engine, which enables FP4 AI. When coupled with fifth-generation NVIDIA NVLink, it delivers 30x faster real-time LLM inference performance for trillion-parameter language models. This advancement is made possible with a new generation of Tensor Cores, which introduce new microscaling formats, optimized for high-throughput, low-latency AI Inference. Additionally, the GB200 NVL72 uses NVLink and liquid cooling to create a single massive 72-GPU rack that can overcome communication bottlenecks.

### Massive-Scale Training

GB200 NVL72 features a faster second-generation Transformer Engine, offering FP8 precision and enabling a remarkable 4x faster training for large language models at scale. This breakthrough is complemented by the fifth-generation NVLink, which provides 1.8 TB/s of GPU-to-GPU interconnect, InfiniBand networking, and NVIDIA Magnum IO™ software.

### Energy-Efficient Infrastructure

Liquid-cooled GB200 NVL72 racks reduce a data center’s carbon footprint and energy consumption. Liquid cooling increases compute density, reduces the amount of floor space used, and facilitates high-bandwidth, low-latency GPU communication with large [NVLink domain architectures](https://www.nvidia.com/en-us/data-center/nvlink.md). Compared to NVIDIA H100 air-cooled infrastructure, GB200 delivers 25x more performance at the same power, while reducing water consumption.

### Data Processing

Databases play critical roles in handling, processing, and analyzing large volumes of data for enterprises. GB200 takes advantage of the high-bandwidth memory performance, [NVLink-C2C](https://www.nvidia.com/en-us/data-center/nvlink-c2c.md), and dedicated decompression engines in the [NVIDIA Blackwell architecture](https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture.md) to speed up key database queries by 18x compared to CPU and deliver a 5x better TCO.

## NVIDIA GB200 NVL4

NVIDIA GB200 NVL4 unlocks the future of converged HPC and AI, delivering revolutionary performance through a bridge connecting four NVIDIA NVLink Blackwell GPUs unified with two Grace CPUs over NVLink-C2C interconnect. Compatible with liquid-cooled NVIDIA MGX™ modular servers, it provides up to 2x performance for scientific computing, AI for science training, and inference applications over the prior generation.

[Read Datasheet](https://nvdam.widen.net/s/wwnsxrhm2w/blackwell-datasheet-3384703)

Features

## Technological Breakthroughs

### Blackwell Architecture

The NVIDIA Blackwell architecture delivers groundbreaking advancements in accelerated computing, powering a new era of computing with unparalleled performance, efficiency, and scale.

[Learn More](https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture.md)

### NVIDIA Grace CPU

The NVIDIA Grace CPU is a breakthrough processor designed for modern data centers running AI, cloud, and HPC applications. It provides outstanding performance and memory bandwidth with 2x the energy efficiency of today’s leading server processors.

[Learn More](https://www.nvidia.com/en-us/data-center/grace-cpu-superchip.md)

### Fifth-Generation NVIDIA NVLink

Unlocking the full potential of exascale computing and trillion-parameter AI models requires swift, seamless communication between every GPU in a server cluster. The fifth generation of NVLink is a scale–up interconnect that unleashes accelerated performance for trillion- and multi-trillion-parameter AI models.

[Learn About NVLink and NVLink Switch](https://www.nvidia.com/en-us/data-center/nvlink.md)

### NVIDIA Networking

The data center’s network plays a crucial role in driving AI advancements and performance, serving as the backbone for distributed AI model training and generative AI performance.  [NVIDIA Quantum-X800 InfiniBand](https://nvdam.widen.net/s/hbp8zz7fvt/solution-overview-gtcspring24-quantum-x800-3175164), [NVIDIA Spectrum™-X800 Ethernet](https://nvdam.widen.net/s/xfmlcbklg5/ethernet-solution-overview-spectrum-x800-gtcspring24-3175614), and [NVIDIA® BlueField®-3 DPUs](https://www.nvidia.com/en-us/networking/products/data-processing-unit.md) enable efficient scalability across hundreds and thousands of Blackwell GPUs for optimal application performance.

[Learn End-to-End Networking Solutions](https://www.nvidia.com/en-us/networking.md)

### AI Factory for the New Industrial Revolution

![GB200 NVL72](https://www.nvidia.com/content/dam/en-zz/Solutions/data-center/gb200-nvl2/gb200-nvl72-datacenter-vid-thumb.jpg)

Consent for Optional Cookies

(googleCookiePolicyLink)YouTube sets performance, advertising, and other optional cookies(/googleCookiePolicyLink) when you watch embedded videos. To watch this video, you need to turn on optional cookies for the site. By clicking “Accept and Play Video,” you will automatically turn on advertising and other optional cookies for the site and accept our (nvidiaTermsOfServiceLink)Terms of Service(/nvidiaTermsOfServiceLink) (which contains important waivers). Please see our (nvidiaPrivacyPolicyLink)Privacy Policy(/nvidiaPrivacyPolicyLink) and (nvidiaCookiePolicyLink)Cookie Policy(/nvidiaCookiePolicyLink) for more information.

Cancel

Accept and Play Video

Alternatively, you can (youtubeLink)watch this video on YouTube(/youtubeLink).

## NVIDIA Mission Control

NVIDIA Mission Control streamlines AI factory operations, from workloads to infrastructure, with world-class expertise delivered as software. It powers NVIDIA Grace Blackwell data centers, bringing instant agility for inference and training while providing full-stack intelligence for infrastructure resilience. Every enterprise can run AI with hyperscale efficiency, simplifying and accelerating AI experimentation.

[Run Models, Automate the Essentials](https://www.nvidia.com/en-us/data-center/mission-control.md)

Specifications

## GB200 NVL72 Specs¹

|  |  |  |
| --- | --- | --- |
|  | **GB200 NVL72** | **GB200 Grace Blackwell Superchip** |
| Configuration | 36 Grace CPU | 72 Blackwell GPUs | 1 Grace CPU | 2 Blackwell GPU |
| NVFP4 Tensor Core2 | 1,440 | 720 PFLOPS | 40 | 20 PFLOPS |
| FP8/FP6 Tensor Core2 | 720 PFLOPS | 20 PFLOPS |
| INT8 Tensor Core2 | 720 POPS | 20 POPS |
| FP16/BF16 Tensor Core2 | 360 PFLOPS | 10 PFLOPS |
| TF32 Tensor Core2 | 180 PFLOPS | 5 PFLOPS |
| FP32 | 5,760 TFLOPS | 160 TFLOPS |
| FP64 / FP64 Tensor Core | 2,880 TFLOPS | 80 TFLOPS |
| GPU Memory | Bandwidth | 13.4 TB HBM3E | 576 TB/s | 372 GB HBM3E | 16 TB/s |
| NVLink Bandwidth | 130 TB/s | 3.6 TB/s |
| CPU Core Count | 2,592 Arm® Neoverse V2 cores | 72 Arm Neoverse V2 cores |
| CPU Memory | Bandwidth | 17 TB LPDDR5X | 14 TB/s | Up to 480 GB LPDDR5X | Up to 512 GB/s |
| 1. Specification in sparse | dense.   2. Specification in sparse. Dense is one-half sparse spec shown. | | |

## NVIDIA GB300 NVL72

The NVIDIA GB300 NVL72 features a fully liquid-cooled, rack-scale architecture that integrates 72 NVIDIA Blackwell Ultra GPUs and 36 Arm®-based NVIDIA Grace™ CPUs into a single platform, purpose-built for test-time scaling inference and AI reasoning tasks. AI factories accelerated by the GB300 NVL72—leveraging NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet, ConnectX-8 SuperNICs, and NVIDIA Mission Control management—deliver up to a 50x overall increase in AI factory output performance compared to NVIDIA Hopper-based platforms.

[Learn More](https://www.nvidia.com/en-us/data-center/gb300-nvl72.md)

Get Started

## Stay Up to Date

Sign up to hear when NVIDIA Blackwell becomes available.

[Notify Me](#notify-me)