# NVIDIA Vera Rubin NVL72

Building the next frontier of AI.

[Read Press Release](https://nvidianews.nvidia.com/news/rubin-platform-ai-supercomputer)

Overview

## Seven New Chips, One AI Supercomputer

NVIDIA Vera Rubin NVL72 unifies leading-edge technologies from NVIDIA—72 Rubin GPUs, 36 Vera CPUs, ConnectX®-9 SuperNIC™s, and BlueField®-4 DPUs. It scales up intelligence in a rack-scale platform with the NVIDIA NVLink™ 6 switch and scales out with [NVIDIA Quantum-X800 InfiniBand](https://www.nvidia.com/en-us/networking/products/infiniband/quantum-x800.md) and Spectrum-X™ Ethernet to power the AI industrial revolution at scale. When deployed with NVIDIA Groq 3 LPX racks, Vera Rubin NVL72 delivers a new class of inference performance for trillion-parameter models and million-token context.

Vera Rubin NVL72 is built on the third-generation [NVIDIA MGX™ NVL72 rack](https://www.nvidia.com/en-us/data-center/gb200-nvl72.md) design, offering a seamless transition from prior generations. It delivers AI training with one-fourth the GPUs and AI inference at one-tenth the cost per million tokens versus NVIDIA Blackwell. Featuring cable‑free modular tray designs and support from over 80 MGX ecosystem partners, the rack-scale AI supercomputer delivers world‑class performance with rapid deployment.

### NVIDIA Kicks Off the Next Generation of AI With Rubin

The leading-edge platform scales mainstream adoption, slashing cost per token with five breakthroughs for reasoning and agentic AI models.

[Read the Press Release](https://nvidianews.nvidia.com/news/rubin-platform-ai-supercomputer)

### NVIDIA Vera Rubin Opens the Agentic AI Frontier

The NVIDIA Vera Rubin platform offers seven new chips, now in full production, to scale the world’s largest AI factories.

[Read the Press Release](https://nvidianews.nvidia.com/news/nvidia-vera-rubin-platform)

Performance

## Massive Efficiency Gains in AI Training and Inference

### Boosting Training Efficiency

NVIDIA Rubin trains mixture-of-expert (MoE) models with one-fourth the number of GPUs over the NVIDIA Blackwell architecture.

Projected performance subject to change. Number of GPUs based on a 10T MoE model trained on 100T tokens in a fixed timeframe of 1 month.

LLM inference performance subject to change. Cost per 1 million tokens based on Kimi-K2-Thinking model using 32K/8K ISL/OSL comparing Blackwell NVL72 and Rubin NVL72.

### Driving Down Inference Costs

NVIDIA Rubin delivers one-tenth the cost per million tokens compared to NVIDIA Blackwell for highly interactive, deep reasoning agentic AI.

Technology Breakthroughs

## Inside the AI Supercomputer

### NVIDIA Rubin GPU

Rubin GPUs with HBM4 and 50 PF NVFP4 Transformer Engine made for the next generation of AI.

[Learn More](https://www.nvidia.com/en-us/data-center/technologies/rubin.md)

### NVIDIA Vera CPU

Vera CPUs are purpose-built for data movement and agentic reasoning, delivering high-bandwidth, energy-efficient compute with deterministic performance.

[Learn More](https://www.nvidia.com/en-us/data-center/vera-cpu.md)

### NVIDIA NVLink 6 Switch

NVLink 6 switches feature 3.6 terabytes per second (TB/s) of all-to-all, scale-up bandwidth per GPU, enabling high-speed GPU-to-GPU communications for AI.

[Learn More](https://www.nvidia.com/en-us/data-center/nvlink.md)

### NVIDIA ConnectX-9 SuperNIC

ConnectX‑9 SuperNICs deliver 1.6 terabits per second (Tb/s) of per-GPU bandwidth, with programmable remote direct-memory access (RDMA) for low‑latency, GPU‑direct networking at massive scale.

[Learn More](https://www.nvidia.com/en-us/networking/products/ethernet/supernic.md)

### NVIDIA BlueField-4 DPU

BlueField-4 DPUs accelerate data processing across storage, networking, cybersecurity, and elastic scaling in AI factories.

[Learn More](https://www.nvidia.com/en-us/networking/products/data-processing-unit.md)

### NVIDIA Spectrum-X Ethernet Co-Packaged Optics

Spectrum‑X Ethernet scale‑out switches with integrated silicon photonics deliver 5x better power efficiency, 10x higher network resiliency, and up to 5x more uptime over traditional networking with pluggable transceivers.

[Learn More](https://www.nvidia.com/en-us/networking/products/silicon-photonics.md)

### NVIDIA Groq 3 LPU

This is the inference accelerator for NVIDIA Vera Rubin NVL72, designed to meet the low-latency and large-context demands of agentic systems. The NVIDIA Groq 3 LPX rack features 256 LPUs with 128 GB SRAM, 40 PB/s memory bandwidth, and 640 TB/s scale-up bandwidth per rack. It is co-designed with Vera Rubin NVL72 to deliver 35x inference performance per watt and up to 10x more revenue opportunity for trillion parameter models relative to Blackwell.

[Learn More](https://www.nvidia.com/en-us/data-center/lpx.md)

Specifications¹

## NVIDIA Vera Rubin NVL72 Specs

|  | NVIDIA Vera Rubin NVL72 | NVIDIA Vera Rubin Superchip | NVIDIA Rubin GPU |
| --- | --- | --- | --- |
| Configuration | 72 NVIDIA Rubin GPUs | 36 NVIDIA Vera CPUs | 2 NVIDIA Rubin GPUs | 1 NVIDIA Vera CPU | 1 NVIDIA Rubin GPU |
| NVFP4 Inference | 3,600 PFLOPS | 100 PFLOPS | 50 PFLOPS |
| NVFP4 Training² | 2,520 PFLOPS | 70 PFLOPS | 35 PFLOPS |
| FP8/FP6 Training² | 1,260 PFLOPS | 35 PFLOPS | 17.5 PFLOPS |
| INT8² | 18 POPS | 0.5 POPS | 0.25 POPS |
| FP16/BF16² | 288 PFLOPS | 8 PFLOPS | 4 PFLOPS |
| TF32² | 144 PFLOPS | 4 PFLOPS | 2 PFLOPS |
| FP32 | 9,360 TFLOPS | 260 TFLOPS | 130 TFLOPS |
| FP64 | 2,400 TFLOPS | 67 TFLOPS | 33 TFLOPS |
| FP32 SGEMM³ | 28,800 TFLOPS | 800 TFLOPS | 400 TFLOPS |
| FP64 DGEMM³ | 14,400 TFLOPS | 400 TFLOPS | 200 TFLOPS |
| GPU Memory | Bandwidth | 20.7 TB HBM4 | 1,580 TB/s | 576 GB HBM4 | 44 TB/s | 288 GB HBM4 | 22 TB/s |
| NVLink Bandwidth | 260 TB/s | 7.2 TB/s | 3.6 TB/s |
| NVLink-C2C Bandwidth | 65 TB/s | 1.8 TB/s | - |
| CPU Core Count | 3,168 custom NVIDIA Olympus cores (Arm® compatible) | 88 custom NVIDIA Olympus cores (Arm compatible) | - |
| CPU Memory | 54 TB LPDDR5X | 1.5 TB LPDDR5X | - |
| Total NVIDIA + HBM4 Chips | 1,296 | 30 | 12 |

1. Preliminary information. All values are up to and subject to change.  
 2. Dense specification.  
 3. Peak performance using Tensor Core-based emulation algorithms.

Get Started

## Stay Up to Date on NVIDIA News

Sign up for the latest news, updates, and more from NVIDIA.

[Stay Informed](https://www.nvidia.com/en-us/preferences/email-signup.md)