Menu

[![nvidia_logo.svg](https://docscontent.nvidia.com/c5/45/6c65378e44f3a4f1f7fcc81586bc/nvidia-logo.svg)](https://www.nvidia.com)

[Docs Hub](
                                            https://docs.nvidia.com/
                                        )

# NVIDIA Cloud Accelerator (NCX)

[NVIDIA Docs Hub Homepage](https://docs.nvidia.com/)  [NVIDIA Cloud Accelerator (NCX)](https://docs.nvidia.com/ncx/index.html)

---

![dgx-cloud-diagram-nvidia-cloud-accelerator-cc](https://docscontent.nvidia.com/dims4/default/2ca92da/2147483647/strip/true/crop/2719x1133+28+0/resize/1440x600!/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2F22%2Fa2%2Fe14417134d93b2d76b72ab86eeea%2Fdgx-cloud-diagram-nvidia-cloud-accelerator-cc-3.png)

**What is NVIDIA Cloud Accelerator?**

NVIDIA Cloud Accelerator (NCX) is a portfolio of open, modular infrastructure software components that cloud partners can use to build and operate NVIDIA-powered AI clouds. Built from NVIDIA’s own AI factory learnings, it consists of composable building blocks across infrastructure and platform layers, including hardware lifecycle management, health monitoring, and operational automation.

* [Software Components](#nvidiatab-software-components)
* [Ecosystem Partners](#nvidiatab-ecosystem-partners)
* [Reference Guides](#nvidiatab-reference-guides)

[NVIDIA Cloud Functions](https://docs.nvidia.com/cloud-functions/current/latest/)

NVIDIA Cloud Functions (NVCF) is a unified API layer for scaling inference and simulation workloads across one or more Kubernetes clusters.

[Browse](https://docs.nvidia.com/cloud-functions/current/latest/)

[KAI Scheduler](https://github.com/NVIDIA/KAI-Scheduler/blob/main/README.md )

KAI Scheduler is a scalable Kubernetes scheduler optimized for GPU resource allocation across large-scale AI and machine learning systems.

[Browse](https://github.com/NVIDIA/KAI-Scheduler/blob/main/README.md )

[Grove](https://github.com/ai-dynamo/grove/blob/main/README.md)

Grove, a modular component of NVIDIA Dynamo, provides a Kubernetes API for defining and scaling multi-component AI inference workloads.

[Browse](https://github.com/ai-dynamo/grove/blob/main/README.md)

[Dynamo](https://docs.nvidia.com/dynamo)

Dynamo is a distributed inference-serving framework built to deploy models in multi-node environments at data center scale.

[Browse](https://docs.nvidia.com/dynamo)

[NVIDIA Fleet Intelligence](https://docs.nvidia.com/fleet-intelligence/latest/index.html)

NVIDIA Fleet Intelligence is a powerful agent-based managed service that offers continuous GPU health monitoring and predictive failure signals for maximum uptime and stability. Read about [data center fleet management software](https://blogs.nvidia.com/blog/optional-data-center-fleet-management-software/).

[Browse](https://docs.nvidia.com/fleet-intelligence/latest/index.html)

[NCX Infra Controller](https://github.com/NVIDIA/ncx-infra-controller-core/blob/main/README.md)

Bare-metal provisioning and secure lifecycle management for multi-tenant GPU infrastructure.

[Browse](https://github.com/NVIDIA/ncx-infra-controller-core/blob/main/README.md)

[AI Cluster Runtime](https://github.com/NVIDIA/aicr/blob/main/README.md)

AI Cluster Runtime provides a canonical, continuously validated definition of the NVIDIA-accelerated Kubernetes runtime for reproducible AI infrastructure. Read [more](https://developer.nvidia.com/blog/validate-kubernetes-for-gpu-infrastructure-with-layered-reproducible-recipes/).

[Browse](https://github.com/NVIDIA/aicr/blob/main/README.md)

[NVSentinel](https://github.com/NVIDIA/NVSentinel/blob/main/README.md)

Open-source, Kubernetes-native GPU monitoring and fault remediation. NVSentinel helps detect issues early and automate recovery to keep GPU fleets productive. Read about [automating Kubernetes AI cluster health with NVSentinel](https://developer.nvidia.com/blog/automate-kubernetes-ai-cluster-health-with-nvsentinel/).

[Browse](https://github.com/NVIDIA/NVSentinel/blob/main/README.md)

[NVIDIA DOCA Platform Framework (DPF)](https://github.com/NVIDIA/doca-platform/blob/public-main/README.md)

The NVIDIA DOCA Platform Framework (DPF) is an orchestration system to build, deploy, and operate BlueField‑accelerated infrastructure services enabling partners to build secure, multi‑tenant cloud infrastructure for AI and other modern applications.

[Browse](https://github.com/NVIDIA/doca-platform/blob/public-main/README.md)

[NVIDIA Project GPUd](https://github.com/leptonai/gpud)

Project GPUd is a lightweight, production-proven GPU telemetry agent. It integrates with Docker, containers, Kubernetes, and NVIDIA ecosystems, while providing a unified view of critical metrics.

[Browse](https://github.com/leptonai/gpud)

[![ov-dgx-cloud-ari-blog](https://docscontent.nvidia.com/dims4/default/52d2a2c/2147483647/strip/true/crop/1280x533+0+73/resize/1440x600!/quality/90/?url=https%3A%2F%2Fk3-prod-nvidia-docs.s3.us-west-2.amazonaws.com%2Fbrightspot%2F39%2Fd5%2F03e4893844d9b25dc3b54365329f%2Fov-dgx-cloud-ari-blog-1280x680.jpg)](https://www.nvidia.com/en-us/data-center/isv-validation-program/)

[NVIDIA AI Cloud-Ready ISV Validation Initiative](https://www.nvidia.com/en-us/data-center/isv-validation-program.md)

The NVIDIA AI Cloud-Ready ISV Validation Initiative qualifies and validates AI infrastructure and platform software from ISVs for deployment on NVIDIA Cloud Partners (NCPs).

[Learn More](https://www.nvidia.com/en-us/data-center/isv-validation-program.md)

[NVIDIA Cloud Partner Software Reference Guide](https://docs.nvidia.com/ncx/ncp-software-reference-guide/latest/index.html)

This guide is intended to provide NVIDIA Cloud Partners (NCPs), Cloud Service Providers (CSPs), and Independent Software Vendors (ISVs) with an infrastructure-native Northstar reference for building AI cloud services on NCP hardware platforms that can be operated with multi-tenancy and elastic resource allocations.

[Browse](https://docs.nvidia.com/ncx/ncp-software-reference-guide/latest/index.html)

[NVIDIA Cloud Partner Inference Reference Architecture](https://docs.nvidia.com/ncx/ncp-inference-ra/latest/index.html)

This document outlines a software architecture intended to help NVIDIA Cloud Partners (NCPs)—sometimes called operators—build a performant, cost-effective, solution for large-scale AI inference workloads. It is intended to provide NCPs and ISVs with a Northstar definition that will best serve AI practitioners and cloud operators alike.

[Browse](https://docs.nvidia.com/ncx/ncp-inference-ra/latest/index.html)

[NVIDIA Requirements for AI Clouds v2.1](https://docs.nvidia.com/ncx/nvidia-requirements-for-ai-clouds-v2.1.pdf)

These are the standards and expectations for NVIDIA Cloud Partners (NCPs) operating NVIDIA GPU-accelerated AI cloud infrastructure. They cover the full operational stack, from compute and Kubernetes to storage, networking, security, telemetry, and fleet management, and expand on the NVIDIA hardware reference design and NCP Software Reference Guide.

[Browse](https://docs.nvidia.com/ncx/nvidia-requirements-for-ai-clouds-v2.1.pdf)