##### Data Processing Solutions

# Data Processing

Data engines ready for AI.

[NVIDIA cuDF](https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries/cudf) | [NVIDIA cuVS](https://developer.nvidia.com/topics/ai/generative-ai/cuvs)

Overview

## New Data Demands

To transform your enterprise, AI agents need continuous access to your data, putting strain on data infrastructure not designed for agentic reasoning loops.

By accelerating unstructured and structured data processing with [NVIDIA cuDF](https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries/cudf) and [NVIDIA cuVS](https://developer.nvidia.com/topics/ai/generative-ai/cuvs), enterprises can meet the new volume and velocity of data demands from AI, while leveraging the data infrastructure they've invested in for years.

The world's most popular data engines run on the accelerated computing platform—helping agents access structured data living in tables and unstructured data living as PDFs, emails, images, and videos across the enterprise.

#### NVIDIA cuDF and cuVS Adopted by World's Leading Data Platforms

Learn how leading data platforms are using NVIDIA cuDF and cuVS to accelerate structured analytics and unstructured vector search for AI-ready data.

[Read the Blog](https://blogs.nvidia.com/blog/gtc-2026-news/#data-processing)

### Benefits

## Transform Your Data for AI

### Massive Performance Gains

The accelerated computing platform delivers up to 20x speedup for data processing, enabling enterprises to take action faster with new use cases.

### Significant Cost Savings

By running on the NVIDIA optimized stack, organizations have saved 80% in costs or more, helping your data infrastructure do more with less.

### Easy to Adopt

The world’s most popular analytics and vector data engines have drop-in accelerators to make adoption straightforward, including Apache Spark, OpenSearch, and more.

### AI-Ready Data

With context from 90% of enterprise data stored in PDFs, messages, and emails with NVIDIA cuVS, and ground truth from terabytes of structured data processed in minutes with NVIDIA cuDF, your data is ready for agentic AI.

Products

## CUDA-X for Data Processing

cuDF and cuVS are [CUDA-X™ toolkits](https://developer.nvidia.com/cuda/cuda-x-libraries), built on highly optimized CUDA® primitives, to accelerate the data processing ecosystem.

### cuDF for Structured Data

* Accelerates analytics engines on NVIDIA GPUs
* Includes drop-in accelerators for Apache Spark, Presto, Polars, and DuckDB
* Executes analytical queries in minutes from hours

[Learn More About cuDF](https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries/cudf)

### cuVS for Unstructured Data

* GPU-accelerated vector search and index building for RAG and AI pipelines
* Integrates with OpenSearch, Elastic, Milvus, and more
* Reduces vector index build times from hours to minutes

[Learn More About cuVS](https://developer.nvidia.com/topics/ai/generative-ai/cuvs)

Adopters

## Data Processing Ecosystem

From analytical SQL queries to vector search, organizations are adopting [NVIDIA's accelerated computing platform](https://www.nvidia.com/en-us/data-center/solutions/accelerated-computing.md) into their existing data platforms to accelerate AI-ready pipelines.

## Data Processing **on NVIDIA Vera**

For enterprises running agentic AI workloads at scale, AI agents dramatically increase concurrent, continuous small-scale querying of structured enterprise data. [NVIDIA Vera](https://www.nvidia.com/en-us/data-center/vera-cpu.md) has 1.2 TB/s of memory bandwidth and high-speed on-chip fabric that offers the per-core performance, high throughput, and predictability under load that supports the increased volume and velocity of queries. For the Starburst analytics engine, NVIDIA Vera processed queries 3x faster compared to x86, reducing query execution from minutes to seconds, while the Redpanda streaming engine saw a 6x improvement in p99 versus x86, enhancing the reliability of the data engine.

Coming soon.

[CPU for the Age of AI](https://www.nvidia.com/en-us/data-center/vera-cpu.md)

**Resources**

## The Latest in Data Processing

1. Blogs
2. Sessions
3. Videos

### NVIDIA cuDF and cuVS Adopted by World's Leading Data Platforms

NVIDIA's accelerated computing platform is fueling modern enterprise data processing. Integrated with the world's most widely used open source data engines—downloaded over 200 million times monthly by developers—these libraries are harnessed across enterprise data platforms, databases, and data lakes.

[Learn More](https://blogs.nvidia.com/blog/gtc-2026-news/#data-processing)

### How Snap Scaled A/B Testing With NVIDIA cuDF

Snap processes 10+ petabytes daily for A/B testing across 940M+ users. Accelerating Apache Spark with NVIDIA cuDF on Google Cloud delivered 4x faster runtimes and 76% cost savings.

[Learn More](https://blogs.nvidia.com/blog/snap-accelerated-data-processing/)

### Accelerating Large-Scale Analytics With Velox and NVIDIA cuDF

IBM and NVIDIA integrate cuDF with the Velox execution engine, enabling GPU-native query execution for Presto and Apache Spark—delivering up to 12x faster analytics than CPU-only systems.

[Learn More](https://developer.nvidia.com/blog/accelerating-large-scale-data-analytics-with-gpu-native-velox-and-nvidia-cudf/)

### Data Is the Ground Truth and Context for AI

Hear CEO Jensen Huang's thoughts on the role of the data processing ecosystem in the age of agentic AI.

[Watch Keynote](https://www.youtube.com/live/jw_o0xr8MWU?t=979&si=JW4hXo0TTky8QHAS)

### IBM Reinvents Data Processing

IBM watsonx.data SQL analytics engine Presto is accelerated by cuDF for 5x speedup and 83% cost savings.

[Watch Demo](https://www.youtube.com/watch?v=83cyZerHRaA)

### Processing 100 Million Rows of Data in Under 2 Seconds With Polars

Polars GPU Engine executes polars code on GPUs for massive speedups.

[Watch Demo](https://youtu.be/AoKeit2Fbmw?si=zyLmw2gNSg4zfrU_)

## Next Steps

## Ready to Learn More?

Get the latest on data processing news, content, and events.

[Stay Informed](#stay-informed)

## cuDF

Open source toolkit for structured data using GPU parallelism and memory bandwidth to accelerate data processing and analytics workflows.

[Get Started With cuDF](https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries/cudf)

## cuVS

Open source library for unstructured vector search and data clustering that enables faster vector searches and index builds.

[Get Started With cuVS](https://developer.nvidia.com/topics/ai/generative-ai/cuvs)

## Sign up to receive data science news

Welcome back.
Not you? Log Out

Welcome
back. Not you? Clear form

### AWS

With NVIDIA CUDA-X, AWS delivers 3x faster Apache Spark performance on Amazon EMR using NVIDIA RTX PRO™ 6000 Blackwell GPUs with cuDF acceleration, and AWS OpenSearch can now build billion-scale vector databases in under an hour with cuVS acceleration.

[Read the Blog](https://aws.amazon.com/blogs/machine-learning/aws-and-nvidia-deepen-strategic-collaboration-to-accelerate-ai-from-pilot-to-production/)

### Dell AI Data Platform

Dell AI Data Platform with NVIDIA unifies Dell high-performance storage and computing infrastructure with NVIDIA accelerated computing, software, and libraries to power data processing for enterprise-scale agentic AI workflows with up to 12x speedup for vector indexing and 3X for data processing with cuDF and cuVS.

[Read the Blog](https://www.dell.com/en-us/blog/ai-at-scale-starts-with-your-data-introducing-the-supercharged-dell-ai-data-platform-with-nvidia/)

### Google Cloud

Google Cloud and NVIDIA deliver GPU-accelerated analytics on G4 VMs, with Snap achieving 76% daily cost savings using Apache Spark accelerated by cuDF on GKE.

[Read the Blog](https://cloud.google.com/blog/products/compute/google-cloud-ai-infrastructure-at-nvidia-gtc-2026)

### HPE

With NVIDIA Vera in HPE ProLiant Compute DL394 Gen12, HPE servers can run data processing workloads with the performance and consistency required for agentic AI workloads.

[Read the Press Release](https://www.hpe.com/us/en/newsroom/press-release/2026/03/hpe-accelerates-secure-scalable-production-ready-ai-through-new-innovations-with-nvidia.html)

### IBM watsonx.data

IBM watsonx.data accelerated Presto with NVIDIA cuDF on A100s to execute Nestle’s queries on terabytes of global operations data 5x faster with 83% cost savings.

[Read the Press Release](https://newsroom.ibm.com/2026-03-16-ibm-and-nvidia-announce-expanded-collaboration-at-gtc-2026-to-advance-ai-for-the-enterprise)

### Oracle

Oracle AI Database integrates NVIDIA cuVS and NVIDIA GPUs to accelerate vector index generation by up to 10x, helping enterprises build and refresh indexes faster for semantic search, RAG, and agentic AI applications grounded in governed enterprise data.

[Read the Blog](https://blogs.oracle.com/database/oracle-ai-database-nvidia-collaboration-advances-enterprise-ai-at-nvidia-gtc-2026)

### Redpanda

Redpanda’s real-time data streaming engine alternative to Apache Kafka delivers 6x lower p99 latencies on NVIDIA Vera, delivering the consistent performance enterprises need for their agentic AI workloads.

[Read the Blog](https://www.redpanda.com/blog/nvidia-vera-cpu-performance-benchmark)

### Starburst

On NVIDIA Vera, Starburst’s Trino-powered platform runs up to 3x faster compared to x86 on 1 terabyte of data, supporting AI agents gathering the ground truth of the enterprise.

[Read the Press Release](https://www.starburst.io/press-releases/starburst-announces-day-one-support-for-delivering-unmatched-ai-inference-and-analytics-performance-with-nvidia-vera-cpu/?utm_campaign=Oktopost-Press+FY27&utm_content=Oktopost-LinkedIn&utm_medium=social&utm_source=LinkedIn)