NVIDIA Vera Rubin Platform: 5 Key Features for AI Infrastructure

Redefining a unit of compute

NVIDIA’s transition with the NVIDIA Blackwell architecture and NVIDIA Vera Rubin platform marks a departure from how a unit of compute has been historically defined. For years, the industry treated the individual GPU or a single server as the primary building block for AI. With NVIDIA Blackwell, NVIDIA introduced a rack-scale architecture that treats the entire data center as a single unified platform, an approach that continues with the NVIDIA Vera Rubin platform.

This evolution is the result of extreme co-design. When AI models grow to trillions of parameters, the physical distance between chips and the speed of the connections between them become the most significant constraints. NVIDIA has addressed this with the Vera Rubin NVL72 system, where every component from the processors to the cooling acts as one cohesive unit.

Named after the astronomer who provided evidence for dark matter, the Vera Rubin platform is built for a future defined by Agentic AI. These are systems that move beyond simple chat to reason, plan, and execute multi-step tasks. To power this level of intelligence, the new hardware moves data with a level of efficiency that previous server designs have not been able to reach.

1. The NVIDIA Vera CPU and the orchestration of data

The NVIDIA Vera CPU is a processor designed specifically to complement the NVIDIA Rubin GPU. In previous systems, the CPU often acted as a separate controller that could struggle to feed data to the GPU fast enough. This often resulted in stalls where the GPU sat idle while waiting for the CPU to catch up.

NVIDIA Vera CPUs address this with 88 NVIDIA-designed Olympus cores. These cores are optimized for data movement rather than just general-purpose tasks. It also introduces a feature called Spatial Multithreading, which allows the processor to handle 176 threads simultaneously.

Unlike traditional "time-slicing" where a core flips back and forth between tasks, Spatial Multithreading physically partitions the core’s resources. This means two threads can execute at the exact same time without waiting for each other. This architectural change provides the predictable, "zero-lag" performance required for reasoning models that must juggle multiple streams of logic at once. By keeping the data path open, the Vera CPU ensures GPUs stay fully utilized during intensive workloads.

2. Doubling down on memory with HBM4

AI performance is often limited not by how fast a chip can think, but by how fast it can access data. The Vera Rubin platform moves to HBM4 (High Bandwidth Memory 4) to address this specific hurdle. This is a critical transition because as models grow in complexity, the "memory wall"—the gap between processor speed and memory retrieval—becomes a major bottleneck.

Each NVIDIA Rubin GPU features 288GB of HBM4, providing a memory bandwidth of 22TB/s. This represents nearly a 3x increase over the previous NVIDIA Blackwell generation. This jump is particularly important for long-context AI, where a model must process and hold massive amounts of information in its active memory at once, such as thousands of pages of technical documentation or high-resolution video streams.

By widening the data highway, the NVIDIA Vera Rubin platform can handle larger models with fewer physical chips. This changes the economics of AI by allowing organizations to achieve higher throughput without adding more servers to the data center.

3. NVLink 6 and the 260TB/s fabric

The concept of treating the rack as a single computer relies entirely on the speed of the connections between the components. In the Vera Rubin platform, this is handled by NVLink 6 switches. This interconnect technology allows all 72 GPUs in a single rack to communicate as one giant processor.

NVLink 6 switches provide 3.6TB/s of bidirectional bandwidth per GPU. When scaled across the NVIDIA Vera Rubin NVL72 rack, the total aggregate bandwidth reaches 260TB/s. This high-speed fabric is what enables the system to maintain efficiency when training or running inference on models with trillions of parameters.

At IREN, we specialize in building the high-performance computing environments needed to support these large GPU clusters, ensuring that the physical infrastructure can keep up with the speed of the latest hardware.

4. Optimized token economics for MoE models

The NVIDIA Vera Rubin platform introduces a major shift in how efficiently AI models are trained and deployed, specifically for Mixture of Experts (MoE) architectures. In an MoE model, only a specialized subset of the neural network is activated for any given task, which is a more efficient approach than running the entire model for every query.

However, coordinating these experts requires massive internal communication speeds. Because the NVIDIA Vera Rubin platform treats the entire rack as a single system, it can manage these data flows much faster than previous designs. NVIDIA has indicated that it takes only one quarter the number of GPUs to train a trillion parameter MoE model on Vera Rubin compared to the NVIDIA Blackwell generation.

For inference, this efficiency translates to a 10x reduction in the cost per token. By lowering the computational tax required to generate each word or decision, the Vera Rubin platform makes it more economical for organizations to deploy complex, high reasoning AI agents at scale.

5. Thermal innovations for high density compute

As compute density increases, the methods used to manage thermal output are evolving. The NVIDIA Vera Rubin NVL72 is designed to support a fully liquid cooled configuration as a core part of its architecture. While liquid cooling has historically been an optional retrofit in data centers, the Vera Rubin platform is engineered with the intent that liquid cooling be a fundamental component of the system.

The platform utilizes a modular tray design intended to circulate coolant directly to components through high conductivity cold plates. This generation is also compatible with warm water-cooling standards, allowing operation with water temperatures as high as 45°C. This design reduces energy spent on cooling, allowing for a larger portion of a facility’s power capacity to be dedicated to compute.

Building the infrastructure for NVIDIA Vera Rubin NVL72

IREN Cloud™ is developed with a focus on vertical integration and next-generation infrastructure. IREN designs data centers with the intent to support high-density, liquid-cooled configurations that next-generation platforms like NVIDIA Vera Rubin require.

The NVIDIA Vera Rubin platform represents a move toward an integrated model where the data center is viewed as a unified system of processing. Achieving the performance targets associated with HBM4 memory and NVLink 6 switches requires an environment capable of providing high density power and advanced thermal management at scale.

By maintaining ownership of the underlying infrastructure and focusing on purpose-built facilities, IREN aims to provide an environment suitable for the requirements of these architectural advancements.

5 things you need to know about the NVIDIA Vera Rubin Platform