IndustryJune 2, 2026

Nvidia Unveils Nemotron 3 Ultra: A 550-Billion Parameter Open Model Built for Agentic AI

Nvidia unveils Nemotron 3 Ultra at Computex 2026, a 550B parameter open-weights AI model built for agentic workflows with 5x higher throughput.

Nvidia CEO Jensen Huang took the stage at Computex 2026 in Taipei, Taiwan on June 1 to unveil Nemotron 3 Ultra, a massive 500-to-550-billion-parameter open AI model. Positioned as the flagship of Nvidia’s Nemotron 3 lineup, the model represents a significant pivot in the company's strategy as it positions itself as a full-stack AI platform provider rather than just a silicon manufacturer. Designed explicitly for advanced reasoning, complex planning, and long-running agentic workflows, the model aims to establish a new open-weights baseline for corporate and developer ecosystems.

The model’s release rounds out the Nemotron 3 family, which began with the 30-billion-parameter Nemotron 3 Nano in December 2025 and continued with the 100-to-120-billion-parameter Nemotron 3 Super in March 2026. With the Ultra variant, Nvidia is targeting the high-end enterprise sector where deep reasoning and autonomous system execution are paramount.

An informative infographic detailing the Nemotron 3 model family timeline.

Hybrid Architecture and Technical Achievements

At the heart of Nemotron 3 Ultra is a hybrid Mamba-Transformer Mixture-of-Experts (MoE) architecture. By combining the linear scaling efficiency of Mamba with the expressive attention mechanism of Transformers, Nvidia has engineered a model that manages to optimize computational resources. Out of its 500-to-550 billion total parameters, only 50-to-55 billion parameters are active at any given step during inference.

A technical architectural diagram explaining the hybrid Mamba-Transformer Mixture-of-Experts architecture of the Nemotron 3 Ultra.

This selective activation is driven by "LatentMoE," an advanced routing technique that groups experts more effectively, and Multi-Token Prediction (MTP), which accelerates generation speeds. Furthermore, the model was trained using NVFP4 (4-bit precision), a breakthrough format that keeps accuracy high while drastically reducing memory requirements.

Nvidia claims these architectural choices deliver up to 5x higher inference throughput and make Nemotron 3 Ultra up to 30% cheaper to operate compared to contemporary open models of a similar scale. The model also boasts a massive context length of up to 1 million tokens, a capacity essential for long-running agent memory and multi-document reasoning.

A bar chart comparing inference throughput and operational costs of open-weights AI models.

Performance Benchmarks and Market Positioning

In benchmark testing, Nemotron 3 Ultra scored 48 on the Artificial Analysis Intelligence Index. This places it at the top tier of US open-weights models. However, the international landscape remains fiercely competitive; China-led Kimi K2.6 has reportedly scored higher on the same index, underscoring the tight race in global open-weights intelligence.

Nvidia’s open approach is central to its market strategy. Huang emphasized that Nvidia is committed to releasing the model's weights, training data, and recipes. "We're dedicated to building open models for the world, so you can take all of it, add to it, make it even better, make it yours," Huang stated during his keynote speech. "With Nemotron, we're transforming advanced AI into an open platform that gives developers the transparency and efficiency they need to build agentic systems at scale."

A conceptual digital illustration of an autonomous multi-step agentic AI workflow.

The focus on "agentic" capabilities reflects a transition from simple query-and-response chatbots to systems that can plan, execute, and iterate on complex, multi-step tasks with minimal human intervention. Huang dismissed concerns about the technology's impact on employment, declaring that the talk of AI reducing jobs is absolute nonsense and pointing out that more software engineers are currently being hired. He also remarked that AI has evolved to become a direct generator of GDP.

A Broadening AI Ecosystem

The announcement of Nemotron 3 Ultra was accompanied by several other key hardware and software introductions at Computex 2026. Nvidia unveiled the RTX Spark SoC, an Arm-based superchip designed to bring personal AI agents to Windows PCs. For enterprise data centers, the company introduced the Vera CPU and the Vera Rubin AI computing platform, which are built specifically to handle the dense computing demands of agentic workloads.

Nvidia also introduced Cosmos 3, an open-world foundation model geared toward physical AI reasoning, world simulation, and action generation, alongside DLSS 4.5 Ray Reconstruction, which utilizes a second-generation transformer model to improve ray-traced graphics.

Delivery Timeline and Outlook

Nvidia expects to release the public weights, detailed benchmarks, and concrete real-world operational cost breakdowns for Nemotron 3 Ultra around June 4, 2026.

By providing a highly competitive open model with low operational overhead and unmatched context capacity, Nvidia is challenging proprietary model providers. The move allows enterprises to avoid vendor lock-in, customize their agents locally, and utilize Nvidia's full ecosystem—from the RTK Spark chips on the edge to the Vera Rubin systems in the cloud.

Back to AI Nexus Daily