Now We’re Talking: NVIDIA Releases Open Dataset, Models for Multilingual Speech AI

Now We’re Talking: NVIDIA Releases Open Dataset, Models for Multilingual Speech AI

Of around 7,000 languages in the world, a tiny fraction are supported by AI language models. NVIDIA is tackling the problem with a new dataset and models that support the development of high-quality speech recognition and translation AI for 25 European languages — including languages with limited available data like Croatian, Estonian and Maltese.

These tools will enable developers to more easily scale AI applications to support global users with fast, accurate speech technology for production-scale use cases such as multilingual chatbots, customer service voice agents and near-real-time translation services. They include:

  • Granary, a massive, open-source corpus of multilingual speech datasets that contains around a million hours of audio, including nearly 650,000 hours for speech recognition and over 350,000 hours for speech translation.
  • NVIDIA Canary-1b-v2, a billion-parameter model trained on Granary for high-quality transcription of European languages, plus translation between English and two dozen supported languages.
  • NVIDIA Parakeet-tdt-0.6b-v3, a streamlined, 600-million-parameter model designed for real-time or large-volume transcription of Granary’s supported languages.

The paper behind Granary will be presented at Interspeech, a language processing conference taking place in the Netherlands, Aug. 17-21. The dataset, as well as the new Canary and Parakeet models, are now available on Hugging Face.

How Granary Addresses Data Scarcity

To develop the Granary dataset, the NVIDIA speech AI team collaborated with researchers from Carnegie Mellon University and Fondazione Bruno Kessler. The team passed unlabeled audio through an innovative processing pipeline powered by NVIDIA NeMo Speech Data Processor toolkit that turned it into structured, high-quality data.

This pipeline allowed the researchers to enhance public speech data into a usable format for AI training, without the need for resource-intensive human annotation. It’s available in open source on GitHub.

With Granary’s clean, ready-to-use data, developers can get a head start building models that tackle transcription and translation tasks in nearly all of the European Union’s 24 official languages, plus Russian and Ukrainian.

For European languages underrepresented in human-annotated datasets, Granary provides a critical resource to develop more inclusive speech technologies that better reflect the linguistic diversity of the continent — all while using less training data.

The team demonstrated in their Interspeech paper that, compared to other popular datasets, it takes around half as much Granary training data to achieve a target accuracy level for automatic speech recognition (ASR) and automatic speech translation (AST).

Tapping NVIDIA NeMo to Turbocharge Transcription

The new Canary and Parakeet models offer examples of the kinds of models developers can build with Granary, customized to their target applications. Canary-1b-v2 is optimized for accuracy on complex tasks, while parakeet-tdt-0.6b-v3 is designed for high-speed, low-latency tasks.

By sharing the methodology behind the Granary dataset and these two models, NVIDIA is enabling the global speech AI developer community to adapt this data processing workflow to other ASR or AST models or additional languages, accelerating speech AI innovation.

Canary-1b-v2, available under a permissive license, expands the Canary family’s supported languages from four to 25. It offers transcription and translation quality comparable to models 3x larger while running inference up to 10x faster.

NVIDIA NeMo, a modular software suite for managing the AI agent lifecycle, accelerated speech AI model development. NeMo Curator, part of the software suite, enabled the team to filter out synthetic examples from the source data so that only high-quality samples were used for model training. The team also harnessed the NeMo Speech Data Processor toolkit for tasks like aligning transcripts with audio files and converting data into the required formats.

Parakeet-tdt-0.6b-v3 prioritizes high throughput and is capable of transcribing 24-minute audio segments in a single inference pass. The model automatically detects the input audio language and transcribes without additional prompting steps.

Both Canary and Parakeet models provide accurate punctuation, capitalization and word-level timestamps in their outputs.

Read more on GitHub and get started with Granary on Hugging Face.

Read More

‘Warhammer 40,000: Dawn of War – Definitive Edition’ Storms GeForce NOW at Launch

‘Warhammer 40,000: Dawn of War – Definitive Edition’ Storms GeForce NOW at Launch

Warhammer 40,000: Dawn of War – Definitive Edition is marching onto GeForce NOW, expanding the cloud gaming platform’s library to over 2,300 supported titles.

Battle is just a click away, as the iconic real-time strategy game joins seven new releases this week. Commanders can prepare their squads and steel their nerves on any device — including laptops, Macs, Steam Decks and NVIDIA SHIELD TVs.

Microsoft’s surprise announcement at Quakecon is now available in the cloud: legendary fantasy shooters Heretic + Hexen have been conjured out of the shadows and are streaming on GeForce NOW.

And don’t miss out on in-game rewards for the popular, free-to-play, massively multiplayer online game World of Tanks as publisher Wargaming celebrates the title’s 15-year anniversary.

GeForce NOW will be at Gamescom 2025 — the world’s largest gaming tradeshow — starting Wednesday, Aug. 20. Stay tuned to GFN Thursday for all the latest updates.

The Emperor’s Call 

Warhammer 40k Dawn of War definitive edition on GeForce NOW
Make your victories shine from the cloud.

The grimdark future calls. Warhammer 40,000: Dawn of War – Definitive Edition storms onto the battlefield with ferocious, squad-based real-time strategy. Command the Space Marines, Orks, Chaos, Eldar and more across four legendary campaigns and nine playable armies. From bolter roars to Waaagh! cries, battles erupt with uncompromising brutality, tactical depth and a healthy dose of swagger.

Fully remastered with enhanced 4K visuals, a refined camera, an improved user interface and more, Dawn of War: Definitive Edition preserves the iconic chaos of the original game while throwing open the gates for creative mayhem. Every charge, psychic blast and last-stand is rendered sharper than ever as cunning, courage and unrelenting war decide the fate of worlds.

GeForce NOW delivers the firepower needed to join the frontlines without having to wait for downloads or lengthy installs. Gamers can leap straight into battle, resume campaigns and join multiplayer chaos with just a few clicks. No frames lost to underpowered hardware — every skirmish, every decisive strike is rendered in full glory in the cloud.

Time to Celebrate

World of Tanks 15 year celebration on GeForce NOW
Make your victories shine from the cloud.

Roll out the tanks for World of Tanks’s 15th-anniversary celebration. Join the party by logging into the game every day through Sunday, Aug. 31 for exclusive commemorative rewards.

Here’s what’s on deck: daily in-game giveaways, deep discounts, a pulse-pounding limited-time game mode and a special Battle Pass chapter packed with surprises. Watch for Twitch drops, enjoy increased credit earnings when playing with veteran tankers and dive into a unique photo-album event where each day reveals a new chapter in the evolution of maps, vehicles and epic memories.

Enjoy smooth, lightning-fast gameplay on GeForce NOW — even on modest hardware — and share every explosive moment with friends, fans and fellow commanders. No download hassles, just pure, seamless action.

Get Hexed

Heretic + Hexen on GeForce NOW
Suit up, pick a class and let chaos reign.

Step into the shadowy worlds that shaped fantasy shooters — fully restored by Nightdive Studios. Heretic + Hexen, the cult classics forged by Raven Software, are back with a vengeance, bringing their spell-slinging attitude and dark magic to a whole new generation.

This definitive collection brings together Heretic: Shadow of the Serpent Riders, Hexen: Beyond Heretic and Hexen: Deathkings of the Dark Citadel — plus two brand-new episodes, Heretic: Faith Renewed and Hexen: Vestiges of Grandeur, crafted with id Software and Nightdive Studios.

Dive into over 110 campaign maps, 120 deathmatch arenas, online and split-screen multiplayer modes, 4K 120 frames-per-second (fps) visuals, modern controls and more spell-slinging action than ever.

Experience the arcane might of Heretic + Hexen with GeForce NOW, which offers instant gameplay on nearly any device, with cloud-powered graphics, ultrasmooth performance and zero downloads. Ultimate members can crank up the magic and stream at up to 4K 120 fps — even without the latest hardware, so every exploding tome and fireball looks spellbindingly sharp.

All Aboard for New Games

Honkai Star Rail V3.5 on GeForce NOW
Outwit the future.

All aboard, Trailblazers. Honkai Star Rail’s new Version 3.5 “Before Their Deaths” is available to stream on GeForce NOW — no need to wait for patches or updates to downloads.

The latest version brings two new playable characters, Hysilens and Imperator Cerydra, who bring fresh abilities and strategies to the game. Journey back a thousand years to ancient Okhema, face the ever-shifting menace Lygus and explore the dazzling streets of Styxia, the City of Infinite Revelry. Between epic battles, serve fairy patrons in the Chrysos Maze Grand Restaurant, mix drinks with old friends and uncover secrets that could change everything. Get ready — the next stop on the Astral Express is about to be unforgettable.

In addition, members can look for the following:

  • Echoes of the End (New release on Steam, Aug. 12)
  • 9 Kings (New release on Xbox, available on PC Game Pass, Aug. 14)
  • Warhammer 40,000: Dawn of War – Definitive Edition (New release on Steam, Aug. 14)
  • Supraworld (New release on Steam, Aug. 15)
  • Crash Bandicoot 4: It’s About Time (New release on Steam and Battle.net)
  • Guntouchables (Steam)
  • Heretic + Hexen (Steam and Xbox, available on PC Game Pass)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

NVIDIA, National Science Foundation Support Ai2 Development of Open AI Models to Drive U.S. Scientific Leadership

NVIDIA, National Science Foundation Support Ai2 Development of Open AI Models to Drive U.S. Scientific Leadership

NVIDIA is partnering with the U.S. National Science Foundation (NSF) to create an AI system that supports the development of multimodal language models for advancing scientific research in the United States.

The partnership supports the NSF Mid-Scale Research Infrastructure project, called Open Multimodal AI Infrastructure to Accelerate Science (OMAI).

“Bringing AI into scientific research has been a game changer,” said Brian Stone, performing the duties of the NSF director. “NSF is proud to partner with NVIDIA to equip America’s scientists with the tools to accelerate breakthroughs. These investments are not just about enabling innovation; they are about securing U.S. global leadership in science and technology and tackling challenges once thought impossible.”

OMAI, part of the work of the Allen Institute for AI, or Ai2, aims to build a national fully open AI ecosystem to drive scientific discovery through AI, while also advancing the science of AI itself.

NVIDIA’s support of OMAI includes providing NVIDIA HGX B300 systems — state-of-the-art AI infrastructure built to accelerate model training and inference with exceptional efficiency — along with the NVIDIA AI Enterprise software platform, empowering OMAI to transform massive datasets into actionable intelligence and breakthrough innovations.

NVIDIA HGX B300 systems are built with NVIDIA Blackwell Ultra GPUs and feature industry-leading high-bandwidth memory and interconnect technologies to deliver groundbreaking acceleration, scalability and efficiency to run the world’s largest models and most demanding workloads.

“AI is the engine of modern science — and large, open models for America’s researchers will ignite the next industrial revolution,” said Jensen Huang, founder and CEO of NVIDIA. “In collaboration with NSF and Ai2, we’re accelerating innovation with state-of-the-art infrastructure that empowers U.S. scientists to generate limitless intelligence, making it America’s most powerful and renewable resource.”

The contributions will support research teams from the University of Washington, the University of Hawaii at Hilo, the University of New Hampshire and the University of New Mexico. The public-private partnership investment in U.S. technology aligns with recent initiatives outlined by the White House AI Action Plan, which supports America’s global AI leadership.

“The models are part of the national research infrastructure — but we can’t build the models without compute, and that’s why NVIDIA is so important to this project,” said Noah Smith, senior director of natural language processing research at Ai2.

Opening Language Models to Advance American Researchers 

Driving some of the fastest-growing applications in history, today’s large language models (LLMs) have many billions of parameters, or internal weights and biases learned in training. LLMs are trained on trillions of words, and multimodal LLMs can ingest images, graphs, tables and more.

But the power of these so-called frontier models can sometimes be out of reach for scientific research when the parameters, training data, code and documentation are not openly available.

“With the model training data in hand, you have the opportunity to trace back to particular training instances similar to a response, and also more systematically study how emerging behaviors relate to the training data,” said Smith.

NVIDIA’s partnership with NSF to support Ai2’s OMAI initiative provides fully open model access to data, open-source data interrogation tools to help refine datasets, as well as documentation and training for early-career researchers — advancing U.S. global leadership in science and engineering.

The Ai2 project — supported by NVIDIA technologies — pledges to make the software and models available at low or zero cost to researchers, similar to open-source code repositories and science-oriented digital libraries. It’s in line with Ai2’s previous work in creating fully open language models and multimodal models, maximizing access.

Driving U.S. Global Leadership in Science and Engineering 

“Winning the AI Race: America’s AI Action Plan” was announced in July by the White House, supported with executive orders to accelerate federal permitting of data center infrastructure and promote exportation of the American AI technology stack.

The OMAI initiative aligns with White House AI Action Plan priorities, emphasizing the acceleration of AI-enabled science and supporting the creation of leading open models to enhance America’s global AI leadership in academic research and education.

Read More

Applications Now Open for $60,000 NVIDIA Graduate Fellowship Awards

Applications Now Open for $60,000 NVIDIA Graduate Fellowship Awards

Bringing together the world’s brightest minds and the latest accelerated computing technology leads to powerful breakthroughs that help tackle some of the biggest research problems.

To foster such innovation, the NVIDIA Graduate Fellowship Program provides grants, mentors and technical support to doctoral students doing outstanding research relevant to NVIDIA technologies. The program, in its 25th year, is now accepting applications worldwide.

It focuses on supporting students working in AI, machine learning, autonomous vehicles, computer graphics, robotics, healthcare, high-performance computing and related fields. Awards are up to $60,000 per student.

Since its start in 2002, the Graduate Fellowship Program has awarded over 200 grants worth more than $7.3 million.

Students must have completed at least their first year of Ph.D.-level studies at the time of application.

The application deadline for the 2026-2027 academic year is Monday, Sept. 15, 2025. An in-person internship at an NVIDIA research office preceding the fellowship year is mandatory; eligible candidates must be available for the internship in summer 2026.

For more on eligibility and how to apply, visit the program website.

Read More

FLUX.1 Kontext NVIDIA NIM Microservice Now Available for Download

FLUX.1 Kontext NVIDIA NIM Microservice Now Available for Download

Black Forest Labs’ FLUX.1 Kontext [dev] image editing model is now available as an NVIDIA NIM microservice.

FLUX.1 models allow users to edit existing images with simple language, without the need for fine-tuning or complex workflows.

Deploying powerful AI requires curation of model variants, adaptation to manage all input and output data, and quantization to reduce VRAM requirements. Models must be converted to work with optimized inference backend software and connected to new AI application programming interfaces.

The FLUX.1 Kontext [dev] NIM microservice simplifies this process, unlocking faster generative AI workflows, and is optimized for RTX AI PCs.

Generative AI in Kontext

FLUX.1 Kontext [dev] is an open-weight generative model built for image editing. It features a guided, step-by-step generation process that makes it easier to control how an image evolves, whether refining small details or transforming an entire scene.

Image generated by FLUX.1 Kontext [dev] with a simple text prompt.

Because the model accepts both text and image inputs, users can easily reference a visual concept and guide how it evolves in a natural and intuitive way. This enables coherent, high-quality image edits that stay true to the original concept.

Guide edits with simple language, without the need for fine-tuning or complex workflows.

The FLUX.1 Kontext [dev] NIM microservice provides prepackaged, optimized files that are ready for one-click download through ComfyUI NIM nodes — making them easily accessible to users.

The original image is revised with six prompts to reach the desired result.

NVIDIA and Black Forest Labs worked together to quantize FLUX.1 Kontext [dev], reducing the model size from 24GB to 12GB for FP8 (NVIDIA Ada Generation GPUs) and 7GB for FP4 (NVIDIA Blackwell architecture). The FP8 checkpoint is optimized for GeForce RTX 40 Series GPUs, which have FP8 accelerators in their Tensor Cores. The FP4 checkpoint is optimized for GeForce RTX 50 Series GPUs and uses a new method called SVDQuant, which preserves image quality while reducing model size.

Speedup compared with BF16 GPU (left, higher is better), and memory usage required to run FLUX.1 Kontext [dev] in different precisions (right, lower is better).

In addition, NVIDIA TensorRT — a framework to access the Tensor Cores in NVIDIA RTX GPUs for maximum performance — provides over 2x acceleration compared with running the original BF16 model with PyTorch.

These dramatic performance gains were previously limited to AI specialists and developers with advanced AI infrastructure knowledge. With the FLUX.1 Kontext [dev] NIM microservice, even enthusiasts can achieve these time savings with greater performance.

Get NIMble

FLUX.1 Kontext [dev] is available on Hugging Face with TensorRT optimizations and ComfyUI.

To get started, follow the directions on ComfyUI’s NIM nodes GitHub:

  1. Install NVIDIA AI Workbench.
  2. Get ComfyUI.
  3. Install NIM nodes through the ComfyUI Manager within the app.
  4. Accept the model licenses on Black Forest Labs’ FLUX.1 Kontext’s [dev] Hugging Face.
  5. The node will prepare the desired workflow and help with downloading all necessary models after clicking “Run.”

NIM microservices are optimized for performance on NVIDIA GeForce RTX and RTX PRO GPUs and include popular models from the AI community. Explore NIM microservices on GitHub and build.nvidia.com.

Each week, the RTX AI Garage blog series features community-driven AI innovations and content for those looking to learn more about NVIDIA NIM microservices and AI Blueprints, as well as building AI agents, creative workflows, productivity apps and more on AI PCs and workstations. 

Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX AI PC newsletter. Join NVIDIA’s Discord server to connect with community developers and AI enthusiasts for discussions on what’s possible with RTX AI.

Follow NVIDIA Workstation on LinkedIn and X

See notice regarding software product information.

Read More

Making Safer Spaces: NVIDIA and Partners Bring Physical AI to Cities and Industrial Infrastructure

Making Safer Spaces: NVIDIA and Partners Bring Physical AI to Cities and Industrial Infrastructure

Physical AI is becoming the foundation of smart cities, facilities and industrial processes across the globe.

NVIDIA is working with companies including Accenture, Avathon, Belden, DeepHow, Milestone Systems and Telit Cinterion to enhance operations across the globe with physical AI-based perception and reasoning.

The continuous loop of simulating, training and deploying physical AI offers sophisticated industrial automation capabilities, making cities and infrastructure safer, smarter and more efficient.

For example, physical AI applications can automate potentially dangerous tasks for workers, such as working with heavy machinery. Physical AI can also improve transportation services and public safety, detect defective products in factories and more.

The need for this is greater than ever. The numbers tell the story:

Statistics in infographic: $7 Trillion lost annually due to poor quality and defects in manufacturing. ~2.8 Million workers die annually from occupational accidents and work-related diseases. 514,000 industrial robots installed worldwide in 2024. $300 billion spent per year on public order and safety in the EU. By 2030, projected global labor shortage of 50 million.

Infrastructure that can perceive, reason and act relies on video sensors and the latest vision AI capabilities. Using the NVIDIA Metropolis platform — which simplifies the development, deployment and scaling of video analytics AI agents and services from the edge to the cloud — developers can build visual perception into their facilities faster to enhance productivity and improve safety across environments.

Below are five leading companies advancing physical AI — and five key NVIDIA Metropolis updates, announced today at the SIGGRAPH computer graphics conference, making such advancements possible.

Five Companies Advancing Physical AI

Global professional services company Accenture is collaborating with Belden, a leading provider of complete connection solutions, to enhance worker safety by creating smart virtual fences that factories can place around large robots to prevent accidents with human operators.

Smart fence image.
Image courtesy of Accenture and Belden.

The smart virtual fence is a physical AI safety system that uses an OpenUSD-based digital twin and physics-grounded simulation to model complex industrial environments. Using computer vision-based mapping and 3D spatial intelligence, the system is adaptive to increased variability in the dynamic human-robot interactions that occur in a modern shopfloor environment.

Accenture taps into the NVIDIA Omniverse platform and Metropolis to build and simulate these smart fences. With Omniverse, Accenture created a digital twin of a robot arm and workers moving in a space. And with Metropolis, the company trained its AI models and deployed them at the edge with video ingestion and the NVIDIA DeepStream software development kit (SDK)’s real-time inference capabilities.

Avathon, an industrial automation platform provider, uses the NVIDIA Blueprint for video search and summarization (VSS), part of NVIDIA Metropolis, to provide manufacturing and energy facilities with real-time insights that improve operational efficiency and worker safety.

Reliance British Petroleum Mobility Limited, a leader in India’s fuel and mobility sector, used the Avathon video intelligence product during the construction of its gas stations to achieve higher standards of safety compliance, a reduction in safety noncompliance incidents and higher productivity by saving thousands of work hours.

DeepHow has developed a “Smart Know-How Companion” for employees in manufacturing and other industries. The companion uses the Metropolis VSS blueprint to transform key workflows into bite-sized, multilingual videos and digital instructions, improving onboarding, safety and floor operator efficiency.

Facing upskilling needs and retiring skilled workers, beverage company Anheuser-Busch InBev turned to the DeepHow platform to convert standard operating procedures into easy-to-understand visual guides. This has slashed onboarding time by 80%, boosted training consistency and improved long-term knowledge retention for employees.

Milestone Systems, which offers one of the world’s largest platforms for managing IP video sensor data in complex industrial and city deployments, is creating the world’s largest real-world computer vision data library through its platform, Project Hafnia. Among its capabilities, the platform provides physical AI developers with access to customized vision language models (VLMs).

Tapping NVIDIA NeMo Curator, Milestone Systems built a VLM fine-tuned for intelligent transportation systems for use within the VSS blueprint to help develop AI agents that better manage city roadways. Milestone Systems is also looking to use the new open, customizable NVIDIA Cosmos Reason VLM for physical AI.

Internet-of-things company Telit Cinterion has integrated NVIDIA TAO Toolkit 6 into its AI-powered visual inspection platform, which uses vision foundation models like FoundationPose, alongside other NVIDIA models, to support multimodal AI and deliver high-performance inferencing. TAO brings low-code AI capabilities to the Telit platform, enabling manufacturers to quickly develop and deploy accurate, custom AI models for defect detection and quality control.

Five NVIDIA Metropolis Updates for Physical AI

Key updates to NVIDIA Metropolis are enhancing developers’ capabilities to build physical AI applications more quickly and easily:

Cosmos Reason VLM

The latest version of Cosmos Reason — NVIDIA’s advanced open, customizable, 7-billion-parameter reasoning VLM for physical AI — enables contextual video understanding, temporal event reasoning for Metropolis use cases. Its compact size makes it easy to deploy from edge to cloud and ideal for automating traffic monitoring, public safety, visual inspection and intelligent decision-making.

VSS Blueprint 2.4

VSS 2.4 makes it easy to quickly augment existing vision AI applications with Cosmos Reason and deliver powerful new features to smart infrastructure. An expanded set of application programming interfaces in the blueprint offers users direct more flexibility in choosing specific VSS components and capabilities to augment computer vision pipelines with generative AI.

New Vision Foundation Models

The NVIDIA TAO Toolkit includes a new suite of vision foundation models, along with advanced fine-tuning methods, self-supervised learning and knowledge distillation capabilities, to optimize deployment of physical AI solutions across edge and cloud environments. The NVIDIA DeepStream SDK includes a new Inference Builder to enable seamless deployment of TAO 6 models.

Companies around the world — including Advex AI, Instrumental AI and Spingence — are experimenting with these new models and NVIDIA TAO to build intelligent solutions that optimize industrial operations and drive efficiency.

NVIDIA Isaac Sim Extensions

New extensions in the NVIDIA Isaac Sim reference application help solve common challenges in vision AI development — such as limited labeled data and rare edge-case scenarios. These tools simulate human and robot interactions, generate rich object-detection datasets, and create incident-based scenes and image-caption pairs to train VLMs, accelerating development and improving AI performance in real-world conditions.

Expanded Hardware Support

All of these Metropolis components can now run on NVIDIA RTX PRO 6000 Blackwell GPUs, the NVIDIA DGX Spark desktop supercomputer and the NVIDIA Jetson Thor platform for physical AI and humanoid robotics — so users can develop and deploy from the edge to the cloud.

Cosmos Reason 1 and NVIDIA TAO 6.0 are now available for download. Sign up to be alerted when VSS 2.4, the Cosmos Reason VLM fine-tuning update and NVIDIA DeepStream 8.0 become available.

Watch the NVIDIA Research special address at SIGGRAPH and learn more about how graphics and simulation innovations come together to drive industrial digitalization by joining NVIDIA at the conference, running through Thursday, Aug. 14.

See notice regarding software product information.

Read More

CrowdStrike, Uber, Zoom Among Industry Pioneers Building Smarter Agents With NVIDIA Nemotron and Cosmos Reasoning Models for Enterprise and Physical AI Applications

CrowdStrike, Uber, Zoom Among Industry Pioneers Building Smarter Agents With NVIDIA Nemotron and Cosmos Reasoning Models for Enterprise and Physical AI Applications

AI agents are poised to deliver as much as $450 billion from revenue gains and cost savings by 2028, according to Capgemini. Developers building these agents are turning to higher-performing reasoning models to improve AI agent platforms and physical AI systems.

At SIGGRAPH, NVIDIA today announced an expansion of two model families with reasoning capabilities — NVIDIA Nemotron and NVIDIA Cosmos — that leaders across industries are using to drive productivity via teams of AI agents and humanoid robots.

CrowdStrike, Uber, Magna, NetApp and Zoom are among some of the enterprises tapping into these model families.

New NVIDIA Nemotron Nano 2 and Llama Nemotron Super 1.5 models offer the highest accuracy in their size categories for scientific reasoning, math, coding, tool-calling, instruction-following and chat. These new models give AI agents the power to think more deeply and work more efficiently — exploring broader options, speeding up research and delivering smarter results within set time limits.

Think of the model as the brain of an AI agent — it provides the core intelligence. But to make that brain useful for a business, it must be embedded into an agent that understands specific workflows, in addition to industry and business jargon, and operates safely. NVIDIA helps enterprises bridge that gap with leading libraries and AI blueprints for onboarding, customizing and governing AI agents at scale.

Cosmos Reason is a new reasoning vision language model (VLM) for physical AI applications that excels in understanding how the real world works, using structured reasoning to understand concepts like physics, object permanence and space-time alignment.

Cosmos Reason is purpose-built to serve as the reasoning backbone to a robot vision language action (VLA) model, or critique and caption training data for robotics and autonomous vehicles, and equip runtime visual AI agents with spatial-temporal understanding and reasoning of physical operations, like in factories or cities.

Nemotron: Highest Accuracy and Efficiency for Agentic Enterprise AI

As enterprises develop AI agents to tackle complex, multistep tasks, models that can provide strong reasoning accuracy with efficient token generation enable intelligent, autonomous decision-making at scale.

NVIDIA Nemotron is a family of advanced open reasoning models that use leading models, NVIDIA-curated open datasets and advanced AI techniques to provide an accurate and efficient starting point for AI agents.

The latest Nemotron models deliver leading efficiency in three ways: a new hybrid model architecture, compact quantized models and a configurable thinking budget that provides developers with control over token generation, resulting in 60% lower reasoning costs. This combination lets the models reason more deeply and respond faster, without needing more time or computing power. This means better results at a lower cost.

Nemotron Nano 2 provides as much as 6x higher token generation compared with other leading models of its size.

Llama Nemotron Super 1.5 achieves leading performance and the highest reasoning accuracy in its class, empowering AI agents to reason better, make smarter decisions and handle complex tasks independently. It’s now available in NVFP4, or 4-bit floating point, which delivers as much as 6x higher throughput on NVIDIA B200 GPUs compared with NVIDIA H100 GPUs.

 

The chart above shows the Nemotron model delivers top reasoning accuracy in the same timeframe and on the same compute budget, delivering the highest accuracy per dollar.

Along with the two new Nemotron models, NVIDIA is also announcing its first open VLM training dataset — Llama Nemotron VLM dataset v1 — with 3 million samples of optical character recognition, visual QA and captioning data that power the previously released Llama 3.1 Nemotron Nano VL 8B model.

In addition to the accuracy of the reasoning models, agents also rely on retrieval-augmented generation to fetch the latest and most relevant information from connected data across disparate sources to make informed decisions. The recently released Llama 3.2 NeMo Retriever embedding model tops three visual document retrieval leaderboards — ViDoRe V1, ViDoRe V2 and MTEB VisualDocumentRetrieval — for boosting agentic system accuracy.

Using these reasoning and information retrieval models, a deep research agent built using the AI-Q NVIDIA Blueprint is currently No. 1 for open and portable agents on DeepResearch Bench.

NVIDIA NeMo and NVIDIA NIM microservices support the entire AI agent lifecycle — from development and deployment to monitoring and optimization of the agentic systems.

Cosmos Reason: A Breakthrough in Physical AI

VLMs marked a breakthrough for computer vision and robotics, empowering machines to identify objects and patterns. However, nonreasoning VLMs lack the ability to understand and interact with the real world — meaning they can’t handle ambiguity or novel experiences, nor solve complex multistep tasks.

NVIDIA Cosmos Reason is a new open, customizable, 7-billion-parameter reasoning VLM for physical AI and robotics. Cosmos Reason lets robots and vision AI agents reason like humans, using prior knowledge, physics understanding and common sense to understand and act in the physical world.

Cosmos Reason enables advanced capabilities across robotics and physical AI applications such as training data critiquing and captioning, robot decision-making and video analytics AI agents.

It can help automate the curation and annotation of large, diverse training datasets, accelerating the development of high-accuracy AI models. It can also serve as a sophisticated reasoning engine for robot planning, parsing complex instructions into actionable steps for VLA models, even in new environments.

It also powers video analytics AI agents built on the NVIDIA Blueprint for video search and summarization (VSS), enabled by the NVIDIA Metropolis platform, gleaning valuable insights from massive volumes of stored or live video data. These visually perceptive and interactive AI agents can help streamline operations in factories, warehouses, retail stores, airports, traffic intersections and more by spotting anomalies.

NVIDIA’s robotics research team uses Cosmos Reason for data filtration and curation, and as the “System 2” reasoning VLM behind VLA models such as the next versions of NVIDIA Isaac GR00T NX.

Now Serving: NVIDIA Reasoning Models for AI Agents and Robots Everywhere

Diverse enterprises and consulting leaders are adopting NVIDIA’s latest reasoning models. Leaders spanning cybersecurity to telecommunications are among those working with Nemotron to build enterprise AI agents.

Zoom plans to harness Nemotron reasoning models with Zoom AI Companion to make decisions and manage multistep tasks to take action for users across Zoom Meetings, Zoom Chat and Zoom documents.

CrowdStrike is testing Nemotron models to enable its Charlotte AI agents to write queries on the CrowdStrike Falcon platform.

Amdocs is using NVIDIA Nemotron models in its amAIz Suite to drive AI agents to handle complex, multistep automation spanning care, sales, network and customer support.

EY is adopting Nemotron Nano 2, given its high throughput, to support agentic AI in large organizations for tax, risk management and finance use cases.

NetApp is currently testing Nemotron reasoning models so that AI agents can search and analyze business data

DataRobot is working with Nemotron models for its Agent Workforce Platform for end-to-end agent lifecycle management.

Tabnine is working with Nemotron models for suggesting and automating coding tasks on behalf of developers.

Automation Anywhere, CrewAI and Dataiku are among the additional agentic AI software developers integrating Nemotron models into their platforms.

Leading companies across transportation, safety and AI intelligence are using Cosmos Reason to advance autonomous driving, video analytics, and road and workplace safety.

Uber is exploring Cosmos Reason to analyze autonomous vehicle behavior. In addition, Uber is post-training Cosmos Reason to summarize visual data and analyze scenarios like pedestrians walking across highways to perform quality analysis and inform autonomous driving behavior.

Cosmos Reason can also serve as the brain of autonomous vehicles. It lets robots interpret environments and, given complex commands, break them down into tasks and execute them using common sense, even in unfamiliar environments.

Centific is testing Cosmos Reason to enhance its AI-powered video intelligence platform. The VLM enables the platform to process complex video data into actionable insights, helping reduce false positives and improve decision-making efficiency.

VAST is advancing real-time urban intelligence using NVIDIA Cosmos Reason with its AI operating system to process massive video streams at scale. With the VSS Blueprint, VAST can build agents that can identify incidents and trigger responses, turning video streams and metadata into actionable, proactive public safety tools.

Ambient.ai is working with Cosmos Reason’s temporal, physics-aware reasoning, to enable automated detection of missing personal protection equipment and monitoring of hazardous conditions, helping enhance environmental health and safety across construction, manufacturing, logistics and other industrial settings.

Magna is developing with Cosmos Reason as part of its City Delivery Platform — a fully autonomous, low-cost solution for instant delivery — to help vehicles adapt more quickly to new cities. The model adds world understanding to the vehicles’ long-term trajectory planning.

These models are expected to be available as NVIDIA NIM microservices for secure, reliable deployment on any NVIDIA-accelerated infrastructure for maximum privacy and control. They are planned to be available soon through Amazon Bedrock and Amazon SageMaker AI for Nemotron models, as well as through Azure AI Foundry, Oracle Data Science Platform and Google Vertex AI.

Try Cosmos Reason on build.nvidia.com or download it from Hugging Face or GitHub

Nemotron Nano 2 and Llama Nemotron Super 1.5 (NVFP4) will be available soon for download. Meanwhile, learn more about Nemotron models and download previous versions.

Download the Llama Nemotron VLM Dataset v1 from Hugging Face.

Watch the NVIDIA Research special address at SIGGRAPH and learn more about how graphics and simulation innovations come together to drive industrial digitalization by joining NVIDIA at the conference, running through Thursday, Aug. 14.

See notice regarding software product information.

Read More

Mini Footprint, Mighty AI: NVIDIA Blackwell Architecture Powers AI Acceleration in Compact Workstations

Mini Footprint, Mighty AI: NVIDIA Blackwell Architecture Powers AI Acceleration in Compact Workstations

Packing the power of the NVIDIA Blackwell architecture in compact, energy-efficient form factors, the NVIDIA RTX PRO 4000 Blackwell SFF Edition and NVIDIA RTX PRO 2000 Blackwell GPUs are coming soon — delivering AI acceleration for professional workflows across industries.

Applications are becoming increasingly AI accelerated, and more users need AI performance, no matter the size or shape of their workstation.

The RTX PRO 4000 SFF and RTX PRO 2000 feature fourth-generation RT Cores and fifth-generation Tensor Cores with lower power in half the size of a traditional GPU.

The new GPUs are designed to bring next-generation performance to a range of professional workflows, providing incredible speedups for engineering, design, content creation, AI and 3D visualization.

Compared with the previous-generation architecture, the RTX PRO 4000 SFF features up to 2.5x higher AI performance, 1.7x higher ray-tracing performance and 1.5x more bandwidth, creating more efficiency with the same 70-watt max power consumption.

Optimized for mainstream design and AI workflows, the RTX PRO 2000 offers up to 1.6x faster 3D modeling, 1.4x faster computer-aided design (CAD) performance and 1.6x quicker rendering speeds compared with the previous generation.

The NVIDIA RTX PRO 2000 Blackwell.

CAD and product engineers as well as creatives will benefit from the RTX PRO 2000 GPU’s 1.4x boost in image generation and 2.3x leap in text generation, enabling faster iteration, rapid prototyping and seamless collaboration.

Businesses Tap NVIDIA RTX PRO for Speedups

Businesses across fields including engineering, construction, architecture, media and entertainment, and healthcare are using RTX PRO Blackwell GPUs to instantly accomplish tasks that previously took hours.

The Mile High Flood District protects people, property and the environment in the Denver, Colorado, metro area by managing flood risks with regional watershed planning, early warning systems, stream restoration and stormwater control, in collaboration with local governments.

“Mile High Flood District runs complex flood simulations, massive 3D visualizations and real-time AI workflows — and with nearly double the CUDA cores, NVIDIA RTX PRO 2000 Blackwell is a big step up in performance compared with the NVIDIA RTX 2000 Ada Generation GPU,” said Jon Villines, innovation manager at Mile High Flood District. “NVIDIA RTX PRO allows us to more easily handle increasingly larger geographic information systems, as well as hydraulic and hydrologic datasets.”

The Government of Cantabria Geospatial Office is responsible for analyzing and visualizing high-resolution geographic information system data for government and public use.

“We tested the NVIDIA RTX PRO 2000 Blackwell and were very impressed with its performance on geospatial workloads with Esri ArcGIS Pro,” said Gabriel Ortiz Rico, chief of service of cartography and geographic information systems at the Government of Cantabria. “Fine-tuning of AI models is 2x faster compared with using the RTX 2000 Ada due to the RTX 2000 Blackwell’s additional Tensor Cores and GDDR7 memory.”

Studio Tim Fu (STF) is a London-based design studio specializing in the integration of human creativity and AI with architecture and design.

“The RTX PRO 2000 Blackwell powers our UrbanGPT application for real-time text-to-3D urban design, which can be used to generate dynamic city layouts, track vital metrics like program and floor areas, and produce realistic massing distribution across complex urban design scenarios,” said Tim Fu, director of STF. “From zoning simulations to large-scale massing studies, this technology accelerates our AI-driven design engine with the stability and responsiveness needed for city-scale planning.”

New York-based Thornton Tomasetti is a global engineering and design consulting firm integrating engineering, science, technology and forensic analysis to advance performance, resilience and innovation in the built environment and beyond.

“At Thornton Tomasetti, we’re constantly advancing computational engineering,” said Rob Otani, chief technology officer of Thornton Tomasetti. “We benchmarked the RTX PRO 2000 Blackwell on CORE.Matrix — our in-house, GPU-based Finite Element Analysis solver — running almost 3x faster than with the RTX 2000 Ada and 27x faster than with a standard CPU. This enabled us to accelerate our structural analysis workflows for more iterative, design-integrated engineering.”

Glüxkind is a technology company that creates AI-powered smart baby strollers designed to improve safety, convenience and accessibility for parents and their children.

“Integrating the latest generation of advanced GPUs like the RTX PRO 2000 enables Glüxkind to push the boundaries of what’s possible in AI-powered parenting solutions,” said Kevin Huang, CEO of Glüxkind. “The RTX PRO 2000’s enhanced AI and graphics performance give us the real-time processing power needed to make our smart strollers safer, more responsive and more convenient for families everywhere.”

The Software Driving Innovation

NVIDIA’s software ecosystem enables creators, developers and enterprises to harness the full power of AI and advanced graphics.

The NVIDIA AI Enterprise software suite delivers enterprise-grade tools for building, deploying and scaling production AI — from generative AI and computer vision to speech and natural language solutions — on virtually any infrastructure.

The NVIDIA Cosmos platform offers world foundation models optimized for fast, efficient inference and edge deployment, enabling high-performance AI for robotics, automation and physical AI applications. The Cosmos-Reason1-7B model can run seamlessly on the RTX PRO 4000 SFF, delivering powerful physical AI reasoning capabilities to edge devices, compact workstations and industrial systems.

NVIDIA’s graphics and visualization tools, including the NVIDIA Omniverse platform, bring generative physical AI and simulation to 3D design teams, facilitating digital twins and visual workflows.

In addition, the Blackwell platform builds on NVIDIA’s ecosystem of powerful development tools, NVIDIA CUDA-X libraries, over 6 million developers and close to 6,000 applications to scale performance across thousands of GPUs.

Availability

The NVIDIA RTX PRO 2000 Blackwell and NVIDIA RTX PRO 4000 Blackwell SFF Edition GPUs are coming later this year.

The RTX PRO 2000 is expected to be available from PNY and TD SYNNEX, as well as system builders such as BOXX, Dell Technologies, HP and Lenovo.

The NVIDIA RTX PRO 4000 Blackwell SFF Edition is expected to be available from global distribution partners and leading manufacturing partners such as Dell Technologies, HP and Lenovo.

Watch the NVIDIA Research special address at SIGGRAPH and learn more about how graphics and simulation innovations come together to drive industrial digitalization by joining NVIDIA at the conference, running through Thursday, Aug. 14.

Read More

Amazon Devices & Services Achieves Major Step Toward Zero-Touch Manufacturing With NVIDIA AI and Digital Twins

Amazon Devices & Services Achieves Major Step Toward Zero-Touch Manufacturing With NVIDIA AI and Digital Twins

Using NVIDIA digital twin technologies, Amazon Devices & Services is powering big leaps in manufacturing with a new physical AI software solution.

Deployed this month at an Amazon Devices facility, the company’s innovative, simulation-first approach for zero-touch manufacturing trains robotic arms to inspect diverse devices for product-quality auditing and integrate new goods into the production line — all based on synthetic data, without requiring hardware changes.

This new technology brings together Amazon Devices-created software that simulates processes on the assembly line with products in NVIDIA-powered digital twins. Using a modular, AI-powered workflow, the technology offers faster, more efficient inspections compared with the previously used audit machinery.

Simulating processes and products in digital twins eliminates the need for expensive, time-consuming physical prototyping. This eases manufacturer workflows and reduces the time it takes to get new products into consumers’ hands.

To enable zero-shot manufacturing for the robotic operations, the solution uses photorealistic, physics-enabled representations of Amazon devices and factory work stations to generate synthetic data. This factory-specific data is then used to enhance AI model performance in both simulation and at the real work station, minimizing the simulation-to-real gap before deployment.

It’s a huge step toward generalized manufacturing: the use of automated systems and technologies to flexibly handle a wide variety of products and production processes — even without physical prototypes.

AI, Digital Twins for Robot Understanding

By training robots in digital twins to recognize and handle new devices, Amazon Devices & Services is equipped to build faster, more modular and easily controllable manufacturing pipelines, allowing lines to change from auditing one product to another simply via software.

Robotic actions can be configured to manufacture products purely based on training performed in simulation — including for steps involved in assembly, testing, packaging and auditing.

A suite of NVIDIA Isaac technologies enables Amazon Devices & Services physically accurate, simulation-first approach.

When a new device is introduced, Amazon Devices & Services puts its computer-aided design (CAD) model into NVIDIA Isaac Sim, an open-source, robotics simulation reference application built on the NVIDIA Omniverse platform.

NVIDIA Isaac is used to generate over 50,000 diverse, synthetic images from the CAD models for each device, crucial for training object- and defect-detection models.

Then, Isaac Sim processes the data and taps into NVIDIA Isaac ROS to generate robotic arm trajectories for handling the product.

The robot is trained purely on synthetic data and can pick up packages and products of different shapes and sizes to perform cosmetic inspection. Real station (left) and simulated station (right). Image courtesy of Amazon Devices & Services.
The robot is trained purely on synthetic data and can pick up packages and products of different shapes and sizes to perform cosmetic inspection. Real station (left) and simulated station (right). Image courtesy of Amazon Devices & Services.

The development of this technology was significantly accelerated by AWS through distributed AI model training on Amazon devices’ product specifications using Amazon EC2 G6 instances via AWS Batch, as well as NVIDIA Isaac Sim physics-based simulation and synthetic data generation on Amazon EC2 G6 family instances.

The solution uses Amazon Bedrock — a service for building generative AI applications and agents — to plan high-level tasks and specific audit test cases at the factory based on analyses of product-specification documents. Amazon Bedrock AgentCore will be used for autonomous-workflow planning for multiple factory stations on the production line, with the ability to ingest multimodal product-specification inputs such as 3D designs and surface properties.

To help robots understand their environment, the solution uses NVIDIA cuMotion, a CUDA-accelerated motion-planning library that can generate collision-free trajectories in a fraction of a second on the NVIDIA Jetson AGX Orin module. The nvblox library, part of Isaac ROS, generates distance fields that cuMotion uses for collision-free trajectory planning.

FoundationPose, an NVIDIA foundation model trained on 5 million synthetic images for pose estimation and object tracking, helps ensure the Amazon Devices & Services robots know the accurate position and orientation of the devices.

Crucial for the new manufacturing solution, FoundationPose can generalize to entirely new objects without prior exposure, allowing seamless transitions between different products and eliminating the need to collect new data to retrain models for each change.

As part of product auditing, the new solution’s approach is used for defect detection on the manufacturing line. Its modular design allows for future integration of advanced reasoning models like NVIDIA Cosmos Reason.

Watch the NVIDIA Research special address at SIGGRAPH and learn more about how graphics and simulation innovations come together to drive industrial digitalization by joining NVIDIA at the conference, running through Thursday, Aug. 14.

Read More