Building effective agentic AI systems requires rethinking how technology interacts and delivers value across organizations.
Bartley Richardson, senior director of engineering and AI infrastructure at NVIDIA, joined the NVIDIA AI Podcast to discuss how enterprises can successfully deploy agentic AI systems.
“When I talk with people about agents and agentic AI, what I really want to say is automation,” Richardson said. “It is that next level of automation.”
Richardson explains that AI reasoning models play a critical role in these systems by “thinking out loud” and enabling better planning capabilities.
“Reasoning models have been trained and tuned in a very specific way to think — almost like thinking out loud,” Richardson said. “It’s kind of like when you’re brainstorming with your colleagues or family.”
What makes NVIDIA’s Llama Nemotron models distinctive is that they give users the ability to toggle reasoning on or off within the same model, optimizing for specific tasks.
Enterprise IT leaders must acknowledge the multi-vendor reality of modern environments, Richardson explained, saying organizations will have agent systems from various sources working together simultaneously.
“You’re going to have all these agents working together, and the trick is discovering how to let them all mesh together in a somewhat seamless way for your employees,” Richardson said.
To address this challenge, NVIDIA developed the AI-Q Blueprint for developing advanced agentic AI systems. Teams can build AI agents to automate complex tasks, break down operational silos and drive efficiency across industries. The blueprint uses the open-source NVIDIA Agent Intelligence (AIQ) toolkit to evaluate and profile agent workflows, making it easier to optimize and ensure interoperability among agents, tools and data sources.
“We have customers that optimize their tool-calling chains and get 15x speedups through their pipeline using AI-Q,” Richardson said.
He also emphasized the importance of maintaining realistic expectations that still provide significant business value.
“Agentic systems will make mistakes,” Richardson added. “But if it gets you 60%, 70%, 80% of the way there, that’s amazing.”
Time Stamps
1:15 – Defining agentic AI as the next evolution of enterprise automation.
4:06 – How reasoning models enhance agentic system capabilities.
12:41 – Enterprise considerations for implementing multi-vendor agent systems.
19:33 – Introduction to the NVIDIA Agent Intelligence toolkit for observability and traceability.
Enterprises are exploring AI to rethink problem-solving and business processes. These initiatives require the right infrastructure, such as AI factories, which allow businesses to convert data into tokens and outcomes. Rama Akkiraju, vice president of IT for AI and machine learning at NVIDIA, joined the AI Podcast to discuss how enterprises can build the right foundations for AI success, and the critical role of AI platform architects in designing and building AI infrastructure based on specific business needs.
Roboflow’s mission is to make the world programmable through computer vision. By simplifying computer vision development, the company helps bridge the gap between AI and people looking to harness it. Cofounder and CEO Joseph Nelson discusses how Roboflow empowers users in manufacturing, healthcare and automotive to solve complex problems with visual AI.
Agentic AI enables developers to create intelligent multi-agent systems that reason, act and execute complex tasks with a degree of autonomy. Jacob Liberman, director of product management at NVIDIA, explains how agentic AI bridges the gap between powerful AI models and practical enterprise applications.
NVIDIA and Google share a long-standing relationship rooted in advancing AI innovation and empowering the global developer community. This partnership goes beyond infrastructure, encompassing deep engineering collaboration to optimize the computing stack.
The latest innovations stemming from this partnership include significant contributions to community software efforts like JAX, OpenXLA, MaxText and llm-d. These foundational optimizations directly support serving of Google’s cutting-edge Gemini models and Gemma family of open models.
Additionally, performance-optimized NVIDIA AI software like NVIDIA NeMo, NVIDIA TensorRT-LLM, NVIDIA Dynamo and NVIDIA NIM microservices are tightly integrated across Google Cloud, including Vertex AI, Google Kubernetes Engine (GKE) and Cloud Run, to accelerate performance and simplify AI deployments.
NVIDIA Blackwell in Production on Google Cloud
Google Cloud was the first cloud service provider to offer both NVIDIA HGX B200 and NVIDIA GB200 NVL72 with its A4 and A4X virtual machines (VMs).
These new VMs with Google Cloud’s AI Hypercomputer architecture are accessible through managed services like Vertex AI and GKE, enabling organizations to choose the right path to develop and deploy agentic AI applications at scale. Google Cloud’s A4 VMs, accelerated by NVIDIA HGX B200, are now generally available.
Google Cloud’s A4X VMs deliver over one exaflop of compute per rack and support seamless scaling to tens of thousands of GPUs, enabled by Google’s Jupiter network fabric and advanced networking with NVIDIA ConnectX-7 NICs. Google’s third-generation liquid cooling infrastructure delivers sustained, efficient performance even for the largest AI workloads.
Google Gemini Can Now Be Deployed On-Premises With NVIDIA Blackwell on Google Distributed Cloud
Gemini’s advanced reasoning capabilities are already powering cloud-based agentic AI applications — however, some customers in public sector, healthcare and financial services with strict data residency, regulatory or security requirements have yet been unable to tap into the technology.
With NVIDIA Blackwell platforms coming to Google Distributed Cloud — Google Cloud’s fully managed solution for on-premises, air-gapped environments and edge — organizations will now be able to deploy Gemini models securely within their own data centers, unlocking agentic AI for these customers
NVIDIA Blackwell’s unique combination of breakthrough performance and confidential computing capabilities makes this possible — ensuring that user prompts and fine-tuning data remain protected. This enables customers to innovate with Gemini while maintaining full control over their information, meeting the highest standards of privacy and compliance. Google Distributed Cloud expands the reach of Gemini, empowering more organizations than ever to tap into next-generation agentic AI.
Optimizing AI Inference Performance for Google Gemini and Gemma
Designed for the agentic era, the Gemini family of models represent Google’s most advanced and versatile AI models to date, excelling at complex reasoning, coding and multimodal understanding.
NVIDIA and Google have worked on performance optimizations to ensure that Gemini-based inference workloads run efficiently on NVIDIA GPUs, particularly within Google Cloud’s Vertex AI platform. This enables Google to serve a significant amount of user queries for Gemini models on NVIDIA-accelerated infrastructure across Vertex AI and Google Distributed Cloud.
In addition, the Gemma family of lightweight, open models have been optimized for inference using the NVIDIA TensorRT-LLM library and are expected to be offered as easy-to-deploy NVIDIA NIM microservices. These optimizations maximize performance and make advanced AI more accessible to developers to run their workloads on various deployment architectures across data centers to local NVIDIA RTX-powered PCs and workstations.
Building a Strong Developer Community and Ecosystem
NVIDIA and Google Cloud are also supporting the developer community by optimizing open-source frameworks like JAX for seamless scaling and breakthrough performance on Blackwell GPUs — enabling AI workloads to run efficiently across tens of thousands of nodes.
The collaboration extends beyond technology, with the launch of a new joint Google Cloud and NVIDIA developer community that brings experts and peers together to accelerate cross-skilling and innovation.
By combining engineering excellence, open-source leadership and a vibrant developer ecosystem, the companies are making it easier than ever for developers to build, scale and deploy the next generation of AI applications.
See notice regarding software product information.
Over a century ago, Henry Ford pioneered the mass production of cars and engines to provide transportation at an affordable price. Today, the technology industry manufactures the engines for a new kind of factory — those that produce intelligence.
As companies and countries increasingly focus on AI, and move from experimentation to implementation, the demand for AI technologies continues to grow exponentially. Leading system builders are racing to ramp up production of the servers for AI factories – the engines of AI factories – to meet the world’s exploding demand for intelligence and growth.
Dell Technologies is a leader in this renaissance. Dell and NVIDIA have partnered for decades and continue to push the pace of innovation. In its last earnings call, Dell projected that its AI server business will grow at least $15 billion this year.
“We’re on a mission to bring AI to millions of customers around the world,” said Michael Dell, chairman and chief executive officer, Dell Technologies, in a recent announcement at Dell Technologies World. “With the Dell AI Factory with NVIDIA, enterprises can manage the entire AI lifecycle across use cases, from training to deployment, at any scale.”
The latest Dell AI servers, powered by NVIDIA Blackwell, offer up to 50x more AI reasoning inference output and 5x improvement in throughput compared with the Hopper platform. Customers use them to generate tokens for new AI applications that will help solve some of the world’s biggest challenges, from disease prevention to advanced manufacturing.
Dell servers with NVIDIA GB200 are shipping at scale for a variety of customers, such as CoreWeave’s new NVIDIA GB200 NVL72 system. One of Dell’s U.S. factories can ship thousands of NVIDIA Blackwell GPUs to customers in a week. It’s why they were chosen by one of their largest customers to deploy 100,000 NVIDIA GPUs in just six weeks.
But how is an AI server made? We visited a facility to find out.
Building the Engines of Intelligence
We visited one of Dell’s U.S. facilities that builds the most compute-dense NVIDIA Blackwell generation servers ever manufactured.
Modern automobile engines have more than 200 major components and take three to seven years to roll out to market. NVIDIA GB200 NVL72 servers have 1.2 million parts and were designed just a year ago.
Amid a forest of racks, grouped by phases of assembly, Dell employees quickly slide in GB200 trays, NVLink Switch networking trays and then test the systems. The company said its ability to engineer the compute, network and storage assembly under one roof and fine tune, deploy and integrate complete systems is a powerful differentiator. Speed also matters. The Dell team can build, test, ship – test again on site at a customer location – and turn over a rack in 24 hours.
The servers are destined for state-of-the-art data centers that require a dizzying quantity of cables, pipes and hoses to operate. One data center can have 27,000 miles of network cable — enough to wrap around the Earth. It can pack about six miles of water pipes, 77 miles of rubber hoses, and is capable of circulating 100,000 gallons of water per minute for cooling.
With new AI factories being announced each week – the European Union has plans for seven AI factories, while India, Japan, Saudi Arabia, the UAE and Norway are also developing them – the demand for these engines of intelligence will only grow in the months and years ahead.
GeForce NOW is turning up the heat this summer with a hot new deal. For a limited time, save 40% on six-month Performance memberships and enjoy premium GeForce RTX-powered gaming for half a year.
Members can jump into all the action this summer, whether traveling or staying cool at home. Eleven new games join the cloud this week, headlined by the highly anticipated launch of Capcom’s Onimusha 2 and Netmarble’s Game of Thrones: Kingsroad.
That’s not all — Honkai Star Rail’s latest update is available to play in the cloud this week.
Level Up for Summer
With the Performance membership, gamers can stream at up to 1440p resolution, experience ultrawide support and play in sessions of up to six hours — all without the need for the latest hardware. For a limited time, gamers can get all of these premium features for just $29.99 when signing up for a six-month Performance membership.
Let’s make a deal.
Dive into an ever-expanding library of over 2,000 games, with ray-tracing and NVIDIA DLSS technologies in supported titles. Stream top titles such as Elder Scrolls IV: Oblivion Remastered, DOOM: The Dark Ages, Clair Obscur: Expedition 33 and more. Stream across any device, whether PCs, Macs, mobile devices, SHIELD TVs or Samsung and LG smart TVs.
The deal is available through Sunday, July 6 — first come, first served — so don’t miss out on the chance to upgrade to the cloud at a fraction of the price.
Sharpen the Blade
Slash first, ask questions later.
Onimusha 2: Samurai’s Destiny returns with high-definition graphics and modernized controls for even more vivid counterattacks and intense swordplay. Play as Jubei Yagyu and battle through feudal Japan with allies.
Dive into the epic, demon-slaying adventure with GeForce NOW — no downloads or high-end hardware required. Enjoy crisp visuals and ultra-responsive controls, and stream the pure samurai action across devices.
Remember to Zigzag
Game of Thrones: Kingsroad is Netmarble’s action-adventure role-playing game licensed by Warner Bros. Interactive Entertainment on behalf of HBO.
The king’s road leads to the throne for those bold enough to seize it.
The game brings the continent of Westeros to life with remarkable detail and scale. Encounter familiar characters from the TV series and freely roam iconic regions on an immersive journey through Westeros, including King’s Landing — the continent’s capital — the Castle Black stronghold and the massive icy Wall, which stretches along the northern border.
Players must navigate the complex power struggles between the noble houses of Westeros as they embark on a mission to restore their family’s former glory — all while aiding the Night’s Watch for the final confrontation with the White Walkers and the army of the dead that awaits beyond the Wall.
Choose the cloud gaming house that fits best: a GeForce NOW Performance or Ultimate membership. Premium members command longer gaming sessions, higher resolutions and ultra-low latency over free members, and Ultimate members get the mightiest graphics streaming capabilities: up to 4K resolution and 120 frames per second. Join today — because the North remembers, and only those who seize the moment rule the game.
Hammer Time
Praise the emperor — and the discounts.
The Warhammer Skulls Festival is the ultimate annual celebration of Warhammer video games. GeForce NOW members can join the festivities with weeklong discounts on a wide selection of Warhammer titles, all available to stream instantly via the cloud:
Warhammer 40,000: Space Marine 2
Warhammer 40,000: Darktide
Warhammer 40,000: Rogue Trader
Warhammer 40,000: Boltgun
Warhammer: Vermintide 2
Total War: Warhammer III
Blood Bowl 3
Warhammer Battlesector
Warhammer 40,000: Chaos Gate – Daemonhunters
Warhammer 40,000: Gladius
Whether a veteran or new to the Warhammer universe, gamers can experience these iconic titles on GeForce NOW at a fraction of the usual price during the Warhammer Skulls Festival.
More Games, More Glory
Just another day in space.
Honkai: Star Rail version 3.3, “The Fall at Dawn’s Rise,” is now available for members to stream. The update features an epic finale to the Flame-Chase Journey as Trailblazers and the Chrysos Heirs face off against the legendary Sky Titan named Aquila. Players can also see two new five-star characters: Hyacine, a compassionate healer with a knack for keeping the team alive, and Cipher, a cunning debuffer who turns enemy strength against them. Fan-favorite characters The Herta and Aglaea also return. Plus, players can dive into fresh limited-time events like the high-speed Penacony Speed Cup and a quirky baseball mini game.
Look for the following games available to stream in the cloud this week:
As robots increasingly make their way to the largest enterprises’ manufacturing plants and warehouses, the need for access to critical business and operational data has never been more crucial.
At its Sapphire conference, SAP announced it is collaborating with NEURA Robotics and NVIDIA to enable its SAP Joule agents to connect enterprise data and processes with NEURA’s advanced cognitive robots.
The integration will enable robots to support tasks including adaptive manufacturing, autonomous replenishment, compliance monitoring and predictive maintenance. Using the Mega NVIDIA Omniverse Blueprint, SAP customers will be able to simulate and validate large robotic fleets in digital twins before deploying them in real-world facilities.
Virtual Assistants Become Physical Helpers
AI agents are traditionally confined to the digital world, which means they’re unable to take actionable steps for physical tasks and do real-world work in warehouses, factories and other industrial workplaces.
SAP’s collaboration with NVIDIA and NEURA Robotics shows how enterprises will be able to use Joule to plan and simulate complex and dynamic scenarios that include physical AI and autonomous humanoid robots to address critical planning, safety and project requirements, streamline operations and embody business intelligence in the physical world.
Revolutionizing Supply Chain Management: NVIDIA, SAP Tackle Complex Challenges
Today’s supply chains have become more fragile and complex with ever-evolving constraints around consumer, economic, political and environmental dimensions. The partnership between NVIDIA and SAP aims to enhance supply chain planning capabilities by integrating NVIDIA cuOpt technology — for real-time route optimization — with SAP Integrated Business Planning (IBP) to enable customers to plan and simulate the most complex and dynamic scenarios for more rapid and confident decision making.
By extending SAP IBP with NVIDIA’s highly scalable, GPU-powered platform, customers can accelerate time-to-value by facilitating the management of larger and more intricate models without sacrificing runtime. This groundbreaking collaboration will empower businesses to address unique planning requirements, streamline operations and drive better business outcomes.
Supporting Supply Chains With AI and Humanoid Robots
SAP Chief Technology Officer Philipp Herzig offered a preview of the technology integration in a keynote demonstration at SAP Sapphire in Orlando, showing the transformative potential for physical AI in businesses in how Joule — SAP’s generative AI co-pilot — works with data from the real world in combination with humanoid robots.
“Robots and autonomous agents are at the heart of the next wave of industrial AI, seamlessly connecting people, data, and processes to unlock new levels of efficiency and innovation,” said Herzig. “SAP, NVIDIA and NEURA Robotics share a vision for uniting AI and robotics to improve safety, efficiency and productivity across industries.”
With the integration of the Mega NVIDIA Omniverse Blueprint, enterprises can harness physical AI and digital twins to enable new AI agents that can deliver real-time, contextual insights and automate routine tasks.
The collaboration between the three companies allows NEURA robots to see, learn and adapt in real time — whether restocking shelves, inspecting infrastructure or resolving supply chain disruptions.
SAP Joule AI Agents are deployed directly onto NEURA’s cognitive robots, enabling real-time decision-making and the execution of physical tasks such as inventory audits or equipment repairs. The robots learn continuously in physically accurate digital twins, powered by NVIDIA Omniverse libraries and technologies, and using business-critical data from SAP applications, while the Mega NVIDIA Omniverse Blueprint helps evaluate and validate deployment options and large-scale task interactions.
Showcasing Joule’s Data-Driven Insights on NEURA Robots
The technology demonstration showcases the abilities of NEURA robots to handle tasks guided by Joule’s data-driven insights.
Businesses bridging simulation-to-reality deployments can use the integrated technology stack to provide zero-shot navigation and inference to address defect rates and production line inefficiencies without interrupting operations.
Herzig showed how Joule can direct a NEURA 4NE1 robot to inspect a machine in the showroom, powered by AI agents, scanning equipment and triggering an SAP Asset Performance management alert before a failure disrupts operations.
It’s but a glimpse into the future of what’s possible with AI agents and robotics.
Industrial AI is transforming how factories operate, innovate and scale.
The convergence of AI, simulation and digital twins is poised to unlock new levels of productivity, flexibility and insight for manufacturers worldwide — and NVIDIA’s collaboration with Siemens is bringing these technologies directly to factory and shop floors, making advanced automation more accessible.
Matthias Loskyll, head of virtual control and industrial AI at Siemens Factory Automation, joined the NVIDIA AI Podcast to discuss how Siemens’ work with NVIDIA is reshaping manufacturing, as the industry hits a turning point.
Manufacturing companies are facing a shortage of skilled labor, widening skills gaps as experts retire and increased demand for resilient, efficient production.
At the same time, AI advancements offer ways to automate tasks previously deemed too complex or variable for traditional programming — and digital twins open a path to designing and optimizing safe, efficient interactions between AI-powered robots and smart spaces.
Siemens’ Inspekto, an AI-driven visual quality inspection system, enables even small manufacturers to automate defect detection in their production lines. Inspekto can be trained in under an hour using as few as 20 product samples, making it ideal for fields like electronics and metal forming.
Meanwhile, automaker Audi is using industrial AI in its car body shops, where 5 million welds are made daily. Training AI models to automate weld-spot inspection and integrating them with Siemens’ Industrial AI Suite helped Audi achieve up to 25x faster inference directly on the shop floor, where the defects can be addressed.
Siemens is creating an AI-driven vision software enabling robots to handle arbitrary, previously unseen objects. The company is also developing Industrial Copilots with NVIDIA NIM microservices to bring generative AI-powered assistance directly to shopfloor operators and service technicians. Loskyll noted that the Industrial Copilots will run on premises to keep sensitive production data secure while enabling rapid troubleshooting and process optimization.
To learn more about the latest in industrial AI, watch the COMPUTEX keynote by NVIDIA founder and CEO Jensen Huang. Hear more from Siemens at NVIDIA GTC Paris, running June 10-12.
Time Stamps
1:00 – Overview of NVIDIA’s collaboration with Siemens.
5:00 – Challenges faced by manufacturing companies.
15:00 – How Inspekto makes automated visual quality inspection more accessible.
24:00 – How Audi achieved up to 25x faster inference with Siemens’ Industrial AI Suite.
37:00 – Future directions with industrial copilots and AI-enhanced robotics.
Yum! Brands, the parent company of KFC, Taco Bell, Pizza Hut and Habit Burger & Grill, is partnering with NVIDIA to streamline order taking, optimize operations and enhance service across its restaurants. Joe Park, chief digital and technology officer at Yum! Brands, Inc. and president of Byte by Yum!, shares how the company is further accelerating AI deployment.
Roboflow’s mission is to make the world programmable through computer vision. By simplifying computer vision development, the company helps bridge the gap between AI and people looking to harness it. Cofounder and CEO Joseph Nelson discusses how Roboflow empowers users in manufacturing, healthcare and automotive to solve complex problems with visual AI.
Agentic AI enables developers to create intelligent multi-agent systems that reason, act and execute complex tasks with a degree of autonomy. Jacob Liberman, director of product management at NVIDIA, explains how agentic AI bridges the gap between powerful AI models and practical enterprise applications.
Agentic AI is redefining scientific discovery and unlocking research breakthroughs and innovations across industries. Through deepened collaboration, NVIDIA and Microsoft are delivering advancements that accelerate agentic AI-powered applications from the cloud to the PC.
At Microsoft Build, Microsoft unveiled Microsoft Discovery, an extensible platform built to empower researchers to transform the entire discovery process with agentic AI. This will help research and development departments across various industries accelerate the time to market for new products, as well as speed and expand the end-to-end discovery process for all scientists.
Microsoft Discovery will integrate the NVIDIA ALCHEMI NIM microservice, which optimizes AI inference for chemical simulations, to accelerate materials science research with property prediction and candidate recommendation. The platform will also integrate NVIDIA BioNeMo NIM microservices, tapping into pretrained AI workflows to speed up AI model development for drug discovery. These integrations equip researchers with accelerated performance for faster scientific discoveries.
In testing, researchers at Microsoft used Microsoft Discovery to detect a novel coolant prototype with promising properties for immersion cooling in data centers in under 200 hours, rather than months or years with traditional methods.
Advancing Agentic AI With NVIDIA GB200 Deployments at Scale
Microsoft is rapidly deploying tens of thousands of NVIDIA GB200 NVL72 rack-scale systems across its Azure data centers, boosting both performance and efficiency.
Azure’s ND GB200 v6 virtual machines — built on a rack-scale architecture with up to 72 NVIDIA Blackwell GPUs per rack and advanced liquid cooling — deliver up to 35x more inference throughput compared with previous ND H100 v5 VMs accelerated by eight NVIDIA H100 GPUs, setting a new benchmark for AI workloads.
These innovations are underpinned by custom server designs, high-speed NVIDIA NVLink interconnects and NVIDIA Quantum InfiniBand networking — enabling seamless scaling to tens of thousands of Blackwell GPUs for demanding generative and agentic AI applications.
Microsoft chairman and CEO Satya Nadella and NVIDIA founder and CEO Jensen Huang also highlighted how Microsoft and NVIDIA’s collaboration is compounding performance gains through continuous software optimizations across NVIDIA architectures on Azure. This approach maximizes developer productivity, lowers total cost of ownership and accelerates all workloads, including AI and data processing — all while driving greater efficiency per dollar and per watt for customers.
NVIDIA AI Reasoning and Healthcare Microservices on Azure AI Foundry
Building on the NIM integration in Azure AI Foundry, announced at NVIDIA GTC, Microsoft and NVIDIA are expanding the platform with the NVIDIA Llama Nemotron family of open reasoning models and NVIDIA BioNeMo NIM microservices, which deliver enterprise-grade, containerized inferencing for complex decision-making and domain-specific AI workloads.
Developers can now access optimized NIM microservices for advanced reasoning in Azure AI Foundry. These include the NVIDIA Llama Nemotron Super and Nano models, which offer advanced multistep reasoning, coding and agentic capabilities, delivering up to 20% higher accuracy and 5x faster inference than previous models.
Healthcare-focused BioNeMo NIM microservices like ProteinMPNN,RFDiffusion and OpenFold2 address critical applications in digital biology, drug discovery and medical imaging, enabling researchers and clinicians to accelerate protein science, molecular modeling and genomic analysis for improved patient care and faster scientific innovation.
This expanded integration empowers organizations to rapidly deploy high-performance AI agents, connecting to these models and other specialized healthcare solutions with robust reliability and simplified scaling.
Accelerating Generative AI on Windows 11 With RTX AI PCs
Generative AI is reshaping PC software with entirely new experiences — from digital humans to writing assistants, intelligent agents and creative tools. NVIDIA RTX AI PCs make it easy to get it started with experimenting with generative AI and unlock greater performance on Windows 11.
At Microsoft Build, NVIDIA and Microsoft are unveiling an AI inferencing stack to simplify development and boost inference performance for Windows 11 PCs.
NVIDIA TensorRT has been reimagined for RTX AI PCs, combining industry-leading TensorRT performance with just-in-time, on-device engine building and an 8x smaller package size for seamless AI deployment to the more than 100 million RTX AI PCs.
Announced at Microsoft Build, TensorRT for RTX is natively supported by Windows ML — a new inference stack that provides app developers with both broad hardware compatibility and state-of-the-art performance. TensorRT for RTX is available in the Windows ML preview starting today, and will be available as a standalone software development kit from NVIDIA Developer in June.
Generative AI is transforming PC software into breakthrough experiences — from digital humans to writing assistants, intelligent agents and creative tools.
NVIDIA RTX AI PCs are powering this transformation with technology that makes it simpler to get started experimenting with generative AI and unlock greater performance on Windows 11.
NVIDIA TensorRT has been reimagined for RTX AI PCs, combining industry-leading TensorRT performance with just-in-time, on-device engine building and an 8x smaller package size for seamless AI deployment to more than 100 million RTX AI PCs.
Announced at Microsoft Build, TensorRT for RTX is natively supported by Windows ML — a new inference stack that provides app developers with both broad hardware compatibility and state-of-the-art performance.
For developers looking for AI features ready to integrate, NVIDIA software development kits (SDKs) offer a wide array of options, from NVIDIA DLSS to multimedia enhancements like NVIDIA RTX Video. This month, top software applications from Autodesk, Bilibili, Chaos, LM Studio and Topaz Labs are releasing updates to unlock RTX AI features and acceleration.
AI enthusiasts and developers can easily get started with AI using NVIDIA NIM — prepackaged, optimized AI models that can run in popular apps like AnythingLLM, Microsoft VS Code and ComfyUI. Releasing this week, the FLUX.1-schnell image generation model will be available as a NIM microservice, and the popular FLUX.1-dev NIM microservice has been updated to support more RTX GPUs.
Those looking for a simple, no-code way to dive into AI development can tap into Project G-Assist — the RTX PC AI assistant in the NVIDIA app — to build plug-ins to control PC apps and peripherals using natural language AI. New community plug-ins are now available, including Google Gemini web search, Spotify, Twitch, IFTTT and SignalRGB.
Accelerated AI Inference With TensorRT for RTX
Today’s AI PC software stack requires developers to compromise on performance or invest in custom optimizations for specific hardware.
Windows ML was built to solve these challenges. Windows ML is powered by ONNX Runtime and seamlessly connects to an optimized AI execution layer provided and maintained by each hardware manufacturer.
For GeForce RTX GPUs, Windows ML automatically uses the TensorRT for RTX inference library for high performance and rapid deployment. Compared with DirectML, TensorRT delivers over 50% faster performance for AI workloads on PCs.
TensorRT delivers over 50% faster performance for AI workloads on PCs than DirectML. Performance measured on GeForce RTX 5090.
Windows ML also delivers quality-of-life benefits for developers. It can automatically select the right hardware — GPU, CPU or NPU — to run each AI feature, and download the execution provider for that hardware, removing the need to package those files into the app. This allows for the latest TensorRT performance optimizations to be delivered to users as soon as they’re ready.
TensorRT performance optimizations are delivered to users as soon as they’re ready.
TensorRT, a library originally built for data centers, has been redesigned for RTX AI PCs. Instead of pre-generating TensorRT engines and packaging them with the app, TensorRT for RTX uses just-in-time, on-device engine building to optimize how the AI model is run for the user’s specific RTX GPU in mere seconds. And the library’s packaging has been streamlined, reducing its file size significantly by 8x.
TensorRT for RTX is available to developers through the Windows ML preview today, and will be available as a standalone SDK at NVIDIA Developer in June.
Developers looking to add AI features or boost app performance can tap into a broad range of NVIDIA SDKs. These include NVIDIA CUDA and TensorRT for GPU acceleration; NVIDIA DLSS and Optix for 3D graphics; NVIDIA RTX Video and Maxine for multimedia; and NVIDIA Riva and ACE for generative AI.
Top applications are releasing updates this month to enable unique features using these NVIDIA SDKs, including:
LM Studio, which released an update to its app to upgrade to the latest CUDA version, increasing performance by over 30%.
Topaz Labs, which is releasing a generative AI video model to enhance video quality, accelerated by CUDA.
Chaos Enscape and Autodesk VRED, which are adding DLSS 4 for faster performance and better image quality.
Bilibili, which is integrating NVIDIA Broadcast features such as Virtual Background to enhance the quality of livestreams.
NVIDIA looks forward to continuing to work with Microsoft and top AI app developers to help them accelerate their AI features on RTX-powered machines through the Windows ML and TensorRT integration.
Local AI Made Easy With NIM Microservices and AI Blueprints
Getting started with developing AI on PCs can be daunting. AI developers and enthusiasts have to select from over 1.2 million AI models on Hugging Face, quantize it into a format that runs well on PC, find and install all the dependencies to run it, and more.
NVIDIA NIM makes it easy to get started by providing a curated list of AI models, prepackaged with all the files needed to run them and optimized to achieve full performance on RTX GPUs. And since they’re containerized, the same NIM microservice can be run seamlessly across PCs or the cloud.
NVIDIA NIM microservices are available to download through build.nvidia.com or through top AI apps like Anything LLM, ComfyUI and AI Toolkit for Visual Studio Code.
During COMPUTEX, NVIDIA will release the FLUX.1-schnell NIM microservice — an image generation model from Black Forest Labs for fast image generation — and update the FLUX.1-dev NIM microservice to add compatibility for a wide range of GeForce RTX 50 and 40 Series GPUs.
These NIM microservices enable faster performance with TensorRT and quantized models. On NVIDIA Blackwell GPUs, they run over twice as fast as running them natively, thanks to FP4 and RTX optimizations.
The FLUX.1-schnell NIM microservice runs over twice as fast as on NVIDIA Blackwell GPUs with FP4 and RTX optimizations.
AI developers can also jumpstart their work with NVIDIA AI Blueprints — sample workflows and projects using NIM microservices.
NVIDIA last month released the NVIDIA AI Blueprint for 3D-guided generative AI, a powerful way to control composition and camera angles of generated images by using a 3D scene as a reference. Developers can modify the open-source blueprint for their needs or extend it with additional functionality.
New Project G-Assist Plug-Ins and Sample Projects Now Available
NVIDIA recently released Project G-Assist as an experimental AI assistant integrated into the NVIDIA app. G-Assist enables users to control their GeForce RTX system using simple voice and text commands, offering a more convenient interface compared to manual controls spread across numerous legacy control panels.
Developers can also use Project G-Assist to easily build plug-ins, test assistant use cases and publish them through NVIDIA’s Discord and GitHub.
The Project G-Assist Plug-in Builder — a ChatGPT-based app that allows no-code or low-code development with natural language commands — makes it easy to start creating plug-ins. These lightweight, community-driven add-ons use straightforward JSON definitions and Python logic.
New open-source plug-in samples are available now on GitHub, showcasing diverse ways on-device AI can enhance PC and gaming workflows. They include:
Gemini: The existing Gemini plug-in that uses Google’s cloud-based free-to-use large language model has been updated to include real-time web search capabilities.
IFTTT: A plug-in that lets users create automations across hundreds of compatible endpoints to trigger IoT routines — such as adjusting room lights or smart shades, or pushing the latest gaming news to a mobile device.
Discord: A plug-in that enables users to easily share game highlights or messages directly to Discord servers without disrupting gameplay.
Explore the GitHub repository for more examples — including hands-free music control via Spotify, livestream status checks with Twitch, and more.
Companies are adopting AI as the new PC interface. For example, SignalRGB is developing a G-Assist plug-in that enables unified lighting control across multiple manufacturers. Users will soon be able to install this plug-in directly from the SignalRGB app.
SignalRGB’s G-Assist plug-in will soon enable unified lighting control across multiple manufacturers.
Starting this week, the AI community will also be able to use G-Assist as a custom component in Langflow — enabling users to integrate function-calling capabilities in low-code or no-code workflows, AI applications and agentic flows.
The G-Assist custom component in Langflow will soon enable users to integrate function-calling capabilities.
Enthusiasts interested in developing and experimenting with Project G-Assist plug-ins are invited to join the NVIDIA Developer Discord channel to collaborate, share creations and gain support.
Each week, the RTX AI Garageblog series features community-driven AI innovations and content for those looking to learn more about NIM microservices and AI Blueprints, as well as building AI agents, creative workflows, digital humans, productivity apps and more on AI PCs and workstations.
“ICRA has played a pivotal role in shaping the direction of robotics and automation, marking key milestones in the field’s evolution and celebrating achievements that have had a lasting impact on technology and society,” said Dieter Fox, senior director of robotics research at NVIDIA. “The research we’re contributing this year will further advance the development of autonomous vehicles and humanoid robots by helping close the data gap and improve robot safety and control.”
Generative AI for Scalable Robotic Learning
NVIDIA-authored papers showcased at ICRA give a glimpse into the future of robotics. They include:
DreamDrive: This 4D spatial-temporal scene generation approach creates realistic, controllable 4D driving scenes using video diffusion and 3D Gaussian splatting for autonomous vehicles.
DexMimicGen: This system can generate large-scale bimanual dexterous manipulation datasets from just a few human demonstrations.
HOVER: A unified neural controller for humanoid robots that seamlessly transitions between locomotion, manipulation and other modes.
MatchMaker: This pipeline automates generation of diverse 3D assembly assets for simulation-based training, enabling robots to learn insertion tasks without manual asset curation.
SPOT: This learning framework uses SE(3) pose trajectory diffusion for object-centric manipulation, enabling cross-embodiment generalization.
Electricity. The Internet. Now it’s time for another major technology, AI, to sweep the globe.
NVIDIA founder and CEO Jensen Huang took the stage at a packed Taipei Music Center Monday to kick off COMPUTEX 2025, captivating the audience of more than 4,000 with a vision for a technology revolution that will sweep every country, every industry and every company.
“AI is now infrastructure, and this infrastructure, just like the internet, just like electricity, needs factories,” Huang said. “These factories are essentially what we build today.”
“They’re not data centers of the past,” Huang added. “These AI data centers, if you will, are improperly described. They are, in fact, AI factories. You apply energy to it, and it produces something incredibly valuable, and these things are called tokens.”
NVIDIA CUDA-X Everywhere: After showing a towering wall of partner logos, Huang described how companies are using NVIDIA’s CUDA-X platform for a dizzying array of applications, how NVIDIA and its partners are building 6G using AI, and revealed NVIDIA’s latest work to accelerate quantum supercomputing.
“The larger the install base, the more developers want to create libraries, the more libraries, the more amazing things are done,” Huang said, describing CUDA-X’s growing popularity and power. “Better applications, more benefits to users.”
More’s coming, Huang said, describing the growing power of AI to reason and perceive. That leads us to agentic AI — AI able to understand, think and act. Beyond that is physical AI — AI that understands the world. The phase after that, he said, is general robotics.
All of this has created demand for much more computing power. To meet those needs, Huang detailed the latest NVIDIA innovations from Grace Blackwell NVL72 systems to advanced networking technology, and detailed huge new AI installations from CoreWeave, Oracle, Microsoft, xAI and others across the globe.
“These are gigantic factory investments, and the reason why people build factories is because you know, you know the answer,” Huang said with a grin. “The more you buy, the more you make.”
Building AI for Taiwan: It all starts in Taiwan, Huang said, highlighting the key role Taiwan plays in the global technology ecosystem. But Taiwan isn’t just building AI for the world; NVIDIA is helping build AI for Taiwan. Huang announced that NVIDIA and Foxconn Hon Hai Technology Group are deepening their longstanding partnership and are working with the Taiwan government to build an AI factory supercomputer that will deliver state-of-the-art NVIDIA Blackwell infrastructure to researchers, startups and industries – including TSMC.
“Having a world-class AI infrastructure here in Taiwan is really important,” Huang said.
NVIDIA NVLink Fusion: And moving to help its partners scale up their systems however they choose, Huang announced NVLink Fusion, a new architecture that enables hyperscalers to create semi-custom compute solutions with NVIDIA’s NVLink interconnect.
This technology aims to break down traditional data center bottlenecks, enabling a new level of AI scale and more flexible, optimized system designs tailored to specific AI workloads.
“This incredible body of work now becomes flexible and open for anybody to integrate into,” Huang said.
Blackwell Everywhere: And the engine now powering this entire AI ecosystem is NVIDIA Blackwell, with Huang showing a slide explaining how NVIDIA offers “one architecture,” from cloud AI to enterprise AI, from personal AI to edge AI.
DGX Spark: Now in full production, this personal AI supercomputer for developers will be available in a “few weeks.” DGX Spark partners include ASUS, Dell, Gigabyte, Lenovo and MSI.
DGX Station: DGX Station is a powerful system with up to 20 petaflops of performance powered from a wall socket. Huang said it has the capacity to run a 1 trillion parameter model, which is like having your “own personal DGX supercomputer.”
NVIDIA RTX PRO Servers: Huang also announced a new line of enterprise servers for agentic AI. NVIDIA RTX PRO Servers, part of a new NVIDIA Enterprise AI Factory validated design, are now in volume production. Delivering universal acceleration for AI, design, engineering and business, RTX PRO Servers provide a foundation for NVIDIA partners to build and operate on-premises AI factories.
NVIDIA AI Data Platform: The compute platform is different, so the storage platform for modern AI is different. To that end, Huang showcased the latest NVIDIA partners building intelligent storage infrastructure with NVIDIA RTX 6000 PRO Blackwell Server Edition GPUs and the NVIDIA AI Data Platform reference design.
Physical AI: Agents are “essentially digital robots,” Huang said, able to “perceive, understand and plan.” To speed up the development of physical robots, the industry needs to train robots in a simulated environment. Huang said that NVIDIA partnered with DeepMind and Disney to build Newton, the world’s most advanced physics training engine for robotics.
Huang introduced new tools to speed the development of humanoid robots: The Isaac GR00T-Dreams blueprint will help generate synthetic training data. And the Isaac GR00T N1.5 Humanoid Robot Foundation Model will power robotic intelligence.
Industrial Physical AI: Huang said that companies are in the process of building $5 trillion worth of factories worldwide. Optimizing the design of those factories is critical to boosting their output. Taiwan’s leading manufacturers — TSMC, Foxconn, Wistron, Pegatron, Delta Electronics, Quanta, GIGABYTE and others — are harnessing NVIDIA Omniverse to build digital twins to drive the next wave of industrial physical AI for semiconductor and electronics manufacturing.
NVIDIA Constellation: Lastly, building anticipation, Huang introduced a dramatic video showing NVIDIA’s Santa Clara office launching into space and landing in Taiwan. The big reveal: NVIDIA Constellation, a brand new Taiwan office for NVIDIA’s growing Taiwan workforce.
In closing, Huang emphasized that the work Taiwanese companies are doing has changed the world. He thanked NVIDIA’s ecosystem partners and described the industry’s opportunity as “extraordinary” and “once in a lifetime.”
“We are in fact creating a whole new industry to support AI factories, AI agents, and robotics, with one architecture,” Huang said.