How Agentic AI Enables the Next Leap in Cybersecurity

How Agentic AI Enables the Next Leap in Cybersecurity

Agentic AI is redefining the cybersecurity landscape — introducing new opportunities that demand rethinking how to secure AI while offering the keys to addressing those challenges.

Unlike standard AI systems, AI agents can take autonomous actions — interacting with tools, environments, other agents and sensitive data. This provides new opportunities for defenders but also introduces new classes of risks. Enterprises must now take a dual approach: defend both with and against agentic AI.

Building Cybersecurity Defense With Agentic AI 

Cybersecurity teams are increasingly overwhelmed by talent shortages and growing alert volume. Agentic AI offers new ways to bolster threat detection, response and AI security — and requires a fundamental pivot in the foundations of the cybersecurity ecosystem.

Agentic AI systems can perceive, reason and act autonomously to solve complex problems. They can also serve as intelligent collaborators for cyber experts to safeguard digital assets, mitigate risks in enterprise environments and boost efficiency in security operations centers. This frees up cybersecurity teams to focus on high-impact decisions, helping them scale their expertise while potentially reducing workforce burnout.

For example, AI agents can cut the time needed to respond to software security vulnerabilities by investigating the risk of a new common vulnerability or exposure in just seconds. They can search external resources, evaluate environments and summarize and prioritize findings so human analysts can take swift, informed action.

Leading organizations like Deloitte are using the NVIDIA AI Blueprint for vulnerability analysis, NVIDIA NIM and NVIDIA Morpheus to enable their customers to accelerate software patching and vulnerability management. AWS also collaborated with NVIDIA to build an open-source reference architecture using this NVIDIA AI Blueprint for software security patching on AWS cloud environments.

AI agents can also improve security alert triaging. Most security operations centers face an overwhelming number of alerts every day, and sorting critical signals from noise is slow, repetitive and dependent on institutional knowledge and experience.

Top security providers are using NVIDIA AI software to advance agentic AI in cybersecurity, including CrowdStrike and Trend Micro. CrowdStrike’s Charlotte AI Detection Triage delivers 2x faster detection triage with 50% less compute, cutting alert fatigue and optimizing security operation center efficiency.

Agentic systems can help accelerate the entire workflow, analyzing alerts, gathering context from tools, reasoning about root causes and acting on findings — all in real time. They can even help onboard new analysts by capturing expert knowledge from experienced analysts and turning it into action.

Enterprises can build alert triage agents using the NVIDIA AI-Q Blueprint for connecting AI agents to enterprise data and the NVIDIA Agent Intelligence toolkit — an open-source library that accelerates AI agent development and optimizes workflows.

Protecting Agentic AI Applications

Agentic AI systems don’t just analyze information — they reason and act on it. This introduces new security challenges: agents may access tools, generate outputs that trigger downstream effects or interact with sensitive data in real time. To ensure they behave safely and predictably, organizations need both pre-deployment testing and runtime controls.

Red teaming and testing help identify weaknesses in how agents interpret prompts, use tools or handle unexpected inputs — before they go into production. This also includes probing how well agents follow constraints, recover from failures and resist manipulative or adversarial attacks.

Garak, a large language model vulnerability scanner, enables automated testing of LLM-based agents by simulating adversarial behavior such as prompt injection, tool misuse and reasoning errors.

Runtime guardrails provide a way to enforce policy boundaries, limit unsafe behaviors and swiftly align agent outputs with enterprise goals. NVIDIA NeMo Guardrails software enables developers to easily define, deploy and rapidly update rules governing what AI agents can say and do. This low-cost, low-effort adaptability ensures quick and effective response when issues are detected, keeping agent behavior consistent and safe in production.

Leading companies such as Amdocs, Cerence AI and Palo Alto Networks are tapping into NeMo Guardrails to deliver trusted agentic experiences to their customers.

Runtime protections help safeguard sensitive data and agent actions during execution, ensuring secure and trustworthy operations. NVIDIA Confidential Computing helps protect data while it’s being processed at runtime, aka protecting data in use. This reduces the risk of exposure during training and inference for AI models of every size.

NVIDIA Confidential Computing is available from major service providers globally, including Google Cloud and Microsoft Azure, with availability from other cloud service providers to come.

The foundation for any agentic AI application is the set of software tools, libraries and services used to build the inferencing stack. The NVIDIA AI Enterprise software platform is produced using a software lifecycle process that maintains application programming interface stability while addressing vulnerabilities throughout the lifecycle of the software. This includes regular code scans and timely publication of security patches or mitigations.

Authenticity and integrity of AI components in the supply chain is critical for scaling trust across agentic AI systems. The NVIDIA AI Enterprise software stack includes container signatures, model signing and a software bill of materials to enable verification of these components.

Each of these technologies provides additional layers of security to protect critical data and valuable models across multiple deployment environments, from on premises to the cloud.

Securing Agentic Infrastructure

As agentic AI systems become more autonomous and integrated into enterprise workflows, the infrastructure they rely on becomes a critical part of the security equation. Whether deployed in a data center, at the edge or on a factory floor, agentic AI needs infrastructure that can enforce isolation, visibility and control — by design.

Agentic systems, by design, operate with significant autonomy, enabling them to perform impactful actions that can be both beneficial or potentially harmful. This inherent autonomy requires protecting runtime workloads, operational monitoring and strict enforcement of zero-trust principles to secure these systems effectively.

NVIDIA BlueField DPUs, combined with NVIDIA DOCA Argus, provides a framework that enables applications to access comprehensive, real-time visibility into agent workload behavior and accurately pinpoint threats through advanced memory forensics. Deploying security controls directly onto BlueField DPUs, rather than server CPUs, further isolates threats at the infrastructure level, substantially reducing the blast radius of potential compromises and reinforcing a comprehensive, security-everywhere architecture.

Integrators also use NVIDIA Confidential Computing to strengthen security foundations for agentic infrastructure. For example, EQTYLab developed a new cryptographic certificate system that provides the first on-silicon governance to ensure AI agents are compliant at runtime. It will be featured at RSA this week as a top 10 RSA Innovation Sandbox recipient.

NVIDIA Confidential Computing is supported on NVIDIA Hopper and NVIDIA Blackwell GPUs, so isolation technologies can now be extended to the confidential virtual machine when users are moving from a single GPU to multi-GPUs.

Secure AI is provided by Protected PCIe and builds upon NVIDIA Confidential Computing, allowing customers to scale workloads from a single GPU to eight GPUs. This lets companies adapt to their agentic AI needs while delivering security in the most performant way.

These infrastructure components support both local and remote attestation, enabling customers to verify the integrity of the platform before deploying sensitive workloads.

These security capabilities are especially important in environments like AI factories — where agentic systems are beginning to power automation, monitoring and real-world decision-making. Cisco is pioneering secure AI infrastructure by integrating NVIDIA BlueField DPUs, forming the foundation of the Cisco Secure AI Factory with NVIDIA to deliver scalable, secure and efficient AI deployments for enterprises.

Extending agentic AI to cyber-physical systems heightens the stakes, as compromises can directly impact uptime, safety and the integrity of physical operations. Leading partners like Armis, Check Point, CrowdStrike, Deloitte, Forescout, Nozomi Networks and World Wide Technology are integrating NVIDIA’s full-stack cybersecurity AI technologies to help customers bolster critical infrastructure against cyber threats across industries such as energy, utilities and manufacturing.

Building Trust as AI Takes Action

Every enterprise today must ensure their investments in cybersecurity are incorporating AI to protect the workflows of the future. Every workload must be accelerated to finally give defenders the tools to operate at the speed of AI.

NVIDIA is building AI and security capabilities into technological foundations for ecosystem partners to deliver AI-powered cybersecurity solutions. This new ecosystem will allow enterprises to build secure, scalable agentic AI systems.

Join NVIDIA at the RSA Conference to learn about its collaborations with industry leaders to advance cybersecurity.

See notice regarding software product information.

Read More

NVIDIA Brings Cybersecurity to Every AI Factory

NVIDIA Brings Cybersecurity to Every AI Factory

As enterprises increasingly adopt AI, securing AI factories — where complex, agentic workflows are executed — has never been more critical.

NVIDIA is bringing runtime cybersecurity to every AI factory with a new NVIDIA DOCA software framework, part of the NVIDIA cybersecurity AI platform. Running on the NVIDIA BlueField networking platform, NVIDIA DOCA Argus operates on every node to immediately detect and respond to attacks on AI workloads, integrating seamlessly with enterprise security systems to deliver instant threat insights.

The DOCA Argus framework provides runtime threat detection by using advanced memory forensics to monitor threats in real time, delivering detection speeds up to 1,000x faster than existing agentless solutions — without impacting system performance.

Unlike conventional tools, Argus runs independently of the host, requiring no agents, integration or reliance on host-based resources. This agentless, zero-overhead design enhances system efficiency and ensures resilient security in any AI compute environment, including containerized and multi-tenant infrastructures. By operating outside the host, Argus remains invisible to attackers — even in the event of a system compromise.

Cybersecurity professionals can seamlessly integrate the framework with their SIEM, SOAR and XDR security platforms, enabling continuous monitoring and automated threat mitigation and extending their existing cybersecurity capabilities for AI infrastructure.

NVIDIA BlueField is a foundational security component for every AI factory, providing built-in, data-centric protection for AI workloads at scale. By combining BlueField’s acceleration capabilities with DOCA Argus’ proactive threat detection, enterprises can secure AI factories without compromising performance or efficiency.

Cisco is collaborating with NVIDIA to deliver a Secure AI Factory with NVIDIA architecture that simplifies how enterprises deploy and protect AI infrastructure at scale. The architecture embeds security into every layer of the AI factory, ensuring runtime protection is built in from the start rather than bolted on after deployment.

“Now is the time for enterprises to be driving forward with AI, but the key to unlocking innovative use cases and enabling broad adoption is safety and security,” said Jeetu Patel, executive vice president and chief product officer at Cisco. “NVIDIA and Cisco are providing enterprises with the infrastructure they need to confidently scale AI while safeguarding their most valuable data.”

DOCA Argus and BlueField are part of the NVIDIA cybersecurity AI platform — a full-stack, accelerated computing platform purpose-built for AI-driven protection. It combines BlueField’s data-centric security and Argus’ real-time threat detection with NVIDIA AI Enterprise software — including the NVIDIA Morpheus cybersecurity AI framework — to deliver visibility and control across an AI factory. It also taps into agentic AI to autonomously perceive, reason and respond to threats in real time.

NVIDIA cybersecurity AI platform.

Optimized AI Workload Threat Detection

Enterprises are inundated with massive volumes of data, making it difficult to pinpoint real threats. The growing adoption of agentic AI, with AI models and autonomous agents operating at enterprise scale to seamlessly connect data, applications and users, brings unprecedented opportunities for gleaning insights from data — while introducing the need for advanced protection that can keep pace.

DOCA Argus is fine-tuned and optimized using insights from NVIDIA’s own security team, surfacing only real, validated threats. By focusing on well-known threat actors and eliminating false positives, the framework provides enterprises with actionable intelligence, reducing alert fatigue and streamlining security operations.

Argus is purpose-built to protect containerized workloads like NVIDIA NIM microservices, incorporating real-world threat intelligence and validation to secure every layer of the AI application stack.

“Cyber defenders need robust tools to effectively protect AI factories, which serve as the foundation for agentic reasoning,” said David Reber, chief security officer at NVIDIA. “The DOCA Argus framework delivers real-time security insights to enable autonomous detection and response — equipping defenders with a data advantage through actionable intelligence.”

Get started with DOCA Argus and meet NVIDIA at the RSA Conference in San Francisco, running through Thursday, May 1.

Read More

Oracle Cloud Infrastructure Deploys Thousands of NVIDIA Blackwell GPUs for Agentic AI and Reasoning Models

Oracle Cloud Infrastructure Deploys Thousands of NVIDIA Blackwell GPUs for Agentic AI and Reasoning Models

Oracle has stood up and optimized its first wave of liquid-cooled NVIDIA GB200 NVL72 racks in its data centers. Thousands of NVIDIA Blackwell GPUs are now being deployed and ready for customer use on NVIDIA DGX Cloud and Oracle Cloud Infrastructure (OCI) to develop and run next-generation reasoning models and AI agents.

Oracle’s state-of-the-art GB200 deployment includes high-speed NVIDIA Quantum-2 InfiniBand and NVIDIA Spectrum-X Ethernet networking to enable scalable, low-latency performance, as well as a full stack of software and database integrations from NVIDIA and OCI.

OCI, one of the world’s largest and fastest-growing cloud service providers, is among the first to deploy NVIDIA GB200 NVL72 systems. The company has ambitious plans to build one of the world’s largest Blackwell clusters. OCI Superclusters will scale beyond 100,000 NVIDIA Blackwell GPUs to meet the world’s skyrocketing need for inference tokens and accelerated computing. The torrid pace of AI innovation continues as several companies including OpenAI have released new reasoning models in the past few weeks.

OCI’s installation is the latest example of NVIDIA Grace Blackwell systems going online worldwide, transforming cloud data centers into AI factories that manufacture intelligence at scale. These new AI factories leverage the NVIDIA GB200 NVL72 platform, a rack-scale system that combines 36 NVIDIA Grace CPUs and 72 NVIDIA Blackwell GPUs, delivering exceptional performance and energy efficiency for agentic AI powered by advanced AI reasoning models.

OCI offers flexible deployment options to bring Blackwell to customers across public, government and sovereign clouds, as well as customer-owned data centers through OCI Dedicated Region and OCI Alloy at any scale.

A number of customers are planning to deploy workloads right away on the OCI GB200 systems including major technology companies, enterprise customers, government agencies and contractors, and regional cloud providers.

These new racks are the first systems available from NVIDIA DGX Cloud, an optimized platform with software, services and technical support to develop and deploy AI workloads on leading clouds such as OCI. NVIDIA will use the racks for a variety of projects including training reasoning models, autonomous vehicle development, accelerating chip design and manufacturing, and developing AI tools.

GB200 NVL72 racks are live and available now from DGX Cloud and OCI.

Read More

All Roads Lead Back to Oblivion: Bethesda’s ‘The Elder Scrolls IV: Oblivion Remastered’ Arrives on GeForce NOW

All Roads Lead Back to Oblivion: Bethesda’s ‘The Elder Scrolls IV: Oblivion Remastered’ Arrives on GeForce NOW

Get the controllers ready and clear the calendar — it’s a jam-packed GFN Thursday.

Time to revisit a timeless classic for a dose of remastered nostalgia. GeForce NOW is bringing members a surprise from Bethesda — The Elder Scrolls IV: Oblivion Remastered is now available in the cloud.

Clair Obscur: Expedition 33, the spellbinding turn-based role-playing game, is ready to paint its adventure across GeForce NOW for members to stream in style.

Sunderfolk, from Dreamhaven’s Secret Door studio, launches on GeForce NOW, following an exclusive First Look Demo for members.

And get ready to crack the case with the sharpest minds in the business — Capcom’s Ace Attorney Investigations Collection heads to the cloud this week, offering members the thrilling adventures of prosecutor Miles Edgeworth.

Stream it all across devices, along with eight other games added to the cloud this week, including Zenless Zone Zero’s latest update.

A Legendary Quest

Elder Scrolls IV Oblivion Remastered on GeForce NOW
Forge your path in the cloud.

Step back into the world of Cyrodiil in style with the award-winning The Elder Scrolls IV: Oblivion Remastered in the cloud. The revitalization of the iconic 2006 role-playing game offers updated visuals, gameplay and plenty of more content.

Explore a meticulously recreated world, navigate story paths as diverse character archetypes and engage in an epic quest to save Tamriel from a Daedric invasion. The remaster includes all previously released expansions — Shivering Isles, Knights of the Nine and additional downloadable content — providing a comprehensive experience for new and returning fans.

Rediscover the vast landscape of Cyrodiil like never before with a GeForce NOW membership and stop the forces of Oblivion from overtaking the land. Ultimate and Performance members enjoy higher resolutions and longer gaming sessions for immersive gaming anytime, anywhere.

A Whole New World

Sunderfolk is a turn-based tactical role-playing adventure for up to four players that offers an engaging couch co-op experience. Control characters using a smartphone app, which serves as both a controller and a hub for cards, inventory and rules.

Sunderfolk on GeForce NOW
Make game night unforgettable with the cloud.

In the underground fantasy world of Arden, take on the roles of anthropomorphic animal heroes tasked with defending their town from the corruption of shadowstone. Six unique classes — from the fiery Pyromancer salamander to the tactical Bard bat — are equipped with distinct skill cards. Missions range from combat and exploration to puzzles and rescues, requiring teamwork and coordination.

Get into the mischief streaming it on GeForce NOW. Gather the squad and rekindle the spirit of game night from the comfort of the couch, streaming on the big screen with GeForce NOW and using a mobile device as a controller for a unique, immersive co-op experience.

No Objections Here

Ace Attorney Investigations Collection
Channel your inner Miles Edgeworth.

Experience both Ace Attorney Investigations games in one gorgeous collection, stepping into the shoes of Miles Edgeworth, the prosecutor of prosecutors from the Ace Attorney mainline games.

Leave the courtroom behind and walk with Edgeworth around the crime scene to gather evidence and clues, including by talking with persons of interest. Solve tough, intriguing cases through wit, logic and deduction.

Members can level up their detective work across devices with a premium GeForce NOW membership. Ultimate and Performance members get extended session times to crack cases without interruptions.

Tears, Fears and Parasol Spears

Zenless Zone Zero 1.7 on GeForce nOW
Zeroing in on secrets.

Zenless Zone Zero v1.7, “Bury Your Tears With the Past,” marks the dramatic conclusion of the first season’s storyline. Team with a special investigator to infiltrate enemy ranks, uncover the truth behind the Exaltists’ conspiracy and explore the mysteries of the Sacrifice Core, adding new depth to the game’s lore and characters.

The update also introduces two new S-Rank Agents — Vivian, a versatile Ether Anomaly fighter, and Hugo, an Ice Attack specialist — each bringing unique combat abilities to the roster. Alongside limited-time events, quality-of-life improvements and more, the update offers fresh gameplay modes and exclusive rewards.

Quest for Fresh Adventures

Clair Obscur: Expedition 33
Defy the monolith.

Clair Obscur: Expedition 33 is a visually stunning, dark fantasy role-playing game available now for members to stream. A mysterious entity called the Paintress erases everyone of a certain age each year after painting their number on a monolith. Join a desperate band of survivors — most with only a year left to live — on the 33rd expedition to end this cycle of death by confronting the Paintress and her monstrous creations. Dodge, parry and counterattack in battle while exploring a richly imagined world inspired by French Belle Époque art and filled with complex, emotionally driven characters.

Look for the following games available to stream in the cloud this week:

  • The Elder Scrolls IV: Oblivion Remastered (New release on Steam and Xbox, available on PC Game Pass, April 22)
  • Sunderfolk (New release on Steam, April 23)
  • Clair Obscur: Expedition 33 (New release on Steam and Xbox, available on PC Game Pass, April 24)
  • Ace Attorney Investigations Collection (Steam and Xbox, available on the Microsoft Store)
  • Ace Attorney Investigations Collection Demo (Steam and Xbox, available on the Microsoft Store)
  • Dead Rising Deluxe Remaster Demo (Steam)
  • EXFIL (Steam)
  • Sands of Aura (Epic Games Store)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

NVIDIA Research at ICLR — Pioneering the Next Wave of Multimodal Generative AI

NVIDIA Research at ICLR — Pioneering the Next Wave of Multimodal Generative AI

Advancing AI requires a full-stack approach, with a powerful foundation of computing infrastructure — including accelerated processors and networking technologies — connected to optimized compilers, algorithms and applications.

NVIDIA Research is innovating across this spectrum, supporting virtually every industry in the process. At this week’s International Conference on Learning Representations (ICLR), taking place April 24-28 in Singapore, more than 70 NVIDIA-authored papers introduce AI developments with applications in autonomous vehicles, healthcare, multimodal content creation, robotics and more.

“ICLR is one of the world’s most impactful AI conferences, where researchers introduce important technical innovations that move every industry forward,” said Bryan Catanzaro, vice president of applied deep learning research at NVIDIA. “The research we’re contributing this year aims to accelerate every level of the computing stack to amplify the impact and utility of AI across industries.”

Research That Tackles Real-World Challenges

Several NVIDIA-authored papers at ICLR cover groundbreaking work in multimodal generative AI and novel methods for AI training and synthetic data generation, including: 

  • Fugatto: The world’s most flexible audio generative AI model, Fugatto generates or transforms any mix of music, voices and sounds described with prompts using any combination of text and audio files. Other NVIDIA models at ICLR improve audio large language models (LLMs) to better understand speech.
  • HAMSTER: This paper demonstrates that a hierarchical design for vision-language-action models can improve their ability to transfer knowledge from off-domain fine-tuning data — inexpensive data that doesn’t need to be collected on actual robot hardware — to improve a robot’s skills in testing scenarios.   
  • Hymba: This family of small language models uses a hybrid model architecture to create LLMs that blend the benefits of transformer models and state space models, enabling high-resolution recall, efficient context summarization and common-sense reasoning tasks. With its hybrid approach, Hymba improves throughput by 3x and reduces cache by almost 4x without sacrificing performance.
  • LongVILA: This training pipeline enables efficient visual language model training and inference for long video understanding. Training AI models on long videos is compute and memory-intensive — so this paper introduces a system that efficiently parallelizes long video training and inference, with training scalability up to 2 million tokens on 256 GPUs. LongVILA achieves state-of-the-art performance across nine popular video benchmarks.
  • LLaMaFlex: This paper introduces a new zero-shot generation technique to create a family of compressed LLMs based on one large model. The researchers found that LLaMaFlex can generate compressed models that are as accurate or better than state-of-the art pruned, flexible and trained-from-scratch models — a capability that could be applied to significantly reduce the cost of training model families compared to techniques like pruning and knowledge distillation.
  • Proteina: This model can generate diverse and designable protein backbones, the framework that holds a protein together. It uses a transformer model architecture with up to 5x as many parameters as previous models.
  • SRSA: This framework addresses the challenge of teaching robots new tasks using a preexisting skill library — so instead of learning from scratch, a robot can apply and adapt its existing skills to the new task. By developing a framework to predict which preexisting skill would be most relevant to a new task, the researchers were able to improve zero-shot success rates on unseen tasks by 19%.
  • STORM: This model can reconstruct dynamic outdoor scenes — like cars driving or trees swaying in the wind — with a precise 3D representation inferred from just a few snapshots. The model, which can reconstruct large-scale outdoor scenes in 200 milliseconds, has potential applications in autonomous vehicle development.

Discover the latest work from NVIDIA Research, a global team of around 400 experts in fields including computer architecture, generative AI, graphics, self-driving cars and robotics. 

Read More

Capital One Banks on AI for Financial Services

Capital One Banks on AI for Financial Services

Financial services has long been at the forefront of adopting technological innovations. Today, generative AI and agentic systems are redefining the industry, from customer interactions to enterprise operations.

Prem Natarajan, executive vice president, chief scientist and head of AI at Capital One, joined the NVIDIA AI Podcast to discuss how his organization is building proprietary AI systems that deliver value to over 100 million customers.

“AI is at its best when it transfers cognitive burden from the human to the system,” Natarajan said. “It allows the human to have that much more fun and experience that magic.”

Capital One’s strategy centers on a “test, iterate, refine” approach that balances innovation with rigorous risk management. The company’s first agentic AI deployment is a chat concierge that helps customers navigate the car-buying process, such as by scheduling test drives.

Rather than simply integrating third-party solutions, Capital One builds proprietary AI technologies that tap into its vast data repositories.

“Your data advantage is your AI advantage,” Natarajan emphasized. “Proprietary data allows you to build proprietary AI that provides enduring differentiated services for your customers.”

Capital One’s AI architecture combines open-weight foundation models with deep customizations using proprietary data. This approach, Natarajan explained, supports the creation of specialized models that excel at financial services tasks and integrate into multi-agent workflows that can take actions.

Natarajan stressed that responsible AI is fundamental to Capital One’s design process. His teams take a “responsibility through design” approach, implementing robust guardrails — both technological and human-in-the-loop — to ensure safe deployment.

The concept of an AI factory — where raw data is processed and refined to produce actionable intelligence — aligns naturally with Capital One’s cloud-native technology stack. AI factories incorporate all the components required for financial institutions to generate intelligence, combining hardware, software, networking and development tools for AI applications in financial services.

Time Stamps

1:10 – Natarajan’s background and journey to Capital One.

4:50 – Capital One’s approach to generative AI and agentic systems.

15:56 – Challenges in implementing responsible AI in financial services.

28:46 – AI factories and Capital One’s cloud-native advantage.

You Might Also Like… 

NVIDIA’s Jacob Liberman on Bringing Agentic AI to Enterprises

Agentic AI enables developers to create intelligent multi-agent systems that reason, act and execute complex tasks with a degree of autonomy. Jacob Liberman, director of product management at NVIDIA, explains how agentic AI bridges the gap between powerful AI models and practical enterprise applications.

Telenor Builds Norway’s First AI Factory, Offering Sustainable and Sovereign Data Processing

Telenor opened Norway’s first AI factory in November 2024, enabling organizations to process sensitive data securely on Norwegian soil while prioritizing environmental responsibility. Telenor’s Chief Innovation Officer and Head of the AI Factory Kaaren Hilsen discusses the AI factory’s rapid development, going from concept to reality in under a year.

Imbue CEO Kanjun Qiu on Transforming AI Agents Into Personal Collaborators

Kanjun Qiu, CEO of Imbue, explores the emerging era where individuals can create and use their own AI agents. Drawing a parallel to the PC revolution of the late 1970s and ‘80s, Qiu discusses how modern AI systems are evolving to work collaboratively with users, enhancing their capabilities rather than just automating tasks.

Read More

How the Economics of Inference Can Maximize AI Value

How the Economics of Inference Can Maximize AI Value

As AI models evolve and adoption grows, enterprises must perform a delicate balancing act to achieve maximum value.

That’s because inference — the process of running data through a model to get an output — offers a different computational challenge than training a model.

Pretraining a model — the process of ingesting data, breaking it down into tokens and finding patterns — is essentially a one-time cost. But in inference, every prompt to a model generates tokens, each of which incur a cost.

That means that as AI model performance and use increases, so do the amount of tokens generated and their associated computational costs. For companies looking to build AI capabilities, the key is generating as many tokens as possible — with maximum speed, accuracy and quality of service — without sending computational costs skyrocketing.

As such, the AI ecosystem has been working to make inference cheaper and more efficient. Inference costs have been trending down for the past year thanks to major leaps in model optimization, leading to increasingly advanced, energy-efficient accelerated computing infrastructure and full-stack solutions.

According to the Stanford University Institute for Human-Centered AI’s 2025 AI Index Report, “the inference cost for a system performing at the level of GPT-3.5 dropped over 280-fold between November 2022 and October 2024. At the hardware level, costs have declined by 30% annually, while energy efficiency has improved by 40% each year. Open-weight models are also closing the gap with closed models, reducing the performance difference from 8% to just 1.7% on some benchmarks in a single year. Together, these trends are rapidly lowering the barriers to advanced AI.”

As models evolve and generate more demand and create more tokens, enterprises need to scale their accelerated computing resources to deliver the next generation of AI reasoning tools or risk rising costs and energy consumption.

What follows is a primer to understand the concepts of the economics of inference, enterprises can position themselves to achieve efficient, cost-effective and profitable AI solutions at scale.

Key Terminology for the Economics of AI Inference

Knowing key terms of the economics of inference helps set the foundation for understanding its importance.

Tokens are the fundamental unit of data in an AI model. They’re derived from data during training as text, images, audio clips and videos. Through a process called tokenization, each piece of data is broken down into smaller constituent units. During training, the model learns the relationships between tokens so it can perform inference and generate an accurate, relevant output.

Throughput refers to the amount of data — typically measured in tokens — that the model can output in a specific amount of time, which itself is a function of the infrastructure running the model. Throughput is often measured in tokens per second, with higher throughput meaning greater return on infrastructure.

Latency is a measure of the amount of time between inputting a prompt and the start of the model’s response. Lower latency means faster responses. The two main ways of measuring latency are:

  • Time to First Token: A measurement of the initial processing time required by the model to generate its first output token after a user prompt.
  • Time per Output Token: The average time between consecutive tokens — or the time it takes to generate a completion token for each user querying the model at the same time. It’s also known as “inter-token latency” or token-to-token latency.

Time to first token and time per output token are helpful benchmarks, but they’re just two pieces of a larger equation. Focusing solely on them can still lead to a deterioration of performance or cost.

To account for other interdependencies, IT leaders are starting to measure “goodput,” which is defined as the throughput achieved by a system while maintaining target time to first token and time per output token levels. This metric allows organizations to evaluate performance in a more holistic manner, ensuring that throughput, latency and cost are aligned to support both operational efficiency and an exceptional user experience.

Energy efficiency is the measure of how effectively an AI system converts power into computational output, expressed as performance per watt. By using accelerated computing platforms, organizations can maximize tokens per watt while minimizing energy consumption.

How the Scaling Laws Apply to Inference Cost

The three AI scaling laws are also core to understanding the economics of inference:

  • Pretraining scaling: The original scaling law that demonstrated that by increasing training dataset size, model parameter count and computational resources, models can achieve predictable improvements in intelligence and accuracy.
  • Post-training: A process where models are fine-tuned for accuracy and specificity so they can be applied to application development. Techniques like retrieval-augmented generation can be used to return more relevant answers from an enterprise database.
  • Test-time scaling (aka “long thinking” or “reasoning”): A technique by which models allocate additional computational resources during inference to evaluate multiple possible outcomes before arriving at the best answer.

While AI is evolving and post-training and test-time scaling techniques become more sophisticated, pretraining isn’t disappearing and remains an important way to scale models. Pretraining will still be needed to support post-training and test-time scaling.

Profitable AI Takes a Full-Stack Approach

In comparison to inference from a model that’s only gone through pretraining and post-training, models that harness test-time scaling generate multiple tokens to solve a complex problem. This results in more accurate and relevant model outputs — but is also much more computationally expensive.

Smarter AI means generating more tokens to solve a problem. And a quality user experience means generating those tokens as fast as possible. The smarter and faster an AI model is, the more utility it will have to companies and customers.

Enterprises need to scale their accelerated computing resources to deliver the next generation of AI reasoning tools that can support complex problem-solving, coding and multistep planning without skyrocketing costs.

This requires both advanced hardware and a fully optimized software stack. NVIDIA’s AI factory product roadmap is designed to deliver the computational demand and help solve for the complexity of inference, while achieving greater efficiency.

AI factories integrate high-performance AI infrastructure, high-speed networking and optimized software to produce intelligence at scale. These components are designed to be flexible and programmable, allowing businesses to prioritize the areas most critical to their models or inference needs.

To further streamline operations when deploying massive AI reasoning models, AI factories run on a high-performance, low-latency inference management system that ensures the speed and throughput required for AI reasoning are met at the lowest possible cost to maximize token revenue generation.

Learn more by reading the ebook “AI Inference: Balancing Cost, Latency and Performance.”

Read More

Enterprises Onboard AI Teammates Faster With NVIDIA NeMo Tools to Scale Employee Productivity

Enterprises Onboard AI Teammates Faster With NVIDIA NeMo Tools to Scale Employee Productivity

An AI agent is only as accurate, relevant and timely as the data that powers it.

Now generally available, NVIDIA NeMo microservices are helping enterprise IT quickly build AI teammates that tap into data flywheels to scale employee productivity. The microservices provide an end-to-end developer platform for creating state-of-the-art agentic AI systems and continually optimizing them with data flywheels informed by inference and business data, as well as user preferences.

With a data flywheel, enterprise IT can onboard AI agents as digital teammates. These agents can tap into user interactions and data generated during AI inference to continuously improve model performance — turning usage into insight and insight into action.

Building Powerful Data Flywheels for Agentic AI

Without a constant stream of high-quality inputs — from databases, user interactions or real-world signals — an agent’s understanding can weaken, making responses less reliable and agents less productive.

Maintaining and improving the models that power AI agents in production requires three types of data: inference data to gather insights and adapt to evolving data patterns, up-to-date business data to provide intelligence, and user feedback data to advise if the model and application are performing as expected. NeMo microservices help developers tap into these three data types.

NeMo microservices speed AI agent development with end-to-end tools for curating, customizing, evaluating and guardrailing the models that drive their agents.

NVIDIA NeMo microservices — including NeMo Customizer, NeMo Evaluator and NeMo Guardrails — can be used alongside NeMo Retriever and NeMo Curator to ease enterprises’ experiences building, optimizing and scaling AI agents through custom enterprise data flywheels. For example:

  • NeMo Customizer accelerates large language model fine-tuning, delivering up to 1.8x higher training throughput. This high-performance, scalable microservice uses popular post-training techniques including supervised fine-tuning and low-rank adaptation.
  • NeMo Evaluator simplifies the evaluation of AI models and workflows on custom and industry benchmarks with just five application programming interface (API) calls.
  • NeMo Guardrails improves compliance protection by up to 1.4x with only half a second of additional latency, helping organizations implement robust safety and security measures that align with organizational policies and guidelines.

With NeMo microservices, developers can build data flywheels that boost AI agent accuracy and efficiency. Deployed through the NVIDIA AI Enterprise software platform, NeMo microservices are easy to operate and can run on any accelerated computing infrastructure, on premises or in the cloud, with enterprise-grade security, stability and support.

The microservices have become generally available at a time when enterprises are building large-scale multi-agent systems, where hundreds of specialized agents — with distinct goals and workflows — collaborate to tackle complex tasks as digital teammates, working alongside employees to assist, augment and accelerate work across functions.

This enterprise-wide impact positions AI agents as a trillion-dollar opportunity — with applications spanning automated fraud detection, shopping assistants, predictive machine maintenance and document review — and underscores the critical role data flywheels play in transforming business data into actionable insights.

Data flywheels built with NVIDIA NeMo microservices constantly curate data, retrain models and evaluate their performance, all with minimal human interactions and maximum autonomy.

Industry Pioneers Boost AI Agent Accuracy With NeMo Microservices

NVIDIA partners and industry pioneers are using NeMo microservices to build responsive AI agent platforms so that digital teammates can help get more done.

Working with Arize and Quantiphi, AT&T has built an advanced AI-powered agent using NVIDIA NeMo, designed to process a knowledge base of nearly 10,000 documents, refreshed weekly. The scalable, high-performance AI agent is fine-tuned for three key business priorities: speed, cost efficiency and accuracy — all increasingly critical as adoption scales.

AT&T boosted AI agent accuracy by up to 40% using NeMo Customizer and Evaluator by fine-tuning a Mistral 7B model to help deliver personalized services, prevent fraud and optimize network performance.

BlackRock is working with NeMo microservices for agentic AI capabilities in its Aladdin tech platform, which unifies the investment management process through a common data language.

Teaming with Galileo, Cisco’s Outshift team is using NVIDIA NeMo microservices to power a coding assistant that delivers 40% fewer tool selection errors and achieves up to 10x faster response times.

Nasdaq is accelerating its Nasdaq Gen AI Platform with NeMo Retriever microservices and NVIDIA NIM microservices. NeMo Retriever enhanced the platform’s search capabilities, leading to up to 30% improved accuracy and response times, in addition to cost savings.

Broad Model and Partner Ecosystem Support for NeMo Microservices

NeMo microservices support a broad range of popular open models, including Llama, the Microsoft Phi family of small language models, Google Gemma, Mistral and Llama Nemotron Ultra, currently the top open model on scientific reasoning, coding and complex math benchmarks.

Meta has tapped NVIDIA NeMo microservices through new connectors for Meta Llamastack. Users can access the same capabilities — including Customizer, Evaluator and Guardrails — via APIs, enabling them to run the full suite of agent-building workflows within their environment.

“With Llamastack integration, agent builders can implement data flywheels powered by NeMo microservices,” said Raghotham Murthy, software engineer, GenAI, at Meta. “This allows them to continuously optimize models to improve accuracy, boost efficiency and reduce total cost of ownership.”

Leading AI software providers such as Cloudera, Datadog, Dataiku, DataRobot, DataStax, SuperAnnotate, Weights & Biases and more have integrated NeMo microservices into their platforms. Developers can use NeMo microservices in popular AI frameworks including CrewAI, Haystack by deepset, LangChain, LlamaIndex and Llamastack.

Enterprises can build data flywheels with NeMo Retriever microservices using NVIDIA AI Data Platform offerings from NVIDIA-Certified Storage partners including DDN, Dell Technologies, Hewlett Packard Enterprise, Hitachi Vantara, IBM, NetApp, Nutanix, Pure Storage, VAST Data and WEKA.

Leading enterprise platforms including Amdocs, Cadence, Cohesity, SAP, ServiceNow and Synopsys are using NeMo Retriever microservices in their AI agent solutions.

Enterprises can run AI agents on NVIDIA-accelerated infrastructure, networking and software from leading system providers including Cisco, Dell, Hewlett Packard Enterprise and Lenovo.

Consulting giants including Accenture, Deloitte and EY are building AI agent platforms for enterprises using NeMo microservices.

Developers can download NeMo microservices from the NVIDIA NGC catalog. The microservices can be deployed as part of NVIDIA AI Enterprise with extended-life software branches for API stability, proactive security remediation and enterprise-grade support.

Read More

Project G-Assist Plug-In Builder Lets Anyone Customize AI on GeForce RTX AI PCs

Project G-Assist Plug-In Builder Lets Anyone Customize AI on GeForce RTX AI PCs

AI is rapidly reshaping what’s possible on a PC — whether for real-time image generation or voice-controlled workflows. As AI capabilities grow, so does their complexity. Tapping into the power of AI can entail navigating a maze of system settings, software and hardware configurations.

Enabling users to explore how on-device AI can simplify and enhance the PC experience, Project G-Assist — an AI assistant that helps tune, control and optimize GeForce RTX systems — is now available as an experimental feature in the NVIDIA app. Developers can try out AI-powered voice and text commands for tasks like monitoring performance, adjusting settings and interacting with supporting peripherals. Users can even summon other AIs powered by GeForce RTX AI PCs.

And it doesn’t stop there. For those looking to expand Project G-Assist capabilities in creative ways, the AI supports custom plug-ins. With the new ChatGPT-based G-Assist Plug-In Builder, developers and enthusiasts can create and customize G-Assist’s functionality, adding new commands, connecting external tools and building AI workflows tailored to specific needs. With the plug-in builder, users can generate properly formatted code with AI, then integrate the code into G-Assist — enabling quick, AI-assisted functionality that responds to text and voice commands.

Teaching PCs New Tricks: Plug-Ins and APIs Explained

Plug-ins are lightweight add-ons that give software new capabilities. G-Assist plug-ins can control music, connect with large language models and much more.

Under the hood, these plug-ins tap into application programming interfaces (APIs), which allow different software and services to talk to each other. Developers can define functions in simple JSON formats, write logic in Python and quickly integrate new tools or features into G-Assist.

With the G-Assist Plug-In Builder, users can:

  • Use a responsive small language model running locally on GeForce RTX GPUs for fast, private inference.
  • Extend G-Assist’s capabilities with custom functionality tailored to specific workflows, games and tools.
  • Interact with G-Assist directly from the NVIDIA overlay, without tabbing out of an application or workflow.
  • Invoke AI-powered GPU and system controls from applications using C++ and Python bindings.
  • Integrate with agentic frameworks using tools like Langflow, letting G-Assist function as a component in larger AI pipelines and multi-agent systems.

Built for Builders: Using Free APIs to Expand AI PC Capabilities 

NVIDIA’s GitHub repository provides everything needed to get started on developing with G-Assist — including sample plug-ins, step-by-step instructions and documentation for building custom functionalities.

Developers can define functions in JSON and drop config files into a designated directory, where G-Assist can automatically load and interpret them. Users can even submit plug-ins for review and potential inclusion in the NVIDIA GitHub repository to make new capabilities available for others.

Hundreds of free, developer-friendly APIs are available today to extend G-Assist capabilities — from automating workflows to optimizing PC setups to boosting online shopping. For ideas, find searchable indices of free APIs for use across entertainment, productivity, smart home, hardware and more on publicapis.dev, free-apis.github.io, apilist.fun and APILayer.

Available sample plug-ins include Spotify, which enables hands-free music and volume control, and Google Gemini, which allows G-Assist to invoke a much larger cloud-based AI for more complex conversations, brainstorming and web searches using a free Google AI Studio API key.

In the clip below, G-Assist asks Gemini for advice on which Legend to pick in the hit game Apex Legends when solo queueing, as well as whether it’s wise to jump into Nightmare mode for level 25 in Diablo IV:

And in the following clip, a developer uses the new plug-in builder to create a Twitch plug-in for G-Assist that checks if a streamer is live. After generating the necessary JSON manifest and Python files, the developer simply drops them into the G-Assist directory to enable voice commands like, “Hey, Twitch, is [streamer] live?”

In addition, users can customize G-Assist to control select peripherals and software applications with simple commands, such as to benchmark or adjust fan speeds, or to change lighting on supported Logitech G, Corsair, MSI and Nanoleaf devices.

Other examples include a Stock Checker plug-in that lets users quickly look up real-time stock prices and performance data, or a Weather plug-in allows users to ask G-Assist for current weather conditions in any city.

Details on how to build, share and load plug-ins are available on the NVIDIA GitHub repository.

Start Building Today

With the G-Assist Plugin Builder and open API support, anyone can extend G-Assist to fit their exact needs. Explore the GitHub repository and submit features for review to help shape the next wave of AI-powered PC experiences.

Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX AI PC newsletter.

Follow NVIDIA Workstation on LinkedIn and X.

See notice regarding software product information.

Read More

Making Brain Waves: AI Startup Speeds Disease Research With Lab in the Loop

Making Brain Waves: AI Startup Speeds Disease Research With Lab in the Loop

About 15% of the world’s population — over a billion people — are affected by neurological disorders, from commonly known diseases like Alzheimer’s and Parkinson’s to hundreds of lesser-known, rare conditions.

BrainStorm Therapeutics, a San Diego-based startup, is accelerating the development of cures for these conditions using AI-powered computational drug discovery paired with lab experiments using organoids: tiny, 3D bundles of brain cells created from patient-derived stem cells. This hybrid, iterative method, where clinical data and AI models inform one another to accelerate drug development, is known as lab in the loop.

“The brain is the last frontier in modern biology,” said BrainStorm’s founder and CEO Robert Fremeau, who was previously a scientific director in neuroscience at Amgen and a faculty member at Duke University and the University of California, San Francisco. “By combining our organoid disease models with the power of generative AI, we now have the ability to start to unravel the underlying complex biology of disease networks.”

The company aims to lower the failure rate of drug candidates for brain diseases during clinical trials — currently over 93% — and identify therapeutics that can be applied to multiple diseases. Achieving these goals would make it faster and more economically viable to develop treatments for rare and common conditions.

“This alarmingly high clinical trial failure rate is mainly due to the inability of traditional preclinical models with rodents or 2D cells to predict human efficacy,” said Jun Yin, cofounder and chief technology officer at BrainStorm. “By integrating human-derived brain organoids with AI-driven analysis, we’re building a platform that better reflects the complexity of human neurobiology and improves the likelihood of clinical success.”

Fremeau and Yin believe that BrainStorm’s platform has the potential to accelerate development timelines, reduce research and development costs, and significantly increase the probability of bringing effective therapies to patients.

BrainStorm Therapeutics’ AI models, which run on NVIDIA GPUs in the cloud, were developed using the NVIDIA BioNeMo Framework, a set of programming tools, libraries and models for computational drug discovery. The company is a member of NVIDIA Inception, a global network of cutting-edge startups.

Clinical Trial in a Dish

BrainStorm Therapeutics uses AI models to develop gene maps of brain diseases, which they can use to identify promising targets for potential drugs and clinical biomarkers. Organoids allow them to screen thousands of drug molecules per day directly on human brain cells, enabling them to test the effectiveness of potential therapies before starting clinical trials.

“Brains have brain waves that can be picked up in a scan like an EEG, or electroencephalogram, which measures the electrical activity of neurons,” said Maya Gosztyla, the company’s cofounder and chief operating officer. “Our organoids also have spontaneous brain waves, allowing us to model the complex activity that you would see in the human brain in this much smaller system. We treat it like a clinical trial in a dish for studying brain diseases.”

BrainStorm Therapeutics is currently using patient-derived organoids for its work on drug discovery for Parkinson’s disease, a condition tied to the loss of neurons that produce dopamine, a neurotransmitter that helps with physical movement and cognition.

“In Parkinson’s disease, multiple genetic variants contribute to dysfunction across different cellular pathways, but they converge on a common outcome — the loss of dopamine neurons,” Fremeau said. “By using AI models to map and analyze the biological effects of these variants, we can discover disease-modifying treatments that have the potential to slow, halt or even reverse the progression of Parkinson’s.”

The BrainStorm team used single-cell sequencing data from brain organoids to fine-tune foundation models available through the BioNeMo Framework, including the Geneformer model for gene expression analysis. The organoids were derived from patients with mutations in the GBA1 gene, the most common genetic risk factor for Parkinson’s disease.

BrainStorm is also collaborating with the NVIDIA BioNeMo team to help optimize open-source access to the Geneformer model.

Accelerating Drug Discovery Research

With its proprietary platform, BrainStorm can mirror human brain biology and simulate how different treatments might work in a patient’s brain.

“This can be done thousands of times, much quicker and much cheaper than can be done in a wet lab — so we can narrow down therapeutic options very quickly,” Gosztyla said. “Then we can go in with organoids and test the subset of drugs the AI model thinks will be effective. Only after it gets through those steps will we actually test these drugs in humans.”

View of an organoid using Fluorescence Imaging Plate Reader, or FLIPR — a technique used to study the effect of compounds on cells during drug screening.

This technology led to the discovery that Donepezil, a drug prescribed for Alzheimer’s disease, could also be effective in treating Rett syndrome, a rare genetic neurodevelopmental disorder. Within nine months, the BrainStorm team was able to go from organoid screening to applying for a phase 2 clinical trial of the drug in Rett patients. This application was recently cleared by the U.S. Food and Drug Administration.

BrainStorm also plans to develop multimodal AI models that integrate data from cell sequencing, cell imaging, EEG scans and more.

“You need high-quality, multimodal input data to design the right drugs,” said Yin. “AI models trained on this data will help us understand disease better, find more effective drug candidates and, eventually, find prognostic biomarkers for specific patients that enable the delivery of precision medicine.”

The company’s next project is an initiative with the CURE5 Foundation to conduct the most comprehensive repurposed drug screen to date for CDKL5 Deficiency Disorder, another rare genetic neurodevelopmental disorder.

“Rare disease research is transforming from a high-risk niche to a dynamic frontier,” said Fremeau. “The integration of BrainStorm’s AI-powered organoid technology with NVIDIA accelerated computing resources and the NVIDIA BioNeMo platform is dramatically accelerating the pace of innovation while reducing the cost — so what once required a decade and billions of dollars can now be investigated with significantly leaner resources in a matter of months.”

Get started with NVIDIA BioNeMo for AI-accelerated drug discovery.

Read More