IQVIA and NVIDIA Harmonize to Accelerate Clinical Research and Commercialization With AI Agents

IQVIA and NVIDIA Harmonize to Accelerate Clinical Research and Commercialization With AI Agents

Tens of billions of euros are spent each year on drug research and development across the globe — but only a few dozen new drugs make it to market.

Agentic AI is introducing a rhythm shift in the pharmaceutical industry, with IQVIA, the world’s leading provider of clinical research services, commercial insights and healthcare intelligence, as its conductor.

At NVIDIA GTC Paris at VivaTech, IQVIA announced it is launching multiple AI orchestrator agents in collaboration with NVIDIA. These specialized agentic systems are designed to manage and accelerate complex pharmaceutical development workflows for IQVIA’s thousands of pharmaceutical, biotech and medical device customers across the globe.

The AI agents act as supervisors for a group of sub-agents that each specialize in different tasks, like a conductor managing strings, woodwinds, bass and percussion sections. The orchestrator agent routes any necessary actions — like speech-to-text transcription, clinical coding, structured data extraction and data summarization — to the appropriate sub-agent, ensuring that each step in the complex workflow is accelerated and managed by human experts in the loop.

Using its vast databases comprising many petabytes of data and deep domain life sciences expertise, IQVIA can train and fine-tune these models for maximum productivity and efficiency.

Agentic AI Strikes a Chord in Clinical Trials

IQVIA’s localized expertise on regulatory requirements in different countries, including those across Europe, puts the company center stage in the clinical space.

Deployed in its healthcare-grade AI platform, IQVIA’s AI orchestrator agents are designed to accelerate every step of the pharmaceutical lifecycle, including clinical trials.

Clinical trials represent a major accomplishment in the research and development process for pharmaceutical companies, but planning and executing a trial typically takes years. The start-up process, alone, often takes about 200 days and is manually intensive.

IQVIA’s clinical trial start-up AI orchestrator agent addresses the growing need for acceleration in clinical trial timelines.

One component now accelerated by IQVIA’s AI orchestrator agents is target identification.  This agent builds a knowledge base from research articles and biomedical databases, using customized AI models to identify key relationships among the data and extract insights.

This knowledge then enables IQVIA’s pharmaceutical customers to identify emerging scientific areas for indication prioritization — like which indications they will pursue and in what order for a particular asset — and to discover new opportunities for drug repurposing, unlocking new uses that were previously unavailable..

Another agent, the clinical data review agent, uses a set of automated checks and specialized agents to catch data issues early, reducing the data review process from seven weeks to as little as two weeks.

“From molecule to market, AI promises to be transformative for life sciences and healthcare,” said Avinob Roy, vice president and general manager of product offerings for commercial analytics solutions at IQVIA.

IQVIA’s agents use NVIDIA NIM microservices, part of the NVIDIA AI Enterprise software platform, to streamline clinical site start-up. The agent directs its sub-agents to analyze clinical trial protocols and extract critical participant inclusion and exclusion criteria, using reasoning to solve these problems in phased steps.

By deploying the orchestrator agent, which works autonomously, research teams can focus on decision-making instead of time-consuming administrative tasks.

AI Agents Compose New Paths for Drug Commercialization

After a drug passes clinical trials, there’s still more work to be done before it’s accessible to patients.

Pharmaceutical companies must understand the market for their drug, landscape the disease it treats, map the patient journey and chart treatment pathways. The goal is to identify the right patient cohorts and understand how to reach them effectively.

“There are a lot of different components — market dynamics, patient behaviors, access challenges and the competitive landscape — that you need to triangulate to really understand where the bottlenecks are,” Roy said.

IQVIA orchestrator agents can provide a comprehensive understanding of how a treatment will reach patients by analyzing patient records, prescriptions and lab results in just a few days instead of weeks.

Another challenge is to capture the attention of healthcare professionals. To stand out, field teams often spend hours preparing for every interaction — crafting personalized, data-driven conversations that can deliver real value.

The IQVIA field companion orchestrator agent delivers tailored insights to pharmaceutical sales teams before each engagement with healthcare providers. By integrating physician demographics, digital behavior, prescribing patterns and patient dynamics, the agent helps field teams prepare for each meeting using near real-time insights, leading to more engaging and impactful discussions.

“The collective impact of these agents across numerous commercial workflows brings unprecedented precision and operational efficiency to the life sciences, supporting better experiences and outcomes for healthcare professionals and patients,” Roy said.

Learn more about the latest AI advancements for healthcare and other industries at NVIDIA GTC Paris, running through Thursday, June 12, at VivaTech.

Watch the NVIDIA GTC Paris keynote from NVIDIA founder and CEO Jensen Huang at VivaTech 2025, and explore GTC Paris sessions.

Read More

Sovereign AI Agents Think Local, Act Global With NVIDIA AI Factories

Sovereign AI Agents Think Local, Act Global With NVIDIA AI Factories

The European Union is investing over $200 billion into AI — but to gain the most value from this investment, its developers must navigate three key constraints: limited compute availability, data-privacy needs and safety priorities.

Unveiled at the NVIDIA GTC Paris keynote at VivaTech, a new suite of NVIDIA technologies is making it easier to address these challenges at every stage of the AI development and deployment cycle. Using these tools, enterprises can build scalable AI factories on premises or in the cloud to rapidly create secure, optimized sovereign AI agents.

The expanded NVIDIA Enterprise AI Factory validated design delivers a turnkey solution for sovereign AI — pairing NVIDIA Blackwell-accelerated infrastructure with a next-generation software stack.

At its core is a new NIM capability that lets enterprises spin up lightning-fast inference for a variety of open large language model (LLM) architectures — now supporting more than 100,000 public, private and domain-specialized model variants hosted on Hugging Face.

Layered on top are new NVIDIA AI Blueprints and developer examples. These guide developers on how to simplify the process of creating and onboarding AI agents while ensuring robust safety, enhanced privacy and continuous improvement.

With these new tools — which include the AI-Q and data flywheel NVIDIA Blueprints, plus a blueprint for AI safety using NVIDIA NeMo — European organizations can build, deploy and run AI factories at scale without compromising performance, control or compliance.

Major enterprises across the continent are already building NVIDIA-accelerated AI factories for virtually every industry. Some of the region’s largest finance companies, including BNP Paribas and Finanz Informatik, are scaling AI factories to run financial services AI agents to assist employees and customers. L’Oreal-backed startup Noli.com is working with Accenture to use its AI Refinery for its AI Beauty Matchmaker.  IQVIA is building AI agents to support healthcare services.

In the telecom industry, BT Group is optimizing customer service with ServiceNow and addressing anomalies in its network. Telenor is using its AI factory to run NVIDIA AI Blueprints for autonomous network configuration.

Boosting AI Agent Development With Enterprise AI Factories

The first step to create sovereign AI agents is model development — often using regional or enterprise-specific data tailored to specific use cases. To train, manage and scale these models, sovereign AI developers need AI factories.

On-premises sovereign AI infrastructure is especially valuable in regulated sectors such as government, finance and healthcare. The NVIDIA Enterprise AI Factory validated design helps these industries scale to support AI applications quickly with on-premises AI factories where every hardware and software layer is optimized.

It features NVIDIA Blackwell accelerated computing — including NVIDIA RTX PRO ServersNVIDIA networking and NVIDIA AI Enterprise software to accelerate generative and agentic AI applications.

Several regional software providers, including Adaptive ML, ClearML, Dataloop, Deepchecks, deepset, Domino Data Labs, EnterpriseDB, Iguazio, Quantiphi, Teradata, Weaviate and Wiz, are now integrating the validated design to help developers build and deploy enterprise AI agents at scale.

The Enterprise AI Factory can also be used with software from regional partners such as aiOla, DeepL, Elastic, Photoroom, PolyAI, Qodo, Sana Labs, Tabnine and ThinkDeep.

NIM Accelerates LLM Deployment Across NVIDIA Infrastructure

When ready to deploy their AI models and agents, developers can tap NVIDIA NIM microservices to unlock accelerated, enterprise-ready inference across an expanding global suite of LLMs, including models tailored to specific languages and domains.

NIM microservices now support a vast collection of LLMs on Hugging Face. NIM automatically optimizes the model with its ideal inference engine — such as NVIDIA TensorRT-LLM, SGLang or vLLM — so that, with a few simple commands, users can rapidly deploy their preferred LLMs for high-performance AI inference on any NVIDIA-accelerated infrastructure.

“NIM makes it easy to deploy a broad range of LLMs from Hugging Face on NVIDIA GPUs,” said Jeff Boudier, vice president of product at Hugging Face. “With support for over 100,000 public and private LLMs hosted on the Hugging Face Hub, NIM makes the performance and diversity of open models available to enterprise AI agents.”

Enterprise model builders and software development tool creators AI21 Labs, Dream Security, IBM and JetBrains, as well as European research and innovation organizations Barcelona Supercomputing Center, Bielik.AI and UTTER are among those contributing specialized LLMs now available as NIM microservices.

These optimized models support 35 regional languages, including Arabic, Czech, Dutch, German, Hebrew, French, Polish, Portuguese and Spanish — expanding options for developers building AI agents with local language and cultural understanding.

NVIDIA is also working with several model builders and AI consortiums in Europe to optimize local models with NVIDIA Nemotron techniques.

Blueprints for Smarter, Safer AI Agents

To give developers a head start on building and onboarding powerful, secure AI models and agents, NVIDIA offers easy-to-follow blueprints and developer examples. These reference designs will enable Europe’s developers to tailor their models and agents to regional needs by connecting them to proprietary data, applying safety policies and continuously updating them for optimized performance.

The AI-Q NVIDIA Blueprint provides a guide for developing agentic systems capable of fast multimodal data extraction and powerful information retrieval. It includes the NVIDIA NeMo Agent toolkit, an open-source software library for evaluating and optimizing AI agents.

The NeMo Agent toolkit brings intelligence to agentic AI workflows and is compatible with the open standards including Model Context Protocol (MCP), an open-source framework for connecting AI agents to tools. This integration enables interoperability with tools served by MCP servers. The NeMo Agent toolkit is also integrated with agent frameworks, including CrewAI, LangChain, LlamaIndex, Microsoft Semantic Kernel and Weights & Biases.

The AI-Q blueprint offers a foundation for enterprises to build domain-specific AI agents that can use a wide range of enterprise data sources to deliver insights contextualized to an organization’s specific needs. NVIDIA partners including DDN, Dell Technologies, Hewlett Packard Enterprise, Hitachi Vantara, IBM, NetApp, Nutanix, Pure Storage, VAST Data and WEKA use AI-Q to connect their data platforms to AI agents.

The NVIDIA AI Blueprint for building data flywheels enables enterprises to improve their AI agents over time. It includes tools to turn inference data into new training and evaluation datasets — and tools to automatically surface optimized models while maintaining high accuracy.

Built on NVIDIA AI Enterprise, the blueprint pulls in production traffic and user feedback and triggers retraining and redeployment pipelines, creating a continuous feedback loop to enhance model performance. Powered by modular NVIDIA NeMo microservices, it offers flexible deployment options that run on any accelerated computing infrastructure, whether on premises or in the cloud.

The blueprint evaluates existing and new candidate models to help developers identify and deploy smaller, faster models that match or surpass the accuracy of larger ones. With this tool, enterprises can pick models that increase compute efficiency and decrease the total cost of ownership, enabling leaner, more cost-effective AI.

NVIDIA partners VAST Data, Weights & Biases and Iguazio — an AI platform company acquired by QuantumBlack, AI by McKinsey — are building on the NVIDIA AI Blueprint for data flywheels to integrate additional features, such as advanced monitoring capabilities, based on their software platforms.

To help enterprises safely adopt open-source models, the Agentic AI Safety blueprint is slated to offer a framework for evaluating and enhancing model safety across content, security and privacy dimensions.

It guides developers through NVIDIA-curated datasets and standardized evaluation tools to prepare models for production with post-training. The recipe also provides actionable safety and vulnerability metrics — covering jailbreaks, prompt injections and harmful content — enabling enterprises to accelerate deployment without compromising compliance or trust.

Enterprises Set to Integrate New NVIDIA Software

Global enterprises — including ActiveFence, Amdocs, Cisco, Cloudera, CrowdStrike, IBM, IQVIA, SAP, ServiceNow and Trend Micro — are adopting NVIDIA NIM microservices and blueprints to accelerate AI workflows in cybersecurity, financial services, healthcare, telecommunications and more.

Amdocs, a leading provider of software and services for communications and media providers, uses NVIDIA AI Enterprise — including NVIDIA NeMo and NVIDIA NIM microservices — as part of Amdocs’ amAIz suite of AI products and services. The company has used the NVIDIA AI Blueprint for building data flywheels in an LLMOps pipeline to enable efficient LLM fine-tuning.

Amdocs plans to integrate more NIM microservices to support AI agents for content creation, translation, network automation and customer service.

Global system integrators like Capgemini, Accenture, Deloitte, EY, Infosys, Tata Consultancy Services and Wipro are helping enterprises build their AI factories with full-stack NVIDIA software.

Watch the NVIDIA GTC Paris keynote from NVIDIA founder and CEO Jensen Huang at VivaTech, and explore GTC Paris sessions.

See notice regarding software product information.

Read More

Italy Drives Industrial Renaissance With NVIDIA AI

Italy Drives Industrial Renaissance With NVIDIA AI

Sovereign AI has sparked a new Italian renaissance around industrial transformation.

Supporting its design and manufacturing heritage, Italy is among leading European nations in its advanced development of sovereign AI.

Italy joins European nations in building domestic AI infrastructure with an ecosystem of NVIDIA Cloud Partners and telecom providers such as Fastweb.

Italian telecommunications company Fastweb has deployed Italy’s first NVIDIA DGX H100 supercomputer to support national AI infrastructure. With these resources, Fastweb is introducing an Italian language model to support generative AI applications — trained and running on its NVIDIA DGX AI supercomputer.

Model builder Domyn — which developed Domyn Small and Domyn Large — is collaborating with NVIDIA on a large AI factory powered by 5,760 NVIDIA Grace Blackwell GPUs.

Italy Backs Sovereign AI to Drive Business Advances

Also, the Italian Ministry of Enterprise and the Made in Italy government initiative are working with Domyn to further boost the nation’s sovereign AI capabilities with advanced reasoning models.

The combination of efforts to deploy AI factory resources for delivering transformative sovereign AI will allow European enterprises, startups and government organizations to securely develop, train and deploy agentic and physical AI applications. CINECA provides the National HPC service and is a EuroHPC pre-Exascale hosting site with over 15,000 GPUs in the Leonardo supercomputer. They have played a key role in scaling AI codes from Mistral and Domyn.

CINECA recently added the LISA supercomputer, which adds 1328 NVIDIA Hopper GPUs for AI for science applications.

CINECA has also been selected by EuroHPC to build a new AI factory, which will be operational in 2025. The new IT4LIA AI factory will provide new AI capabilities to both research computing and Italian startups.

“This agreement represents a strategic step toward strengthening Italy’s technological sovereignty and ensuring that our businesses have secure and competitive access to data management,” said Minister of Enterprise and Made in Italy Adolfo Urso. “The collaboration with top-tier partners such as NVIDIA and Domyn confirms the Government’s commitment in supporting high-level alliances to foster innovation and the competitiveness of the national production system.”

NVIDIA Collaborations With Italy Drive Industrial AI Gains

Vertiv, a global critical digital infrastructure and services provider that operates in Italy, is helping deliver AI-ready, pre-fabricated modular data center infrastructure for the Domyn Colosseum development, including power, cooling, management, monitoring, service and maintenance offerings. This data center aims to stand out with its modular setup and fast time to launch.

“Colosseum development stakeholders —- Domyn, World Wide Technologies, Vertiv and NVIDIA — leveraged NVIDIA Omniverse for collaboration and real-time simulations, reducing simulation times from months to hours. Vertiv manufacturing and factory integration processes reduce deployment time by up to 50%, compared with traditional data center builds,” said Karsten Winther, president of Vertiv, EMEA. “Together, we are setting new standards for flexibility and speed of deployment for critical AI factories.”

The AI data center from Domyn includes plans to use NVIDIA Llama Nemotron models and NVIDIA Nemotron techniques for building AI agent platforms with advanced open reasoning foundation models. Domyn’s supercomputer is designed to support the development of large-scale artificial intelligence solutions in the most highly regulated industries.

“AI is transforming Italy and industries worldwide, enabling organizations to innovate responsibly while upholding the highest standards of trust and compliance,” said Uljan Sharka, CEO of Domyn. “Working with NVIDIA, we are building AI infrastructure and models tailored to our unique values and regulatory needs to drive sustainable growth and position Italy as a leader in this transformative technology.”

Leonardo is accelerating its advanced physics for aerospace engineering with the NVIDIA Blackwell platform, boosting its helicopter design and simulation. The company also operates the Davinci-1 supercomputer. Davinci-1 assists in its advanced internal research and development, and is available for AI applications delivered by Leonardo Hypercomputing Continuum to customers.

Italian startup K2K is developing visual language models for real-time video analytics in cities, manufacturing and public infrastructure. K2K aims to deliver operational efficiencies in services through AI agents and physical AI, with vision language models for a range of applications such as waste management, visual pollution and traffic management.

Government Partnerships With NVIDIA Enable Upskilling 

Top universities — from Bologna and Torino to Milano and Roma — offer talent for Italy to lead the way with its sovereign AI strategy. NVIDIA has been collaborating through the NVIDIA AI Technology Center program with CINI Laboratorio AIIS (Artificial Intelligence and Intelligent Systems), a consortium of 50 Italian universities and research institutes, which has trained 3,000+ academics and co-published 50+ scientific papers to date.

Now, more than 200 AI projects are actively running on Leonardo versus about 20 in 2020.

In addition to collaborating with the Domyn development on Colosseum, the Ministry of Enterprise and the government’s Made in Italy initiative are seeking to prepare the workforces of the future. The parties aim to build an AI-skilled workforce capable of driving innovation that draws on NVIDIA technologies, Domyn’s expertise in AI models for highly regulated industries and the Ministry’s commitment to economic resilience and innovation. NVIDIA globally offers learning courses through its Deep Learning Institute to promote education and certification in AI.

Read More

France Bolsters National AI Strategy With NVIDIA Infrastructure

France Bolsters National AI Strategy With NVIDIA Infrastructure

AI’s in fashion in France — as it is across the globe — with the technology already helping solve some of the country’s greatest challenges across research and innovation, transportation, manufacturing and many other industries. And this fashion’s here to stay.

France’s National Strategy for AI, part of the broader France 2030 investment plan, includes more than €109 billion in investments for the country’s AI infrastructure projects.

Such projects include a collaboration between NVIDIA and Mistral AI, an independent generative AI pioneer headquartered in France, to build a cutting-edge, end-to-end compute platform that answers the comprehensive compute infrastructure needs of enterprise customers.

Plus, a slew of the nation’s AI-native companies, startups and research centers are innovating with NVIDIA AI infrastructure.

These leading innovators are using the latest agentic and industrial AI technologies to bolster and accelerate work in areas ranging from advertising for skincare and beauty, spearheaded by L’Oréal and Accenture, to transportation and the electric grid.

With its decarbonized, abundant electricity supply, expanding high-voltage electric grid and more than 30 ready-to-use, low-carbon AI sites throughout the country, France is poised to become one of the world’s greenest leaders in artificial intelligence.

Below are some of the key players making AI development the nation’s hottest trend.

AI Infrastructure Development Across Industries

Mistral AI’s new compute platform will feature the latest-generation NVIDIA Grace Blackwell systems, with 18,000 Blackwell and Blackwell Ultra GPUs planned for deployment in the initial phase and additional plans to expand across multiple sites in 2026. The infrastructure will host Mistral AI’s cloud application service, which customers can use to develop and run AI applications with Mistral AI’s and other providers’ open-source models.

Mistral AI and NVIDIA are optimizing inference performance for several Mistral models with NVIDIA NIM microservices, including the new Mistral Nemotron model, exclusively available with the NVIDIA AI Enterprise software platform.

“We are forging Europe’s AI future in partnership with NVIDIA, combining strategic autonomy with our expertise in AI and NVIDIA’s most advanced technology,” said Arthur Mensch, CEO of Mistral AI. “This new infrastructure will provide enterprises and the public sector with Mistral’s AI expertise in building the best compute for AI, ensuring full control to businesses.”

In addition, Mistral AI and NVIDIA are collaborating with Bpifrance, the French national investment bank, and MGX, the UAE’s investment fund focused on AI and advanced technology, to establish Europe’s largest AI campus — to be located in the Paris region and expected to reach a capacity of 1.4 gigawatts. The campus will feature advanced NVIDIA compute infrastructure to support the full AI lifecycle, from model training and inference to deployment of generative and applied AI systems.

France-founded European cloud service provider Scaleway offers the European cloud’s largest compute capacity, powered by more than a thousand NVIDIA Hopper GPUs, with plans to offer NVIDIA Blackwell GPUs — which enable building and running real-time generative AI on trillion-parameter large language models at up to 25x less cost and energy consumption than its predecessor. As a European provider, Scaleway provides domestic infrastructure that ensures access and compliance with EU data protection laws — critical to businesses with a European footprint.

Mistral AI and Scaleway plan to participate in the DGX Cloud Lepton marketplace to provide startups and developers access to compute infrastructure.

Orange Business, the enterprise division of Orange, one of Europe’s leading telco operators, has joined the NVIDIA Cloud Partner program to accelerate the development of enterprise-grade agentic AI, including its innovative Live Intelligence platform, which empowers companies of all sizes to securely deploy generative AI at scale. Those AI solutions tap into the Orange Business Cloud Avenue platform, built on high-performance NVIDIA infrastructure.

AI Deployments, From Beauty to Transportation

Paris-based beauty company L’Oréal Groupe’s generative AI content platform CREAITECH uses the NVIDIA AI Enterprise platform to develop and deploy 3D digital renderings of L’Oréal’s products for faster, more creative development of marketing and advertising campaigns. Eighty percent of L’Oréal’s production in France is exported globally, helping make cosmetics the third-largest contributor to national economic growth.

Learn more about how L’Oréal and other leading retailers are using NVIDIA technologies to redefine their operations.

The France public sector uses NVIDIA technologies for use cases ranging from transportation and public safety in cities to cybersecurity in schools and better fraud detection at the French Ministry of the Economy, Finance, and Industrial and Digital Sovereignty, which oversees national funds and the economic system. Local governments have deployed solutions in generative and vision AI, document analytics and more through NVIDIA partners Dell Technologies, Hewlett Packard Enterprise, LightOn, SCC, ThinkDeep, XXII and others.

France’s national rail operator SNCF Gares&Connexions, which operates internationally and has a network of 3,000 train stations across France and Monaco, is developing digital twins to simulate railway scenarios.

Powered by NVIDIA Omniverse, Metropolis and ecosystem partners Akila and XXII, SNCF Gares&Connexions’ AI deployment, including at the Monaco-Monte-Carlo station, has helped SNCF Gares&Connexions achieve a 100% on-time preventive maintenance completion rate, a 50% reduction in downtime and issue response time, as well as a 20% reduction in energy consumption.

Schneider Electric — a French multinational company driving the digital transformation of energy management and automation — has introduced publicly available engineered reference designs for optimizing performance, scalability and energy efficiency of NVIDIA-powered AI data centers. In addition, AVEVA, a subsidiary of Schneider Electric, is connecting its digital twin platform to NVIDIA Omniverse to deliver a unified virtual simulation and collaboration environment for designing and deploying optimized data centers.

Electricité de France, commonly known as EDF, the French national electricity company, has partnered with NVIDIA to transition its open-source code_saturne computational fluid dynamics (CFD) application, developed by EDF R&D, onto accelerated computing platforms for improved performance in power and industrial applications. This collaboration, which also involves NVIDIA developer partner ANEO, taps into NVIDIA Nsight tools to iteratively adapt the CFD code for optimized GPU operation.

AI-Native Companies Build Models, Cloud Services to Accelerate Next Industrial Revolution

To accelerate France’s AI-driven transformation, NVIDIA is partnering with the country’s leading model builders and AI-native companies to support large language models in various languages including Arabic, French, English, Italian, German, Polish, Spanish and Swedish.

H Company and LightOn are tailoring and optimizing their models with NVIDIA Nemotron techniques to maximize cost efficiency and accuracy for enterprise AI workloads including agentic AI.

Plus, a new Hugging Face integration with DGX Cloud Lepton will let companies fine-tune their AI models on local NCP infrastructure.

Startups Develop Breakthroughs With NVIDIA AI Infrastructure

France has a rich ecosystem of more than 1,000 AI startups pursuing breakthroughs in healthcare, quantum computing and more.

Alice & Bob, a member of the NVIDIA Inception program for cutting-edge startups, is building quantum computing hardware and has integrated the NVIDIA CUDA-Q hybrid computing platform into its quantum simulation library, called Dynamiqs. This allows the company to accelerate its qubit design process with GPUs. Adding NVIDIA acceleration on top of Dynamiqs’ advanced optimization capabilities can increase the efficiency of these challenging qubit-design simulations by up to 75x.

Quandela, a leader in full-stack photonic quantum computing, has announced MerLin, a quantum machine learning programming framework that uses NVIDIA CUDA-Q to deliver high-performance simulations for photonic quantum circuits. This enables developers to build new models and assess the performance of candidate algorithms on simulations of larger quantum processors.

Moon Surgical, a robotic surgery company and also an NVIDIA Inception member, is using the NVIDIA Holoscan and IGX platforms to power its Maestro System for minimally invasive surgery, a technique where surgeons operate through small incisions with an internal camera and instruments. Moon Surgical and NVIDIA are also collaborating to bring generative AI features to the operating room using Maestro and Holoscan.

Research Centers Bring Future of Technology Closer to Reality

As the country with the world’s third largest number of AI researchers, France supports a vast spectrum of projects and centers advancing supercomputing, AI education and other initiatives to make the future of technology possible.

The Jean Zay supercomputer, operated by IDRIS, a national computing centre for the CNRS (France’s National Centre for Scientific Research), is a French AI flagship serving research academia and startups users. Built by Eviden and powered by NVIDIA, the supercomputer accelerates the work of university and public sector researchers, developers and data scientists across France.

Acquired by the French government through intermediary French civil company GENCI, the supercomputer integrates NVIDIA accelerated computing, including more than a thousand NVIDIA Hopper GPUs.

It supports more than 150 startups — including Hugging Face, Mistral AI, H Company and LightOn — and powered 1,400+ AI projects in 2024. Jean Zay is among the most eco-efficient machines in Europe, thanks to the accelerated technologies and core warm-water cooling of the computing servers. In addition, the supercomputer’s waste heat is reused to help heat more than 1,500 homes in the Saclay area, to the southwest of Paris.

Learn more about the latest AI advancements in France and other countries at NVIDIA GTC Paris, running through Thursday, June 12, at VivaTech. Watch the keynote from NVIDIA founder and CEO Jensen Huang, and explore GTC Paris sessions.

See notice regarding software product information.

Read More

Leading European Telcos Build AI Infrastructure With NVIDIA for Regional Enterprises

Leading European Telcos Build AI Infrastructure With NVIDIA for Regional Enterprises

AI factories are producing intelligence at unprecedented scale, showing massive potential to fuel economic growth and innovation. But to tap that intelligence, countries need secure, sovereign AI infrastructure.

As trusted providers of critical connectivity infrastructure, forming the backbone of the modern digital world, telecommunications providers are uniquely positioned to deliver AI services due to their extensive infrastructure and geographic reach. Eighteen telco-led AI factories powered by NVIDIA now span five continents.

At NVIDIA GTC Paris at VivaTech, NVIDIA today announced collaborations with leading telecommunications companies Orange, Fastweb, Swisscom, Telefónica and Telenor to develop and expand sovereign AI factories and edge infrastructure across Europe. This will equip European enterprises across industries with secure, accelerated computing infrastructure to train and deploy customized AI models and agentic AI services.

NVIDIA’s full-stack technologies allow telco operators to go beyond traditional connectivity services to AI as a service, driving generative and agentic AI adoption and empowering entire nations to build localized models with secure sovereign AI infrastructure.

Orange Business Enables Trusted AI Solutions in Europe

Orange Business, the enterprise division of the Orange Group, one of Europe’s leading telco operators, has joined the NVIDIA Cloud Partner program to accelerate the development of enterprise-grade agentic AI, including its innovative Live Intelligence platform, which empowers companies of all sizes to securely deploy generative AI at scale. Those AI solutions tap into Orange Business Cloud Avenue platform, built on high-performance NVIDIA infrastructure.

This will allow enterprises across Europe to access AI infrastructure, train custom AI models and deploy secure generative AI applications through Orange Business’ Live Intelligence platform — as well as introduce new revenue streams for telcos, transforming employee, customer and operational experiences.

Orange has also applied AI to its own operations, with 73,000 employees now regularly using the solution to streamline tasks, develop software, automate support procedures and enhance decision-making. These cases will benefit from Orange’s sovereign AI infrastructure — supporting employees in France, as well as across Europe and Africa, and handling over 30,000 requests daily.

Telenor Pioneers Secure and Sustainable AI for Norway 

Building on its creation of Norway’s first sovereign AI infrastructure, Telenor today announced it is significantly expanding capacity to meet rising demand from both internal teams and external customers. With plans to add a new AI data center that will run entirely on renewable energy and contribute surplus energy back to the grid, Telenor is helping Norway advance its mission of secure, sovereign and sustainable AI development.

Since its launch, Telenor’s AI infrastructure has helped drive AI adoption across Norway by powering digital services for the public sector, industrial automation and local language models. This includes hosting BabelSpeak, an AI-driven translation tool developed by Capgemini that offers near-real-time, voice-to-voice multilingual translation capabilities in nearly 100 languages and is now being piloted by Norwegian Red Cross. Capgemini has been a key partner for Telenor to build Norway’s first sovereign and secure AI cloud service in collaboration with NVIDIA.

In addition, Telenor is integrating NVIDIA AI Enterprise software to accelerate enterprise adoption and deployment of generative and agentic AI applications, as well as its own internal innovation efforts, such as in network automation.

“Telenor is leading the way in AI adoption in the telecom industry, pioneering innovation in nearly every aspect of the business,” said Cathal Kennedy, acting group chief technology officer of Telenor. “By combining robust infrastructure with advanced AI capabilities, we’re setting new standards for efficiency and sustainability, and ensuring production of intelligence remains in Norway.”

Swiss AI Platform Powers Enterprises

Swisscom recently announced GenAI Studio — a new service built on its Swiss AI Platform that allows enterprises to develop and run AI agents quickly and securely.

The company also launched the AI Work Hub and Model catalog, enabling enterprises across Switzerland to build complex AI projects, customize models and deploy agentic AI at scale and speed.

The new enterprise AI services are hosted on Swisscom’s sovereign AI factory, built on NVIDIA DGX SuperPOD. This allows for fast and easy scaling of capacity to serve Switzerland’s rapidly growing demand for AI services and inference, expanding the company’s revenue opportunities.

Telefónica Empowers Spanish Enterprises at the Edge

Telefónica is piloting distributed edge AI infrastructure across Spain to bring advanced computing closer to where data’s generated and where local AI inference is needed.

The company will be deploying hundreds of NVIDIA GPUs as part of its edge AI fabric. This will help ensure that AI services are delivered with low latency, reliability and strong privacy protections, and that trusted information stays within national borders.

The edge AI solution incorporates NVIDIA AI Enterprise software and NVIDIA NIM microservices, enabling secure, scalable and enterprise-grade AI applications for key sectors such as government and financial services.

Fastweb Innovates for Italy With New Language Model

Fastweb built MIIA — one of the first Italian language models to support generative AI applications — trained and running on its NVIDIA DGX AI supercomputer.

Watch the NVIDIA GTC Paris keynote from NVIDIA founder and CEO Jensen Huang at VivaTech, and explore the GTC Paris telecom special address and industry panel with Orange, Swisscom and Telenor

Learn more about sovereign AI in telecom.

Read More

Automate customer support with Amazon Bedrock, LangGraph, and Mistral models

Automate customer support with Amazon Bedrock, LangGraph, and Mistral models

AI agents are transforming the landscape of customer support by bridging the gap between large language models (LLMs) and real-world applications. These intelligent, autonomous systems are poised to revolutionize customer service across industries, ushering in a new era of human-AI collaboration and problem-solving. By harnessing the power of LLMs and integrating them with specialized tools and APIs, agents can tackle complex, multistep customer support tasks that were previously beyond the reach of traditional AI systems.As we look to the future, AI agents will play a crucial role in the following areas:

  • Enhancing decision-making – Providing deeper, context-aware insights to improve customer support outcomes
  • Automating workflows – Streamlining customer service processes, from initial contact to resolution, across various channels
  • Human-AI interactions – Enabling more natural and intuitive interactions between customers and AI systems
  • Innovation and knowledge integration – Generating new solutions by combining diverse data sources and specialized knowledge to address customer queries more effectively
  • Ethical AI practices – Helping provide more transparent and explainable AI systems to address customer concerns and build trust

Building and deploying AI agent systems for customer support is a step toward unlocking the full potential of generative AI in this domain. As these systems evolve, they will transform customer service, expand possibilities, and open new doors for AI in enhancing customer experiences.

In this post, we demonstrate how to use Amazon Bedrock and LangGraph to build a personalized customer support experience for an ecommerce retailer. By integrating the Mistral Large 2 and Pixtral Large models, we guide you through automating key customer support workflows such as ticket categorization, order details extraction, damage assessment, and generating contextual responses. These principles are applicable across various industries, but we use the ecommerce domain as our primary example to showcase the end-to-end implementation and best practices. This post provides a comprehensive technical walkthrough to help you enhance your customer service capabilities and explore the latest advancements in LLMs and multimodal AI.

LangGraph is a powerful framework built on top of LangChain that enables the creation of cyclical, stateful graphs for complex AI agent workflows. It uses a directed graph structure where nodes represent individual processing steps (like calling an LLM or using a tool), edges define transitions between steps, and state is maintained and passed between nodes during execution. This architecture is particularly valuable for customer support automation involving workflows. LangGraph’s advantages include built-in visualization, logging (traces), human-in-the-loop capabilities, and the ability to organize complex workflows in a more maintainable way than traditional Python code.This post provides details on how to do the following:

  • Use Amazon Bedrock and LangGraph to build intelligent, context-aware customer support workflows
  • Integrate data in a helpdesk tool, like JIRA, in the LangChain workflow
  • Use LLMs and vision language models (VLMs) in the workflow to perform context-specific tasks
  • Extract information from images to aid in decision-making
  • Compare images to assess product damage claims
  • Generate responses for the customer support tickets

Solution overview

This solution involves the customers initiating support requests through email, which are automatically converted into new support tickets in Atlassian Jira Service Management. The customer support automation solution then takes over, identifying the intent behind each query, categorizing the tickets, and assigning them to a bot user for further processing. The solution uses LangGraph to orchestrate a workflow involving AI agents to extracts key identifiers such as transaction IDs and order numbers from the support ticket. It analyzes the query and uses these identifiers to call relevant tools, extracting additional information from the database to generate a comprehensive and context-aware response. After the response is prepared, it’s updated in Jira for human support agents to review before sending the response back to the customer. This process is illustrated in the following figure. This solution is capable of extracting information not only from the ticket body and title but also from attached images like screenshots and external databases.

Solution Architecture

The solution uses two foundation models (FMs) from Amazon Bedrock, each selected based on its specific capabilities and the complexity of the tasks involved. For instance, the Pixtral model is used for vision-related tasks like image comparison and ID extraction, whereas the Mistral Large 2 model handles a variety of tasks like ticket categorization, response generation, and tool calling. Additionally, the solution includes fraud detection and prevention capabilities. It can identify fraudulent product returns by comparing the stock product image with the returned product image to verify if they match and assess whether the returned product is genuinely damaged. This integration of advanced AI models with automation tools enhances the efficiency and reliability of the customer support process, facilitating timely resolutions and security against fraudulent activities. LangGraph provides a framework for orchestrating the information flow between agents, featuring built-in state management and checkpointing to facilitate seamless process continuity. This functionality allows the inclusion of initial ticket summaries and descriptions in the State object, with additional information appended in subsequent steps of the workflows. By maintaining this evolving context, LangGraph enables LLMs to generate context-aware responses. See the following code:

# class to hold state information

class JiraAppState(MessagesState):
    key: str
    summary: str
    description: str
    attachments: list
    category: str
    response: str
    transaction_id: str
    order_no: str
    usage: list

The framework integrates effortlessly with Amazon Bedrock and LLMs, supporting task-specific diversification by using cost-effective models for simpler tasks while reducing the risks of exceeding model quotas. Furthermore, LangGraph offers conditional routing for dynamic workflow adjustments based on intermediate results, and its modular design facilitates the addition or removal of agents to extend system capabilities.

Responsible AI

It’s crucial for customer support automation applications to validate inputs and make sure LLM outputs are secure and responsible. Amazon Bedrock Guardrails can significantly enhance customer support automation applications by providing configurable safeguards that monitor and filter both user inputs and AI-generated responses, making sure interactions remain safe, relevant, and aligned with organizational policies. By using features such as content filters, which detect and block harmful categories like hate speech, insults, sexual content, and violence, as well as denied topics to help prevent discussions on sensitive or restricted subjects (for example, legal or medical advice), customer support applications can avoid generating or amplifying inappropriate or defiant information. Additionally, guardrails can help redact personally identifiable information (PII) from conversation transcripts, protecting user privacy and fostering trust. These measures not only reduce the risk of reputational harm and regulatory violations but also create a more positive and secure experience for customers, allowing support teams to focus on resolving issues efficiently while maintaining high standards of safety and responsibility.

The following diagram illustrates this architecture.

Guardrails

Observability

Along with Responsible AI, observability is vital for customer support applications to provide deep, real-time visibility into model performance, usage patterns, and operational health, enabling teams to proactively detect and resolve issues. With comprehensive observability, you can monitor key metrics such as latency and token consumption, and track and analyze input prompts and outputs for quality and compliance. This level of insight helps identify and mitigate risks like hallucinations, prompt injections, toxic language, and PII leakage, helping make sure that customer interactions remain safe, reliable, and aligned with regulatory requirements.

Prerequisites

In this post, we use Atlassian Jira Service Management as an example. You can use the same general approach to integrate with other service management tools that provide APIs for programmatic access. The configuration required in Jira includes:

  • A Jira service management project with API token to enable programmatic access
  • The following custom fields:
    • Name: Category, Type: Select List (multiple choices)
    • Name: Response, Type: Text Field (multi-line)
  • A bot user to assign tickets

The following code shows a sample Jira configuration:

JIRA_API_TOKEN = "<JIRA_API_TOKEN>"
JIRA_USERNAME = "<JIRA_USERNAME>"
JIRA_INSTANCE_URL = "https://<YOUR_JIRA_INSTANCE_NAME>.atlassian.net/"
JIRA_PROJECT_NAME = "<JIRA_PROJECT_NAME>"
JIRA_PROJECT_KEY = "<JIRA_PROJECT_KEY>"
JIRA_BOT_USER_ID = '<JIRA_BOT_USER_ID>'

In addition to Jira, the following services and Python packages are required:

  • A valid AWS account.
  • An AWS Identity and Access Management (IAM) role in the account that has sufficient permissions to create the necessary resources.
  • Access to the following models hosted on Amazon Bedrock:
    • Mistral Large 2 (model ID: mistral.mistral-large-2407-v1:0).
    • Pixtral Large (model ID: us.mistral.pixtral-large-2502-v1:0). The Pixtral Large model is available in Amazon Bedrock under cross-Region inference profiles.
  • A LangGraph application up and running locally. For instructions, see Quickstart: Launch Local LangGraph Server.

For this post, we use the us-west-2 AWS Region. For details on available Regions, see Amazon Bedrock endpoints and quotas.

The source code of this solution is available in the GitHub repository. This is an example code; you should conduct your own due diligence and adhere to the principle of least privilege.

Implementation with LangGraph

At the core of customer support automation is a suite of specialized tools and functions designed to collect, analyze, and integrate data from service management systems and a SQLite database. These tools serve as the foundation of our system, empowering it to deliver context-aware responses. In this section, we delve into the essential components that power our system.

BedrockClient class

The BedrockClient class is implemented in the cs_bedrock.py file. It provides a wrapper for interacting with Amazon Bedrock services, specifically for managing language models and content safety guardrails in customer support applications. It simplifies the process of initializing language models with appropriate configurations and managing content safety guardrails. This class is used by LangChain and LangGraph to invoke LLMs on Amazon Bedrock.

This class also provides methods to create guardrails for responsible AI implementation. The following Amazon Bedrock Guardrails policy filters sexual, violence, hate, insults, misconducts, and prompt attacks, and helps prevent models from generating stock and investment advice, profanity, hate, violent and sexual content. Additionally, it helps prevent exposing vulnerabilities in models by alleviating prompt attacks.

# guardrails policy

contentPolicyConfig={
    'filtersConfig': [
        {
            'type': 'SEXUAL',
            'inputStrength': 'MEDIUM',
            'outputStrength': 'MEDIUM'
        },
        {
            'type': 'VIOLENCE',
            'inputStrength': 'MEDIUM',
            'outputStrength': 'MEDIUM'
        },
        {
            'type': 'HATE',
            'inputStrength': 'MEDIUM',
            'outputStrength': 'MEDIUM'
        },
        {
            'type': 'INSULTS',
            'inputStrength': 'MEDIUM',
            'outputStrength': 'MEDIUM'
        },
        {
            'type': 'MISCONDUCT',
            'inputStrength': 'MEDIUM',
            'outputStrength': 'MEDIUM'
        },
        {
            'type': 'PROMPT_ATTACK',
            'inputStrength': 'LOW',
            'outputStrength': 'NONE'
        }
    ]
},
wordPolicyConfig={
    'wordsConfig': [
        {'text': 'stock and investment advice'}
    ],
    'managedWordListsConfig': [
        {'type': 'PROFANITY'}
    ]
},
contextualGroundingPolicyConfig={
    'filtersConfig': [
        {
            'type': 'GROUNDING',
            'threshold': 0.65
        },
        {
            'type': 'RELEVANCE',
            'threshold': 0.75
        }
    ]
}

Database class

The Database class is defined in the cs_db.py file. This class is designed to facilitate interactions with a SQLite database. It’s responsible for creating a local SQLite database and importing synthetic data related to customers, orders, refunds, and transactions. By doing so, it makes sure that the necessary data is readily available for various operations. Furthermore, the class includes convenient wrapper functions that simplify the process of querying the database.

JiraSM class

The JiraSM class is implemented in the cs_jira_sm.py file. It serves as an interface for interacting with Jira Service Management. It establishes a connection to Jira by using the API token, user name, and instance URL, all of which are configured in the .env file. This setup provides secure and flexible access to the Jira instance. The class is designed to handle various ticket operations, including reading tickets and assigning them to a preconfigured bot user. Additionally, it supports downloading attachments from tickets and updating custom fields as needed.

CustomerSupport class

The CustomerSupport class is implemented in the cs_cust_support_flow.py file. This class encapsulates the customer support processing logic by using LangGraph and Amazon Bedrock. Using LangGraph nodes and tools, this class orchestrates the customer support workflow. The workflow initially determines the category of the ticket by analyzing its content and classifying it as related to transactions, deliveries, refunds, or other issues. It updates the support ticket with the category detected. Following this, the workflow extracts pertinent information such as transaction IDs or order numbers, which might involve analyzing both text and images, and queries the database for relevant details. The next step is response generation, which is context-aware and adheres to content safety guidelines while maintaining a professional tone. Finally, the workflow integrates with Jira, assigning categories, updating responses, and managing attachments as needed.

The LangGraph orchestration is implemented in the build_graph function, as illustrated in the following code. This function also generates a visual representation of the workflow using a Mermaid graph for better clarity and understanding. This setup supports an efficient and structured approach to handling customer support tasks.

def build_graph(self):
    """
    This function prepares LangGraph nodes, edges, conditional edges, compiles the graph and displays it 
    """

    # create StateGraph object
    graph_builder = StateGraph(JiraAppState)

    # add nodes to the graph
    graph_builder.add_node("Determine Ticket Category", self.determine_ticket_category_tool)
    graph_builder.add_node("Assign Ticket Category in JIRA", self.assign_ticket_category_in_jira_tool)
    graph_builder.add_node("Extract Transaction ID", self.extract_transaction_id_tool)
    graph_builder.add_node("Extract Order Number", self.extract_order_number_tool)
    graph_builder.add_node("Find Transaction Details", self.find_transaction_details_tool)
    
    graph_builder.add_node("Find Order Details", self.find_order_details_tool)
    graph_builder.add_node("Generate Response", self.generate_response_tool)
    graph_builder.add_node("Update Response in JIRA", self.update_response_in_jira_tool)

    graph_builder.add_node("tools", ToolNode([StructuredTool.from_function(self.assess_damaged_delivery), StructuredTool.from_function(self.find_refund_status)]))
    
    # add edges to connect nodes
    graph_builder.add_edge(START, "Determine Ticket Category")
    graph_builder.add_edge("Determine Ticket Category", "Assign Ticket Category in JIRA")
    graph_builder.add_conditional_edges("Assign Ticket Category in JIRA", self.decide_ticket_flow_condition)
    graph_builder.add_edge("Extract Order Number", "Find Order Details")
    
    graph_builder.add_edge("Extract Transaction ID", "Find Transaction Details")
    graph_builder.add_conditional_edges("Find Order Details", self.order_query_decision, ["Generate Response", "tools"])
    graph_builder.add_edge("tools", "Generate Response")
    graph_builder.add_edge("Find Transaction Details", "Generate Response")
    
    graph_builder.add_edge("Generate Response", "Update Response in JIRA")
    graph_builder.add_edge("Update Response in JIRA", END)

    # compile the graph
    checkpoint = MemorySaver()
    app = graph_builder.compile(checkpointer=checkpoint)
    self.graph_app = app
    self.util.log_data(data="Workflow compiled successfully", ticket_id='NA')

    # Visualize the graph
    display(Image(app.get_graph().draw_mermaid_png(draw_method=MermaidDrawMethod.API)))

    return app

LangGraph generates the following Mermaid diagram to visually represent the workflow.

Mermaid diagram

Utility class

The Utility class, implemented in the cs_util.py file, provides essential functions to support the customer support automation. It encompasses utilities for logging, file handling, usage metric tracking, and image processing operations. The class is designed as a central hub for various helper methods, streamlining common tasks across the application. By consolidating these operations, it promotes code reusability and maintainability within the system. Its functionality makes sure that the automation framework remains efficient and organized.

A key feature of this class is its comprehensive logging capabilities. It provides methods to log informational messages, errors, and significant events directly into the cs_logs.log file. Additionally, it tracks Amazon Bedrock LLM token usage and latency metrics, facilitating detailed performance monitoring. The class also logs the execution flow of application-generated prompts and LLM generated responses, aiding in troubleshooting and debugging. These log files can be seamlessly integrated with standard log pusher agents, allowing for automated transfer to preferred log monitoring systems. This integration makes sure that system activity is thoroughly monitored and quickly accessible for analysis.

Run the agentic workflow

Now that the customer support workflow is defined, it can be executed for various ticket types. The following functions use the provided ticket key to fetch the corresponding Jira ticket and download available attachments. Additionally, they initialize the State object with details such as the ticket key, summary, description, attachment file path, and a system prompt for the LLM. This State object is used throughout the workflow execution.

def generate_response_for_ticket(ticket_id: str):
    
    llm, vision_llm, llm_with_guardrails = bedrock_client.init_llms(ticket_id=ticket_id)
    cust_support = CustomerSupport(llm=llm, vision_llm=vision_llm, llm_with_guardrails=llm_with_guardrails)
    app   = cust_support.build_graph()
    
    state = cust_support.get_jira_ticket(key=ticket_id)
    state = app.invoke(state, thread)
    
    util.log_usage(state['usage'], ticket_id=ticket_id)
    util.log_execution_flow(state["messages"], ticket_id=ticket_id)
    

The following code snippet invokes the workflow for the Jira ticket with key AS-6:

# initialize classes and create bedrock guardrails
bedrock_client = BedrockClient()
util = Utility()
guardrail_id = bedrock_client.create_guardrail()

# process a JIRA ticket
generate_response_for_ticket(ticket_id='AS-6')

The following screenshot shows the Jira ticket before processing. Notice that the Response and Category fields are empty, and the ticket is unassigned.

Support Ticket - Initial

The following screenshot shows the Jira ticket after processing. The Category field is updated as Refunds and the Response field is updated by the AI-generated content.

Support Ticket - updated

This logs LLM usage information as follows:

Model                              Input Tokens  Output Tokens Latency 
mistral.mistral-large-2407-v1:0      385               2         653  
mistral.mistral-large-2407-v1:0      452              27         884      
mistral.mistral-large-2407-v1:0     1039              36        1197   
us.mistral.pixtral-large-2502-v1:0  4632             425        5952   
mistral.mistral-large-2407-v1:0     1770             144        4556  

Clean up

Delete any IAM roles and policies created specifically for this post. Delete the local copy of this post’s code.

If you no longer need access to an Amazon Bedrock FM, you can remove access from it. For instructions, see Add or remove access to Amazon Bedrock foundation models.

Delete the temporary files and guardrails used in this post with the following code:

shutil.rmtree(util.get_temp_path())
bedrock_client.delete_guardrail()

Conclusion

In this post, we developed an AI-driven customer support solution using Amazon Bedrock, LangGraph, and Mistral models. This advanced agent-based workflow efficiently handles diverse customer queries by integrating multiple data sources and extracting relevant information from tickets or screenshots. It also evaluates damage claims to mitigate fraudulent returns. The solution is designed with flexibility, allowing the addition of new conditions and data sources as businesses need to evolve. With this multi-agent approach, you can build robust, scalable, and intelligent systems that redefine the capabilities of generative AI in customer support.

Want to explore further? Check out the following GitHub repo. There, you can observe the code in action and experiment with the solution yourself. The repository includes step-by-step instructions for setting up and running the multi-agent system, along with code for interacting with data sources and agents, routing data, and visualizing workflows.


About the authors

Deepesh DhapolaDeepesh Dhapola is a Senior Solutions Architect at AWS India, specializing in helping financial services and fintech clients optimize and scale their applications on the AWS Cloud. With a strong focus on trending AI technologies, including generative AI, AI agents, and the Model Context Protocol (MCP), Deepesh uses his expertise in machine learning to design innovative, scalable, and secure solutions. Passionate about the transformative potential of AI, he actively explores cutting-edge advancements to drive efficiency and innovation for AWS customers. Outside of work, Deepesh enjoys spending quality time with his family and experimenting with diverse culinary creations.

Read More

Build responsible AI applications with Amazon Bedrock Guardrails

Build responsible AI applications with Amazon Bedrock Guardrails

As organizations embrace generative AI, they face critical challenges in making sure their applications align with their designed safeguards. Although foundation models (FMs) offer powerful capabilities, they can also introduce unique risks, such as generating harmful content, exposing sensitive information, being vulnerable to prompt injection attacks, and returning model hallucinations.

Amazon Bedrock Guardrails has helped address these challenges for multiple organizations, such as MAPRE, KONE, Fiserv, PagerDuty, Aha, and more. Just as traditional applications require multi-layered security, Amazon Bedrock Guardrails implements essential safeguards across model, prompt, and application levels—blocking up to 88% more undesirable and harmful multimodal content. Amazon Bedrock Guardrails helps filter over 75% hallucinated responses in Retrieval Augmented Generation (RAG) and summarization use cases, and stands as the first and only safeguard using Automated Reasoning to prevent factual errors from hallucinations.

In this post, we show how to implement safeguards using Amazon Bedrock Guardrails in a healthcare insurance use case.

Solution overview

We consider an innovative AI assistant designed to streamline interactions of policyholders with the healthcare insurance firm. With this AI-powered solution, policyholders can check coverage details, submit claims, find in-network providers, and understand their benefits through natural, conversational interactions. The assistant provides all-day support, handling routine inquiries while allowing human agents to focus on complex cases. To help enable secure and compliant operations of our assistant, we use Amazon Bedrock Guardrails to serve as a critical safety framework. Amazon Bedrock Guardrails can help maintain high standards of blocking undesirable and harmful multimodal content. This not only protects the users, but also builds trust in the AI system, encouraging wider adoption and improving overall customer experience in healthcare insurance interactions.

This post walks you through the capabilities of Amazon Bedrock Guardrails from the AWS Management Console. Refer to the following GitHub repo for information about creating, updating, and testing Amazon Bedrock Guardrails using the SDK.

Amazon Bedrock Guardrails provides configurable safeguards to help safely build generative AI applications at scale. It evaluates user inputs and model responses based on specific policies, working with all large language models (LLMs) on Amazon Bedrock, fine-tuned models, and external FMs using the ApplyGuardrail API. The solution integrates seamlessly with Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases, so organizations can apply multiple guardrails across applications with tailored controls.

Guardrails can be implemented in two ways: direct integration with Invoke APIs (InvokeModel and InvokeModelWithResponseStream) and Converse APIs (Converse and ConverseStream) for models hosted on Amazon Bedrock, applying safeguards during inference, or through the flexible ApplyGuardrail API, which enables independent content evaluation without model invocation. This second method is ideal for assessing inputs or outputs at various application stages and works with custom or third-party models that are not hosted on Amazon Bedrock. Both approaches empower developers to implement use case-specific safeguards aligned with responsible AI policies, helping to block undesirable and harmful multimodal content from generative AI applications.

The following diagram depicts the six safeguarding policies offered by Amazon Bedrock Guardrails.

Diagram showing Amazon Bedrock Guardrails system flow from user input to final response with content filtering steps

Prerequisites

Before we begin, make sure you have access to the console with appropriate permissions for Amazon Bedrock. If you haven’t set up Amazon Bedrock yet, refer to Getting started in the Amazon Bedrock console.

Create a guardrail

To create guardrail for our healthcare insurance assistant, complete the following steps:

  1. On the Amazon Bedrock console, choose Guardrails in the navigation pane.
  2. Choose Create guardrail.
  3. In the Provide guardrail details section, enter a name (for this post, we use MyHealthCareGuardrail), an optional description, and a message to display if your guardrail blocks the user prompt, then choose Next.

Amazon Bedrock Guardrails configuration interface for MyHealthCareGuardrail with multi-step setup process and customizable options

Configuring Multimodal Content filters

Security is paramount when building AI applications. With image content filters in Amazon Bedrock Guardrails, content filters can now detect and filter both text and image content through six protection categories: Hate, Insults, Sexual, Violence, Misconduct, and Prompt Attacks.

  1. In the Configure content filters section, for maximum protection, especially in sensitive sectors like healthcare in our example use case, set your confidence thresholds to High across all categories for both text and image content.
  2. Enable prompt attack protection to prevent system instruction tampering, and use input tagging to maintain accurate classification of system prompts, then choose Next.

WS guardrail configuration interface for content filtering showing harmful content categories, threshold controls, and prompt attack prevention settings

Denied topics

In healthcare applications, we need clear boundaries around medical advice. Let’s configure Amazon Bedrock Guardrails to prevent users from attempting disease diagnosis, which should be handled by qualified healthcare professionals.

  1. In the Add denied topics section, create a new topic called Disease Diagnosis, add example phrases that represent diagnostic queries, and choose Confirm.

This setting helps makes sure our application stays within appropriate boundaries for insurance-related queries while avoiding medical diagnosis discussions. For example, when users ask questions like “Do I have diabetes?” or “What’s causing my headache?”, the guardrail will detect these as diagnosis-related queries and block them with an appropriate response.

Amazon Bedrock Guardrails interface showing Disease Diagnosis denied topic setup with sample phrases

  1. After you set up your denied topics, choose Next to proceed with word filters.

Amazon Bedrock Guardrails configuration interface with Disease Diagnosis as denied topic

Word filters

Configuring word filters in Amazon Bedrock Guardrails helps keep our healthcare insurance application focused and professional. These filters help maintain conversation boundaries and make sure responses stay relevant to health insurance queries.

Let’s set up word filters for two key purposes:

  • Block inappropriate language to maintain professional discourse
  • Filter irrelevant topics that fall outside the healthcare insurance scope

To set them up, do the following:

  1. In the Add word filters section, add custom words or phrases to filter (in our example, we include off-topic terms like “stocks,” “investment strategies,” and “financial performance”), then choose Next.

Amazon Bedrock guardrail creation interface showing word filter configuration steps and options

Sensitive information filtersWith sensitive information filters, you can configure filters to block email addresses, phone numbers, and other personally identifiable information (PII), as well as set up custom regex patterns for industry-specific data requirements. For example, healthcare providers use these filters to maintain HIPAA compliance to help automatically block PII types that they include. This way, they can use AI capabilities while helping to maintain strict patient privacy standards.

  1. For our example, configure filters for blocking the email address and phone number of healthcare insurance users, then choose Next.

Amazon Bedrock interface for configuring sensitive information filters with PII and regex options
Contextual grounding checks We use Amazon Bedrock Guardrails contextual grounding and relevance checks in our application to help validate model responses, detect hallucinations, and support alignment with reference sources.

  1. Set up the thresholds for contextual grounding and relevance checks (we set them to 0.7), then choose Next.

Amazon Bedrock guardrail configuration for contextual grounding and relevance checks

Automated Reasoning checks

Automated Reasoning checks help detect hallucinations and provide a verifiable proof that our application’s model (LLM) response is accurate.

The first step to incorporate Automated Reasoning checks for our application is to create an Automated Reasoning policy that is composed of a set of variables, defined with a name, type, and description, and the logical rules that operate on the variables. These rules are expressed in formal logic, but they’re translated to natural language to make it straightforward for a user without formal logic expertise to refine a model. Automated Reasoning checks use the variable descriptions to extract their values when validating a Q&A.

  1. To create an Automated Reasoning policy, choose the new Automated Reasoning menu option under Safeguards.
  2. Create a new policy and give it a name, then upload an existing document that defines the right solution space, such as an HR guideline or an operational manual. For this demo, we use an example healthcare insurance policy document that includes the insurance coverage policies applicable to insurance holders.

Automated Reasoning checks is in preview in Amazon Bedrock Guardrails in the US West (Oregon) AWS Region. To request to be considered for access to the preview today, contact your AWS account team.

  1. Define the policy’s intent and processing parameters and choose Create policy.

Amazon Bedrock interface showing HealthCareCoveragePolicy creation page with policy details, generation settings, and file upload

The system now initiates an automated process to create your Automated Reasoning policy. This process involves analyzing your document, identifying key concepts, breaking down the document into individual units, translating these natural language units into formal logic, validating the translations, and finally combining them into a comprehensive logical model. You can review the generated structure, including the rules and variables, and edit these for accuracy through the UI.

Amazon Bedrock policy editor displaying comprehensive healthcare coverage rules and variables with types, descriptions, and configuration options

  1. To attach the Automated Reasoning policy to your guardrail, turn on Enable Automated Reasoning policy, choose the policy and policy version you want to use, then choose Next.

Amazon Bedrock guardrail creation wizard on step 7, showing HealthCareCoveragePolicy Automated Reasoning configuration options

  1. Review the configurations set in the previous steps and choose Create guardrail.

Amazon Bedrock Guardrail 8-step configuration summary showing MyHealthCareGuardrail setup with safety measures and blocked response messages

Amazon Bedrock Guardrail content filter configuration showing harmful categories and denied topics

Amazon Bedrock Guardrail Steps 4-5 showing enabled profanity filter, word lists, and PII blocking settings

Amazon Bedrock Guardrail setup steps 6-7 with enabled grounding checks and HealthCareCoveragePolicy settings

Test your guardrail

We can now test our healthcare insurance call center application with different inputs and see how the configured guardrail intervenes for harmful and undesirable multimodal content.

  1. On the Amazon Bedrock console, on the guardrail details page, choose Select model in the Test panel.

Amazon Bedrock healthcare guardrail dashboard displaying overview, status, and test interface

  1. Choose your model, then choose Apply.

For our example, we use the Amazon Nova Lite FM, which is a low-cost multimodal model that is lightning fast for processing image, video, and text input. For your use case, you can use another model of your choice.

AWS Guardrail configuration interface showing model categories, providers, and inference options with Nova Lite selected

  1. Enter a query prompt with a denied topic.

For example, if we ask “I have cold and sore throat, do you think I have Covid, and if so please provide me information on what is the coverage,” the system recognizes this as a request for a disease diagnosis. Because Disease Diagnosis is configured as a denied topic in the guardrail settings, the system blocks the response.

Amazon Bedrock interface with Nova Lite model blocking COVID-19 related question

  1. Choose View trace to see the details of the intervention.

Amazon Bedrock Guardrails interface with Nova Lite model, blocked response for COVID-19 query
You can test with other queries. For example, if we ask “What is the financial performance of your insurance company in 2024?”, the word filter guardrail that we configured earlier intervenes. You can choose View trace to see that the word filter was invoked.

Amazon Bedrock interface showing blocked response due to guardrail word filter detection

Next, we use a prompt to validate if PII data in input can be blocked using the guardrail. We ask “Can you send my lab test report to abc@gmail.com?” Because the guardrail was set up to block email addresses, the trace shows an intervention due to PII detection in the input prompt.

Amazon Bedrock healthcare guardrail demonstration showing blocked response due to sensitive information filter detecting email

If we enter the prompt “I am frustrated on someone, and feel like hurting the person.” The text content filter is invoked for Violence because we set up Violence as a high threshold for detection of the harmful content while creating the guardrail.

Amazon Bedrock guardrail test interface showing blocked response due to detected violence in prompt

If we provide an image file in the prompt that contains content of the category Violence, the image content filter gets invoked for Violence.

Amazon Bedrock guardrail test interface showing blocked response due to detected violence

Finally, we test the Automated Reasoning policy by using the Test playground on the Amazon Bedrock console. You can input a sample user question and an incorrect answer to check if your Automated Reasoning policy works correctly. In our example, according to the insurance policy provided, new insurance claims take a minimum 7 days to get processed. Here, we input the question “Can you process my new insurance claim in less than 3 days?” and the incorrect answer “Yes, I can process it in 3 days.”

Amazon Bedrock Automated Reasoning interface showing HealthCareCoveragePolicy test playground and guardrail configuration

The Automated Reasoning checks marked the answer as Invalid and provided details about why, including which specific rule was broken, the relevant variables it found, and recommendations for fixing the issue.

Invalid validation result for electronic claims processing rule showing 7-10 day requirement with extracted CLAIM variable logic

Independent API

In addition to using Amazon Bedrock Guardrails as shown in the preceding section for Amazon Bedrock hosted models, you can now use Amazon Bedrock Guardrails to apply safeguards on input prompts and model responses for FMs available in other services (such as Amazon SageMaker), on infrastructure such as Amazon Elastic Compute Cloud (Amazon EC2), on on-premises deployments, and other third-party FMs beyond Amazon Bedrock. The ApplyGuardrail API assesses text using your preconfigured guardrails in Amazon Bedrock, without invoking the FMs.

While testing Amazon Bedrock Guardrails, select Use ApplyGuardrail API to validate user inputs using MyHealthCareGuardrail. The following test doesn’t require you to choose an Amazon Bedrock hosted model, you can test configured guardrails as an independent API.

Amazon Bedrock Guardrail API test interface with health-related prompt and safety intervention

Conclusion

In this post, we demonstrated how Amazon Bedrock Guardrails helps block harmful and undesirable multimodal content. Using a healthcare insurance call center scenario, we walked through the process of configuring and testing various guardrails. We also highlighted the flexibility of our ApplyGuardrail API, which implements guardrail checks on any input prompt, regardless of the FM in use. You can seamlessly integrate safeguards across models deployed on Amazon Bedrock or external platforms.

Ready to take your AI applications to the next level of safety and compliance? Check out Amazon Bedrock Guardrails announces IAM Policy-based enforcement to deliver safe AI interactions, which enables security and compliance teams to establish mandatory guardrails for model inference calls, helping to consistently enforce your guardrails across AI interactions. To dive deeper into Amazon Bedrock Guardrails, refer to Use guardrails for your use case, which includes advanced use cases with Amazon Knowledge Bases and Amazon Bedrock Agents.

This guidance is for informational purposes only. You should still perform your own independent assessment and take measures to ensure that you comply with your own specific quality control practices and standards, and the local rules, laws, regulations, licenses and terms of use that apply to you, your content, and the third-party model referenced in this guidance. AWS has no control or authority over the third-party model referenced in this guidance and does not make any representations or warranties that the third-party model is secure, virus-free, operational, or compatible with your production environment and standards. AWS does not make any representations, warranties, or guarantees that any information in this guidance will result in a particular outcome or result.

References


About the authors

Divya Muralidharan is a Solutions Architect at AWS, supporting a strategic customer. Divya is an aspiring member of the AI/ML technical field community at AWS. She is passionate about using technology to accelerate growth, provide value to customers, and achieve business outcomes. Outside of work, she spends time cooking, singing, and growing plants.

Blog AuthorRachna Chadha is a Principal Technologist at AWS, where she helps customers leverage generative AI solutions to drive business value. With decades of experience in helping organizations adopt and implement emerging technologies, particularly within the healthcare domain, Rachna is passionate about the ethical and responsible use of artificial intelligence. She believes AI has the power to create positive societal change and foster both economic and social progress. Outside of work, Rachna enjoys spending time with her family, hiking, and listening to music.

Read More

Effective cost optimization strategies for Amazon Bedrock

Effective cost optimization strategies for Amazon Bedrock

Customers are increasingly using generative AI to enhance efficiency, personalize experiences, and drive innovation across various industries. For instance, generative AI can be used to perform text summarization, facilitate personalized marketing strategies, create business-critical chat-based assistants, and so on. However, as generative AI adoption grows, associated costs can escalate in several areas including cost in inference, deployment, and model customization. Effective cost optimization can help to make sure that generative AI initiatives remain financially sustainable and deliver a positive return on investment. Proactive cost management makes the best of generative AI’s transformative potential available to businesses while maintaining their financial health.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, DeepSeek, Luma, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock, you can experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources.

With the increasing adoption of Amazon Bedrock, optimizing costs is a must to help keep the expenses associated with deploying and running generative AI applications manageable and aligned with your organization’s budget. In this post, you’ll learn about strategic cost optimization techniques while using Amazon Bedrock.

Understanding Amazon Bedrock pricing

Amazon Bedrock offers a comprehensive pricing model based on actual usage of FMs and related services. The core pricing components include model inference (available in On-Demand, Batch, and Provisioned Throughput options), model customization (charging for training, storage, and inference), and Custom Model Import (free import but charges for inference and storage). Through Amazon Bedrock Marketplace, you can access over 100 models with varying pricing structures for proprietary and public models. You can check out Amazon Bedrock pricing for a pricing overview and more details on pricing models.

Cost monitoring in Amazon Bedrock

You can monitor the cost of your Amazon Bedrock usage using the following approaches:

Cost optimization strategies for Amazon Bedrock

When building generative AI applications with Amazon Bedrock, implementing thoughtful cost optimization strategies can significantly reduce your expenses while maintaining application performance. In this section, you’ll find key approaches to consider in the following order:

  1. Select the appropriate model
  2. Determine if it needs customization
    1. If yes, explore options in the correct order
    2. If no, proceed to the next step
  3. Perform prompt engineering and management
  4. Design efficient agents
  5. Select the correct consumption option

This flow is shown in the following flow diagram.

Choose an appropriate model for your use case

Amazon Bedrock provides access to a diverse portfolio of FMs through a single API. The service continually expands its offerings with new models and providers, each with different pricing structures and capabilities.

For example, consider the on-demand pricing variation among Amazon Nova models in the US East (Ohio) AWS Region. This pricing is current as of May 21, 2025. Refer to the Amazon Bedrock pricing page for latest data.

As shown in the following table, the price varies significantly between Amazon Nova Micro, Amazon Nova Lite, and Amazon Nova Pro models. For example, Amazon Nove Micro is approximately 1.71 times cheaper than Amazon Note Lite based on per 1,000 input tokens as of this writing. If you don’t need multimodal capability and the accuracy of Amazon Nova Micro meets your use case, then you need not opt for Amazon Nova Lite. This demonstrates why selecting the right model for your use case is critical. The largest or most advanced model isn’t always necessary for every application.

Amazon Nova models Price per 1,000 input tokens Price per 1,000 output tokens
Amazon Nova Micro $0.000035 $0.00014
Amazon Nova Lite $0.00006 $0.00024
Amazon Nova Pro $0.0008 $0.0032

One of the key advantages of Amazon Bedrock is its unified API, which abstracts the complexity of working with different models. You can switch between models by changing the model ID in your request with minimal code modifications. With this flexibility, you can select the most cost and performance optimized model that meets your requirements and upgrade only when necessary.

Best practice: Use Amazon Bedrock native features to evaluate the performance of the foundation model for your use case. Begin with an automatic model evaluation job to narrow down the scope. Follow it up by using LLM as a judge or human-based evaluation as required for your use case.

Perform model customization in the right order

When customizing FMs in Amazon Bedrock for contextualizing responses, choosing the strategy in correct order can significantly reduce your expenses while maximizing performance. You have four primary strategies available, each with different cost implications:

  1. Prompt Engineering – Start by crafting high-quality prompts that effectively condition the model to generate desired responses. This approach requires minimal resources and no additional infrastructure costs beyond your standard inference calls.
  2. RAGAmazon Bedrock Knowledge Bases is a fully managed feature with built-in session context management and source attribution that helps you implement the entire RAG workflow from ingestion to retrieval and prompt augmentation without having to build custom integrations to data sources and manage data flows.
  3. Fine-tuning – This approach involves providing labeled training data to improve model performance on specific tasks. Although its effective, fine-tuning requires additional compute resources and creates custom model versions with associated hosting costs.
  4. Continued pre-training – The most resource-intensive option involves providing unlabeled data to further train an FM on domain-specific content. This approach incurs the highest costs and longest implementation time.

The following graph shows the escalation of the complexity, quality, cost, and time of these four approaches.

Common approaches for customization

Best practice: Implement these strategies progressively. Begin with prompt engineering as your foundation—it’s cost-effective and can often deliver impressive results with minimal investment. Refer to the Optimize for clear and concise prompts section to learn about different strategies that you can follow to write good prompts. Next, integrate RAG when you need to incorporate proprietary information into responses. These two approaches together should address most use cases while maintaining efficient cost structures. Explore fine-tuning and continued pre-training only when you have specific requirements that can’t be addressed through the first two methods and your use case justifies the additional expense.

By following this implementation hierarchy, shown in the following figure, you can optimize both your Amazon Bedrock performance and your budget allocation. Here is the high-level mental model for choosing different options:

Mental model for choosing Amazon Bedrock options for cost optimization

Use Amazon Bedrock native model distillation feature

Amazon Bedrock Model Distillation is a powerful feature that you can use to access smaller, more cost-effective models without sacrificing performance and accuracy for your specific use cases.

  • Enhance accuracy of smaller (student) cost-effective models – With Amazon Bedrock Model Distillation, you can select a teacher model whose accuracy you want to achieve for your use case and then select a student model that you want to fine-tune. Model distillation automates the process of generating responses from the teacher and using those responses to fine-tune the student model.
  • Maximize distilled model performance with proprietary data synthesis – Fine-tuning a smaller, cost-efficient model to achieve accuracy similar to a larger model for your specific use case is an iterative process. To remove some of the burden of iteration needed to achieve better results, Amazon Bedrock Model Distillation might choose to apply different data synthesis methods that are best suited for your use case. For example, Amazon Bedrock might expand the training dataset by generating similar prompts, or it might generate high-quality synthetic responses using customer provided prompt-response pairs as golden examples.
  • Reduce cost by bringing your production data – With traditional fine-tuning, you’re required to create prompts and responses. With Amazon Bedrock Model Distillation, you only need to provide prompts, which are used to generate synthetic responses and fine-tune student models.

Best practice: Consider model distillation when you have a specific, well-defined use case where a larger model performs well but costs more than desired. This approach is particularly valuable for high-volume inference scenarios where the ongoing cost savings will quickly offset the initial investment in distillation.

Use Amazon Bedrock intelligent prompt routing

With Amazon Bedrock Intelligent Prompt Routing, you can now use a combination of FMs from the same model family to help optimize for quality and cost when invoking a model. For example, you can route between the Anthropic’s Claude model family—between Claude 3.5 Sonnet and Claude 3 Haiku depending on the complexity of the prompt. This is particularly useful for applications like customer service assistants, where uncomplicated queries can be handled by smaller, faster, and more cost-effective models, and complex queries are routed to more capable models. Intelligent prompt routing can reduce costs by up to 30% without compromising on accuracy.

Best practice: Implement intelligent prompt routing for applications that handle a wide range of query complexities.

Optimize for clear and concise prompts

Optimizing prompts for clarity and conciseness in Amazon Bedrock focuses on structured, efficient communication with the model to minimize token usage and maximize response quality. Through techniques such as clear instructions, specific output formats, and precise role definitions, you can achieve better results while reducing costs associated with token consumption.

  • Structured instructions – Break down complex prompts into clear, numbered steps or bullet points. This helps the model follow a logical sequence and improves the consistency of responses while reducing token usage.
  • Output specifications – Explicitly define the desired format and constraints for the response. For example, specify word limits, format requirements, or use indicators like Please provide a brief summary in 2-3 sentences to control output length.
  • Avoid redundancy – Remove unnecessary context and repetitive instructions. Keep prompts focused on essential information and requirements because superfluous content can increase costs and potentially confuse the model.
  • Use separatorsEmploy clear delimiters (such as triple quotes, dashes, or XML-style tags) to separate different parts of the prompt to help the model to distinguish between context, instructions, and examples.
  • Role and context precision – Start with a clear role definition and specific context that’s relevant to the task. For example, You are a technical documentation specialist focused on explaining complex concepts in simple terms provides better guidance than a generic role description.

Best practice: Amazon Bedrock offers a fully managed feature to optimize prompts for a select model. This helps to reduce costs by improving prompt efficiency and effectiveness, leading to better results with fewer tokens and model invocations. The prompt optimization feature automatically refines your prompts to follow best practices for each specific model, eliminating the need for extensive manual prompt engineering that could take months of experimentation. Use this built-in prompt optimization feature in Amazon Bedrock to get started and optimize further to get better results as needed. Experiment with prompts to make them clear and concise to reduce the number of tokens without compromising the quality of the responses.

Optimize cost and performance using Amazon Bedrock prompt caching

You can use prompt caching with supported models on Amazon Bedrock to reduce inference response latency and input token costs. By adding portions of your context to a cache, the model can use the cache to skip recomputation of inputs, enabling Amazon Bedrock to share in the compute savings and lower your response latencies.

  • Significant cost reduction – Prompt caching can reduce costs by up to 90% compared to standard model inference costs, because cached tokens are charged at a reduced rate compared to non-cached input tokens.
  • Ideal use cases – Prompt caching is particularly valuable for applications with long and repeated contexts, such as document Q&A systems where users ask multiple questions about the same document or coding assistants that maintain context about code files.
  • Improved latency – Implementing prompt caching can decrease response latency by up to 85% for supported models by eliminating the need to reprocess previously seen content, making applications more responsive.
  • Cache retention period – Cached content remains available for up to 5 minutes after each access, with the timer resetting upon each successful cache hit, making it ideal for multiturn conversations about the same context.
  • Implementation approach – To implement prompt caching, developers identify frequently reused prompt portions, tag these sections using the cachePoint block in API calls, and monitor cache usage metrics (cacheReadInputTokenCount and cacheWriteInputTokenCount) in response metadata to optimize performance.

Best practice: Prompt caching is valuable in scenarios where applications repeatedly process the same context, such as document Q&A systems where multiple users query the same content. The technique delivers maximum benefit when dealing with stable contexts that don’t change frequently, multiturn conversations about identical information, applications that require fast response times, high-volume services with repetitive requests, or systems where cost optimization is critical without sacrificing model performance.

Cache prompts within the client application

Client-side prompt caching helps reduce costs by storing frequently used prompts and responses locally within your application. This approach minimizes API calls to Amazon Bedrock models, resulting in significant cost savings and improved application performance.

  • Local storage implementation – Implement a caching mechanism within your application to store common prompts and their corresponding responses, using techniques such as in-memory caching (Redis, Memcached) or application-level caching systems.
  • Cache hit optimization – Before making an API call to Amazon Bedrock, check if the prompt or similar variations exist in the local cache. This reduces the number of billable API calls to the FMs, directly impacting costs. You can check Caching Best Practices to learn more.
  • Expiration strategy – Implement a time-based cache expiration strategy such as Time To Live (TTL) to help make sure that cached responses remain relevant while maintaining cost benefits. This aligns with the 5-minute cache window used by Amazon Bedrock for optimal cost savings.
  • Hybrid caching approach – Combine client-side caching with the built-in prompt caching of Amazon Bedrock for maximum cost optimization. Use the local cache for exact matches and the Amazon Bedrock cache for partial context reuse.
  • Cache monitoring – Implement cache hit:miss ratio monitoring to continually optimize your caching strategy and identify opportunities for further cost reduction through cached prompt reuse.

Best practice: In performance-critical systems and high-traffic websites, client-side caching enhances response times and user experience while minimizing dependency on ongoing Amazon Bedrock API interactions.

Build small and focused agents that interact with each other rather than a single large monolithic agent

Creating small, specialized agents that interact with each other in Amazon Bedrock can lead to significant cost savings while improving solution quality. This approach uses the multi-agent collaboration capability of Amazon Bedrock to build more efficient and cost-effective generative AI applications.

The multi-agent architecture advantage: You can use Amazon Bedrock multi-agent collaboration to orchestrate multiple specialized AI agents that work together to tackle complex business problems. By creating smaller, purpose-built agents instead of a single large one, you can:

  • Optimize model selection based on specific tasks – Use more economical FMs for simpler tasks and reserve premium models for complex reasoning tasks
  • Enable parallel processing – Multiple specialized agents can work simultaneously on different aspects of a problem, reducing overall response time
  • Improve solution quality – Each agent focuses on its specialty, leading to more accurate and relevant responses

Best practice: Select appropriate models for each specialized agent, matching capabilities to task requirements while optimizing for cost. Based on the complexity of the task, you can choose either a low-cost model or a high-cost model to optimize the cost. Use AWS Lambda functions that retrieve only the essential data to reduce unnecessary cost in Lambda execution. Orchestrate your system with a lightweight supervisor agent that efficiently handles coordination without consuming premium resources.

Choose the desired throughput depending on the usage

Amazon Bedrock offers two distinct throughput options, each designed for different usage patterns and requirements:

  • On-Demand mode – Provides a pay-as-you-go approach with no upfront commitments, making it ideal for early-stage proof of concepts (POCs) on development and test environments, applications with unpredictable or seasonal or sporadic traffic with significant variation.

With On-Demand pricing, you’re charged based on actual usage:

    • Text generation models – Pay per input token processed and output token generated
    • Embedding models – Pay per input token processed
    • Image generation models – Pay per image generated
  • Provisioned Throughput mode – By using Provisioned Throughput, you can purchase dedicated model units for specific FMs to get higher level of throughput for a model at a fixed cost. This makes Provisioned Throughput suitable for production workload requiring predictable performance without throttling. If you customized a model, you must purchase Provisioned Throughput to be able to use it.

Each model unit delivers a defined throughput capacity measured by the maximum number of tokens processed per minute. Provisioned Throughput is billed hourly with commitment options of 1-month or 6-month terms, with longer commitments offering greater discounts.

Best practice: If you’re working on a POC or on a use case that has a sporadic workload using one of the base FMs from Amazon Bedrock, use On-Demand mode to take the benefit of pay-as-you-go pricing. However, if you’re working on a steady state workload where throttling must be avoided, or if you’re using custom models, you should opt for provisioned throughput that matches your workload. Calculate your token processing requirements carefully to avoid over-provisioning.

Use batch inference

With batch mode, you can get simultaneous large-scale predictions by providing a set of prompts as a single input file and receiving responses as a single output file. The responses are processed and stored in your Amazon Simple Storage Service (Amazon S3) bucket so you can access them later. Amazon Bedrock offers select FMs from leading AI providers like Anthropic, Meta, Mistral AI, and Amazon for batch inference at a 50% lower price compared to On-Demand inference pricing. Refer to Supported AWS Regions and models for batch inference for more details. This approach is ideal for non-real-time workloads where you need to process large volumes of content efficiently.

Best practice: Identify workloads in your application that don’t require real-time responses and migrate them to batch processing. For example, instead of generating product descriptions on-demand when users view them, pre-generate descriptions for new products in a nightly batch job and store the results. This approach can dramatically reduce your FM costs while maintaining the same output quality.

Conclusion

As organizations increasingly adopt Amazon Bedrock for their generative AI applications, implementing effective cost optimization strategies becomes crucial for maintaining financial efficiency. The key to successful cost optimization lies in taking a systematic approach. That is, start with basic optimizations such as proper model selection and prompt engineering, then progressively implement more advanced techniques such as caching and batch processing as your use cases mature. Regular monitoring of costs and usage patterns, combined with continuous optimization of these strategies, will help make sure that your generative AI initiatives remain both effective and economically sustainable.Remember that cost optimization is an ongoing process that should evolve with your application’s needs and usage patterns, making it essential to regularly review and adjust your implementation of these strategies.For more information about Amazon Bedrock pricing and the cost optimization strategies discussed in this post, refer to:


About the authors

Biswanath Mukherjee is a Senior Solutions Architect at Amazon Web Services. He works with large strategic customers of AWS by providing them technical guidance to migrate and modernize their applications on AWS Cloud. With his extensive experience in cloud architecture and migration, he partners with customers to develop innovative solutions that leverage the scalability, reliability, and agility of AWS to meet their business needs. His expertise spans diverse industries and use cases, enabling customers to unlock the full potential of the AWS Cloud.

Upendra V is a Senior Solutions Architect at Amazon Web Services, specializing in Generative AI and cloud solutions. He helps enterprise customers design and deploy production-ready Generative AI workloads, implement Large Language Models (LLMs) and Agentic AI systems, and optimize cloud deployments. With expertise in cloud adoption and machine learning, he enables organizations to build and scale AI-driven applications efficiently.

Read More

How E.ON saves £10 million annually with AI diagnostics for smart meters powered by Amazon Textract

How E.ON saves £10 million annually with AI diagnostics for smart meters powered by Amazon Textract

E.ON—headquartered in Essen, Germany—is one of Europe’s largest energy companies, with over 72,000 employees serving more than 50 million customers across 15 countries. As a leading provider of energy networks and customer solutions, E.ON focuses on accelerating the energy transition across Europe. A key part of this mission involves the Smart Energy Solutions division, which manages over 5 million smart meters in the UK alone. These devices help millions of customers track their energy consumption in near real time, receive accurate bills without manual readings, reduce their carbon footprints through more efficient energy management, and access flexible tariffs aligned with their usage.

Historically, diagnosing errors on smart meters required an on-site visit—an approach that was both time-consuming and logistically challenging. To address this challenge, E.ON partnered with AWS to develop a remote diagnostic solution powered by Amazon Textract, a machine learning (ML) service that automatically extracts printed text, handwriting, and structure from scanned documents and images. Instead of dispatching engineers, the consumer captures a 7-second video of their smart meters, which is automatically uploaded to AWS through the E.ON application for remote analysis. In real-world testing, it delivers an impressive 84% accuracy. Beyond cost savings, this ML-powered solution enhances consistency in diagnostics and can detect malfunctioning meters before issues escalate.

By transforming on-site inspections into quick-turnaround video analysis, E.ON aims to reduce site visits, accelerate repair times, make sure assets achieve their full lifecycle expectation, and cut annual costs by £10 million. This solution also helps E.ON maintain its 95% smart meter connectivity target, further demonstrating the company’s commitment to customer satisfaction and operational excellence.

In this post, we dive into how this solution works and the impact it’s making.

The challenge: Smart meter diagnostics at scale

Smart meters are designed to provide near real-time billing data and support better energy management. But when something goes wrong, such as a Wide Area Network (WAN) connectivity error, resolving it has traditionally required dispatching a field technician. With 135,000 on-site appointments annually and costs exceeding £20 million, this approach is neither scalable nor sustainable.

The process is also inconvenient for customers, who often need to take time off work or rearrange their schedules. Even then, resolution isn’t guaranteed. Engineers diagnose faults by visually interpreting a set of LED indicators on the Communications Hub, the device that sits directly on top of the smart meter. These LEDs, SW, WAN, HAN, MESH, and GAS, blink at different frequencies (Off, Low, Medium, High), and accurate diagnosis requires matching these blink patterns to a technical manual. With no standardized digital output and thousands of possible combinations, the risk of human error is high, and without a confirmed fault in advance, engineers might arrive without the tools needed to resolve the issue.

The following visuals make these differences clear. The first is an animation that mimics how the four states blink in real time, with each pulse lasting 0.1 seconds.

Animation showing the four LED pulse states (Off, Low, Medium, High) and the wait time between each 0.1-second flash.

Animation showing the four LED pulse states (Off, Low, Medium, High) and the wait time between each 0.1-second flash.

The following diagram presents a simplified 7-second timeline for each state, showing exactly when pulses occur and how they differ in count and spacing.

Timeline visualization of LED pulse patterns over 7 seconds.

Timeline visualization of LED pulse patterns over 7 seconds.

E.ON wanted to change this. They set out to alleviate unnecessary visits, reduce diagnostic errors, and improve customer experience. Partnering with AWS, they developed a more automated, scalable, and cost-effective way to detect smart meter faults, without needing to send an engineer on-site.

From manual to automated diagnostics

In partnership with AWS, E.ON developed a solution where customers record and upload short, 7-second videos of their smart meter. These videos are analyzed by a diagnostic tool, which returns the error and a natural language explanation of the issue directly to the customer’s smartphone. If an engineer visit is necessary, the technician arrives equipped with the right tools, having already received an accurate diagnosis.

The following image shows a typical Communications Hub, mounted above the smart meter. The labeled indicators—SW, WAN, HAN, MESH, and GAS—highlight the LEDs used in diagnostics, illustrating how the system identifies and isolates each region for analysis.

A typical Communications Hub, with LED indicators labeled SW, WAN, MESH, HAN, and GAS.

A typical Communications Hub, with LED indicators labeled SW, WAN, MESH, HAN, and GAS.

Solution overview

The diagnostic tool follows three main steps, as outlined in the following data flow diagram:

  1. Upon receiving a 7-second video, the solution breaks it into individual frames. A Signal Intensity metric flags frames where an LED is likely active, drastically reducing the total number of frames requiring deeper analysis.
  2. Next, the tool uses Amazon Textract to find text labels (SW, WAN, MESH, HAN, GAS). These labels, serving as landmarks, guide the system to the corresponding LED regions, where custom signal- and brightness-based heuristics determines whether each LED is on or off.
  3. Finally, the tool counts pulses for each LED over 7 seconds. This pulse count maps directly to Off, Low, Medium, or High frequencies, which in turn align with error codes from the meter’s reference manual. The error code can either be returned directly as shown in the conceptual view or translated into a natural language explanation using a dictionary lookup created from the meter’s reference manual.
A conceptual view of the remote diagnostic pipeline, centered around the use of Textract to extract insights from video input and drive error detection.

A conceptual view of the remote diagnostic pipeline, centered around the use of Textract to extract insights from video input and drive error detection.

A 7-second clip is essential to reduce ambiguity around LED pulse frequency. For instance, the Low frequency might flash once or twice in a five-second window, which could be mistaken for Off. By extending to 7 seconds, each frequency (Off, Low, Medium, or High) becomes unambiguous:

  • Off: 0 pulses
  • Low: 1–2 pulses
  • Medium: 3–4 pulses
  • High: 11–12 pulses

Because there’s no overlap among these pulse counts, the system can now accurately classify each LED’s frequency.

In the following sections, we discuss the three key steps of the solution workflow in more detail.

Step 1: Identify key frames

A modern smartphone typically captures 30 frames per second, resulting in 210 frames over a 7-second video. As seen in the earlier image, many of these frames appear as though the LEDs are off, either because the LEDs are inactive or between pulses, highlighting the need for key frame detection. In practice, only a small subset of the 210 frames will contain a visible lit LED, making it unnecessarily expensive to analyze every frame.

To address this, we introduced a Signal Intensity metric. This simple heuristic examines color channels and assigns each frame a likelihood score of containing an active LED. Frames with a score below a certain threshold are discarded, because they’re unlikely to contain active LEDs. Although the metric might generate a few false positives, it effectively trims down the volume of frames for further processing. Testing in the field conditions has shown robust performance across various lighting scenarios and angles.

Step 2: Inspect light status

With key frames identified, the system next determines which LEDs are active. It uses Amazon Textract to treat the meter’s panel like a document. Amazon Textract identifies all visible text in the frame, and the diagnostic system then parses this output to isolate only the relevant labels: “SW,” “WAN,” “MESH,” “HAN,” and “GAS,” filtering out unrelated text.

The following image shows a key frame processed by Amazon Textract. The bounding boxes show detected text; LED labels appear in red after text matching.

A key frame processed by Amazon Textract. The bounding boxes show detected text; LED labels appear in red after text matching.

A key frame processed by Amazon Textract. The bounding boxes show detected text; LED labels appear in red after text matching.

Because each Communications Hub follows standard dimensions, the LED for each label is consistently located just above it. Using the bounding box coordinates from Amazon Textract as our landmark, the system calculates an “upward” direction for the meter and places a new bounding region above each label, pinpointing the pixels corresponding to each LED. The resulting key frame highlights exactly where to look for LED activity.

To illustrate this, the following image of a key frame shows how the system maps each detected label (“SW,” “WAN,” “MESH,” “HAN,” “GAS”) to its corresponding LED region. Each region is automatically defined using the Amazon Textract output and geometric rules, allowing the system to isolate just the areas that matter for diagnosis.

A key frame showing the exact LED regions for “SW,” “WAN,” “MESH,” “HAN,” and “GAS.”

A key frame showing the exact LED regions for “SW,” “WAN,” “MESH,” “HAN,” and “GAS.”

With the LED regions now precisely defined, the tool evaluates whether each one is on or off. Because E.ON didn’t have a labeled dataset large enough to train a supervised ML model, we opted for a heuristic approach. We combined the Signal Intensity metric from Step 1 with a brightness threshold to determine LED status. By using relative rather than absolute thresholds, the method remains robust across different lighting conditions and angles, even if an LED’s glow reflects off neighboring surfaces.The end result is a simple on/off status for each LED in every key frame, feeding into the final error classification in Step 3.

Step 3: Aggregate results to determine the error

Now that each key frame has an on/off status for each LED, the final step is to determine how many times each light pulses during the 7-second clip. This pulse count reveals which frequency (Off, Low, Medium, or High) each LED is blinking at, allowing the solution to identify the appropriate error code from the Communications Hub’s reference manual, just like a field engineer would, but in a fully automated way.

To calculate the number of pulses, the system first groups consecutive “on” frames. Because one pulse of light typically lasts 0.1 seconds, or about 2–3 frames, a continuous block of “on” frames represents a single pulse. After grouping these blocks, the total number of pulses for each LED can be counted. Thanks to the 7-second recording window, the mapping from pulse count to frequency is unambiguous.

After each LED’s frequency is determined, the system simply references the meter’s manual to find the corresponding error. This final diagnostic result is then relayed back to the customer.

The following demo video below shows this process in action, with a user uploading a 7-second clip of their meter. In just 5.77 seconds, the application detects a WAN error, explains how it arrived at that conclusion, and outlines the steps an engineer would take to address the issue.

Conclusion

E.ON’s story highlights how a creative application of Amazon Textract, combined with custom image analysis and pulse counting, can solve a real-world challenge at scale. By diagnosing smart meter errors through brief smartphone videos, E.ON aims to lower costs, improve customer satisfaction, and enhance overall energy service reliability.

Although the system is still being field tested, initial results are encouraging: approximately 350 cases per week (18,200 annually) can now be diagnosed remotely, with an estimated £10 million in projected annual savings. Real-world accuracy stands at 84%, without extensive tuning, while controlled environments have shown a 100% success rate. Notably, the tool has even caught errors that field engineers initially missed, pointing to opportunities for refined training and proactive fault detection.

Looking ahead, E.ON plans to expand this approach to other devices and integrate advanced computer vision techniques to further boost accuracy. If you’re interested in exploring a similar solution, consider the following next steps:

  • Explore the Amazon Textract documentation to learn how you can streamline text extraction for your own use cases
  • Alternatively, consider Amazon Bedrock Document Automation for a generative AI-powered alternative to extract insights from multimodal content in audio, documents, images, and video
  • Browse the Amazon Machine Learning Blog to discover innovative ways customers use AWS ML services to drive efficiency and reduce costs
  • Contact your AWS Account Manager to discuss your specific needs to design a proof of concept or production-ready solution

By combining domain expertise with AWS services, E.ON demonstrates how an AI-driven strategy can transform operational efficiency, even in early stages. If you’re considering a similar path, these resources can help you unlock the power of AWS AI and ML to meet your unique business goals.


About the Authors

Sam Charlton is a Product Manager at E.ON who looks for innovative ways to use existing technology against entrenched issues often ignored. Starting in the contact center, he has worked the breadth and depth of E.ON, ensuring a holistic stance for his business’s needs.

Tanrajbir Takher is a Data Scientist at the AWS Generative AI Innovation Center, where he works with enterprise customers to implement high-impact generative AI solutions. Prior to AWS, he led research for new products at a computer vision unicorn and founded an early generative AI startup.

Satyam Saxena is an Applied Science Manager at the AWS Generative AI Innovation Center. He leads generative AI customer engagements, driving innovative ML/AI initiatives from ideation to production, with over a decade of experience in machine learning and data science. His research interests include deep learning, computer vision, NLP, recommender systems, and generative AI.

Tom Chester is an AI Strategist at the AWS Generative AI Innovation Center, working directly with AWS customers to understand the business problems they are trying to solve with generative AI and helping them scope and prioritize use cases. Tom has over a decade of experience in data and AI strategy and data science consulting.

Amit Dhingra is a GenAI/ML Sr. Sales Specialist in the UK. He works as a trusted advisor to customers by providing guidance on how they can unlock new value streams, solve key business problems, and deliver results for their customers using AWS generative AI and ML services.

Read More