March 2025 – Page 4

Asure’s approach to enhancing their call center experience using generative AI and Amazon Q in Quicksight

Asure, a company of over 600 employees, is a leading provider of cloud-based workforce management solutions designed to help small and midsized businesses streamline payroll and human resources (HR) operations and ensure compliance. Their offerings include a comprehensive suite of human capital management (HCM) solutions for payroll and tax, HR compliance services, time tracking, 401(k) plans, and more.

Asure anticipated that generative AI could aid contact center leaders to understand their team’s support performance, identify gaps and pain points in their products, and recognize the most effective strategies for training customer support representatives using call transcripts. The Asure team was manually analyzing thousands of call transcripts to uncover themes and trends, a process that lacked scalability. The overarching goal of this engagement was to improve upon this manual approach. Failing to adopt a more automated approach could have potentially led to decreased customer satisfaction scores and, consequently, a loss in future revenue. Therefore, it was valuable to provide Asure a post-call analytics pipeline capable of providing beneficial insights, thereby enhancing the overall customer support experience and driving business growth.

Asure recognized the potential of generative AI to further enhance the user experience and better understand the needs of the customer and wanted to find a partner to help realize it.

Pat Goepel, chairman and CEO of Asure, shares,

“In collaboration with the AWS Generative AI Innovation Center, we are utilizing Amazon Bedrock, Amazon Comprehend, and Amazon Q in QuickSight to understand trends in our own customer interactions, prioritize items for product development, and detect issues sooner so that we can be even more proactive in our support for our customers. Our partnership with AWS and our commitment to be early adopters of innovative technologies like Amazon Bedrock underscore our dedication to making advanced HCM technology accessible for businesses of any size.”

“We are thrilled to partner with AWS on this groundbreaking generative AI project. The robust AWS infrastructure and advanced AI capabilities provide the perfect foundation for us to innovate and push the boundaries of what’s possible. This collaboration will enable us to deliver cutting-edge solutions that not only meet but exceed our customers’ expectations. Together, we are poised to transform the landscape of AI-driven technology and create unprecedented value for our clients.”

—Yasmine Rodriguez, CTO of Asure.

“As we embarked on our journey at Asure to integrate generative AI into our solutions, finding the right partner was crucial. Being able to partner with the Gen AI Innovation Center at AWS brings not only technical expertise with AI but the experience of developing solutions at scale. This collaboration confirms that our AI solutions are not just innovative but also resilient. Together, we believe that we can harness the power of AI to drive efficiency, enhance customer experiences, and stay ahead in a rapidly evolving market.”

—John Canada, VP of Engineering at Asure.

In this post, we explore why Asure used the Amazon Web Services (AWS) post-call analytics (PCA) pipeline that generated insights across call centers at scale with the advanced capabilities of generative AI-powered services such as Amazon Bedrock and Amazon Q in QuickSight. Asure chose this approach because it provided in-depth consumer analytics, categorized call transcripts around common themes, and empowered contact center leaders to use natural language to answer queries. This ultimately allowed Asure to provide its customers with improvements in product and customer experiences.

Solution Overview

At a high level, the solution consists of first converting audio into transcripts using Amazon Transcribe and generating and evaluating summary fields for each transcript using Amazon Bedrock. In addition, Q&A can be done at a single call level using Amazon Bedrock or for many calls using Amazon Q in QuickSight. In the rest of this section, we describe these components and the services used in greater detail.

We added upon the existing PCA solution with the following services:

Customer service and call center operations are highly dynamic, with evolving customer expectations, market trends, and technological advancements reshaping the industry at a rapid pace. Staying ahead in this competitive landscape demands agile, scalable, and intelligent solutions that can adapt to changing demands.

In this context, Amazon Bedrock emerges as an exceptional choice for developing a generative AI-powered solution to analyze customer service call transcripts. This fully managed service provides access to cutting-edge foundation models (FMs) from leading AI providers, enabling the seamless integration of state-of-the-art language models tailored for text analysis tasks. Amazon Bedrock offers fine-tuning capabilities that allow you to customize these pre-trained models using proprietary call transcript data, facilitating high accuracy and relevance without the need for extensive machine learning (ML) expertise. Moreover, Amazon Bedrock offers integration with other AWS services like Amazon SageMaker, which streamlines the deployment process, and its scalable architecture makes sure the solution can adapt to increasing call volumes effortlessly.

With robust security measures, data privacy safeguards, and a cost-effective pay-as-you-go model, Amazon Bedrock offers a secure, flexible, and cost-efficient service to harness generative AI’s potential in enhancing customer service analytics, ultimately leading to improved customer experiences and operational efficiencies.

Furthermore, by integrating a knowledge base containing organizational data, policies, and domain-specific information, the generative AI models can deliver more contextual, accurate, and relevant insights from the call transcripts. This knowledge base allows the models to understand and respond based on the company’s unique terminology, products, and processes, enabling deeper analysis and more actionable intelligence from customer interactions.

In this use case, Amazon Bedrock is used for both generation of summary fields for sample call transcripts and evaluation of these summary fields against a ground truth dataset. Its value comes from its simple integration into existing pipelines and various evaluation frameworks. Amazon Bedrock also allows you to choose various models for different use cases, making it an obvious choice for the solution due to its flexibility. Using Amazon Bedrock allows for iteration of the solution using knowledge bases for simple storage and access of call transcripts as well as guardrails for building responsible AI applications.

Amazon Bedrock

Amazon Bedrock is a fully managed service that makes FMs available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and quickly integrate and deploy them into your applications using AWS tools without having to manage the infrastructure.

Amazon Q in Quicksight

Amazon Q in QuickSight is a generative AI assistant that accelerates decision-making and enhances business productivity with generative business intelligence (BI) capabilities.

The original PCA solution includes the following services:

The solution consisted of the following components:

Call metadata generation – After the file ingestion step when transcripts are generated for each call transcript using Amazon Transcribe, Anthropic’s Claude Haiku FM in Amazon Bedrock is used to generate call-related metadata. This includes a summary, the category, the root cause, and other high-level fields generated from a call transcript. This is orchestrated using AWS Step Functions.
Individual call Q&A – For questions requiring a specific call, such as, “How did the customer react in call ID X,” Anthropic’s Claude Haiku is used to power a Q&A assistant located in a CloudFront application. This is powered by the web app portion of the architecture diagram (provided in the next section).
Aggregate call Q&A – To answer questions requiring multiple calls, such as “What are the most common issues detected,” Amazon Q on QuickSight is used to enhance the Agent Assist interface. This step is shown by business analysts interacting with QuickSight in the storage and visualization step through natural language.

To learn more about the architectural components of the PCA solution, including file ingestion, insight extraction, storage and visualization, and web application components, refer to Post call analytics for your contact center with Amazon language AI services.

Architecture

The following diagram illustrates the solution architecture. The evaluation framework, call metadata generation, and Amazon Q in QuickSight were new components introduced from the original PCA solution.

Ragas and a human-in-the-loop UI (as described in the customer blogpost with Tealium) were used to evaluate the metadata generation and individual call Q&A portions. Ragas is an open source evaluation framework that helps evaluate FM-generated text.

The high-level takeaways from this work are the following:

Anthropic’s Claude 3 Haiku successfully took in a call transcript and determined its summary, root cause, if the issue was resolved, and, if it was a callback, next steps by the customer and agent (generative AI-powered fields). When using Anthropic’s Claude 3 Haiku as opposed to Anthropic’s Claude Instant, there was a reduction in latency. With chain-of-thought reasoning, there was an increase in overall quality (includes how factual, understandable, and relevant responses are on a 1–5 scale, described in more detail later in this post) as measured by subject matter experts (SMEs). With the use of Amazon Bedrock, various models can be chosen based on different use cases, illustrating its flexibility in this application.
Amazon Q in QuickSight proved to be a powerful analytical tool in understanding and generating relevant insights from data through intuitive chart and table visualizations. It can perform simple calculations whenever necessary while also facilitating deep dives into issues and exploring data from multiple perspectives, demonstrating great value in insight generation.
The human-in-the-loop UI plus Ragas metrics proved effective to evaluate outputs of FMs used throughout the pipeline. Particularly, answer correctness, answer relevance, faithfulness, and summarization metrics (alignment and coverage score) were used to evaluate the call metadata generation and individual call Q&A components using Amazon Bedrock. Its flexibility in various FMs allowed the testing of many types of models to generate evaluation metrics, including Anthropic’s Claude Sonnet 3.5 and Anthropic’s Claude Haiku 3.

Call metadata generation

The call metadata generation pipeline consisted of converting an audio file to a call transcript in a JSON format using Amazon Transcribe and then generating key information for each transcript using Amazon Bedrock and Amazon Comprehend. The following diagram shows a subset of the preceding architecture diagram that demonstrates this.

The original PCA post linked previously shows how Amazon Transcribe and Amazon Comprehend are used in the metadata generation pipeline.

The call transcript input that was outputted from the Amazon Transcribe step of the Step Functions workflow followed the format in the following code example:

{
call_id: <call id>,
agent_id: <agent_id>
customer_id: <customer_id>
transcript: """
   Agent: <Agent message>.
   Customer: <Customer message>
   Agent: <Agent message>.
   Customer: <Customer message>
   Agent: <Agent message>.
   Customer: <Customer message>
   ...........
    """
}

Metadata was generated using Amazon Bedrock. Specifically, it extracted the summary, root cause, topic, and next steps, and answered key questions such as if the call was a callback and if the issue was ultimately resolved.

Prompts were stored in Amazon DynamoDB, allowing Asure to quickly modify prompts or add new generative AI-powered fields based on future enhancements. The following screenshot shows how prompts can be modified through DynamoDB.

Individual call Q&A

The chat assistant powered by Anthropic’s Claude Haiku was used to answer natural language queries on a single transcript. This assistant, the call metadata values generated from the previous section, and sentiments generated from Amazon Comprehend were displayed in an application hosted by CloudFront.

The user of the final chat assistant can modify the prompt in DynamoDB. The following screenshot shows the general prompt for an individual call Q&A.

The UI hosted by CloudFront allows an agent or supervisor to analyze a specific call to extract additional details. The following screenshot shows the insights Asure gleaned for a sample customer service call.

The following screenshot shows the chat assistant, which exists in the same webpage.

Evaluation Framework

This section outlines components of the evaluation framework used. It ultimately allows Asure to highlight important metrics for their use case and provides visibility into the generative AI application’s strengths and weaknesses. This was done using automated quantitative metrics provided by Ragas, DeepEval, and traditional ML metrics as well as human-in-the-loop evaluation done by SMEs.

Quantitative Metrics

The results of the generated call metadata and individual call Q&A were evaluated using quantitative metrics provided by Ragas: answer correctness, answer relevance, and faithfulness; and DeepEval: alignment and coverage, both powered by FMs from Amazon Bedrock. Its simple integration with external libraries allowed Amazon Bedrock to be configured with existing libraries. In addition, traditional ML metrics were used for “Yes/No” answers. The following are the metrics used for different components of the solution:

Call metadata generation – This included the following:
- Summary – Alignment and coverage (find a description of these metrics in the DeepEval repository) and answer correctness
- Issue resolved, callback – F1-score and accuracy
- Topic, next steps, root cause – Answer correctness, answer relevance, and faithfulness
Individual call Q&A – Answer correctness, answer relevance, and faithfulness
Human in the loop – Both individual call Q&A and call metadata generation used human-in-the-loop metrics

For a description of answer correctness, answer relevance, and faithfulness, refer to the customer blogpost with Tealium.

The use of Amazon Bedrock in the evaluation framework allowed for a flexibility of different models based on different use cases. For example, Anthropic’s Claude Sonnet 3.5 was used to generate DeepEval metrics, whereas Anthropic’s Claude 3 Haiku (with its low latency) was ideal for Ragas.

Human in the Loop

The human-in-the-loop UI is described in the Human-in-the-Loop section of the customer blogpost with Tealium. To use it to evaluate this solution, some changes had to be made:

There is a choice for the user to analyze one of the generated metadata fields for a call (such as a summary) or a specific Q&A pair.
The user can bring in two model outputs for comparison. This can include outputs from the same FMs but using different prompts, outputs from different FMs but using the same prompt, and outputs from different FMs and using different prompts.
Additional checks for fluency, coherence, creativity, toxicity, relevance, completeness, and overall quality were added, where the user adds in a measure of this metric based on the model output from a range of 0–4.

The following screenshots show the UI.

The human-in-the-loop system establishes a mechanism between domain expertise and Amazon Bedrock outputs. This in turn will lead to improved generative AI applications and ultimately to high customer trust of such systems.

To demo the human-in-the-loop UI, follow the instructions in the GitHub repo.

Natural Language Q&A using Amazon Q in Quicksight

QuickSight, integrated with Amazon Q, enabled Asure to use natural language queries for comprehensive customer analytics. By interpreting queries on sentiments, call volumes, issue resolutions, and agent performance, the service delivered data-driven visualizations. This empowered Asure to quickly identify pain points, optimize operations, and deliver exceptional customer experiences through a streamlined, scalable analytics solution tailored for call center operations.

Integrate Amazon Q in QuickSight with the PCA solution

The Amazon Q in QuickSight integration was done by following three high-level steps:

Create a dataset on QuickSight.
Create a topic on QuickSight from the dataset.
Query using natural language.

Create a dataset on QuickSight

We used Athena as the data source, which queries data from Amazon S3. QuickSight can be configured through multiple data sources (for more information, refer to Supported data sources). For this use case, we used the data generated from the PCA pipeline as the data source for further analytics and natural language queries in Amazon Q in QuickSight. The PCA pipeline stores data in Amazon S3, which can be queried in Athena, an interactive query service that allows you to analyze data directly in Amazon S3 using standard SQL.

On the QuickSight console, choose Datasets in the navigation pane.
Choose Create new.
Choose Athena as the data source and input the particular catalog, database, and table that Amazon Q in QuickSight will reference.

Confirm the dataset was created successfully and proceed to the next step.

Create a topic on Amazon Quicksight from the dataset created

Users can use topics in QuickSight, powered by Amazon Q integration, to perform natural language queries on their data. This feature allows for intuitive data exploration and analysis by posing questions in plain language, alleviating the need for complex SQL queries or specialized technical skills. Before setting up a topic, make sure that the users have Pro level access. To set up a topic, follow these steps:

On the QuickSight console, choose Topics in the navigation pane.
Choose New topic.
Enter a name for the topic and choose the data source created.
Choose the created topic and then choose Open Q&A to start querying in natural language

Query using natural language

We performed intuitive natural language queries to gain actionable insights into customer analytics. This capability allows users to analyze sentiments, call volumes, issue resolutions, and agent performance through conversational queries, enabling data-driven decision-making, operational optimization, and enhanced customer experiences within a scalable, call center-tailored analytics solution. Examples of the simple natural language queries “Which customer had positive sentiments and a complex query?” and “What are the most common issues and which agents dealt with them?” are shown in the following screenshots.

These capabilities are helpful when business leaders want to dive deep on a particular issue, empowering them to make informed decisions on various issues.

Success metrics

The primary success metric gained from this solution is boosting employee productivity, primarily by quickly understanding customer interactions from calls to uncover themes and trends while also identifying gaps and pain points in their products. Before the engagement, analysts were taking 14 days to manually go through each call transcript to retrieve insights. After engagement, Asure observed how Amazon Bedrock and Amazon Q in QuickSight could reduce this time to minutes, even seconds, to obtain both insights queried directly from all stored call transcripts and visualizations that can be used for report generation.

In the pipeline, Anthropic’s Claude 3 Haiku was used to obtain initial call metadata fields (such as summary, root cause, next steps, and sentiments) that was stored in Athena. This allowed each call transcript to be queried using natural language from Amazon Q in QuickSight, letting business analysts answer high-level questions about issues, themes, and customer and agent insights in seconds.

Pat Goepel, chairman and CEO of Asure, shares,

“In collaboration with the AWS Generative AI Innovation Center, we have improved upon a post-call analytics solution to help us identify and prioritize features that will be the most impactful for our customers. We are utilizing Amazon Bedrock, Amazon Comprehend, and Amazon Q in QuickSight to understand trends in our own customer interactions, prioritize items for product development, and detect issues sooner so that we can be even more proactive in our support for our customers. Our partnership with AWS and our commitment to be early adopters of innovative technologies like Amazon Bedrock underscore our dedication to making advanced HCM technology accessible for businesses of any size.”

Takeaways

We had the following takeaways:

Enabling chain-of-thought reasoning and specific assistant prompts for each prompt in the call metadata generation component and calling it using Anthropic’s Claude 3 Haiku improved metadata generation for each transcript. Primarily, the flexibility of Amazon Bedrock in the use of various FMs allowed full experimentation of many types of models with minimal changes. Using Amazon Bedrock can allow for the use of various models depending on the use case, making it the obvious choice for this application due to its flexibility.
Ragas metrics, particularly faithfulness, answer correctness, and answer relevance, were used to evaluate call metadata generation and individual Q&A. However, summarization required different metrics, alignment, and coverage, which didn’t require ground truth summaries. Therefore, DeepEval was used to calculate summarization metrics. Overall, the ease of integrating Amazon Bedrock allowed it to power the calculation of quantitative metrics with minimal changes to the evaluation libraries. This also allowed the use of different types of models for different evaluation libraries.
The human-in-the-loop approach can be used by SMEs to further evaluate Amazon Bedrock outputs. There is an opportunity to improve upon an Amazon Bedrock FM based on this feedback, but this was not worked on in this engagement.
The post-call analytics workflow, with the use of Amazon Bedrock, can be iterated upon in the future using features such as Amazon Bedrock Knowledge Bases to perform Q&A over a specific number of call transcripts as well as Amazon Bedrock Guardrails to detect harmful and hallucinated responses while also creating more responsible AI applications.
Amazon Q in QuickSight was able to answer natural language questions on customer analytics, root cause, and agent analytics, but some questions required reframing to get meaningful responses.
Data fields within Amazon Q in QuickSight needed to be defined properly and synonyms needed to be added to make Amazon Q more robust with natural language queries.

Security best practices

We recommend the following security guidelines for building secure applications on AWS:

Conclusion

In this post, we showcased how Asure used the PCA solution powered by Amazon Bedrock and Amazon Q in QuickSight to generate consumer and agent insights both at individual and aggregate levels. Specific insights included those centered around a common theme or issue. With these services, Asure was able to improve employee productivity to generate these insights in minutes instead of weeks.

This is one of the many ways builders can deliver great solutions using Amazon Bedrock and Amazon Q in QuickSight. To learn more, refer to Amazon Bedrock and Amazon Q in QuickSight.

About the Authors

Suren Gunturu is a Data Scientist working in the Generative AI Innovation Center, where he works with various AWS customers to solve high-value business problems. He specializes in building ML pipelines using large language models, primarily through Amazon Bedrock and other AWS Cloud services.

Avinash Yadav is a Deep Learning Architect at the Generative AI Innovation Center, where he designs and implements cutting-edge GenAI solutions for diverse enterprise needs. He specializes in building ML pipelines using large language models, with expertise in cloud architecture, Infrastructure as Code (IaC), and automation. His focus lies in creating scalable, end-to-end applications that leverage the power of deep learning and cloud technologies.

John Canada is the VP of Engineering at Asure Software, where he leverages his experience in building innovative, reliable, and performant solutions and his passion for AI/ML to lead a talented team dedicated to using Machine Learning to enhance the capabilities of Asure’s software and meet the evolving needs of businesses.

Yasmine Rodriguez Wakim is the Chief Technology Officer at Asure Software. She is an innovative Software Architect & Product Leader with deep expertise in creating payroll, tax, and workforce software development. As a results-driven tech strategist, she builds and leads technology vision to deliver efficient, reliable, and customer-centric software that optimizes business operations through automation.

Vidya Sagar Ravipati is a Science Manager at the Generative AI Innovation Center, where he leverages his vast experience in large-scale distributed systems and his passion for machine learning to help AWS customers across different industry verticals accelerate their AI and cloud adoption.

Unleashing the multimodal power of Amazon Bedrock Data Automation to transform unstructured data into actionable insights

Gartner predicts that “by 2027, 40% of generative AI solutions will be multimodal (text, image, audio and video) by 2027, up from 1% in 2023.”

The McKinsey 2023 State of AI Report identifies data management as a major obstacle to AI adoption and scaling. Enterprises generate massive volumes of unstructured data, from legal contracts to customer interactions, yet extracting meaningful insights remains a challenge. Traditionally, transforming raw data into actionable intelligence has demanded significant engineering effort. It often requires managing multiple machine learning (ML) models, designing complex workflows, and integrating diverse data sources into production-ready formats.

The result is expensive, brittle workflows that demand constant maintenance and engineering resources. In a world where—according to Gartner—over 80% of enterprise data is unstructured, enterprises need a better way to extract meaningful information to fuel innovation.

Today, we’re excited to announce the general availability of Amazon Bedrock Data Automation, a powerful, fully managed feature within Amazon Bedrock that automate the generation of useful insights from unstructured multimodal content such as documents, images, audio, and video for your AI-powered applications. It enables organizations to extract valuable information from multimodal content unlocking the full potential of their data without requiring deep AI expertise or managing complex multimodal ML pipelines. With Amazon Bedrock Data Automation, enterprises can accelerate AI adoption and develop solutions that are secure, scalable, and responsible.

The benefits of using Amazon Bedrock Data Automation

Amazon Bedrock Data Automation provides a single, unified API that automates the processing of unstructured multi-modal content, minimizing the complexity of orchestrating multiple models, fine-tuning prompts, and stitching outputs together. It helps ensure high accuracy and cost efficiency while significantly lowering processing costs.

Built with responsible AI, Amazon Bedrock Data Automation enhances transparency with visual grounding and confidence scores, allowing outputs to be validated before integration into mission-critical workflows. It adheres to enterprise-grade security and compliance standards, enabling you to deploy AI solutions with confidence. It also enables you to define when data should be extracted as-is and when it should be inferred, giving complete control over the process.

Cross-Region inference enables seamless management of unplanned traffic bursts by using compute across different AWS Regions. Amazon Bedrock Data Automation optimizes for available AWS Regional capacity by automatically routing across regions within the same geographic area to maximize throughput at no additional cost. For example, a request made in the US stays within Regions in the US. Amazon Bedrock Data Automation is currently available in US West (Oregon) and US East (N. Virginia) AWS Regions helping to ensure seamless request routing and enhanced reliability. Amazon Bedrock Data Automation is expanding to additional Regions, so be sure to check the documentation for the latest updates.

Amazon Bedrock Data Automation offers transparent and predictable pricing based on the modality of processed content and the type of output used (standard vs custom output). Pay according to the number of pages, quantity of images, and duration of audio and video files. This straightforward pricing model provides easier cost calculation compared to token-based pricing model.

Use cases for Amazon Bedrock Data Automation

Key use cases such as intelligent document processing, media asset analysis and monetization, speech analytics, search and discovery, and agent-driven operations highlight how Amazon Bedrock Data Automation enhances innovation, efficiency, and data-driven decision-making across industries.

Intelligent document processing

According to Fortune Business Insights, the intelligent document processing industry is projected to grow from USD 10.57 billion in 2025 to USD 66.68 billion by 2032 with a CAGR of 30.1 %. IDP is powering critical workflows across industries and enabling businesses to scale with speed and accuracy. Financial institutions use IDP to automate tax forms and fraud detection, while healthcare providers streamline claims processing and medical record digitization. Legal teams accelerate contract analysis and compliance reviews, and in oil and gas, IDP enhances safety reporting. Manufacturers and retailers optimize supply chain and invoice processing, helping to ensure seamless operations. In the public sector, IDP improves citizen services, legislative document management, and compliance tracking. As businesses strive for greater automation, IDP is no longer an option, it’s a necessity for cost reduction, operational efficiency, and data-driven decision-making.

Let’s explore a real-world use case showcasing how Amazon Bedrock Data Automation enhances efficiency in loan processing.

Loan processing is a complex, multi-step process that involves document verification, credit assessments, policy compliance checks, and approval workflows, requiring precision and efficiency at every stage. Loan processing with traditional AWS AI services is shown in the following figure.

As shown in the preceding figure, loan processing is a multi-step workflow that involves handling diverse document types, managing model outputs, and stitching results across multiple services. Traditionally, documents from portals, email, or scans are stored in Amazon Simple Storage Service (Amazon S3), requiring custom logic to split multi-document packages. Next, Amazon Comprehend or custom classifiers categorize them into types such as W2s, bank statements, and closing disclosures, while Amazon Textract extracts key details. Additional processing is needed to standardize formats, manage JSON outputs, and align data fields, often requiring manual integration and multiple API calls. In some cases, foundation models (FMs) generate document summaries, adding further complexity. Additionally, human-in-the-loop verification may be required for low-threshold outputs.

With Amazon Bedrock Data Automation, this entire process is now simplified into a single unified API call. It automates document classification, data extraction, validation, and structuring, removing the need for manual stitching, API orchestration, and custom integration efforts, significantly reducing complexity and accelerating loan processing workflows as shown in the following figure.

As shown in the preceding figure, when using Amazon Bedrock Data Automation, loan packages from third-party systems, portals, email, or scanned documents are stored in Amazon S3, where Amazon Bedrock Data Automation automates document splitting and processing, removing the need for custom logic. After the loan packages are ingested, Amazon Bedrock Data Automation classifies documents such W2s, bank statements, and closing disclosures in a single step, alleviating the need for separate classifier model calls. Amazon Bedrock Data Automation then extracts key information based on the customer requirement, capturing critical details such as employer information from W2s, transaction history from bank statements, and loan terms from closing disclosures.

Unlike traditional workflows that require manual data normalization, Amazon Bedrock Data Automation automatically standardizes extracted data, helping to ensure consistent date formats, currency values, and field names without additional processing based on the customer provided output schema. Moreover, Amazon Bedrock Data Automation enhances compliance and accuracy by providing summarized outputs, bounding boxes for extracted fields, and confidence scores, delivering structured, validated, and ready-to-use data for downstream applications with minimal effort.

In summary, Amazon Bedrock Data Automation enables financial institutions to seamlessly process loan documents from ingestion to final output through a single unified API call, eliminating the need for multiple independent steps.

While this example highlights financial services, the same principles apply across industries to streamline complex document processing workflows. Built for scale, security, and transparency, Amazon Bedrock Data Automation adheres to enterprise-grade compliance standards, providing robust data protection. With visual grounding, confidence scores, and seamless integration into knowledge bases, it powers Retrieval Augmented Generation (RAG)-driven document retrieval and completes the deployment of production-ready AI workflows in days, not months.

It also offers flexibility in data extraction by supporting both explicit and implicit extractions. Explicit extraction is used for clearly stated information, such as names, dates, or specific values, while implicit extraction infers insights that aren’t directly stated but can be derived through context and reasoning. This ability to toggle between extraction types enables more comprehensive and nuanced data processing across various document types.

This is achieved through responsible AI, with Amazon Bedrock Data Automation passing every process through a responsible AI model to help ensure fairness, accuracy, and compliance in document automation.

By automating document classification, extraction, and normalization, it not only accelerates document processing, it also enhances downstream applications, such as knowledge management and intelligent search. With structured, validated data readily available, organizations can unlock deeper insights and improve decision-making.

This seamless integration extends to efficient document search and retrieval, transforming business operations by enabling quick access to critical information across vast repositories. By converting unstructured document collections into searchable knowledge bases, organizations can seamlessly find, analyze, and use their data. This is particularly valuable for industries handling large document volumes, where rapid access to specific information is crucial. Legal teams can efficiently search through case files, healthcare providers can retrieve patient histories and research papers, and government agencies can manage legislative records and policy documents. Powered by Amazon Bedrock Data Automation and Amazon Bedrock Knowledge Bases, this integration streamlines investment research, regulatory filings, clinical protocols, and public sector record management, significantly improving efficiency across industries.

The following figure shows how Amazon Bedrock Data Automation seamlessly integrates with Amazon Bedrock Knowledge Bases to extract insights from unstructured datasets and ingest them into a vector database for efficient retrieval. This integration enables organizations to unlock valuable knowledge from their data, making it accessible for downstream applications. By using these structured insights, businesses can build generative AI applications, such as assistants that dynamically answer questions and provide context-aware responses based on the extracted information. This approach enhances knowledge retrieval, accelerates decision-making, and enables more intelligent, AI-driven interactions.

The preceding architecture diagram showcases a pipeline for processing and retrieving insights from multimodal content using Amazon Bedrock Data Automation and Amazon Bedrock Knowledge Bases. Unstructured data, such as documents, images, videos, and audio, is first ingested into an Amazon S3 bucket. Amazon Bedrock Data Automation then processes this content, extracting key insights and transforming it for further use. The processed data is stored in Amazon Bedrock Knowledge Bases, where an embedding model converts it into vector representations, which are then stored in a vector database for efficient semantic search. Amazon API Gateway (WebSocket API) facilitates real-time interactions, enabling users to query the knowledge base dynamically via a chatbot or other interfaces. This architecture enhances automated data processing, efficient retrieval, and seamless real-time access to insights.

Beyond intelligent search and retrieval, Amazon Bedrock Data Automation enables organizations to automate complex decision-making processes, providing greater accuracy and compliance in document-driven workflows. By using structured data, businesses can move beyond simple document processing to intelligent, policy-aware automation.

Amazon Bedrock Data Automation can also be used with Amazon Bedrock Agents to take the next step in automation. Going beyond traditional IDP, this approach enables autonomous workflows that assist knowledge workers and streamline decision-making. For example, in insurance claims processing, agents validate claims against policy documents; while in loan processing, they assess mortgage applications against underwriting policies. With multi-agent workflows, policy validation, automated decision support, and document generation, this approach enhances efficiency, accuracy, and compliance across industries.

Similarly, Amazon Bedrock Data Automation is simplifying media and entertainment use cases, seamlessly integrating workflows through its unified API. Let’s take a closer look at how it’s driving this transformation

Media asset analysis and monetization

Companies in media and entertainment (M&E), advertising, gaming, and education own vast digital assets, such as videos, images, and audio files, and require efficient ways to analyze them. Gaining insights from these assets enables better indexing, deeper analysis, and supports monetization and compliance efforts.

The image and video modalities of Amazon Bedrock Data Automation provide advanced features for efficient extraction and analysis.

Image modality: Supports image summarization, IAB taxonomy, and content moderation. It also includes text detection and logo detection with bounding boxes and confidence scores. Additionally, it enables customizable analysis via blueprints for use cases like scene classification.
Video modality: Automates video analysis workflows, chapter segmentation, and both visual and audio processing. It generates full video summaries, chapter summaries, IAB taxonomy, text detection, visual and audio moderation, logo detection, and audio transcripts.

The customized approach to extracting and analyzing video content involves a sophisticated process that gathers information from both the visual and audio components of the video, making it complex to build and manage.

As shown in the preceding figure, a customized video analysis pipeline involves sampling image frames from the visual portion of the video and applying both specialized and FMs to extract information, which is then aggregated at the shot level. It also transcribes the audio into text and combines both visual and audio data for chapter level analysis. Additionally, large language model (LLM)-based analysis is applied to derive further insights, such as video summaries and classifications. Finally, the data is stored in a database for downstream applications to consume.

Media video analysis with Amazon Bedrock Data Automation now simplifies this workflow into a single unified API call, minimizing complexity and reducing integration effort, as shown in the following figure.

Customers can use Amazon Bedrock Data Automation to support popular media analysis use cases such as:

Digital asset management: in the M&E industry, digital asset management (DAM) refers to the organized storage, retrieval, and management of digital content such as videos, images, audio files, and metadata. With growing content libraries, media companies need efficient ways to categorize, search, and repurpose assets for production, distribution, and monetization.

Amazon Bedrock Data Automation automates video, image, and audio analysis, making DAM more scalable, efficient and intelligent.

Contextual ad placement: Contextual advertising enhances digital marketing by aligning ads with content, but implementing it for video on demand (VOD) is challenging. Traditional methods rely on manual tagging, making the process slow and unscalable.

Amazon Bedrock Data Automation automates content analysis across video, audio, and images, eliminating complex workflows. It extracts scene summaries, audio segments, and IAB taxonomies to power video ads solution, improving contextual ad placement and improve ad campaign performance.

Compliance and moderation: Media compliance and moderation make sure that digital content adheres to legal, ethical, and environment-specific guidelines to protect users and maintain brand integrity. This is especially important in industries such as M&E, gaming, advertising, and social media, where large volumes of content need to be reviewed for harmful content, copyright violations, brand safety and regulatory compliance.

Amazon Bedrock Data Automation streamlines compliance by using AI-driven content moderation to analyze both the visual and audio components of media. This enables users to define and apply customized policies to evaluate content against their specific compliance requirements.

Intelligent speech analytics

Amazon Bedrock Data Automation is used in intelligent speech analytics to derive insights from audio data across multiple industries with speed and accuracy. Financial institutions rely on intelligent speech analytics to monitor call centers for compliance and detect potential fraud, while healthcare providers use it to capture patient interactions and optimize telehealth communications. In retail and hospitality, speech analytics drives customer engagement by uncovering insights from live feedback and recorded interactions. With the exponential growth of voice data, intelligent speech analytics is no longer a luxury—it’s a vital tool for reducing costs, improving efficiency, and driving smarter decision-making.

Customer service – AI-driven call analytics for better customer experience

Businesses can analyze call recordings at scale to gain actionable insights into customer sentiment, compliance, and service quality. Contact centers can use Amazon Bedrock Data Automation to:

Transcribe and summarize thousands of calls daily with speaker separation and key moment detection.
Extract sentiment insights and categorize customer complaints for proactive issue resolution.
Improve agent coaching by detecting compliance gaps and training needs.

A traditional call analytics approach is shown in the following figure.

Processing customer service call recordings involves multiple steps, from audio capture to advanced AI-driven analysis as highlighted below:

Audio capture and storage Call recordings from customer service interactions are collected and stored across disparate systems (for example, multiple S3 buckets and call center service provider output). Each file might require custom handling because of varying formats and qualities.
Multi-step processing: Multiple, separate AI and machine learning (AI/ML) services and models are needed for each processing stage:
1. Transcription: Audio files are sent to a speech-to-text ML model, such as Amazon Transcribe, to generate different audio segments.
2. Call summary: Summary of the call with main issue description, action items, and outcomes using either Amazon Transcribe Call Analytics or other generative AI FMs.
3. Speaker diarization and identification: Determining who spoke when involves Amazon Transcribe or similar third-party tools.
4. Compliance analysis: Separate ML models must be orchestrated to detect compliance issues (such as identifying profanity or escalated emotions), implement personally identifiable information (PII) redaction, and flag critical moments. These analytics are implemented with either Amazon Comprehend, or separate prompt engineering with FMs.
5. Discovers entities referenced in the call using Amazon Comprehend or custom entity detection models, or configurable string matching.
6. Audio metadata extraction: Extraction of file properties such as format, duration, and bit rate is handled by either Amazon Transcribe Analytics or another call center solution.

Fragmented workflows: The disparate nature of these processes leads to increased latency, higher integration complexity, and a greater risk of errors. Stitching of outputs is required to form a comprehensive view, complicating dashboard integration and decision-making.

Unified, API-drove speech analytics with Amazon Bedrock Data Automation

The following figure shows customer service call analytics using Amazon Bedrock Data Automation-power intelligent speech analytics.

Optimizing customer service call analysis requires a seamless, automated pipeline that efficiently ingests, processes, and extracts insights from audio recordings as mentioned below:

Streamlined data capture and processing: A single, unified API call ingests call recordings directly from storage—regardless of the file format or source—automatically handling any necessary file splitting or pre-processing.
End-to-end automation: Intelligent speech analytics with Amazon Bedrock Data Automation now encapsulates the entire call analysis workflow:
1. Comprehensive transcription: Generates turn-by-turn transcripts with speaker identification, providing a clear record of every interaction.
2. Detailed call summary: Created using the generative AI capability of Amazon Bedrock Data Automation, the detailed call summary enables an operator to quickly gain insights from the files.
3. Automated speaker diarization and identification: Seamlessly distinguishes between multiple speakers, accurately mapping out who spoke when.
4. Compliance scoring: In one step, the system flags key compliance indicators (such as profanity, violence, or other content moderation metrics) to help ensure regulatory adherence.
5. Rich audio metadata: Amazon Bedrock Data Automation automatically extracts detailed metadata—including format, duration, sample rate, channels, and bit rate—supporting further analytics and quality assurance.

By consolidating multiple steps into a single API call, customer service centers benefit from faster processing, reduced error rates, and significantly lower integration complexity. This streamlined approach enables real-time monitoring and proactive agent coaching, ultimately driving improved customer experience and operational agility.

Before the availability of Amazon Bedrock Data Automation for intelligent speech analytics, customer service call analysis was a fragmented, multi-step process that required juggling various tools and models. Now, with the unified API of Amazon Bedrock Data Automation, organizations can quickly transform raw voice data into actionable insights—cutting through complexity, reducing costs, and empowering teams to enhance service quality and compliance.

When to choose Amazon Bedrock Data Automation instead of traditional AI/ML services

You should choose Amazon Bedrock Data Automation when you need a simple, API-driven solution for multi-modal content processing without the complexity of managing and orchestrating across multiple models or prompt engineering. With a single API call, Amazon Bedrock Data Automation seamlessly handles asset splitting, classification, information extraction, visual grounding, and confidence scoring, eliminating the need for manual orchestration.

On the other hand, the core capabilities of Amazon Bedrock are ideal if you require full control over models and workflows to tailor solutions to your organization’s specific business needs. Developers can use Amazon Bedrock to select FMs based on price-performance, fine-tune prompt engineering for data extraction, train custom classification models, implement responsible AI guardrails, and build an orchestration pipeline to provide consistent output.

Amazon Bedrock Data Automation streamlines multi-modal processing, while Amazon Bedrock offers building blocks for deeper customization and control.

Conclusion

Amazon Bedrock Data Automation provides enterprises with scalability, security, and transparency; enabling seamless processing of unstructured data with confidence. Designed for rapid deployment, it helps developers transition from prototype to production in days, accelerating time-to-value while maintaining cost efficiency. Start using Amazon Bedrock Data Automation today and unlock the full potential of your unstructured data. For solution guidance, see Guidance for Multimodal Data Processing with Bedrock Data Automation.

About the Authors

Wrick Talukdar is a Tech Lead – Generative AI Specialist focused on Intelligent Document Processing. He leads machine learning initiatives and projects across business domains, leveraging multimodal AI, generative models, computer vision, and natural language processing. He speaks at conferences such as AWS re:Invent, IEEE, Consumer Technology Society(CTSoc), YouTube webinars, and other industry conferences like CERAWEEK and ADIPEC. In his free time, he enjoys writing and birding photography.

Lana Zhang is a Senior Solutions Architect at AWS World Wide Specialist Organization AI Services team, specializing in AI and generative AI with a focus on use cases including content moderation and media analysis. With her expertise, she is dedicated to promoting AWS AI and generative AI solutions, demonstrating how generative AI can transform classic use cases with advanced business value. She assists customers in transforming their business solutions across diverse industries, including social media, gaming, e-commerce, media, advertising, and marketing.

Julia Hu is a Specialist Solutions Architect who helps AWS customers and partners build generative AI solutions using Amazon Q Business on AWS. Julia has over 4 years of experience developing solutions for customers adopting AWS services on the forefront of cloud technology.

Keith Mascarenhas leads worldwide GTM strategy for Generative AI at AWS, developing enterprise use cases and adoption frameworks for Amazon Bedrock. Prior to this, he drove AI/ML solutions and product growth at AWS, and held key roles in Business Development, Solution Consulting and Architecture across Analytics, CX and Information Security.

‘Assassin’s Creed Shadows’ Emerges From the Mist on GeForce NOW

Time to sharpen the blade. GeForce NOW brings a legendary addition to the cloud: Ubisoft’s highly anticipated Assassin’s Creed Shadows is now available for members to stream.

Plus, dive into the updated version of the iconic Fable Anniversary — part of 11 games joining the cloud this week.

Silent as a Shadow

Assassin's Creed Shadows on GeForce NOW — *Take the Leap of Faith from the cloud.*

Explore 16th-century Japan, uncover conspiracies and shape the destiny of a nation — all from the cloud.

Assassin’s Creed Shadows unfolds in 1579, during the turbulent Azuchi-Momoyama period of feudal Japan, a time of civil war and cultural exchange.

Step into the roles of Naoe, a fictional shinobi assassin and daughter of Fujibayashi Nagato, and Yasuke, a character based on the historical African samurai. Their stories intertwine as they find themselves on opposite sides of a conflict.

The game’s dynamic stealth system enables players to hide in shadows and use a new “Observe” mechanic to identify targets, tag enemies and highlight objectives. Yasuke and Naoe each have unique abilities and playstyles: Naoe excels in stealth, equipped with classic Assassin techniques and shinobi skills, while Yasuke offers a more combat-focused approach.

Navigate the turbulent Sengoku period on GeForce NOW, and experience the game’s breathtaking landscapes and intense combat at up to 4K resolution and 120 frames per second with an Ultimate membership. Every sword clash and sweeping vista is delivered with exceptional smoothness and clarity.

A Classic Reborn

Fable Anniversary revitalizes the original Fable: The Lost Chapters with enhanced graphics, a new save system and Xbox achievements. This action role-playing game invites players to shape their heroes’ destinies in the whimsical world of Albion.

Fable Anniversary on GeForce NOW — *Make every choice from the cloud.*

Fable Anniversary weaves an epic tale of destiny and choice, following the journey of a young boy whose life is forever changed when bandits raid his peaceful village of Oakvale. Recruited to the Heroes’ Guild, he embarks on a quest to uncover the truth about his family and confront the mysterious Jack of Blades.

Players shape their hero’s destiny through a series of moral choices. These decisions influence the story’s progression and even manifest physically on the character.

Stream the title with a GeForce NOW membership across PCs that may not be game-ready, Macs, mobile devices, and Samsung and LG smart TVs. GeForce NOW transforms these devices into powerful gaming rigs, with up to eight-hour gaming sessions for Ultimate members.

Unleash the Games

Wreckfest 2 on GeForce NOW — *Crash, smash, repeat.*

Wreckfest 2, the highly anticipated sequel by Bugbear Entertainment to the original demolition derby racing game, promises an even more intense and chaotic experience. The game features a range of customizable cars, from muscle cars to novelty vehicles, each with a story to tell.

Play around with multiple modes, including traditional racing with physics-driven handling, and explore demolition derby arenas where the goal is to cause maximum destruction. With enhanced multiplayer features, including skills-based matchmaking and split-screen mode, Wreckfest 2 is the ultimate playground for destruction-racing enthusiasts.

Look for the following games available to stream in the cloud this week:

Assassin’s Creed Shadows (New release on Steam and Ubisoft Connect, March 20)
Wreckfest 2 (New release on Steam, March 20)
Aliens: Dark Descent (Xbox, available on PC Game Pass)
Crime Boss: Rockay City (Epic Games Store)
Eternal Strands (Xbox, available on PC Game Pass)
Fable Anniversary (Steam)
Motor Town: Behind the Wheel (Steam)
Nine Sols (Xbox, available on PC Game Pass)
Quake Live (Steam)
Skydrift Infinity (Epic Games Store)
To the Rescue! (Epic Games Store)

What are you planning to play this weekend? Let us know on X or in the comments below.

If you could go on a vacation to any video game realm, where would you go?

— NVIDIA GeForce NOW (@NVIDIAGFN) March 19, 2025

EPRI, NVIDIA and Collaborators Launch Open Power AI Consortium to Transform the Future of Energy

The power and utilities sector keeps the lights on for the world’s populations and industries. As the global energy landscape evolves, so must the tools it relies on.

To advance the next generation of electricity generation and distribution, many of the industry’s members are joining forces through the creation of the Open Power AI Consortium. The consortium includes energy companies, technology companies and researchers developing AI applications to tackle domain-specific challenges, such as adapting to an increased deployment of distributed energy resources and significant load growth on electric grids.

Led by independent, nonprofit energy R&D organization EPRI, the consortium aims to spur AI adoption in the power sector through a collaborative effort to build open models using curated, industry-specific data. The initiative was launched today at NVIDIA GTC, a global AI conference taking place through Friday, March 21, in San Jose, California.

“Over the next decade, AI has the great potential to revolutionize the power sector by delivering the capability to enhance grid reliability, optimize asset performance, and enable more efficient energy management,” said Arshad Mansoor, EPRI’s president and CEO. “With the Open Power AI Consortium, EPRI and its collaborators will lead this transformation, driving innovation toward a more resilient and affordable energy future.”

As part of the consortium, EPRI, NVIDIA and Articul8, a member of the NVIDIA Inception program for cutting-edge startups, are developing a set of domain-specific, multimodal large language models trained on massive libraries of proprietary energy and electrical engineering data from EPRI that can help utilities streamline operations, boost energy efficiency and improve grid resiliency.

The first version of an industry-first open AI model for electric and power systems was developed using hundreds of NVIDIA H100 GPUs and is expected to soon be available in early access as an NVIDIA NIM microservice.

“Working with EPRI, we aim to leverage advanced AI tools to address today’s unique industry challenges, positioning us at the forefront of innovation and operational excellence,” said Vincent Sorgi, CEO of PPL Corporation and EPRI board chair.

PPL is a leading U.S. energy company that provides electricity and natural gas to more than 3.6 million customers in Pennsylvania, Kentucky, Rhode Island and Virginia.

The Open AI Consortium’s Executive Advisory Committee includes executives from over 20 energy companies such as Duke Energy, Pacific Gas & Electric Company and Portland General Electric, as well as leading tech companies such as AWS, Oracle and Microsoft. The consortium plans to further expand its global member base.

Powering Up AI to Energize Operations, Drive Innovation

Global energy consumption is projected to grow by nearly 4% annually through 2027, according to the International Energy Agency. To support this surge in demand, electricity providers are looking to enhance the resiliency of power infrastructure, balance diverse energy sources and expand the grid’s capacity.

AI agents trained on thousands of documents specific to this sector — including academic research, industry regulations and standards, and technical documents — can enable utility and energy companies to more quickly assess energy needs and prepare the studies and permits required to improve infrastructure.

“We can bring AI to the global power sector in a much more accelerated way by working together to develop foundation models for the industry, and collaborating with the power sector to y apply solutions tailored to its unique needs,” Mansoor said.

Utilities could tap the consortium’s model to help accelerate interconnection studies, which analyze the feasibility and potential impact of connecting new generators to the existing electric grid. The process varies by region but can take up to four years to complete. By introducing AI agents that can support the analysis, the consortium aims to cut this timeline down by at least 5x.

The AI model could also be used to support the preparation of licenses, permits, environmental studies and utility rate cases, where energy companies seek regulatory approval and public comment on proposed changes to electricity rates.

Beyond releasing datasets and models, the consortium also aims to develop a standardized framework of benchmarks to help utilities, researchers and other energy sector stakeholders evaluate the performance and reliability of AI technologies.

Learn more about the Open Power AI Consortium online and in EPRI’s sessions at GTC:

Accelerate Energy Transformation With Industry Domain AI Models — Arshad Mansoor, president and CEO of EPRI
Energy Transition: Impact of Generative AI in the Power Ecosystem of Generation, Transmission and Distribution — Swati Daji, executive vice president and chief financial, risk and operations officer at EPRI

To learn more about advancements in AI across industries, watch the GTC keynote by NVIDIA founder and CEO Jensen Huang:

See notice regarding software product information.

Innovation to Impact: How NVIDIA Research Fuels Transformative Work in AI, Graphics and Beyond

The roots of many of NVIDIA’s landmark innovations — the foundational technology that powers AI, accelerated computing, real-time ray tracing and seamlessly connected data centers — can be found in the company’s research organization, a global team of around 400 experts in fields including computer architecture, generative AI, graphics and robotics.

Established in 2006 and led since 2009 by Bill Dally, former chair of Stanford University’s computer science department, NVIDIA Research is unique among corporate research organizations — set up with a mission to pursue complex technological challenges while having a profound impact on the company and the world.

“We make a deliberate effort to do great research while being relevant to the company,” said Dally, chief scientist and senior vice president of NVIDIA Research. “It’s easy to do one or the other. It’s hard to do both.”

Dally is among NVIDIA Research leaders sharing the group’s innovations at NVIDIA GTC, the premier developer conference at the heart of AI, taking place this week in San Jose, California.

“We make a deliberate effort to do great research while being relevant to the company.” — Bill Dally, chief scientist and senior vice president

While many research organizations may describe their mission as pursuing projects with a longer time horizon than those of a product team, NVIDIA researchers seek out projects with a larger “risk horizon” — and a huge potential payoff if they succeed.

“Our mission is to do the right thing for the company. It’s not about building a trophy case of best paper awards or a museum of famous researchers,” said David Luebke, vice president of graphics research and NVIDIA’s first researcher. “We are a small group of people who are privileged to be able to work on ideas that could fail. And so it is incumbent upon us to not waste that opportunity and to do our best on projects that, if they succeed, will make a big difference.”

Innovating as One Team

One of NVIDIA’s core values is “one team” — a deep commitment to collaboration that helps researchers work closely with product teams and industry stakeholders to transform their ideas into real-world impact.

“Everybody at NVIDIA is incentivized to figure out how to work together because the accelerated computing work that NVIDIA does requires full-stack optimization,” said Bryan Catanzaro, vice president of applied deep learning research at NVIDIA. “You can’t do that if each piece of technology exists in isolation and everybody’s staying in silos. You have to work together as one team to achieve acceleration.”

When evaluating potential projects, NVIDIA researchers consider whether the challenge is a better fit for a research or product team, whether the work merits publication at a top conference, and whether there’s a clear potential benefit to NVIDIA. If they decide to pursue the project, they do so while engaging with key stakeholders.

“We are a small group of people who are privileged to be able to work on ideas that could fail. And so it is incumbent upon us to not waste that opportunity.” — David Luebke, vice president of graphics research

“We work with people to make something real, and often, in the process, we discover that the great ideas we had in the lab don’t actually work in the real world,” Catanzaro said. “It’s a tight collaboration where the research team needs to be humble enough to learn from the rest of the company what they need to do to make their ideas work.”

The team shares much of its work through papers, technical conferences and open-source platforms like GitHub and Hugging Face. But its focus remains on industry impact.

“We think of publishing as a really important side effect of what we do, but it’s not the point of what we do,” Luebke said.

NVIDIA Research’s first effort was focused on ray tracing, which after a decade of sustained work led directly to the launch of NVIDIA RTX and redefined real-time computer graphics. The organization now includes teams specializing in chip design, networking, programming systems, large language models, physics-based simulation, climate science, humanoid robotics and self-driving cars — and continues expanding to tackle additional areas of study and tap expertise across the globe.

“You have to work together as one team to achieve acceleration.” — Bryan Catanzaro, vice president of applied deep learning research

Transforming NVIDIA — and the Industry

NVIDIA Research didn’t just lay the groundwork for some of the company’s most well-known products — its innovations have propelled and enabled today’s era of AI and accelerated computing.

It began with CUDA, a parallel computing software platform and programming model that enables researchers to tap GPU acceleration for myriad applications. Launched in 2006, CUDA made it easy for developers to harness the parallel processing power of GPUs to speed up scientific simulations, gaming applications and the creation of AI models.

“Developing CUDA was the single most transformative thing for NVIDIA,” Luebke said. “It happened before we had a formal research group, but it happened because we hired top researchers and had them work with top architects.”

Making Ray Tracing a Reality

Once NVIDIA Research was founded, its members began working on GPU-accelerated ray tracing, spending years developing the algorithms and the hardware to make it possible. In 2009, the project — led by the late Steven Parker, a real-time ray tracing pioneer who was vice president of professional graphics at NVIDIA — reached the product stage with the NVIDIA OptiX application framework, detailed in a 2010 SIGGRAPH paper.

The researchers’ work expanded and, in collaboration with NVIDIA’s architecture group, eventually led to the development of NVIDIA RTX ray-tracing technology, including RT Cores that enabled real-time ray tracing for gamers and professional creators.

Unveiled in 2018, NVIDIA RTX also marked the launch of another NVIDIA Research innovation: NVIDIA DLSS, or Deep Learning Super Sampling. With DLSS, the graphics pipeline no longer needs to draw all the pixels in a video. Instead, it draws a fraction of the pixels and gives an AI pipeline the information needed to create the image in crisp, high resolution.

Accelerating AI for Virtually Any Application

NVIDIA’s research contributions in AI software kicked off with the NVIDIA cuDNN library for GPU-accelerated neural networks, which was developed as a research project when the deep learning field was still in its initial stages — then released as a product in 2014.

As deep learning soared in popularity and evolved into generative AI, NVIDIA Research was at the forefront — exemplified by NVIDIA StyleGAN, a groundbreaking visual generative AI model that demonstrated how neural networks could rapidly generate photorealistic imagery.

While generative adversarial networks, or GANs, were first introduced in 2014, “StyleGAN was the first model to generate visuals that could completely pass muster as a photograph,” Luebke said. “It was a watershed moment.”

NVIDIA researchers introduced a slew of popular GAN models such as the AI painting tool GauGAN, which later developed into the NVIDIA Canvas application. And with the rise of diffusion models, neural radiance fields and Gaussian splatting, they’re still advancing visual generative AI — including in 3D with recent models like Edify 3D and 3DGUT.

In the field of large language models, Megatron-LM was an applied research initiative that enabled the efficient training and inference of massive LLMs for language-based tasks such as content generation, translation and conversational AI. It’s integrated into the NVIDIA NeMo platform for developing custom generative AI, which also features speech recognition and speech synthesis models that originated in NVIDIA Research.

Achieving Breakthroughs in Chip Design, Networking, Quantum and More

AI and graphics are only some of the fields NVIDIA Research tackles — several teams are achieving breakthroughs in chip architecture, electronic design automation, programming systems, quantum computing and more.

In 2012, Dally submitted a research proposal to the U.S. Department of Energy for a project that would become NVIDIA NVLink and NVSwitch, the high-speed interconnect that enables rapid communication between GPU and CPU processors in accelerated computing systems.

In 2013, the circuit research team published work on chip-to-chip links that introduced a signaling system co-designed with the interconnect to enable a high-speed, low-area and low-power link between dies. The project eventually became the link between the NVIDIA Grace CPU and NVIDIA Hopper GPU.

In 2021, the ASIC and VLSI Research group developed a software-hardware codesign technique for AI accelerators called VS-Quant that enabled many machine learning models to run with 4-bit weights and 4-bit activations at high accuracy. Their work influenced the development of FP4 precision support in the NVIDIA Blackwell architecture.

And unveiled this year at the CES trade show was NVIDIA Cosmos, a platform created by NVIDIA Research to accelerate the development of physical AI for next-generation robots and autonomous vehicles. Read the research paper and check out the AI Podcast episode on Cosmos for details.

Learn more about NVIDIA Research at GTC. Watch the keynote by NVIDIA founder and CEO Jensen Huang below:

See notice regarding software product information.

Tool choice with Amazon Nova models

In many generative AI applications, a large language model (LLM) like Amazon Nova is used to respond to a user query based on the model’s own knowledge or context that it is provided. However, as use cases have matured, the ability for a model to have access to tools or structures that would be inherently outside of the model’s frame of reference has become paramount. This could be APIs, code functions, or schemas and structures required by your end application. This capability has developed into what is referred to as tool use or function calling.

To add fine-grained control to how tools are used, we have released a feature for tool choice for Amazon Nova models. Instead of relying on prompt engineering, tool choice forces the model to adhere to the settings in place.

In this post, we discuss tool use and the new tool choice feature, with example use cases.

Tool use with Amazon Nova

To illustrate the concept of tool use, we can imagine a situation where we provide Amazon Nova access to a few different tools, such as a calculator or a weather API. Based on the user’s query, Amazon Nova will select the appropriate tool and tell you how to use it. For example, if a user asks “What is the weather in Seattle?” Amazon Nova will use the weather tool.

The following diagram illustrates an example workflow between an Amazon Nova model, its available tools, and related external resources.

Tool use at the core is the selection of the tool and its parameters. The responsibility to execute the external functionality is left to application or developer. After the tool is executed by the application, you can return the results to the model for the generation of the final response.

Let’s explore some examples in more detail. The following diagram illustrates the workflow of an Amazon Nova model using a function call to access a weather API, and returning the response to the user.

The following diagram illustrates the workflow of an Amazon Nova model using a function call to access a calculator tool.

Tool choice with Amazon Nova

The toolChoice API parameter allows you to control when a tool is called. There are three supported options for this parameter:

Any – With tool choice Any, the model will select at least one of the available tools each time:
```
{
   "toolChoice": {
        "any": {}
    }
}
```

Tool – With tool choice Tool, the model will always use the requested tool:

{
   "toolChoice": {
        "tool": {
            "name": "name_of_tool"
        }
    }
}

Auto – Tool choice Auto is the default behavior and will leave the tool selection completely up to the model:
```
{
   "toolChoice": {
        "auto": {}
    }
}
```

A popular tactic to improve the reasoning capabilities of a model is to use chain of thought. When using the tool choice of auto, Amazon Nova will use chain of thought and the response of the model will include both the reasoning and the tool that was selected.

This behavior will differ depending on the use case. When tool or any are selected as the tool choice, Amazon Nova will output only the tools and not output chain of thought.

Use cases

In this section, we explore different use cases for tool choice.

Structured output/JSON mode

In certain scenarios, you might want Amazon Nova to use a specific tool to answer the user’s question, even if Amazon Nova believes it can provide a response without the use of a tool. A common use case for this approach is enforcing structured output/JSON mode. It’s often critical to have LLMs return structured output, because this enables downstream use cases to more effectively consume and process the generated outputs. In these instances, the tools employed don’t necessarily need to be client-side functions—they can be used whenever the model is required to return JSON output adhering to a predefined schema, thereby compelling Amazon Nova to use the specified tool.

When using tools for enforcing structured output, you provide a single tool with a descriptive JSON inputSchema. You specify the tool with {"tool" : {"name" : "Your tool name"}}. The model will pass the input to the tool, so the name of the tool and its description should be from the model’s perspective.

For example, consider a food website. When provided with a dish description, the website can extract the recipe details, such as cooking time, ingredients, dish name, and difficulty level, in order to facilitate user search and filtering capabilities. See the following example code:

import boto3
import json

tool_config = {
    "toolChoice": {
        "name": { "tool" : "extract_recipe"}
    },
    "tools": [
        {
            "toolSpec": {
                "name": "extract_recipe",
                "description": "Extract recipe for cooking instructions",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "recipe": {
                                "type": "object",
                                "properties": {
                                    "name": {
                                        "type": "string",
                                        "description": "Name of the recipe"
                                    },
                                    "description": {
                                        "type": "string",
                                        "description": "Brief description of the dish"
                                    },
                                    "prep_time": {
                                        "type": "integer",
                                        "description": "Preparation time in minutes"
                                    },
                                    "cook_time": {
                                        "type": "integer",
                                        "description": "Cooking time in minutes"
                                    },
                                    "servings": {
                                        "type": "integer",
                                        "description": "Number of servings"
                                    },
                                    "difficulty": {
                                        "type": "string",
                                        "enum": ["easy", "medium", "hard"],
                                        "description": "Difficulty level of the recipe"
                                    },
                                    "ingredients": {
                                        "type": "array",
                                        "items": {
                                            "type": "object",
                                            "properties": {
                                                "item": {
                                                    "type": "string",
                                                    "description": "Name of ingredient"
                                                },
                                                "amount": {
                                                    "type": "number",
                                                    "description": "Quantity of ingredient"
                                                },
                                                "unit": {
                                                    "type": "string",
                                                    "description": "Unit of measurement"
                                                }
                                            },
                                            "required": ["item", "amount", "unit"]
                                        }
                                    },
                                    "instructions": {
                                        "type": "array",
                                        "items": {
                                            "type": "string",
                                            "description": "Step-by-step cooking instructions"
                                        }
                                    },
                                    "tags": {
                                        "type": "array",
                                        "items": {
                                            "type": "string",
                                            "description": "Categories or labels for the recipe"
                                        }
                                    }
                                },
                                "required": ["name", "ingredients", "instructions"]
                            }
                        },
                        "required": ["recipe"]
                    }
                }
            }
        }
    ]
}

messages = [{
    "role": "user",
    "content": [
        {"text": input_text},
    ]
}]

inf_params = {"topP": 1, "temperature": 1}

client = boto3.client("bedrock-runtime", region_name="us-east-1")

response = client.converse(
    modelId="us.amazon.nova-micro-v1:0",
    messages=messages,
    toolConfig=tool_config,
    inferenceConfig=inf_params,
    additionalModelRequestFields= {"inferenceConfig": { "topK": 1 } }
)
print(json.dumps(response['output']['message']['content'][0][], indent=2))

We can provide a detailed description of a dish as text input:

Legend has it that this decadent chocolate lava cake was born out of a baking mistake in New York's Any Kitchen back in 1987, when chef John Doe pulled a chocolate sponge cake out of the oven too early, only to discover that the dessert world would never be the same. Today I'm sharing my foolproof version, refined over countless dinner parties. Picture a delicate chocolate cake that, when pierced with a fork, releases a stream of warm, velvety chocolate sauce – it's pure theater at the table. While it looks like a restaurant-worthy masterpiece, the beauty lies in its simplicity: just six ingredients (good quality dark chocolate, unsalted butter, eggs, sugar, flour, and a pinch of salt) transform into individual cakes in under 15 minutes. The secret? Precise timing is everything. Pull them from the oven a minute too late, and you'll miss that magical molten center; too early, and they'll be raw. But hit that sweet spot at exactly 12 minutes, when the edges are set but the center still wobbles slightly, and you've achieved dessert perfection. I love serving these straight from the oven, dusted with powdered sugar and topped with a small scoop of vanilla bean ice cream that slowly melts into the warm cake. The contrast of temperatures and textures – warm and cold, crisp and gooey – makes this simple dessert absolutely unforgettable.

We can force Amazon Nova to use the tool extract_recipe, which will generate a structured JSON output that adheres to the predefined schema provided as the tool input schema:

 {
  "toolUseId": "tooluse_4YT_DYwGQlicsNYMbWFGPA",
  "name": "extract_recipe",
  "input": {
    "recipe": {
      "name": "Decadent Chocolate Lava Cake",
      "description": "A delicate chocolate cake that releases a stream of warm, velvety chocolate sauce when pierced with a fork. It's pure theater at the table.",
      "difficulty": "medium",
      "ingredients": [
        {
          "item": "good quality dark chocolate",
          "amount": 125,
          "unit": "g"
        },
        {
          "item": "unsalted butter",
          "amount": 125,
          "unit": "g"
        },
        {
          "item": "eggs",
          "amount": 4,
          "unit": ""
        },
        {
          "item": "sugar",
          "amount": 100,
          "unit": "g"
        },
        {
          "item": "flour",
          "amount": 50,
          "unit": "g"
        },
        {
          "item": "salt",
          "amount": 0.5,
          "unit": "pinch"
        }
      ],
      "instructions": [
        "Preheat the oven to 200u00b0C (400u00b0F).",
        "Melt the chocolate and butter together in a heatproof bowl over a saucepan of simmering water.",
        "In a separate bowl, whisk the eggs and sugar until pale and creamy.",
        "Fold the melted chocolate mixture into the egg and sugar mixture.",
        "Sift the flour and salt into the mixture and gently fold until just combined.",
        "Divide the mixture among six ramekins and bake for 12 minutes.",
        "Serve straight from the oven, dusted with powdered sugar and topped with a small scoop of vanilla bean ice cream."
      ],
      "prep_time": 10,
      "cook_time": 12,
      "servings": 6,
      "tags": [
        "dessert",
        "chocolate",
        "cake"
      ]
    }
  }
}

API generation

Another common scenario is to require Amazon Nova to select a tool from the available options no matter the context of the user query. One example of this is with API endpoint selection. In this situation, we don’t know the specific tool to use, and we allow the model to choose between the ones available.

With the tool choice of any, you can make sure that the model will always use at least one of the available tools. Because of this, we provide a tool that can be used for when an API is not relevant. Another example would be to provide a tool that allows clarifying questions.

In this example, we provide the model with two different APIs, and an unsupported API tool that it will select based on the user query:

import boto3
import json

tool_config = {
    "toolChoice": {
        "any": {}
    },
    "tools": [
         {
            "toolSpec": {
                "name": "get_all_products",
                "description": "API to retrieve multiple products with filtering and pagination options",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "sort_by": {
                                "type": "string",
                                "description": "Field to sort results by. One of: price, name, created_date, popularity",
                                "default": "created_date"
                            },
                            "sort_order": {
                                "type": "string",
                                "description": "Order of sorting (ascending or descending). One of: asc, desc",
                                "default": "desc"
                            },
                        },
                        "required": []
                    }
                }
            }
        },
        {
            "toolSpec": {
                "name": "get_products_by_id",
                "description": "API to retrieve retail products based on search criteria",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "product_id": {
                                "type": "string",
                                "description": "Unique identifier of the product"
                            },
                        },
                        "required": ["product_id"]
                    }
                }
            }
        },
        {
            "toolSpec": {
                "name": "unsupported_api",
                "description": "API to use when the user query does not relate to the other available APIs",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "reasoning": {
                                "type": "string",
                                "description": "The reasoning for why the user query did not have a valid API available"
                            },
                        },
                        "required": ["reasoning"]
                    }
                }
            }
        }
    ]
}


messages = [{
    "role": "user",
    "content": [
        {"text": input_text},
    ]
}]

inf_params = {"topP": 1, "temperature": 1}

client = boto3.client("bedrock-runtime", region_name="us-east-1")

response = client.converse(
    modelId="us.amazon.nova-micro-v1:0",
    messages=messages,
    toolConfig=tool_config,
    inferenceConfig=inf_params,
    additionalModelRequestFields= {"inferenceConfig": { "topK": 1 } }
)

print(json.dumps(response['output']['message']['content'][0], indent=2))

A user input of “Can you get all of the available products?” would output the following:

{
  "toolUse": {
    "toolUseId": "tooluse_YCNbT0GwSAyjIYOuWnDhkw",
    "name": "get_all_products",
    "input": {}
  }
}

Whereas “Can you get my most recent orders?” would output the following:

{
  "toolUse": {
    "toolUseId": "tooluse_jpiZnrVcQDS1sAa-qPwIQw",
    "name": "unsupported_api",
    "input": {
      "reasoning": "The available tools do not support retrieving user orders. The user's request is for personal order information, which is not covered by the provided APIs."
    }
  }
}

Chat with search

The final option for tool choice is auto. This is the default behavior, so it is consistent with providing no tool choice at all.

Using this tool choice will allow the option of tool use or just text output. If the model selects a tool, there will be a tool block and text block. If the model responds with no tool, only a text block is returned. In the following example, we want to allow the model to respond to the user or call a tool if necessary:

import boto3
import json

tool_config = {
    "toolChoice": {
        "auto": {}
    },
    "tools": [
         {
            "toolSpec": {
                "name": "search",
                "description": "API that provides access to the internet",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "query": {
                                "type": "string",
                                "description": "Query to search by",
                            },
                        },
                        "required": ["query"]
                    }
                }
            }
        }
    ]
}

messages = [{
    "role": "user",
    "content": [
        {"text": input_text},
    ]
}]

system = [{
    "text": "ou are a helpful chatbot. You can use a tool if necessary or respond to the user query"
}]

inf_params = {"topP": 1, "temperature": 1}

client = boto3.client("bedrock-runtime", region_name="us-east-1")

response = client.converse(
    modelId="us.amazon.nova-micro-v1:0",
    messages=messages,
    toolConfig=tool_config,
    inferenceConfig=inf_params,
    additionalModelRequestFields= {"inferenceConfig": { "topK": 1 } }
)


if (response["stopReason"] == "tool_use"):
    tool_use = next(
        block["toolUse"]
        for block in response["output"]["message"]["content"]
            if "toolUse" in block
    )
   print(json.dumps(tool_use, indent=2))
 else:
    pattern = r'<thinking>.*?</thinking>\n\n|<thinking>.*?</thinking>'
    text_response = response["output"]["message"]["content"][0]["text"]
    stripped_text = re.sub(pattern, '', text_response, flags=re.DOTALL)
    
    print(stripped_text)

A user input of “What is the weather in San Francisco?” would result in a tool call:

{
  "toolUseId": "tooluse_IwtBnbuuSoynn1qFiGtmHA",
  "name": "search",
  "input": {
    "query": "what is the weather in san francisco"
  }
}

Whereas asking the model a direct question like “How many months are in a year?” would respond with a text response to the user:

There are 12 months in a year.

Considerations

There are a few best practices that are required for tool calling with Nova models. The first is to use greedy decoding parameters. With Amazon Nova models, that requires setting a temperature, top p, and top k of 1. You can refer to the previous code examples for how to set these. Using greedy decoding parameters forces the models to produce deterministic responses and improves the success rate of tool calling.

The second consideration is the JSON schema you are using for the tool consideration. At the time of writing, Amazon Nova models support a limited subset of JSON schemas, so they might not be picked up as expected by the model. Common fields would be $def and $ref fields. Make sure that your schema has the following top-level fields set: type (must be object), properties, and required.

Lastly, for the most impact on the success of tool calling, you should optimize your tool configurations. Descriptions and names should be very clear. If there are nuances to when one tool should be called over the other, make sure to have that concisely included in the tool descriptions.

Conclusion

Using tool choice in tool calling workflows is a scalable way to control how a model invokes tools. Instead of relying on prompt engineering, tool choice forces the model to adhere to the settings in place. However, there are complexities to tool calling; for more information, refer to Tool use (function calling) with Amazon Nova, Tool calling systems, and Troubleshooting tool calls.

Explore how Amazon Nova models can enhance your generative AI use cases today.

About the Authors

Jean Farmer is a Generative AI Solutions Architect on the Amazon Artificial General Intelligence (AGI) team, specializing in agentic applications. Based in Seattle, Washington, she works at the intersection of autonomous AI systems and practical business solutions, helping to shape the future of AGI at Amazon.

Sharon Li is an AI/ML Specialist Solutions Architect at Amazon Web Services (AWS) based in Boston, Massachusetts. With a passion for leveraging cutting-edge technology, Sharon is at the forefront of developing and deploying innovative generative AI solutions on the AWS cloud platform.

Lulu Wong is an AI UX designer on the Amazon Artificial General Intelligence (AGI) team. With a background in computer science, learning design, and user experience, she bridges the technical and user experience domains by shaping how AI systems interact with humans, refining model input-output behaviors, and creating resources to make AI products more accessible to users.

Integrate generative AI capabilities into Microsoft Office using Amazon Bedrock

Generative AI is rapidly transforming the modern workplace, offering unprecedented capabilities that augment how we interact with text and data. At Amazon Web Services (AWS), we recognize that many of our customers rely on the familiar Microsoft Office suite of applications, including Word, Excel, and Outlook, as the backbone of their daily workflows. In this blog post, we showcase a powerful solution that seamlessly integrates AWS generative AI capabilities in the form of large language models (LLMs) based on Amazon Bedrock into the Office experience. By harnessing the latest advancements in generative AI, we empower employees to unlock new levels of efficiency and creativity within the tools they already use every day. Whether it’s drafting compelling text, analyzing complex datasets, or gaining more in-depth insights from information, integrating generative AI with Office suite transforms the way teams approach their essential work. Join us as we explore how your organization can leverage this transformative technology to drive innovation and boost employee productivity.

Solution overview

Figure 1: Solution architecture overview

The solution architecture in Figure 1 shows how Office applications interact with a serverless backend hosted on the AWS Cloud through an Add-In. This architecture allows users to leverage Amazon Bedrock’s generative AI capabilities directly from within the Office suite, enabling enhanced productivity and insights within their existing workflows.

Components deep-dive

Office Add-ins

Office Add-ins allow extending Office products with custom extensions built on standard web technologies. Using AWS, organizations can host and serve Office Add-ins for users worldwide with minimal infrastructure overhead.

An Office Add-in is composed of two elements:

Manifest: A domain-specific XML file that specifies how the Office Add-in integrates with the application. The manifest must be loaded within the Office installation. Refer to the recommended publishing methods. For testing, one can sideload an Office Add-in.
User Interface: A custom web application that gets integrated into the MS Office experience. One can quickly host such application on the AWS Cloud without managing the underlying infrastructure, for example, with Amazon Simple Storage Service (S3) and Amazon CloudFront.

The code snippet below demonstrates part of a function that could run whenever a user invokes the plugin, performing the following actions:

Initiate a request to the generative AI backend, providing the user prompt and available context in the request body
Integrate the results from the backend response into the Word document using Microsoft’s Office JavaScript APIs. Note that these APIs use objects as namespaces, alleviating the need for explicit imports. Instead, we use the globally available namespaces, such as Word, to directly access relevant APIs, as shown in following example snippet.

// Initiate backend request (optional context)
const response = await sendPrompt({ user_message: prompt, context: selectedContext });

// Modify Word content with responses from the Backend
await Word.run(async (context) => {
  let documentBody;

  // Target for the document modifications
  if (response.location === 'Replace') {
    documentBody = context.document.getSelection(); // active text selection
  } else {
    documentBody = context.document.body; // entire document body
  }

  // Markdown support for preserving original content layout
  // Dependencies used: React markdown
  const content = renderToString(<Markdown>{ response.content } < /Markdown>);
  const operation = documentBody.insertHtml(content, response.location);

  // set properties for the output content (font, size, color, etc.)
  operation.font.set({ name: 'Arial' });

  // flush changes to the Word document
  await context.sync();
});

Generative AI backend infrastructure

The AWS Cloud backend consists of three components:

Amazon API Gateway acts as an entry point, receiving requests from the Office applications’ Add-in. API Gateway supports multiple mechanisms for controlling and managing access to an API.
AWS Lambda handles the REST API integration, processing the requests and invoking the appropriate AWS services.
Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With Bedrock’s serverless experience, you can get started quickly, privately customize FMs with your own data, and quickly integrate and deploy them into your applications using the AWS tools without having to manage infrastructure.

LLM prompting

Amazon Bedrock allows you to choose from a wide selection of foundation models for prompting. Here, we use Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock for completions. The system prompt we used in this example is as follows:

You are an office assistant helping humans to write text for their documents.

[When preparing the answer, take into account the following text: <text>{context}</text>]
Before answering the question, think through it step-by-step within the <thinking></thinking> tags.
Then, detect the user's language from their question and store it in the form of an ISO 639-1 code within the <user_language></user_language> tags.
Then, develop your answer in the user’s language within the <response></response> tags.

In the prompt, we first give the LLM a persona, indicating that it is an office assistant helping humans. The second, optional line contains text that has been selected by the user in the document and is provided as context to the LLM. We specifically instruct the LLM to first mimic a step-by-step thought process for arriving at the answer (chain-of-thought reasoning), an effective measure of prompt-engineering to improve the output quality. Next, we instruct it to detect the user’s language from their question so we can later refer to it. Finally, we instruct the LLM to develop its answer using the previously detected user language within response tags, which are used as the final response. While here, we use the default configuration for inference parameters such as temperature, that can quickly be configured with every LLM prompt. The user input is then added as a user message to the prompt and sent via the Amazon Bedrock Messages API to the LLM.

Implementation details and demo setup in an AWS account

As a prerequisite, we need to make sure that we are working in an AWS Region with Amazon Bedrock support for the foundation model (here, we use Anthropic’s Claude 3.5 Sonnet). Also, access to the required relevant Amazon Bedrock foundation models needs to be added. For this demo setup, we describe the manual steps taken in the AWS console. If required, this setup can also be defined in Infrastructure as Code.

To set up the integration, follow these steps:

Create an AWS Lambda function with Python runtime and below code to be the backend for the API. Make sure that we have Powertools for AWS Lambda (Python) available in our runtime, for example, by attaching aLambda layer to our function. Make sure that the Lambda function’s IAM role provides access to the required FM, for example:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:*::foundation-model/anthropic.claude-3-5-sonnet-20240620-v1:0"
            ]
        }
    ]
}

The following code block shows a sample implementation for the REST API Lambda integration based on a Powertools for AWS Lambda (Python) REST API event handler:

import json
import re
from typing import Optional

import boto3
from aws_lambda_powertools import Logger
from aws_lambda_powertools.event_handler import APIGatewayRestResolver, CORSConfig
from aws_lambda_powertools.logging import correlation_paths
from aws_lambda_powertools.utilities.typing import LambdaContext
from pydantic import BaseModel

logger = Logger()
app = APIGatewayRestResolver(
    enable_validation=True,
    cors=CORSConfig(allow_origin="http://localhost:3000"),  # for testing purposes
)

bedrock_runtime_client = boto3.client("bedrock-runtime")


SYSTEM_PROMPT = """
You are an office assistant helping humans to write text for their documents.

{context}
Before answering the question, think through it step-by-step within the <thinking></thinking> tags.
Then, detect the user's language from their question and store it in the form of an ISO 639-1 code within the <user_language></user_language> tags.
Then, develop your answer in the user's language in markdown format within the <response></response> tags.
"""

class Query(BaseModel):
    user_message: str  # required
    context: Optional[str] = None  # optional
    max_tokens: int = 1000  # default value
    model_id: str = "anthropic.claude-3-5-sonnet-20240620-v1:0"  # default value

def wrap_context(context: Optional[str]) -> str:
    if context is None:
        return ""
    else:
        return f"When preparing the answer take into account the following text: <text>{context}</text>"

def parse_completion(completion: str) -> dict:
    response = {"completion": completion}
    try:
        tags = ["thinking", "user_language", "response"]
        tag_matches = re.finditer(
            f"<(?P<tag>{'|'.join(tags)})>(?P<content>.*?)</(?P=tag)>",
            completion,
            re.MULTILINE | re.DOTALL,
        )
        for match in tag_matches:
            response[match.group("tag")] = match.group("content").strip()
    except Exception:
        logger.exception("Unable to parse LLM response")
        response["response"] = completion

    return response


@app.post("/query")
def query(query: Query):
    bedrock_response = bedrock_runtime_client.invoke_model(
        modelId=query.model_id,
        body=json.dumps(
            {
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": query.max_tokens,
                "system": SYSTEM_PROMPT.format(context=wrap_context(query.context)),
                "messages": [{"role": "user", "content": query.user_message}],
            }
        ),
    )
    response_body = json.loads(bedrock_response.get("body").read())
    logger.info("Received LLM response", response_body=response_body)
    response_text = response_body.get("content", [{}])[0].get(
        "text", "LLM did not respond with text"
    )
    return parse_completion(response_text)

@logger.inject_lambda_context(correlation_id_path=correlation_paths.API_GATEWAY_REST)
def lambda_handler(event: dict, context: LambdaContext) -> dict:
    return app.resolve(event, context)

Create an API Gateway REST API with a Lambda proxy integration to expose the Lambda function via a REST API. You can follow this tutorial for creating a REST API for the Lambda function by using the API Gateway console. By creating a Lambda proxy integration with a proxy resource, we can route requests to the resources to the Lambda function. Follow the tutorial to deploy the API and take note of the API’s invoke URL. Make sure to configure adequate access control for the REST API.

We can now invoke and test our function via the API’s invoke URL. The following example uses curl to send a request (make sure to replace all placeholders in curly braces as required), and the response generated by the LLM:

$ curl --header "Authorization: {token}" 
     --header "Content-Type: application/json" 
     --request POST 
     --data '{"user_message": "Write a 2 sentence summary about AWS."}' 
     https://{restapi_id}.execute-api.{region}.amazonaws.com/{stage_name}/query | jq .
{
 "completion": "<thinking>nTo summarize AWS in 2 sentences:n1. AWS (Amazon Web Services) is a comprehensive cloud computing platform offering a wide range of services like computing power, database storage, content delivery, and more.n2. It allows organizations and individuals to access these services over the internet on a pay-as-you-go basis without needing to invest in on-premises infrastructure.n</thinking>nn<user_language>en</user_language>nn<response>nnAWS (Amazon Web Services) is a cloud computing platform that offers a broad set of global services including computing, storage, databases, analytics, machine learning, and more. It enables companies of all sizes to access these services over the internet on a pay-as-you-go pricing model, eliminating the need for upfront capital expenditure or on-premises infrastructure management.nn</response>",
 "thinking": "To summarize AWS in 2 sentences:n1. AWS (Amazon Web Services) is a comprehensive cloud computing platform offering a wide range of services like computing power, database storage, content delivery, and more.n2. It allows organizations and individuals to access these services over the internet on a pay-as-you-go basis without needing to invest in on-premises infrastructure.",
 "user_language": "en",
 "response": "AWS (Amazon Web Services) is a cloud computing platform that offers a broad set of global services including computing, storage, databases, analytics, machine learning, and more. It enables companies of all sizes to access these services over the internet on a pay-as-you-go pricing model, eliminating the need for upfront capital expenditure or on-premises infrastructure management."
}

If required, the created resources can be cleaned up by 1) deleting the API Gateway REST API, and 2) deleting the REST API Lambda function and associated IAM role.

Example use cases

To create an interactive experience, the Office Add-in integrates with the cloud back-end that implements conversational capabilities with support for additional context retrieved from the Office JavaScript API.

Next, we demonstrate two different use cases supported by the proposed solution, text generation and text refinement.

Text generation

Figure 2: Text generation use-case demo

In the demo in Figure 2, we show how the plug-in is prompting the LLM to produce a text from scratch. The user enters their query with some context into the Add-In text input area. Upon sending, the backend will prompt the LLM to generate respective text, and return it back to the frontend. From the Add-in, it is inserted into the Word document at the cursor position using the Office JavaScript API.

Text refinement

Figure 3: Text refinement use-case demo

In Figure 3, the user highlighted a text segment in the work area and entered a prompt into the Add-In text input area to rephrase the text segment. Again, the user input and highlighted text are processed by the backend and returned to the Add-In, thereby replacing the previously highlighted text.

Conclusion

This blog post showcases how the transformative power of generative AI can be incorporated into Office processes. We described an end-to-end sample of integrating Office products with an Add-in for text generation and manipulation with the power of LLMs. In our example, we used managed LLMs on Amazon Bedrock for text generation. The backend is hosted as a fully serverless application on the AWS cloud.

Text generation with LLMs in Office supports employees by streamlining their writing process and boosting productivity. Employees can leverage the power of generative AI to generate and edit high-quality content quickly, freeing up time for other tasks. Additionally, the integration with a familiar tool like Word provides a seamless user experience, minimizing disruptions to existing workflows.

To learn more about boosting productivity, building differentiated experiences, and innovating faster with AWS visit the Generative AI on AWS page.

About the Authors

Martin Maritsch is a Generative AI Architect at AWS ProServe focusing on Generative AI and MLOps. He helps enterprise customers to achieve business outcomes by unlocking the full potential of AI/ML services on the AWS Cloud.

Miguel Pestana is a Cloud Application Architect in the AWS Professional Services team with over 4 years of experience in the automotive industry delivering cloud native solutions. Outside of work Miguel enjoys spending its days at the beach or with a padel racket in one hand and a glass of sangria on the other.

Carlos Antonio Perea Gomez is a Builder with AWS Professional Services. He enables customers to become AWSome during their journey to the cloud. When not up in the cloud he enjoys scuba diving deep in the waters.

From innovation to impact: How AWS and NVIDIA enable real-world generative AI success

As we gather for NVIDIA GTC, organizations of all sizes are at a pivotal moment in their AI journey. The question is no longer whether to adopt generative AI, but how to move from promising pilots to production-ready systems that deliver real business value. The organizations that figure this out first will have a significant competitive advantage—and we’re already seeing compelling examples of what’s possible.

Consider Hippocratic AI’s work to develop AI-powered clinical assistants to support healthcare teams as doctors, nurses, and other clinicians face unprecedented levels of burnout. During a recent hurricane in Florida, their system called 100,000 patients in a day to check on medications and provide preventative healthcare guidance–the kind of coordinated outreach that would be nearly impossible to achieve manually. They aren’t just building another chatbot; they are reimagining healthcare delivery at scale.

Production-ready AI like this requires more than just cutting-edge models or powerful GPUs. In my decade working with customers’ data journeys, I’ve seen that an organization’s most valuable asset is its domain-specific data and expertise. And now leading our data and AI go-to-market, I hear customers consistently emphasize what they need to transform their domain advantage into AI success: infrastructure and services they can trust—with performance, cost-efficiency, security, and flexibility—all delivered at scale. When the stakes are high, success requires not just cutting-edge technology, but the ability to operationalize it at scale—a challenge that AWS has consistently solved for customers. As the world’s most comprehensive and broadly adopted cloud, our partnership with NVIDIA’s pioneering accelerated computing platform for generative AI amplifies this capability. It’s inspiring to see how, together, we’re enabling customers across industries to confidently move AI into production.

In this post, I will share some of these customers’ remarkable journeys, offering practical insights for any organization looking to harness the power of generative AI.

Transforming content creation with generative AI

Content creation represents one of the most visible and immediate applications of generative AI today. Adobe, a pioneer that has shaped creative workflows for over four decades, has moved with remarkable speed to integrate generative AI across its flagship products, helping millions of creators work in entirely new ways.

Adobe’s approach to generative AI infrastructure exemplifies what their VP of Generative AI, Alexandru Costin, calls an “AI superhighway”—a sophisticated technical foundation that enables rapid iteration of AI models and seamless integration into their creative applications. The success of their Firefly family of generative AI models, integrated across flagship products like Photoshop, demonstrates the power of this approach. For their AI training and inference workloads, Adobe uses NVIDIA GPU-accelerated Amazon Elastic Compute Cloud (Amazon EC2) P5en (NVIDIA H200 GPUs), P5 (NVIDIA H100 GPUs), P4de (NVIDIA A100 GPUs), and G5 (NVIDIA A10G GPUs) instances. They also use NVIDIA software such as NVIDIA TensorRT and NVIDIA Triton Inference Server for faster, scalable inference. Adobe needed maximum flexibility to build their AI infrastructure, and AWS provided the complete stack of services needed—from Amazon FSx for Lustre for high-performance storage, to Amazon Elastic Kubernetes Service (Amazon EKS) for container orchestration, to Elastic Fabric Adapter (EFA) for high-throughput networking—to create a production environment that could reliably serve millions of creative professionals.

Key takeaway

If you’re building and managing your own AI pipelines, Adobe’s success highlights a critical insight: although GPU-accelerated compute often gets the spotlight in AI infrastructure discussions, what’s equally important is the NVIDIA software stack along with the foundation of orchestration, storage, and networking services that enable production-ready AI. Their results speak for themselves—Adobe achieved a 20-fold scale-up in model training while maintaining the enterprise-grade performance and reliability their customers expect.

Pioneering new AI applications from the ground up

Throughout my career, I’ve been particularly energized by startups that take on audacious challenges—those that aren’t just building incremental improvements but are fundamentally reimagining how things work. Perplexity exemplifies this spirit. They’ve taken on a technology most of us now take for granted: search. It’s the kind of ambitious mission that excites me, not just because of its bold vision, but because of the incredible technical challenges it presents. When you’re processing 340 million queries monthly and serving over 1,500 organizations, transforming search isn’t just about having great ideas—it’s about building robust, scalable systems that can deliver consistent performance in production.

Perplexity’s innovative approach earned them membership in both AWS Activate and NVIDIA Inception—flagship programs designed to accelerate startup innovation and success. These programs provided them with the resources, technical guidance, and support needed to build at scale. They were one of the early adopters of Amazon SageMaker HyperPod, and continue to use its distributed training capabilities to accelerate model training time by up to 40%. They use a highly optimized inference stack built with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server to serve both their search application and pplx-api, their public API service that gives developers access to their proprietary models. The results speak for themselves—their inference stack achieves up to 3.1 times lower latency compared to other platforms. Both their training and inference workloads run on NVIDIA GPU-accelerated EC2 P5 instances, delivering the performance and reliability needed to operate at scale. To give their users even more flexibility, Perplexity complements their own models with services such as Amazon Bedrock, and provides access to additional state-of-the-art models in their API. Amazon Bedrock offers ease of use and reliability, which are crucial for their team—as they note, it allows them to effectively maintain the reliability and latency their product demands.

What I find particularly compelling about Perplexity’s journey is their commitment to technical excellence, exemplified by their work optimizing GPU memory transfer with EFA networking. The team achieved 97.1% of the theoretical maximum bandwidth of 3200 Gbps and open sourced their innovations, enabling other organizations to benefit from their learnings.

For those interested in the technical details, I encourage you to read their fascinating post Journey to 3200 Gbps: High-Performance GPU Memory Transfer on AWS Sagemaker Hyperpod.

Key takeaway

For organizations with complex AI workloads and specific performance requirements, Perplexity’s approach offers a valuable lesson. Sometimes, the path to production-ready AI isn’t about choosing between self-hosted infrastructure and managed services—it’s about strategically combining both. This hybrid strategy can deliver both exceptional performance (evidenced by Perplexity’s 3.1 times lower latency) and the flexibility to evolve.

Transforming enterprise workflows with AI

Enterprise workflows represent the backbone of business operations—and they’re a crucial proving ground for AI’s ability to deliver immediate business value. ServiceNow, which terms itself the AI platform for business transformation, is rapidly integrating AI to reimagine core business processes at scale.

ServiceNow’s innovative AI solutions showcase their vision for enterprise-specific AI optimization. As Srinivas Sunkara, ServiceNow’s Vice President, explains, their approach focuses on deep AI integration with technology workflows, core business processes, and CRM systems—areas where traditional large language models (LLMs) often lack domain-specific knowledge. To train generative AI models at enterprise scale, ServiceNow uses NVIDIA DGX Cloud on AWS. Their architecture combines high-performance FSx for Lustre storage with NVIDIA GPU clusters for training, and NVIDIA Triton Inference Server handles production deployment. This robust technology platform allows ServiceNow to focus on domain-specific AI development and customer value rather than infrastructure management.

Key takeaway

ServiceNow offers an important lesson about enterprise AI adoption: while foundation models (FMs) provide powerful general capabilities, the greatest business value often comes from optimizing models for specific enterprise use cases and workflows. In many cases, it’s precisely this deliberate specialization that transforms AI from an interesting technology into a true business accelerator.

Scaling AI across enterprise applications

Cisco’s Webex team’s journey with generative AI exemplifies how large organizations can methodically transform their applications while maintaining enterprise standards for reliability and efficiency. With a comprehensive suite of telecommunications applications serving customers globally, they needed an approach that would allow them to incorporate LLMs across their portfolio—from AI assistants to speech recognition—without compromising performance or increasing operational complexity.

The Webex team’s key insight was to separate their models from their applications. Previously, they had embedded AI models into the container images for applications running on Amazon EKS, but as their models grew in sophistication and size, this approach became increasingly inefficient. By migrating their LLMs to Amazon SageMaker AI and using NVIDIA Triton Inference Server, they created a clean architectural break between their relatively lean applications and the underlying models, which require more substantial compute resources. This separation allows applications and models to scale independently, significantly reducing development cycle time and increasing resource utilization. The team deployed dozens of models on SageMaker AI endpoints, using Triton Inference Server’s model concurrency capabilities to scale globally across AWS data centers.

The results validate Cisco’s methodical approach to AI transformation. By separating applications from models, their development teams can now fix bugs, perform tests, and add features to applications much faster, without having to manage large models in their workstation memory. The architecture also enables significant cost optimization—applications remain available during off-peak hours for reliability, and model endpoints can scale down when not needed, all without impacting application performance. Looking ahead, the team is evaluating Amazon Bedrock to further improve their price-performance, demonstrating how thoughtful architecture decisions create a foundation for continuous optimization.

Key takeaway

For enterprises with large application portfolios looking to integrate AI at scale, Cisco’s methodical approach offers an important lesson: separating LLMs from applications creates a cleaner architectural boundary that improves both development velocity and cost optimization. By treating models and applications as independent components, Cisco significantly improved development cycle time while reducing costs through more efficient resource utilization.

Building mission-critical AI for healthcare

Earlier, we highlighted how Hippocratic AI reached 100,000 patients during a crisis. Behind this achievement lies a story of rigorous engineering for safety and reliability—essential in healthcare where stakes are extraordinarily high.

Hippocratic AI’s approach to this challenge is both innovative and rigorous. They’ve developed what they call a “constellation architecture”—a sophisticated system of over 20 specialized models working in concert, each focused on specific safety aspects like prescription adherence, lab analysis, and over-the-counter medication guidance. This distributed approach to safety means they have to train multiple models, requiring management of significant computational resources. That’s why they use SageMaker HyperPod for their training infrastructure, using Amazon FSx and Amazon Simple Storage Service (Amazon S3) for high-speed storage access to NVIDIA GPUs, while Grafana and Prometheus provide the comprehensive monitoring needed to provide optimal GPU utilization. They build upon NVIDIA’s low-latency inference stack, and are enhancing conversational AI capabilities using NVIDIA Riva models for speech recognition and text-to-speech translation, and are also using NVIDIA NIM microservices to deploy these models. Given the sensitive nature of healthcare data and HIPAA compliance requirements, they’ve implemented a sophisticated multi-account, multi-cluster strategy on AWS—running production inference workloads with patient data on completely separate accounts and clusters from their development and training environments. This careful attention to both security and performance allows them to handle thousands of patient interactions while maintaining precise control over clinical safety and accuracy.

The impact of Hippocratic AI’s work extends far beyond technical achievements. Their AI-powered clinical assistants address critical healthcare workforce burnout by handling burdensome administrative tasks—from pre-operative preparation to post-discharge follow-ups. For example, during weather emergencies, their system can rapidly assess heat risks and coordinate transport for vulnerable patients—the kind of comprehensive care that would be too burdensome and resource-intensive to coordinate manually at scale.

Key takeaway

For organizations building AI solutions for complex, regulated, and high-stakes environments, Hippocratic AI’s constellation architecture reinforces what we’ve consistently emphasized: there’s rarely a one-size-fits-all model for every use case. Just as Amazon Bedrock offers a choice of models to meet diverse needs, Hippocratic AI’s approach of combining over 20 specialized models—each focused on specific safety aspects—demonstrates how a thoughtfully designed ensemble can achieve both precision and scale.

Conclusion

As the technology partners enabling these and countless other customer innovations, AWS and NVIDIA’s long-standing collaboration continues to evolve to meet the demands of the generative AI era. Our partnership, which began over 14 years ago with the world’s first GPU cloud instance, has grown to offer the industry’s widest range of NVIDIA accelerated computing solutions and software services for optimizing AI deployments. Through initiatives like Project Ceiba—one of the world’s fastest AI supercomputers hosted exclusively on AWS using NVIDIA DGX Cloud for NVIDIA’s own research and development use—we continue to push the boundaries of what’s possible.

As all the examples we’ve covered illustrate, it isn’t just about the technology we build together—it’s how organizations of all sizes are using these capabilities to transform their industries and create new possibilities. These stories ultimately reveal something more fundamental: when we make powerful AI capabilities accessible and reliable, people find remarkable ways to use them to solve meaningful problems. That’s the true promise of our partnership with NVIDIA—enabling innovators to create positive change at scale. I’m excited to continue inventing and partnering with NVIDIA and can’t wait to see what our mutual customers are going to do next.

Resources

Check out the following resources to learn more about our partnership with NVIDIA and generative AI on AWS:

Learn about the AWS and NVIDIA partnership
Explore generative AI on AWS
Cost-effectively access NVIDIA GPUs across several new AWS Regions with Amazon EC2 Capacity Blocks for ML
Get started with Amazon SageMaker HyperPod for generative AI model development
Build and scale generative AI applications with Amazon Bedrock

About the Author

Rahul Pathak is Vice President Data and AI GTM at AWS, where he leads the global go-to-market and specialist teams who are helping customers create differentiated value with AWS’s AI and capabilities such as Amazon Bedrock, Amazon Q, Amazon SageMaker, and Amazon EC2 and Data Services such as Amaqzon S3, AWS Glue and Amazon Redshift. Rahul believes that generative AI will transform virtually every single customer experience and that data is a key differentiator for customers as they build AI applications. Prior to his current role, he was Vice President, Relational Database Engines where he led Amazon Aurora, Redshift, and DSQL . During his 13+ years at AWS, Rahul has been focused on launching, building, and growing managed database and analytics services, all aimed at making it easy for customers to get value from their data. Rahul has over twenty years of experience in technology and has co-founded two companies, one focused on analytics and the other on IP-geolocation. He holds a degree in Computer Science from MIT and an Executive MBA from the University of Washington.

Claimify: Extracting high-quality claims from language model outputs

Gradient background transitioning from blue to pink with two white icons. The left icon depicts a network or molecule structure with interconnected nodes, and the right icon shows a laptop and the outline of a person.

While large language models (LLMs) are capable of synthesizing vast amounts of information, they sometimes produce inaccurate or unsubstantiated content. To mitigate this risk, tools like Azure AI’s Groundedness Detection (opens in new tab) can be used to verify LLM outputs. 

A common strategy for fact-checking LLM-generated texts – especially complex, highly detailed outputs – is claim extraction: instead of evaluating the entire text at once, it’s broken down into simple factual statements that can be verified independently. However, the effectiveness of this approach depends on the quality of the extracted claims: if they are inaccurate or incomplete, the fact-checking results may be compromised.

Our paper “Towards Effective Extraction and Evaluation of Factual Claims” proposes a framework for evaluating claim extraction and introduces Claimify, a novel LLM-based method that outperforms prior solutions. In this blog post, we discuss common claim extraction issues and explore Claimify’s approach.

Key issues in claim extraction

As a case study, we prompted OpenAI’s GPT-4o model to “provide an overview of challenges in emerging markets” based on a collection of news articles. Here is its output:

# Emerging market challenges
Emerging markets face a myriad of challenges that span economic, social, and environmental dimensions. These challenges are often interconnected, exacerbating the difficulties faced by these regions.

## Economic instability and inflation
Several emerging markets are grappling with severe economic instability. For instance, Argentina’s rampant inflation, with monthly rates reaching as high as 25.5%, has made many goods unobtainable and plunged the value of the currency, causing severe economic hardship. Some experts estimate that the annual inflation rate could potentially double to 300%, while others predict even higher rates.

## Food insecurity and agricultural challenges
Food insecurity is another critical issue in emerging markets. Nigeria, for example, is striving to become self-sufficient in wheat production but is hindered by climate change and violence, exacerbated by high grain prices due to the suspension of the Black Sea Grain Initiative. Climate change has played a pivotal role in creating food insecurity and economic instability in farming-dependent economies, such as Zambia and Mozambique.

## Natural disasters and public health emergencies
Public health crises, as demonstrated by the experience of Derna, Libya, remain a significant concern in emerging markets. Catastrophic flooding caused by the collapse of two dams has led to significant destruction, complicating access to healthcare. The U.N. found that the resulting contaminated water caused many residents to fall ill, highlighting the need for improved water management. Countries like Afghanistan and Sudan have experienced similar challenges to those of Libya.

## Conclusion
Addressing emerging market challenges will require comprehensive strategies that consider economic stability, food security, and public health. International cooperation and targeted interventions are crucial in fostering sustainable development in emerging markets.

We then applied a simple prompt[1] that extracted the following claims from the answer:

1. Emerging markets face economic challenges.
2. Emerging markets face social challenges.
3. Emerging markets face environmental challenges.
4. Argentina experiences severe economic instability.
5. Argentina has a monthly inflation rate as high as 25.5%.
6. Argentina’s inflation has made many goods unobtainable.
7. Argentina’s currency value has plunged.
8. Some experts estimate Argentina’s annual inflation rate could reach 300%.
9. Food insecurity is a critical issue in emerging markets.
10. Nigeria is striving to become self-sufficient in wheat production.
11. Nigeria’s wheat production is hindered by climate change.
12. Nigeria’s wheat production is hindered by violence.
13. High grain prices affect Nigeria due to the suspension of the Black Sea Grain Initiative.
14. Climate change creates food insecurity in Zambia.
15. Climate change creates food insecurity in Mozambique.
16. Climate change causes economic instability in farming-dependent economies.
17. Public health crises are a concern in emerging markets.
18. Derna, Libya, experienced catastrophic flooding.
19. The collapse of two dams caused flooding in Derna, Libya.
20. Flooding in Derna, Libya, complicated access to healthcare.
21. The U.N. found contaminated water in Derna, Libya.
22. Contaminated water in Derna, Libya, caused many residents to fall ill.
23. Afghanistan has experienced challenges similar to those in Libya.
24. Sudan has experienced challenges similar to those in Libya.
25. Addressing emerging market challenges requires comprehensive strategies.
26. Comprehensive strategies should consider economic stability.
27. Comprehensive strategies should consider food security.
28. Comprehensive strategies should consider public health.
29. International cooperation is crucial for sustainable development in emerging markets.
30. Targeted interventions are crucial for sustainable development in emerging markets.

These claims appear to reflect the answer, but a closer look reveals four critical issues:

#	Issue	Example
1	Some claims are not factual statements that can be verified as true or false.	Excerpt: “Addressing emerging market challenges will require comprehensive strategies that consider economic stability, food security, and public health.” Claims: Addressing emerging market challenges requires comprehensive strategies. Comprehensive strategies should consider economic stability. Comprehensive strategies should consider food security. Comprehensive strategies should consider public health. Explanation: These claims are not verifiable because they are opinions.
2	Some claims are missing or incomplete.	Excerpt: “Argentina’s rampant inflation, with monthly rates reaching as high as 25.5%, has made many goods unobtainable and plunged the value of the currency, causing severe economic hardship. Some experts estimate that the annual inflation rate could potentially double to 300%, while others predict even higher rates.” Claims: Argentina has a monthly inflation rate as high as 25.5%. Argentina’s inflation has made many goods unobtainable. Argentina’s currency value has plunged. Some experts estimate Argentina’s annual inflation rate could reach 300%. Explanation: The phrases “causing severe economic hardship” and “others predict even higher rates” are not reflected in any of the claims. The third claim also omits the fact that inflation caused the currency depreciation.
3	Some claims are inaccurate.	Excerpt: “The U.N. found that the resulting contaminated water caused many residents to fall ill, highlighting the need for improved water management.” Claims: The U.N. found contaminated water in Derna, Libya. Contaminated water in Derna, Libya, caused many residents to fall ill. Explanation: The first claim is inaccurate because the U.N. found the link between contaminated water and illness, not the contaminated water itself. The second claim also misrepresents the sentence since it shifts the meaning from a viewpoint of a specific entity (the U.N.) to a general assertion about the effects of contaminated water in Derna, Libya.
4	Some claims cannot be understood without additional context.	Excerpt: “Countries like Afghanistan and Sudan have experienced similar challenges to those of Libya.” Claims: Afghanistan has experienced challenges similar to those in Libya. Sudan has experienced challenges similar to those in Libya. Explanation: These claims cannot be understood on their own because “those” is not defined.

Introducing Claimify

The case study highlights that claim extraction is surprisingly error-prone. Our paper demonstrates that the issues identified above are common across LLM-based claim extraction methods. To minimize these errors, we created a system called Claimify[2].

Core principles

Claimify is an LLM-based claim extraction system built on the following principles:

#	Principle	Example
1	The claims should capture all verifiable content in the source text and exclude unverifiable content.	In the sentence “The partnership between John and Jane illustrates the importance of collaboration,” the only verifiable content is the existence of a partnership between John and Jane. The rest is subjective interpretation.
2	Each claim should be entailed (i.e., fully supported) by the source text.	Consider the sentence “Governments are curtailing emissions from cars and trucks, which are the largest source of greenhouse gases from transportation.” The following claims are incorrect: Cars are the largest source of greenhouse gases from transportation. Trucks are the largest source of greenhouse gases from transportation. The sentence attributes the highest emissions to cars and trucks collectively, not individually.
3	Each claim should be understandable on its own, without additional context.	The claim “They will update the policy next year” is not understandable on its own because it’s unclear what “They,” “the policy,” and “next year” refer to.
4	Each claim should minimize the risk of excluding critical context.	Suppose the claim “The World Trade Organization has supported trade barriers” was extracted from the sentence “An exception to the World Trade Organization’s open-market philosophy is its history of supporting trade barriers when member countries have failed to comply with their obligations.” A fact-checking system would likely classify the claim as false, since there is extensive evidence that the WTO aims to reduce trade barriers. However, if the claim had specified that the WTO has supported trade barriers “when member countries have failed to comply with their obligations,” it would likely have been classified as true. This example demonstrates that missing context can distort the fact-checking verdict.
5	The system should flag cases where ambiguity cannot be resolved.	The sentence “AI has advanced renewable energy and sustainable agriculture at Company A and Company B” has two mutually exclusive interpretations: AI has advanced renewable energy and sustainable agriculture at both Company A and Company B. AI has advanced renewable energy at Company A and sustainable agriculture at Company B. If the context does not clearly indicate that one of these interpretations is correct, the system should flag the ambiguity instead of picking one interpretation arbitrarily.

Implementation

Claimify accepts a question-answer pair as input and performs claim extraction in four stages, illustrated in Figure 1:

#	Stage	Description
1	Sentence splitting and context creation	The answer is split into sentences, with “context” – a configurable combination of surrounding sentences and metadata (e.g., the header hierarchy in a Markdown-style answer) – created for each sentence.
2	Selection	An LLM identifies sentences that do not contain verifiable content. These sentences are labeled “No verifiable claims” and excluded from subsequent stages. When sentences contain verifiable and unverifiable components, the LLM rewrites the sentence, retaining only the verifiable components.
3	Disambiguation	For sentences that passed the Selection stage, an LLM detects ambiguity and determines if it can be resolved using the context. If all ambiguity is resolvable, the LLM returns a disambiguated version of the sentence. Otherwise, the sentence is labeled “Cannot be disambiguated” and excluded from the Decomposition stage.
4	Decomposition	For sentences that are unambiguous or were disambiguated, an LLM creates standalone claims that preserve critical context. If no claims are extracted, the sentence is labeled “No verifiable claims.”

A flowchart outlining Claimify’s stages for extracting claims from a question-answer pair. The process begins by splitting the answer into sentences and creating context. Next, the Selection stage asks if a sentence contains any verifiable content. If no, the sentence is labeled — Figure 1: Overview of Claimify’s stages

Results

In our paper, we demonstrate that Claimify outperforms existing LLM-based methods[3]. Specifically, we show that: (1) 99% of claims extracted by Claimify are entailed by their source sentence, (2) Claimify strikes the best balance between including verifiable content and excluding unverifiable content, and (3) Claimify is least likely to omit context critical to the fact-checking verdict.

For the above case study on challenges in emerging markets, here are Claimify’s outputs, with source sentences preceded by a letter and claims numbered[4]:

A. Several emerging markets are grappling with severe economic instability.
1. Several emerging markets are grappling with severe economic instability.

B. For instance, Argentina’s rampant inflation, with monthly rates reaching as high as 25.5%, has made many goods unobtainable and plunged the value of the currency, causing severe economic hardship.
1. Argentina has rampant inflation.
2. The monthly inflation rates in Argentina have reached as high as 25.5%.
3. Inflation has made many goods unobtainable in Argentina.
4. Inflation has plunged the value of the currency in Argentina.
5. Inflation has caused severe economic hardship in Argentina.

C. Some experts estimate that the annual inflation rate could potentially double to 300%, while others predict even higher rates.
1. Some experts estimate that Argentina’s annual inflation rate could double to 300% in the future.
2. Some experts predict that Argentina’s annual inflation rate could be higher than 300% in the future.

D. Nigeria, for example, is striving to become self-sufficient in wheat production but is hindered by climate change and violence, exacerbated by high grain prices due to the suspension of the Black Sea Grain Initiative.
1. Nigeria is striving to become self-sufficient in wheat production.
2. Nigeria is hindered by climate change in becoming self-sufficient in wheat production.
3. Nigeria is hindered by violence in becoming self-sufficient in wheat production.
4. High grain prices exacerbate the hindrance to Nigeria’s efforts to become self-sufficient in wheat production.
5. The suspension of the Black Sea Grain Initiative is a reason for high grain prices.

E. Climate change has played a pivotal role in creating food insecurity and economic instability in farming-dependent economies, such as Zambia and Mozambique.
1. Climate change has played a role in creating food insecurity in farming-dependent economies.
2. Zambia is a farming-dependent economy where climate change has played a role in creating food insecurity.
3. Mozambique is a farming-dependent economy where climate change has played a role in creating food insecurity.
4. Climate change has played a role in creating economic instability in farming-dependent economies.
5. Zambia is a farming-dependent economy where climate change has played a role in creating economic instability.
6. Mozambique is a farming-dependent economy where climate change has played a role in creating economic instability.

F. Public health crises, as demonstrated by the experience of Derna, Libya, remain a significant concern in emerging markets.
1. Public health crises are a concern in emerging markets.
2. Derna, Libya, is an example of a public health crisis in emerging markets.

G. Catastrophic flooding caused by the collapse of two dams has led to significant destruction, complicating access to healthcare.
1. There was catastrophic flooding in Derna, Libya.
2. The flooding in Derna, Libya, was caused by the collapse of two dams.
3. The flooding in Derna, Libya, has led to significant destruction.
4. The flooding in Derna, Libya, has complicated access to healthcare.

H. Countries like Afghanistan and Sudan have experienced similar challenges to those of Libya.
1. Afghanistan has experienced challenges related to public health crises.
2. Afghanistan has experienced challenges related to catastrophic flooding.
3. Afghanistan has experienced challenges related to contaminated water.
4. Sudan has experienced challenges related to public health crises.
5. Sudan has experienced challenges related to catastrophic flooding.
6. Sudan has experienced challenges related to contaminated water.

Note that the baseline prompt extracted several claims from the sentence “The U.N. found that the resulting contaminated water caused many residents to fall ill, highlighting the need for improved water management,” but it ignored the phrase “highlighting the need for improved water management.” It also failed to capture that the contaminated water resulted from flooding, as implied by “resulting” in the original sentence.

Claimify took a different approach. First, it found two instances of ambiguity – “resulting contaminated water” and “many residents” – that it determined could be resolved using the context. Here’s an excerpt from its reasoning: “…the context specifies that the contaminated water is a result of the catastrophic flooding in Derna, Libya, and the residents are those of Derna, Libya.”

However, it also found an instance of ambiguity – “highlighting the need for improved water management” – where it concluded that the context does not definitively support a single interpretation: “The sentence could be interpreted as: (1) The U.N. found that the contaminated water caused illness and also highlighted the need for improved water management, (2) The U.N. only found that the contaminated water caused illness, while the need for improved water management is an implication or conclusion drawn by the writer. Readers … would likely fail to reach consensus about the correct interpretation of this ambiguity.” As a result, Claimify labeled the sentence “Cannot be disambiguated” at the Disambiguation stage and did not proceed to the Decomposition stage.

To the best of our knowledge, Claimify is the first claim extraction system that identifies when the source text has multiple possible interpretations and extracts claims only when there is high confidence in the correct interpretation.

Next steps

We’re currently working on new methods for evaluating LLM-generated texts. We anticipate that the high-quality claims extracted by Claimify will help not only in verifying the veracity of LLM outputs, but also in assessing their overall quality – especially when gold-standard references are difficult to create (e.g., long-form texts where people may disagree on what defines “good” content). For example, we recently used Claimify to evaluate the comprehensiveness and diversity of answers generated by GraphRAG, showing that GraphRAG outperforms traditional Retrieval Augmented Generation (RAG) in these areas.

For an in-depth discussion of Claimify and our evaluation framework, please see our paper “Towards Effective Extraction and Evaluation of Factual Claims.”

[1] (opens in new tab) We used the “proposition chunking” prompt from NirDiamant’s RAG Techniques repository (opens in new tab). We generated multiple responses using GPT-4o, then picked the response that was most representative of the samples.

[2] Claimify is currently used for research purposes only and is not available commercially.

[3] (opens in new tab) We benchmarked Claimify against VeriScore (opens in new tab), DnD (opens in new tab), SAFE (opens in new tab), AFaCTA (opens in new tab), and Factcheck-GPT (opens in new tab).

[4] The outputs were generated using GPT-4o. Sentences not shown were either labeled “No verifiable claims” or “Cannot be disambiguated.”

The post Claimify: Extracting high-quality claims from language model outputs appeared first on Microsoft Research.

Metasurface: Unlocking the future of wireless sensing and communication

The image features three white icons on a gradient background transitioning from blue on the left to green on the right. The first icon, located on the left, represents a Wi-Fi signal with curved lines radiating from a central point. The middle icon depicts a satellite with solar panels and an antenna emitting waves. The third icon, on the right, shows a bar chart with ascending bars indicating signal strength.

As the demand for faster, more reliable wireless communication continues to grow, traditional systems face limitations in efficiency and adaptability. To keep up with evolving needs, researchers are investigating new ways to manipulate electromagnetic waves to improve wireless performance.

One solution involves metasurfaces—engineered materials that can control wave propagation in unprecedented ways. By dynamically shaping and directing electromagnetic waves, they can overcome the constraints of conventional wireless systems.

Building on these capabilities, we are developing metasurfaces for a wide range of wireless application scenarios. Notably, we have developed metasurfaces for enhancing low earth orbit satellite communication, optimizing acoustic sensing, realizing acoustic and mmWave imaging using commodity devices. More recently, we have designed metasurfaces to enable indoor Global Navigation Satellite System (GNSS), offer good mmWave coverage over a target environment, optimize heat distribution inside a microwave oven, and deliver directional sound to a user without a headphone.

All these works, published at top networking conferences, including MobiCom 2023 & 2024, MobiSys 2024 & 2025, and NSDI 2023, demonstrate the transformative potential of metasurfaces in advancing wireless communication and sensing technologies. This blog post explores some of these technologies in more detail.

Metasurfaces optimize GNSS for accurate indoor positioning

While GNSS is widely used for outdoor positioning and navigation, its indoor performance is often hindered by signal blockage, reflection, and attenuation caused by physical obstacles. Additional technologies like Wi-Fi and Bluetooth Low Energy (BLE) are often employed to address these issues. However, these solutions require extra infrastructure, are costly, and are complicated to deploy. Accurate positioning also typically depends on specialized hardware and software on mobile devices.

Despite these challenges, GNSS signals hold promise for accurate indoor positioning. By leveraging the vast number of available satellites, GNSS-based solutions eliminate the need for base station deployment and maintenance required by Wi-Fi and BLE systems. This approach also allows seamless integration between indoor and outdoor environments, supporting continuous positioning in scenarios like guiding smart vehicles through indoor and outdoor industrial environments.

To explore this potential, we conducted indoor measurements and found that GNSS satellite signals can penetrate windows at different angles and reflect or diffract from surfaces like floors and ceilings, resulting in uneven signals. Metasurfaces can control structured arrays of electromagnetic signals, allowing them to capture and redirect more GNSS signals. This allows signals to enter buildings in a path parallel to the ground, achieving broader coverage. Using this capability, we developed a GNSS positioning metasurface system (GPMS) based on passive metasurface technology.

One limitation of passive metasurfaces is their lack of programmability. To overcome this and enable them to effectively guide signals from different angles and scatter them in parallel, we designed a two-layer metasurface system. As shown in Figure 1, this design ensures that electromagnetic waves from different angles follow similar emission trajectories.

A diagram showing the optimization of metasurfaces for enhancing GNSS signals indoors. It includes two GNSS satellites, far-field channels, a near-field channel matrix, a passive metasurface grid, and colorful 3D waveforms. The target radiation matrix is shown with indoor users. The text reads: “Optimization problem: The radiation output of our designed metasurfaces should all be close to the target radiation for GNSS signal input at all incidence angles.” — Figure 1: The GPMS two-layer metasurface structure

To improve positioning accuracy, we developed new algorithms that allow signals to pass through metasurfaces, using them as anchor points. Traditional GPS positioning requires signals from at least four satellites to decode location information. In the GPMS system, illustrated in Figure 2, each deployed metasurface functions as a virtual satellite. By deploying at least three metasurfaces indoors, we achieved high-precision positioning through a triangulation algorithm.

The image depicts a shopping mall indoor environment with three metasurfaces labeled Metasurface 1, Metasurface 2, and Metasurface 3. Each metasurface is associated with a steering and scattering area, labeled Steering and scattering area 1, Steering and scattering area 2, and Steering and scattering area 3 respectively. GNSS satellites are shown outside the building. The image illustrates how GNSS signals interact with metasurfaces within an indoor environment. — Figure 2. Diagram of the GPMS system. Passive metasurfaces guide GNSS signals indoors, while enhanced positioning algorithms provide precise indoor positioning on mobile devices.

To evaluate the system, we deployed the GPMS with six metasurfaces on a 10×50-meter office floor and a 15×20-meter conference hall. The results show significant improvements in signal quality and availability. C/N₀, a measure of signal-to-noise ratio, increased from 9.1 dB-Hz to 32.2 dB-Hz. The number of visible satellites increased from 3.6 to 21.5. Finally, the absolute positioning error decreased from 30.6 meters to 3.2 meters in the office and from 11.2 meters to 2.7 meters in the conference hall. These findings are promising and highlight the feasibility and advantages of GNSS-based metasurfaces for indoor positioning.

Metasurfaces extend millimeter-wave coverage

Millimeter waves enable the high-speed, low-latency performance needed for 5G and 6G communication systems. While commercial products like 60 GHz Wi-Fi routers and mobile devices are becoming popular, their limited coverage and susceptibility to signal obstruction restrict their widespread application.

Traditional solutions include deploying multiple millimeter-wave access points, such as routers or base stations, or placing reflective metal panels in room corners to reflect electromagnetic waves. However, these approaches are both costly and offer limited performance. Metasurfaces offer a promising alternative for improving millimeter-wave applications. Previous research has shown that programmable metasurfaces can enhance signal coverage in blind spots and significantly improve signal quality and efficiency.

To maximize the benefits of metasurfaces, we developed the AutoMS automation service framework, shown in Figure 3. This proposed framework can optimize millimeter-wave coverage using low-cost passive metasurface design and strategic placement.

The three main components of AutoMS can address the limitations of traditional solutions:

Automated joint optimization: AutoMS determines the optimal network deployment configuration by analyzing phase settings, metasurface placement, and access point positioning. It also refines beam-forming configurations to enhance signal coverage. By iteratively identifying and optimizing the number, size, and placement of metasurfaces, AutoMS adjusts the metasurface phase settings and the access point’s configurations to achieve optimal signal coverage.

A flowchart diagram illustrating the AutoMS framework, which generates optimized passive metasurface and access point deployment plans for a specific 3D model based on environmental scanning results. The process starts with an environment scan, producing a 3D model and reflection coefficients. This information feeds into wireless channel modeling, which along with deployment configurations, is optimized by a hyper-configuration tuner. The output includes phase maps used by the surface and AP optimizer. The optimized deployment configurations are then used for metasurface fabrication and network deployment. — Figure 3. The AutoMS framework generates optimized deployment plans for passive metasurface and access points based on environment scanning results.

Fast 3D ray tracing simulator: Using hardware and software acceleration, our simulator efficiently calculates channel matrices resulting from metasurfaces with tens of thousands of elements. This simulator, capable of tracing 1.3 billion rays in just three minutes on an A100 GPU, significantly accelerates calculations for complex environments.

Low-cost passive metasurface design: We designed a high-reflectivity passive metasurface with near-2π phase control and broadband compatibility for the millimeter-wave frequency band. This metasurface is compatible with low-precision, cost-effective thermoforming processes. This process enables users to create metasurfaces at minimal cost, significantly reducing deployment expenses.
Shown in Figure 4, users can capture the environment using existing 3D scanning apps on mobile devices, generate a 3D layout model, and upload it to the cloud. AutoMS then generates metasurface settings and placement guidelines.

Users can print metasurface patterns using hot stamping and customize them without affecting functionality, as millimeter waves penetrate paint and paper.

A step-by-step process for creating low-cost passive metasurfaces. Step 1: Print patterns on paper with a laser printer. Step 2: Hot stamp aluminum foil on paper with a laminator. Step 3: Tear the aluminum foil off to get the metallic patterns. Step 4: Paste patterns on the plastic sheet and aluminum board. — Figure 4: The low-cost passive metasurface creation process

Evaluation using publicly available 3D layout datasets and real-world tests shows that AutoMS significantly improves millimeter-wave coverage across various scenarios. Compared to a single router setup, AutoMS increased signal strength by 12.1 dB. Onsite tests further confirmed gains of 11 dB in target areas and over 20 dB in blind spots, with signal throughput increasing from 77 Mbps to 373 Mbps. AutoMS adapts to diverse environments, ensuring reliable and flexible deployment in real-world applications.

Metasurfaces support uniform heating in microwave ovens

Microwave ovens often heat unevenly, creating cold spots in food. These can allow harmful bacteria and other pathogens to survive, increasing the risk of foodborne illnesses. Uneven heating can cause eggs to burst or create “hot spots” that can scald.

Uneven heating is due to the appliance’s heating mechanism. Microwave ovens generate high-power radio frequency (RF) electromagnetic waves through dielectric heating. These waves create nodes with zero amplitude, which prevents heating. They also create antinodes, where heating occurs more rapidly.

To address this issue, we developed MicroSurf, a low-cost solution that improves heating by using passive metasurfaces to control electromagnetic energy inside the microwave oven. It uses the resonance effect between the metasurface and electromagnetic waves to modify the standing-wave distribution and achieve more uniform heating. This is shown in Figure 5.

A diagram illustrating the working principle of MicroSurf in four parts. A shows an uneven electric field distribution inside a microwave oven leading to uneven heating, with images of a microwave and thermal images of food. B depicts accurate modeling of the microwave oven, including geometry refinement, dielectric factor tuning, and frequency tuning. C involves designing and optimizing a metasurface that can function in a high-power environment to change the standing wave distribution, with an image of a high-power phase-tuning metasurface. D demonstrates achieving uniform heating of different foods and selectively heating specific parts of food, with thermal images showing uniform heating results. — Figure 5: MicroSurf’s working principle: Uneven electric field distribution inside the microwave oven leads to uneven heating. B. Modeling the microwave oven. C. Designing and optimizing a metasurface that can function in a high-power environment to change the standing wave distribution. D. Achieving uniform heating of different foods and selectively heating specific parts.

Tests across four different microwave oven brands demonstrate that MicroSurf effectively optimizes heating for various liquids and solids, uniformly heating water, milk, bread, and meat. It concentrates heat on specific areas and adapts to differently shaped foods. MicroSurf offers a promising solution for even heating in microwave ovens, demonstrating the potential of metasurface technology in everyday applications. This innovation paves the way for smarter, more efficient home appliances.

Advancing wireless innovation

Wireless sensing and communication technologies are evolving rapidly, driving innovation across a wide range of applications. We are continuing to push the boundaries of these technologies—particularly in metasurface development—while working to create practical solutions for a variety of use cases.

The post Metasurface: Unlocking the future of wireless sensing and communication appeared first on Microsoft Research.

Solution Overview

Amazon Bedrock

Amazon Q in Quicksight

Architecture

Call metadata generation

Individual call Q&A

Evaluation Framework

Quantitative Metrics

Human in the Loop

Natural Language Q&A using Amazon Q in Quicksight

Integrate Amazon Q in QuickSight with the PCA solution

Create a dataset on QuickSight

Create a topic on Amazon Quicksight from the dataset created

Query using natural language

Success metrics

Takeaways

Security best practices

Conclusion

About the Authors

The benefits of using Amazon Bedrock Data Automation

Use cases for Amazon Bedrock Data Automation

Intelligent document processing

Media asset analysis and monetization

Intelligent speech analytics

Customer service – AI-driven call analytics for better customer experience

Unified, API-drove speech analytics with Amazon Bedrock Data Automation

When to choose Amazon Bedrock Data Automation instead of traditional AI/ML services

Conclusion

About the Authors

Silent as a Shadow

A Classic Reborn

Unleash the Games

Powering Up AI to Energize Operations, Drive Innovation

Innovating as One Team

Transforming NVIDIA — and the Industry

Making Ray Tracing a Reality

Accelerating AI for Virtually Any Application

Achieving Breakthroughs in Chip Design, Networking, Quantum and More

Tool use with Amazon Nova

Tool choice with Amazon Nova

Use cases

Structured output/JSON mode

API generation

Chat with search

Considerations

Conclusion

About the Authors

Solution overview

Components deep-dive

Office Add-ins

Generative AI backend infrastructure

LLM prompting

Implementation details and demo setup in an AWS account

Example use cases

Text generation

Text refinement

Conclusion

About the Authors

Transforming content creation with generative AI

Key takeaway

Pioneering new AI applications from the ground up

Key takeaway

Transforming enterprise workflows with AI

Key takeaway

Scaling AI across enterprise applications

Key takeaway

Building mission-critical AI for healthcare

Key takeaway

Conclusion

Resources

About the Author

Key issues in claim extraction

Introducing Claimify

Core principles

Implementation

Results

Next steps

What’s Your Story: Lex Story

Metasurfaces optimize GNSS for accurate indoor positioning

Metasurfaces extend millimeter-wave coverage