Amazon AWS – Page 58

Deploy a serverless web application to edit images using Amazon Bedrock

October 21, 2024

by Salman Ahmed Amazon AWS

Generative AI adoption among various industries is revolutionizing different types of applications, including image editing. Image editing is used in various sectors, such as graphic designing, marketing, and social media. Users rely on specialized tools for editing images. Building a custom solution for this task can be complex. However, by using various AWS services, you can quickly deploy a serverless solution to edit images. This approach can give your teams access to image editing foundation models (FMs) using Amazon Bedrock.

Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that’s best suited for your use case. Amazon Bedrock is serverless, so you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage infrastructure.

Amazon Titan Image Generator G1 is an AI FM available with Amazon Bedrock that allows you to generate an image from text, or upload and edit your own image. Some of the key features we focus on include inpainting and outpainting.

This post introduces a solution that simplifies the deployment of a web application for image editing using AWS serverless services. We use AWS Amplify, Amazon Cognito, Amazon API Gateway, AWS Lambda, and Amazon Bedrock with the Amazon Titan Image Generator G1 model to build an application to edit images using prompts. We cover the inner workings of the solution to help you understand the function of each service and how they are connected to give you a complete solution. At the time of writing this post, Amazon Titan Image Generator G1 comes in two versions; for this post, we use version 2.

Solution overview

The following diagram provides an overview and highlights the key components. The architecture uses Amazon Cognito for user authentication and Amplify as the hosting environment for our frontend application. A combination of API Gateway and a Lambda function is used for our backend services, and Amazon Bedrock integrates with the FM model, enabling users to edit the image using prompts.

Prerequisites

You must have the following in place to complete the solution in this post:

An AWS account
FM access in Amazon Bedrock for Amazon Titan Image Generator G1 v2 in the same AWS Region where you will deploy this solution
The accompanying AWS CloudFormation template downloaded from the aws-samples GitHub repo.

Deploy solution resources using AWS CloudFormation

When you run the AWS CloudFormation template, the following resources are deployed:

Amazon Cognito resources:
- User pool: CognitoUserPoolforImageEditApp
- App client: ImageEditApp
Lambda resources:
- Function: <Stack name>-ImageEditBackend-<auto-generated>
AWS Identity Access Management (IAM) resources:
- IAM role: <Stack name>-ImageEditBackendRole-<auto-generated>
- IAM inline policy: AmazonBedrockAccess (this policy allows Lambda to invoke Amazon Bedrock FM amazon.titan-image-generator-v2:0)
API Gateway resources:
- Rest API: ImageEditingAppBackendAPI
- Methods:
  - OPTIONS – Added header mapping for CORS
  - POST – Lambda integration
- Authorization: Through Amazon Cognito using CognitoAuthorizer

After you deploy the CloudFormation template, copy the following from the Outputs tab to be used during the deployment of Amplify:

userPoolId
userPoolClientId
invokeUrl

Deploy the Amplify application

You have to manually deploy the Amplify application using the frontend code found on GitHub. Complete the following steps:

Download the frontend code from the GitHub repo.
Unzip the downloaded file and navigate to the folder.
In the js folder, find the config.js file and replace the values of XYZ for userPoolId, userPoolClientId, and invokeUrl with the values you collected from the CloudFormation stack outputs. Set the region value based on the Region where you’re deploying the solution.

The following is an example config.js file:

window._config = {
    cognito: {
        userPoolId: 'XYZ', // e.g. us-west-2_uXboG5pAb
        userPoolClientId: 'XYZ', // e.g. 25ddkmj4v6hfsfvruhpfi7n4hv
        region: 'XYZ// e.g. us-west-2
    },
    api: {
        invokeUrl: 'XYZ' // e.g. https://rc7nyt4tql.execute-api.us-west-2.amazonaws.com/prod,
    }
};

Select all the files and compress them as shown in the following screenshot.

Make sure you zip the contents and not the top-level folder. For example, if your build output generates a folder named AWS-Amplify-Code, navigate into that folder and select all the contents, and then zip the contents.

Use the new .zip file to manually deploy the application in Amplify.

After it’s deployed, you will receive a domain that you can use in later steps to access the application.

Create a test user in the Amazon Cognito user pool.

An email address is required for this user because you will need to mark the email address as verified.

Return to the Amplify page and use the domain it automatically generated to access the application.

Use Amazon Cognito for user authentication

Amazon Cognito is an identity platform that you can use to authenticate and authorize users. We use Amazon Cognito in our solution to verify the user before they can use the image editing application.

Upon accessing the Image Editing Tool URL, you will be prompted to sign in with a previously created test user. For first-time sign-ins, users will be asked to update their password. After this process, the user’s credentials are validated against the records stored in the user pool. If the credentials match, Amazon Cognito will issue a JSON Web Token (JWT). In the API payload to be sent section of the page, you will notice that the Authorization field has been updated with the newly issued JWT.

Use Lambda for backend code and Amazon Bedrock for generative AI function

The backend code is hosted on Lambda, and launched by user requests routed through API Gateway. The Lambda function process the request payload and forwards it to Amazon Bedrock. The reply from Amazon Bedrock follows the same route as the initial request.

Use API Gateway for API management

API Gateway streamlines API management, allowing developers to deploy, maintain, monitor, secure, and scale their APIs effortlessly. In our use case, API Gateway serves as the orchestrator for the application logic and provides throttling to manage the load to the backend. Without API Gateway, you would need to use the JavaScript SDK in the frontend to interact directly with the Amazon Bedrock API, bringing more work to the frontend.

Use Amplify for frontend code

Amplify offers a development environment for building secure, scalable mobile and web applications. It allows developers to focus on their code rather than worrying about the underlying infrastructure. Amplify also integrates with many Git providers. For this solution, we manually upload our frontend code using the method outlined earlier in this post.

Image editing tool walkthrough

Navigate to the URL provided after you created the application in Amplify and sign in. At first login attempt, you’ll be asked to reset your password.

As you follow the steps for this tool, you will notice the API Payload to be Sent section on the right side updating dynamically, reflecting the details mentioned in the corresponding steps that follow.

Step 1: Create a mask on your image

To create a mask on your image, choose a file (JPEG, JPG, or PNG).

After the image is loaded, the frontend converts the file into base64 and base_image value is updated.

As you select a portion of the image you want to edit, a mask will be created, and mask value is updated with a new base64 value. You can also use the stroke size option to adjust the area you are selecting.

You now have the original image and the mask image encoded in base64. (The Amazon Titan Image Generator G1 model requires the inputs to be in base64 encoding.)

Step 2: Write a prompt and set your options

Write a prompt that describes what you want to do with the image. For this example, we enter Make the driveway clear and empty. This is reflected in the prompt on the right.

You can choose from the following image editing options: inpainting and outpainting. The value for mode is updated depending on your selection.

Use inpainting to remove masked elements and replace them with background pixels
Use outpainting to extend the pixels of the masked image to the image boundaries

Choose Send to API to send the payload to the API gateway. This action invokes the Lambda function, which validates the received payload. If the payload is validated successfully, the Lambda function proceeds to invoke the Amazon Bedrock API for further processing.

The Amazon Bedrock API generates two image outputs in base64 format, which are transmitted back to the frontend application and rendered as visual images.

Step 3: View and download the result

The following screenshot shows the results of our test. You can download the results or provide an updated prompt to get a new output.

Testing and troubleshooting

When you initiate the Send to API action, the system performs a validation check. If required information is missing or incorrect, it will display an error notification. For instance, if you attempt to send an image to the API without providing a prompt, an error message will appear on the right side of the interface, alerting you to the missing input, as shown in the following screenshot.

Clean up

If you decide to discontinue using the Image Editing Tool, you can follow these steps to remove the Image Editing Tool, its associated resources deployed using AWS CloudFormation, and the Amplify deployment:

Delete the CloudFormation stack:
1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
2. Locate the stack you created during the deployment process (you assigned a name to it).
3. Select the stack and choose Delete.
Delete the Amplify application and its resources. For instructions, refer to Clean Up Resources.

Conclusion

In this post, we explored a sample solution that you can use to deploy an image editing application by using AWS serverless services and generative AI services. We used Amazon Bedrock and an Amazon Titan FM that allows you to edit images by using prompts. By adopting this solution, you gain the advantage of using AWS managed services, so you don’t have to maintain the underlying infrastructure. Get started today by deploying this sample solution.

Additional resources

To learn more about Amazon Bedrock, see the following resources:

To learn more about the Amazon Titan Image Generator G1 model, see the following resources:

About the Authors

Salman Ahmed is a Senior Technical Account Manager in AWS Enterprise Support. He enjoys helping customers in the travel and hospitality industry to design, implement, and support cloud infrastructure. With a passion for networking services and years of experience, he helps customers adopt various AWS networking services. Outside of work, Salman enjoys photography, traveling, and watching his favorite sports teams.

Sergio Barraza is a Senior Enterprise Support Lead at AWS, helping energy customers design and optimize cloud solutions. With a passion for software development, he guides energy customers through AWS service adoption. Outside work, Sergio is a multi-instrument musician playing guitar, piano, and drums, and he also practices Wing Chun Kung Fu.

Ravi Kumar is a Senior Technical Account Manager in AWS Enterprise Support who helps customers in the travel and hospitality industry to streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience. In his free time, Ravi enjoys creative activities like painting. He also likes playing cricket and traveling to new places.

Ankush Goyal is a Enterprise Support Lead in AWS Enterprise Support who helps customers streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience.

Brilliant words, brilliant writing: Using AWS AI chips to quickly deploy Meta LLama 3-powered applications

October 21, 2024

by Zheng Zhang Amazon AWS

Many organizations are building generative AI applications powered by large language models (LLMs) to boost productivity and build differentiated experiences. These LLMs are large and complex and deploying them requires powerful computing resources and results in high inference costs. For businesses and researchers with limited resources, the high inference costs of generative AI models can be a barrier to enter the market, so more efficient and cost-effective solutions are needed. Most generative AI use cases involve human interaction, which requires AI accelerators that can deliver real time response rates with low latency. At the same time, the pace of innovation in generative AI is increasing, and it’s becoming more challenging for developers and researchers to quickly evaluate and adopt new models to keep pace with the market.

One of ways to get started with LLMs such as Llama and Mistral are by using Amazon Bedrock. However, customers who want to deploy LLMs in their own self-managed workflows for greater control and flexibility of underlying resources can use these LLMs optimized on top of AWS Inferentia2-powered Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances. In this blog post, we will introduce how to use an Amazon EC2 Inf2 instance to cost-effectively deploy multiple industry-leading LLMs on AWS Inferentia2, a purpose-built AWS AI chip, helping customers to quickly test and open up an API interface to facilitate performance benchmarking and downstream application calls at the same time.

Model introduction

There are many popular open source LLMs to choose from, and for this blog post, we will review three different use cases based on model expertise using Meta-Llama-3-8B-Instruct, Mistral-7B-instruct-v0.2, and CodeLlama-7b-instruct-hf.

Model name	Release company	Number of parameters	Release time	Model capabilities
Meta-Llama-3-8B-Instruct	Meta	8 billion	April 2024	Language understanding, translation, code generation, inference, chat
Mistral-7B-Instruct-v0.2	Mistral AI	7.3 billion	March 2024	Language understanding, translation, code generation, inference, chat
CodeLlama-7b-Instruct-hf	Meta	7 billion	August 2023	Code generation, code completion, chat

Meta-Llama-3-8B-Instruct is a popular language models, released by Meta AI in April 2024. The Llama 3 model has improved pre-training, instant comprehension, output generation, coding, inference, and math skills. The Meta AI team says that Llama 3 has the potential to be the initiator of a new wave of innovation in AI. The Llama 3 model is available in two publicly released versions, 8B and 70B. At the time of writing, Llama 3.1 instruction-tuned models are available in 8B, 70B, and 405B versions. In this blog post, we will use the Meta-Llama-3-8B-Instruct model, but the same process can be followed for Llama 3.1 models.

Mistral-7B-instruct-v0.2, released by Mistral AI in March 2024, marks a major milestone in the development of the publicly available foundation model. With its impressive performance, efficient architecture, and wide range of features, Mistral 7B v0.2 sets a new standard for user-friendly and powerful AI tools. The model excels at tasks ranging from natural language processing to coding, making it an invaluable resource for researchers, developers, and businesses. In this blog post, we will use the Mistral-7B-instruct-v0.2 model, but the same process can be followed for the Mistral-7B-instruct-v0.3 model.

CodeLlama-7b-instruct-hf is a collection of models published by Meta AI. It is an LLM that uses text prompts to generate code. Code Llama is aimed at code tasks, making developers’ workflow faster and more efficient and lowering the learning threshold for coders. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more powerful and well-documented software.

Solution architecture

The solution uses a client-server architecture, and the client uses the HuggingFace Chat UI to provide a chat page that can be accessed on a PC or mobile device. Server-side model inference uses Hugging Face’s Text Generation Inference, an efficient LLM inference framework that runs in a Docker container. We pre-compiled the model using Hugging Face’s Optimum Neuron and uploaded the compilation results to Hugging Face Hub. We have also added a model switching mechanism to the HuggingFace Chat UI to control the loading of different models in the Text Generation Inference container through a scheduler (Scheduler).

Solution highlights

All components are deployed on an Inf2 instance with a single chip instance (inf2.xl or inf2.8xl), and users can experience the effects of multiple models on one instance.
With the client-server architecture, users can flexibly replace either the client or the server side according to their actual needs. For example, the model can be deployed in Amazon SageMaker, and the frontend Chat UI can be deployed on the Node server. To facilitate the demonstration, we deployed both the front and back ends on the same Inf2 server.
Using a publicly available framework, users can customize frontend pages or models according to their own needs.
Using an API interface for Text Generation Inference facilitates quick access for users using the API.
Deployment using AWS Cloudformation, suitable for all types of businesses and developers within the enterprise.

Main components

The following are the main components of the solution.

Hugging Face Optimum Neuron

Optimum Neuron is an interface between the HuggingFace Transformers library and the AWS Neuron SDK. It provides a set of tools for model load, training, and inference for single and multiple accelerator setups of different downstream tasks. In this article, we mainly used Optimum Neuron’s export interface. To deploy the HuggingFace Transformers model on Neuron devices, the model needs to be compiled and exported to a serialized format before the inference is performed. The export interface is pre-compiled (Ahead of-time compilation (AOT)) using the Neuron compiler (Neuronx-cc), and the model is converted into a serialized and optimized TorchScript module. This is shown in the following figure.

During the compilation process, we introduced a tensor parallelism mechanism to split the weights, data, and computations between the two NeuronCores. For more compilation parameters, see Export a model to Inferentia.

Hugging Face’s Text Generation Inference (TGI)

Text Generation Inference (TGI) is a framework written in Rust and Python for deploying and serving LLMs. TGI provides high performance text generation services for the most popular publicly available foundation LLMs. Its main features are:

Simple launcher that provides inference services for many LLMs
Supports both generate and stream interfaces
Token stream using server-sent events (SSE)
Supports AWS Inferentia, Trainium, NVIDIA GPUs and other accelerators

HuggingFace Chat UI

HuggingFace Chat UI is an open-source chat tool built by SvelteKit and can be deployed to Cloudflare, Netlify, Node, and so on. It has the following main features:

Page can be customized
Conversation records can be stored, and chat records are stored in MongoDB
Supports operation on PC and mobile terminals
The backend can connect to Text Generation Inference and supports API interfaces such as Anthropic, Amazon SageMaker, and Cohere
Compatible with various publicly available foundation models (Llama series, Mistral/Mixtral series, Falcon, and so on.

Thanks to the page customization capabilities of the Hugging Chat UI, we’ve added a model switching function, so users can switch between different models on the same EC2 Inf2 instance.

Solution deployment

Before deploying the solution, make sure you have an inf2.xl or inf2.8xl usage quota in the us-east-1 (Virginia) or us-west-2 (Oregon) AWS Region. See the reference link for how to apply for a quota.
Sign in to the AWS Management Consol and switch the Region to us-east-1 (Virginia) or us-west-2 (Oregon) in the upper right corner of the console page.
Enter Cloudformation in the service search box and choose Create stack.
Select Choose an existing template, and then select Amazon S3 URL.
If you plan to use an existing virtual private cloud (VPC), use the steps in a; if you plan to create a new VPC to deploy, use the steps in b.
1. Use an existing VPC.
  1. Enter https://zz-common.s3.amazonaws.com/tmp/tgiui/20240501/launch_server_default_vpc_ubuntu22.04.yaml in the Amazon S3 URL.
  2. Stack name: Enter the stack name.
  3. InstanceType: select inf2.xl (lower cost) or inf2.8xl (better performance).
  4. KeyPairName (optional): if you want to sign in to the Inf2 instance, enter the KeyPairName name.
  5. VpcId: Select VPC.
  6. PublicSubnetId: Select a public subnet.
  7. VolumeSize: Enter the size of the EC2 instance EBS storage volume. The minimum value is 80 GB.
  8. Choose Next, then Next again. Choose Submit.
2. Create a new VPC.
  1. Enter https://zz-common.s3.amazonaws.com/tmp/tgiui/20240501/launch_server_new_vpc_ubuntu22.04.yaml in the Amazon S3 URL.
  2. Stack name: Enter the stack name.
  3. InstanceType: Select inf2.xl or inf2.8xl.
  4. KeyPairName (optional): If you want to sign in to the Inf2 instance, enter the KeyPairName name.
  5. VpcId: Leave as New.
  6. PublicSubnetId: Leave as New.
  7. VolumeSize: Enter the size of the EC2 instance EBS storage volume. The minimum value is 80 GB.
Choose Next, and then Next again. Then choose Submit.6. After creating the stack, wait for the resources to be created and started (about 15 minutes). After the stack status is displayed as CREATE_COMPLETE, choose Outputs. Choose the URL where the key is the corresponding value location for Public endpoint for the web server (close all VPN connections and firewall programs).

User interface

After the solution is deployed, users can access the preceding URL on the PC or mobile phone. On the page, the Llama3-8B model will be loaded by default. Users can switch models in the menu settings, select the model name to be activated in the model list, and choose Activate to switch models. Switching models requires reloading the new model into the Inferentia 2 accelerator memory. This process takes about 1 minute. During this process, users can check the loading status of the new model by choosing Retrieve model status. If the status is Available, it indicates that the new model has been successfully loaded.

The effects of the different models are shown in the following figure:

The following figures shows the solution in a browser on a PC:

API interface and performance testing

The solution uses a Text Generation Inference Inference Server, which supports /generate and /generate_stream interfaces and uses port 8080 by default. You can make API calls by replacing <IP> that follows with the IP address deployed previously.

The /generate interface is used to return all responses to the client at once after generating all tokens on the server side.

curl <IP>:8080/generate
    -X POST
    -d '{"inputs”: "Calculate the distance from Beijing to Shanghai"}'
    -H 'Content-Type: application/json'

/generate_stream is used to reduce waiting delays and enhance the user experience by receiving tokens one by one when the model output length is relatively large.

curl <IP>:8080/generate_stream 
    -X POST
    -d '{"inputs”: "Write an essay on the mental health of elementary school students with no more than 300 words. "}' 
    -H 'Content-Type: application/json'

Here is a sample code to use requests interface in python.

import requests
url = "http://<IP>:8080/generate"
headers = {"Content-Type": "application/json"}
data = {"inputs": "Calculate the distance from Beijing to Shanghai","parameters":{
    "max_new_tokens":200
  }
}
response = requests.post(url, headers=headers, json=data)
print(response.text)

Summary

In this blog post, we introduced methods and examples of deploying popular LLMs on AWS AI chips, so that users can quickly experience the productivity improvements provided by LLMs. The model deployed on Inf2 instance has been validated by multiple users and scenarios, showing strong performance and wide applicability. AWS is continuously expanding its application scenarios and features to provide users with efficient and economical computing capabilities. See Inf2 Inference Performance to check the types and list of models supported on the Inferentia2 chip. Contact us to give feedback on your needs or ask questions about deploying LLMs on AWS AI chips.

References

About the authors

Zheng Zhang is a technical expert for Amazon Web Services machine learning products, focus on Amazon Web Services-based accelerated computing and GPU instances. He has rich experiences on large-scale model training and inference acceleration in machine learning.

Bingyang Huang is a Go-To-Market Specialist of Accelerated Computing at GCR SSO GenAI team. She has experience on deploying the AI accelerator on customer’s production environment. Outside of work, she enjoys watching films and exploring good foods.

Tian Shi is Senior Solution Architect at Amazon Web Services. He has rich experience in cloud computing, data analysis, and machine learning and is currently dedicated to research and practice in the fields of data science, machine learning, and serverless. His translations include Machine Learning as a Service, DevOps Practices Based on Kubernetes, Practical Kubernetes Microservices, Prometheus Monitoring Practice, and CoreDNS Study Guide in the Cloud Native Era.

Chuan Xie is a Senior Solution Architect at Amazon Web Services Generative AI, responsible for the design, implementation, and optimization of generative artificial intelligence solutions based on the Amazon Cloud. River has many years of production and research experience in the communications, ecommerce, internet and other industries, and rich practical experience in data science, recommendation systems, LLM RAG, and others. He has multiple AI-related product technology invention patents.

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 2

October 21, 2024

by Maira Ladeira Tanke Amazon AWS

In Part 1 of this series, we explored best practices for creating accurate and reliable agents using Amazon Bedrock Agents. Amazon Bedrock Agents help you accelerate generative AI application development by orchestrating multistep tasks. Agents use the reasoning capability of foundation models (FMs) to create a plan that decomposes the problem into multiple steps. The model is augmented with the developer-provided instruction to create an orchestration plan and then carry out the plan. The agent can use company APIs and external knowledge through Retrieval Augmented Generation (RAG).

In this second part, we dive into the architectural considerations and development lifecycle practices that can help you build robust, scalable, and secure intelligent agents. Whether you are just starting to explore the world of conversational AI or looking to optimize your existing agent deployments, this comprehensive guide can provide valuable long-term insights and practical tips to help you achieve your goals.

Enable comprehensive logging and observability

From the outset of your agent development journey, you should implement thorough logging and observability practices. This is crucial for debugging, auditing, and troubleshooting your agents. The first step to achieve comprehensive logging is to enable Amazon Bedrock model invocation logging to capture prompts and responses securely in your account.

Amazon Bedrock Agents also provides you with traces, a detailed overview of the steps being orchestrated by the agents, the underlying prompts invoking the FM, the references being returned from the knowledge bases, and code being generated by the agent. Trace events are streamed in real time, which allows you to customize UX cues to keep the end-user informed about the progress of their request. You can log your agent’s traces and use them to track and troubleshoot your agents.

When moving agent applications to production, it’s a best practice to set up a monitoring workflow to continuously analyze your logs. You can do so by either creating a custom solution or using an open source solution such as Bedrock-ICYM.

Use infrastructure as code

Just as you would with any other software development project, you should use infrastructure as code (IaC) frameworks to facilitate iterative and reliable deployment. This lets you create repeatable and production-ready agents that can be readily reproduced, tested, and monitored. Amazon Bedrock Agents allows you to write IaC code with AWS CloudFormation, the AWS Cloud Development Kit (AWS CDK), or Terraform. We also recommend that you get started using our Agent Blueprints construct. We provide blueprint templates of the most common capabilities of Amazon Bedrock Agents, which can be deployed and updated with a single AWS CDK command.

When creating agents that use action groups, you can specify your function definitions as a JSON object to the agent or provide an API schema in the OpenAPI schema format. If you already have an OpenAPI schema for your application, the best practice is to start with it. Make sure the functions have proper natural language descriptions, because your agent will use them to understand when to use each function. If you’re starting with no existing schema, the simplest way to provide tool metadata for your agent is to use simple JSON function definitions. Either way, you can use the Amazon Bedrock console to quickly create a default AWS Lambda function to get started implementing your actions or tools.

After you start to scale the development of agents, you should consider the reusability of the agent’s components. Using IaC will allow you to have predefined guardrails using Amazon Bedrock Guardrails, knowledge bases using Amazon Bedrock Knowledge Bases, and action groups that are reused over multiple agents.

Building agents that run tasks requires function definitions and Lambda functions. Another best practice is to use generative AI to accelerate the development and maintenance of this code. You can do so directly with the invoke model functionality in Amazon Bedrock, using the Amazon Q Developer support or even by creating an AWS PartyRock application that creates a framework of your Lambda function based on your action group metadata. You can directly generate the IaC required for creating your agents with function definitions and Lambda connections using generative AI. Independently of the approach selected, creating a test pipeline that validates and runs the IaC will help you optimize your agent solutions.

Use SessionState for additional agent context

You can use SessionState to provide additional context to your agent. You can pass information that is only available to the Lambda function in the action groups using SessionAttribute and information that should be available to your prompt as SessionPromptAttribute. For example, if you want to pass a user authentication token for your action to use, it’s best placed as a SessionAttribute. If you want to pass information that the large language model (LLM) needs to reason about, such as the current date and timestamp to define relative dates, it’s best placed as a SessionPromptAttribute. This lets your agent infer things like the number of days before your next payment due date or how many hours it has been since you placed your order using the reasoning capabilities of the underlying LLM model.

Optimize model selection for cost and performance

A key part of the agent building process is to select the underlying FM for your agent (or for each sub-agent). Experiment with available FMs to select the best one for your application based on cost, latency, and accuracy requirements. Implement automated testing pipelines to collect evaluation metrics, enabling data-driven decisions on model selection. This approach allows you to use faster, cheaper models like Anthropic’s Claude 3 Haiku on Amazon Bedrock for simple agents, and more complex applications can use more advanced models like Anthropic’s Claude 3.5 Sonnet or Anthropic’s Claude 3 Opus.

Implement robust testing frameworks

Automating the evaluation of your agent, or any generative AI-powered system, can accelerate the development process and make sure you provide your customers with the best possible solution. You should evaluate on multiple dimensions, including cost, latency, and accuracy of your agents. Use frameworks like Agent Evaluation to assess agent behavior against predefined criteria. By using the Amazon Bedrock agent versioning and alias features, you can unlock A/B testing as part of your deployment stages. You should define different aspects of agent behavior, such as formal or informal HR assistant tone, that can be tested with a subset of your user group. You can then make different agent versions available for each group during initial deployments and evaluate the agent behavior for each group. Amazon Bedrock Agents has built-in versioning capabilities to help you with this key part of testing. The following figure shows how the HR agent can be updated after a testing and evaluation phase to create a new alias pointing to the selected version of the agent for the model invocation.

Use LLMs for test case generation

You can use LLMs to generate test cases based on expected use cases for your agent. As a best practice, you should select a different LLM to generate data than the one that is powering your agent. This approach can significantly accelerate the building of comprehensive test suites, providing thorough coverage of potential scenarios. For example, you could use the following prompt to create test cases for an HR assistant agent that helps employees booking holidays:

Generate the conversation back and forward between an employee and an employee 
assistant agent. The employee is trying to reserve time off. 
The agent has access to functions for checking the available employee's time off, 
booking and updating time off, and sending notifications that a new time off booking 
has been completed. Here's a sample conversation between an employee and an employee 
assistant agent for booking time off. Your conversation should have at least 3 
interactions between the agent and the employee. The employee starts by saying hello.

Design robust confirmation and security mechanisms

Implement robust confirmation mechanisms for critical actions in your agent’s workflow. Clearly state in your instructions that the agent should ask for user confirmation before running certain functions, especially those that modify data or perform sensitive operations. This step helps move beyond proof of concept or prototype stages, verifying that your agent operates reliably in production environments. For instance, the following instruction tells your agent to confirm that a vacation request action should be run before updating the database for the user:

You are an HR agent, helping employees … [other instructions removed for brevity]

Before creating, editing or deleting a time-off request, ask for user confirmation
for your actions. Include sufficient information with that ask to be clear about
the action that will be taken. DO NOT provide the function name itself but rather focus
on the actions being executed using natural language.

You can also use the requireConfirmation field for function schema definition or the
x-requireConfirmation field for API schema definition during the creation of a new action to enable the Amazon Bedrock Agents built-in functionality for user confirmation request before invoking an action in an action group.

Implement flexible authorization and encryption

You should provide customer managed keys to encrypt your agent’s resources, and confirm that your AWS Identity and Access Management (IAM) permissions follow the least privilege approach, limiting your agent to only have access to required resources and actions. When implementing action groups, take advantage of the sessionAttributes parameter of your sessionState to provide information about your user roles and permissions so that your action can implement fine-grained permissions (see the following sample code). Another best practice is to use the knowledgeBaseConfigurations parameter of the sessionState to provide extra configurations to your knowledge base, such as the user group defining the documents that a user should have access to through knowledge base metadata filtering.

Integrate responsible AI practices

When developing generative AI applications, you should apply responsible AI practices to create systems in an ethical, transparent, and accountable manner. Amazon Bedrock features help you develop your responsible AI practices in a scalable manner. When creating agents, you should implement Amazon Bedrock Guardrails to avoid sensitive topics, filter user input and agent output from harmful content, and redact sensitive information to protect user privacy. You can create organization-level guardrails that can be reused across multiple generative AI applications, thereby preserving consistent responsible AI practices. After you create a guardrail, you can associate it with your agent using the Amazon Bedrock Agents built-in guardrails connection (see the following sample code).

Build a reusable actions catalog and scale gradually

After the successful deployment of your first agent, you can plan to reuse common functionalities, such as action groups, knowledge bases, and guardrails, for other applications. Amazon Bedrock Agents support the creation of agents manually using the AWS Management Console, using code with the SDKs available for the agent API, or using IaC with CloudFormation templates, the AWS CDK, or Terraform templates. To reuse functionality, the best practice is to create and deploy them using IaC and reuse the components across applications. The following figure shows an example of the reusability of a utilities action group across two agents: an HR assistant and a banking assistant.

Follow a crawl-walk-run methodology when scaling agent usage

The final best practice that we would like to highlight is to follow the crawl-walk-run methodology. Start with an internal application (crawl), followed with applications made available for a smaller, controlled set of external users (walk), and finally scale your applications to all customers (run) and eventually use multi-agent collaboration. This approach helps you build reliable agents that support mission-critical business operations, while minimizing risks associated with the rollout of new technology. The following figure illustrates this process.

Conclusion

By following these architectural and development lifecycle best practices, you’ll be well-equipped to create robust, scalable, and secure agents that can effectively serve your users and integrate seamlessly with your existing systems.

For examples to get started, check out the Amazon Bedrock samples repository. To learn more about Amazon Bedrock Agents, get started with the Amazon Bedrock Workshop and the standalone Amazon Bedrock Agents Workshop, which provides a deeper dive. Additionally, check out the service introduction video from AWS re:Invent 2023.

About the Authors

Maira Ladeira Tanke is a Senior Generative AI Data Scientist at AWS. With a background in machine learning, she has over 10 years of experience architecting and building AI applications with customers across industries. As a technical lead, she helps customers accelerate their achievement of business value through generative AI solutions on Amazon Bedrock. In her free time, Maira enjoys traveling, playing with her cat, and spending time with her family someplace warm.

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build generative AI solutions. His focus since early 2023 has been leading solution architecture efforts for the launch of Amazon Bedrock, the flagship generative AI offering from AWS for builders. Mark’s work covers a wide range of use cases, with a primary interest in generative AI, agents, and scaling ML across the enterprise. He has helped companies in insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services. Mark holds six AWS certifications, including the ML Specialty Certification.

Navneet Sabbineni is a Software Development Manager at AWS Bedrock. With over 9 years of industry experience as a software developer and manager, he has worked on building and maintaining scalable distributed services for AWS, including generative AI services like Amazon Bedrock Agents and conversational AI services like Amazon Lex. Outside of work, he enjoys traveling and exploring the Pacific Northwest with his family and friends.

Monica Sunkara is a Senior Applied Scientist at AWS, where she works on Amazon Bedrock Agents. With over 10 years of industry experience, including 6 years at AWS, Monica has contributed to various AI and ML initiatives such as Alexa Speech Recognition, Amazon Transcribe, and Amazon Lex ASR. Her work spans speech recognition, natural language processing, and large language models. Recently, she worked on adding function calling capabilities to Amazon Titan text models. Monica holds a degree from Cornell University, where she conducted research on object localization under the supervision of Prof. Andrew Gordon Wilson before joining Amazon in 2018.

Removing selection bias from evaluation of recommendations

October 21, 2024

by Amazon AWS

Causal machine learning provides a powerful tool for estimating the effectiveness of Fulfillment by Amazon’s recommendations to selling partners.Read More

Train, optimize, and deploy models on edge devices using Amazon SageMaker and Qualcomm AI Hub

October 18, 2024

by Rodrigo Amaral Amazon AWS

This post is co-written Rodrigo Amaral, Ashwin Murthy and Meghan Stronach from Qualcomm.

In this post, we introduce an innovative solution for end-to-end model customization and deployment at the edge using Amazon SageMaker and Qualcomm AI Hub. This seamless cloud-to-edge AI development experience will enable developers to create optimized, highly performant, and custom managed machine learning solutions where you can bring you own model (BYOM) and bring your own data (BYOD) to meet varied business requirements across industries. From real-time analytics and predictive maintenance to personalized customer experiences and autonomous systems, this approach caters to diverse needs.

We demonstrate this solution by walking you through a comprehensive step-by-step guide on how to fine-tune YOLOv8, a real-time object detection model, on Amazon Web Services (AWS) using a custom dataset. The process uses a single ml.g5.2xlarge instance (providing one NVIDIA A10G Tensor Core GPU) with SageMaker for fine-tuning. After fine-tuning, we show you how to optimize the model with Qualcomm AI Hub so that it’s ready for deployment across edge devices powered by Snapdragon and Qualcomm platforms.

Business challenge

Today, many developers use AI and machine learning (ML) models to tackle a variety of business cases, from smart identification and natural language processing (NLP) to AI assistants. While open source models offer a good starting point, they often don’t meet the specific needs of the applications being developed. This is where model customization becomes essential, allowing developers to tailor models to their unique requirements and ensure optimal performance for specific use cases.

In addition, on-device AI deployment is a game-changer for developers crafting use cases that demand immediacy, privacy, and reliability. By processing data locally, edge AI minimizes latency, ensures sensitive information stays on-device, and guarantees functionality even in poor connectivity. Developers are therefore looking for an end-to-end solution where they can not only customize the model but also optimize the model to target on-device deployment. This enables them to offer responsive, secure, and robust AI applications, delivering exceptional user experiences.

How can Amazon SageMaker and Qualcomm AI Hub help?

BYOM and BYOD offer exciting opportunities for you to customize the model of your choice, use your own dataset, and deploy it on your target edge device. Through this solution, we propose using SageMaker for model fine-tuning and Qualcomm AI Hub for edge deployments, creating a comprehensive end-to-end model deployment pipeline. This opens new possibilities for model customization and deployment, enabling developers to tailor their AI solutions to specific use cases and datasets.

SageMaker is an excellent choice for model training, because it reduces the time and cost to train and tune ML models at scale without the need to manage infrastructure. You can take advantage of the highest-performing ML compute infrastructure currently available, and SageMaker can scale infrastructure from one to thousands of GPUs. Because you pay only for what you use, you can manage your training costs more effectively. SageMaker distributed training libraries can automatically split large models and training datasets across AWS GPU instances, or you can use third-party libraries, such as DeepSpeed, Horovod, Fully Sharded Data Parallel (FSDP), or Megatron. You can train foundation models (FMs) for weeks and months without disruption by automatically monitoring and repairing training clusters.

After the model is trained, you can use Qualcomm AI Hub to optimize, validate, and deploy these customized models on hosted devices with Snapdragon and Qualcomm Technologies within minutes. Qualcomm AI Hub is a developer-centric platform designed to streamline on-device AI development and deployment. AI Hub offers automatic conversion and optimization of PyTorch or ONNX models for efficient on-device deployment using TensorFlow Lite, ONNX Runtime, or Qualcomm AI Engine Direct SDK. It also has an existing library of over 100 pre-optimized models for Qualcomm and Snapdragon platforms.

Qualcomm AI Hub has served more than 800 companies and continues to expand its offerings in terms of models available, platforms supported, and more.

Using SageMaker and Qualcomm AI Hub together can create new opportunities for rapid iteration on model customization, providing access to powerful development tools and enabling a smooth workflow from cloud training to on-device deployment.

Solution architecture

The following diagram illustrates the solution architecture. Developers working in their local environment initiate the following steps:

Select an open source model and a dataset for model customization from the Hugging Face repository.
Pre-process the data into the format required by your model for training, then upload the processed data to Amazon Simple Storage Service (Amazon S3). Amazon S3 provides a highly scalable, durable, and secure object storage solution for your machine learning use case.
Call the SageMaker control plane API using the SageMaker Python SDK for model training. In response, SageMaker provisions a resilient distributed training cluster with the requested number and type of compute instances to run the model training. SageMaker also handles orchestration and monitors the infrastructure for any faults.
After the training is complete, SageMaker spins down the cluster, and you’re billed for the net training time in seconds. The final model artifact is saved to an S3 bucket.
Pull the fine-tuned model artifact from Amazon S3 to the local development environment and validate the model accuracy.
Use Qualcomm AI Hub to compile and profile the model, running it on cloud-hosted devices to deliver performance metrics ahead of downloading for deployment across edge devices.

Use case walk through

Imagine a leading electronics manufacturer aiming to enhance its quality control process for printed circuit boards (PCBs) by implementing an automated visual inspection system. Initially, using an open source vision model, the manufacturer collects and annotates a large dataset of PCB images, including both defective and non-defective samples.

This dataset, similar to the keremberke/pcb-defect-segmentation dataset from HuggingFace, contains annotations for common defect classes such as dry joints, incorrect installations, PCB damage, and short circuits. With SageMaker, the manufacturer trains a custom YOLOv8 model (You Only Look Once), developed by Ultralytics, to recognize these specific PCB defects. The model is then optimized for deployment at the edge using Qualcomm AI Hub, providing efficient performance on chosen platforms such as industrial cameras or handheld devices used in the production line.

This customized model significantly improves the quality control process by accurately detecting PCB defects in real-time. It reduces the need for manual inspections and minimizes the risk of defective PCBs progressing through the manufacturing process. This leads to improved product quality, increased efficiency, and substantial cost savings.

Let’s walk through this scenario with an implementation example.

Prerequisites

For this walkthrough, you should have the following:

Jupyter Notebook – The example has been tested in Visual Studio Code with Jupyter Notebook using the Python 3.11.7 environment.
An AWS account.
Create an AWS Identity and Access Management (IAM) user with the AmazonSageMakerFullAccess policy to enable you to run SageMaker APIs. Set up your security credentials for CLI.
Install AWS Command Line Interface (AWS CLI) and use aws configure to set up your IAM credentials securely.
Create a role with the name sagemakerrole to be assumed by SageMaker. Add managed policies AmazonS3FullAccess to give SageMaker access to your S3 buckets.
Make sure your account has the SageMaker Training resource type limit for ml.g5.2xlarge increased to 1 using the Service Quotas console.
Follow the get started instructions to install the necessary Qualcomm AI Hub library and set up your unique API token for Qualcomm AI Hub.
Use the following command to clone the GitHub repository with the assets for this use case. This repository consists of a notebook that references training assets.
```
$ git clone https://github.com/aws-samples/sm-qai-hub-examples.git
$ cd sm-qai-hub-examples/yolo
```

The sm-qai-hub-examples/yolo directory contains all the training scripts that you might need to deploy this sample.

Next, you will run the sagemaker_qai_hub_finetuning.ipynb notebook to fine-tune the YOLOv8 model on SageMaker and deploy it on the edge using AI Hub. See the notebook for more details on each step. In the following sections, we walk you through the key components of fine-tuning the model.

Step 1: Access the model and data

Begin by installing the necessary packages in your Python environment. At the top of the notebook, include the following code snippet, which uses Python’s pip package manager to install the required packages in your local runtime environment.
```
%pip install -Uq sagemaker==2.232.0 ultralytics==8.2.100 datasets==2.18.0
```
Import the necessary libraries for the project. Specifically, import the Dataset class from the Hugging Face datasets library and the YOLO class from the ultralytics library. These libraries are crucial for your work, because they provide the tools you need to access and manipulate the dataset and work with the YOLO object detection model.
```
from datasets import Dataset

from ultralytics import YOLO
```

Step 2: Pre-process and upload data to S3

To fine-tune your YOLOv8 model for detecting PCB defects, you will use the keremberke/pcb-defect-segmentation dataset from Hugging Face. This dataset includes 189 images of chip defects (train: 128 images, validation: 25 images and test: 36 images). These defects are annotated in COCO format.

YOLOv8 doesn’t recognize these classes out of the box, so you will map YOLOv8’s logits to identify these classes during model fine-tuning, as shown in the following image.

Begin by downloading the dataset from Hugging Face to the local disk and converting it to the required YOLO dataset structure using the utility function CreateYoloHFDataset. This structure ensures that the YOLO API correctly loads and processes the images and labels during the training phase.
```
dataset_name = "keremberke/pcb-defect-segmentation"
dataset_labels = [
    'dry_joint', 
    'incorrect_installation', 
    'pcb_damage', 
    'short_circuit'
]

data = CreateYoloHFDataset(
    hf_dataset_name=dataset_name, 
    labels_names=dataset_labels
)
```
Upload the dataset to Amazon S3. This step is crucial because the dataset stored in S3 will serve as the input data channel for the SageMaker training job. SageMaker will efficiently manage the process of distributing this data across the training cluster, allowing each node to access the necessary information for model training.
```
uploaded_s3_uri = sagemaker.s3.S3Uploader.upload(
    local_path=data_path, 
    desired_s3_uri=f"s3://{s3_bucket}/qualcomm-aihub...”
)
```

Alternatively, you can use your own custom dataset (non-Hugging Face) to fine-tune the YOLOv8 model, as long as the dataset complies with the YOLOv8 dataset format.

Step 3: Fine-tune your YOLOv8 model

3.1: Review the training script

You’re now prepared to fine-tune the model using the model.train method from the Ultralytics YOLO library.

We’ve prepared a script called train_yolov8.py that will perform the following tasks. Let’s quickly review the key points in this script before you launch the training job.

The training script will do the following: Load a YOLOv8 model from the Ultralytics library
```
model = YOLO(args.yolov8_model)
```
Use the train method to run fine-tuning that considers the model data, adjusts its parameters, and optimizes its ability to accurately predict object classes and locations in images.
```
tuned_model = model.train(
        data=dataset_yaml,
        batch=args.batch_size,
        imgsz=args.img_size,
        epochs=args.epochs,
 
        ...
```

After the model is trained, the script runs inference to test the model output and save the model artifacts to a local Amazon S3 mapped folder

results = model.predict(
          data=dataset_yaml, 
          imgsz=args.img_size, 
          batch=args.batch_size
        )

model.save(“<model_name>.pt")

3.2: Launch the training

You’re now ready to launch the training. You will use the SageMaker PyTorch training estimator to initiate training. The estimator simplifies the training process by automating several of the key tasks in this example:

The SageMaker estimator spins up a training cluster of one 2xlarge instance. SageMaker handles the setup and management of these compute instances, which reduces the total cost of ownership.
The estimator also uses one of the pre-built containers managed by SageMaker—PyTorch, which includes an optimized compiled version of the PyTorch framework along with its required dependencies and GPU-specific libraries for accelerated computations.

The estimator.fit() method initiates the training process with the specified input data channels. Following is the code used to launch the training job along with the necessary parameters.

estimator = PyTorch(
    entry_point='train_yolov8.py',
    source_dir='scripts',
    role=role,
    instance_count=instance_count,
    instance_type=instance_type,
    image_uri=training_image_uri,
    hyperparameters=hyperparameters,
    base_job_name="yolov8-finetuning",
    output_path=f"s3://{s3_bucket}/…"
)

estimator.fit(
    {
        'training': sagemaker.inputs.TrainingInput(
            s3_data=uploaded_s3_uri,
            distribution='FullyReplicated',
            s3_data_type='S3Prefix'
        )
    }
)

You can track a SageMaker training job by monitoring its status using the AWS Management Console, AWS CLI, or AWS SDKs. To determine when the job is completed, check for the Completed status or set up Amazon CloudWatch alarms to notify you when the job transitions to the Completed state.

Step 4 & 5: Save, download and validate the trained model

The training process generates model artifacts that will be saved to the S3 bucket specified in output_path location. This example uses the download_tar_and_untar utility to download the model to a local drive.

Run inference on this model and visually validate how close ground truth and model predictions bounding boxes align on test images. The following code shows how to generate an image mosaic using a custom utility function—draw_bounding_boxes—that overlays an image with ground truth and model classification along with a confidence value for class prediction.

image_mosiacs = []
for i, _key in enumerate(image_label_pairs):
    img_path, lbl_path = image_label_pairs[_key]["image_path"], image_label_pairs[_key]["label_path"]
    result = model([img_path], save=False)
    image_with_boxes = draw_bounding_boxes(
        yolo_result=result[0], 
        ground_truth=open(lbl_path).read().splitlines(),
        confidence_threshold=0.2
    )
    image_mosiacs.append(np.array(image_with_boxes))

From the preceding image mosaic, you can observe two distinct sets of bounding boxes: the cyan boxes indicate human annotations of defects on the PCB image, while the red boxes represent the model’s predictions of defects. Along with the predicted class, you can also see the confidence value for each prediction, which reflects the quality of the YOLOv8 model’s output.

After fine-tuning, YOLOv8 begins to accurately predict the PCB defect classes present in the custom dataset, even though it hadn’t encountered these classes during model pretraining. Additionally, the predicted bounding boxes are closely aligned with the ground truth, with confidence scores of greater than or equal to 0.5 in most cases. You can further improve the model’s performance without the need for hyperparameter guesswork by using a SageMaker hyperparameter tuning job.

Step 6: Run the model on a real device with Qualcomm AI Hub

Now that you’re validated the fine-tuned model on PyTorch, you want to run the model on a real device.

Qualcomm AI Hub enables you to do the following:

Compile and optimize the PyTorch model into a format that can be run on a device
Run the compiled model on a device with a Snapdragon processor hosted in AWS device farm
Verify on-device model accuracy
Measure on-device model latency

To run the model:

Compile the model.

The first step is converting the PyTorch model into a format that can run on the device.

This example uses a Windows laptop powered by the Snapdragon X Elite processor. This device uses the ONNX model format, which you will configure during compilation.

As you get started, you can see a list of all the devices supported on Qualcomm AI Hub, by running qai-hub list-devices.

See Compiling Models to learn more about compilation on Qualcomm AI Hub.

compile_job = hub.submit_compile_job(
    model=traced_model,
    input_specs={"image": (model_input.shape, "float32")},
    device=target_device,
    name=model_name,
    options="--target_runtime onnx"
)

Inference the model on a real device

Run the compiled model on a real cloud-hosted device with Snapdragon using the same model input you verified locally with PyTorch.

See Running Inference to learn more about on-device inference on Qualcomm AI Hub.

inference_job = hub.submit_inference_job(
    model=compile_job.get_target_model(),
    inputs={"image": [model_input.numpy()]},
    device=target_device,
    name=model_name,
)

Profile the model on a real device.

Profiling measures the latency of the model when run on a device. It reports the minimum value over 100 invocations of the model to best isolate model inference time from other processes on the device.

See Profiling Models to learn more about profiling on Qualcomm AI Hub.

profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=target_device,
    name=model_name,
)

Deploy the compiled model to your device

Run the command below to download the compiled model.

The compiled model can be used in conjunction with the AI Hub sample application hosted here. This application uses the model to run object detection on a Windows laptop powered by Snapdragon that you have locally.

compile_job.download_target_model()

Conclusion

Model customization with your own data through Amazon SageMaker—with over 250 models available on SageMaker JumpStart—is an addition to the existing features of Qualcomm AI Hub, which include BYOM and access to a growing library of over 100 pre-optimized models. Together, these features create a rich environment for developers aiming to build and deploy customized on-device AI models across Snapdragon and Qualcomm platforms.

The collaboration between Amazon SageMaker and Qualcomm AI Hub will help enhance the user experience and streamline machine learning workflows, enabling more efficient model development and deployment across any application at the edge. With this effort, Qualcomm Technologies and AWS are empowering their users to create more personalized, context-aware, and privacy-focused AI experiences.

To learn more, visit Qualcomm AI Hub and Amazon SageMaker. For queries and updates, join the Qualcomm AI Hub community on Slack.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. or its subsidiaries

About the authors

Rodrigo Amaral currently serves as the Lead for Qualcomm AI Hub Marketing at Qualcomm Technologies, Inc. In this role, he spearheads go-to-market strategies, product marketing, developer activities, with a focus on AI and ML with a focus on edge devices. He brings almost a decade of experience in AI, complemented by a strong background in business. Rodrigo holds a BA in Business and a Master’s degree in International Management.

Ashwin Murthy is a Machine Learning Engineer working on Qualcomm AI Hub. He works on adding new models to the public AI Hub Models collection, with a special focus on quantized models. He previously worked on machine learning at Meta and Groq.

Meghan Stronach is a PM on Qualcomm AI Hub. She works to support our external community and customers, delivering new features across Qualcomm AI Hub and enabling adoption of ML on device. Born and raised in the Toronto area, she graduated from the University of Waterloo in Management Engineering and has spent her time at companies of various sizes.

Kanwaljit Khurmi is a Principal Generative AI/ML Solutions Architect at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance, helping them improve the value of their solutions when using AWS. Kanwaljit specializes in helping customers with containerized and machine learning applications.

Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.

Karan Jain is a Senior Machine Learning Specialist at AWS, where he leads the worldwide Go-To-Market strategy for Amazon SageMaker Inference. He helps customers accelerate their generative AI and ML journey on AWS by providing guidance on deployment, cost-optimization, and GTM strategy. He has led product, marketing, and business development efforts across industries for over 10 years, and is passionate about mapping complex service features to customer solutions.

Enhancing repository-level code completion with selective retrieval

October 17, 2024

by Amazon AWS

Self-supervised method for learning when to retrieve contextual information from a code repository speeds up code completion times by 70% while increasing accuracy.Read More

Using Amazon Q Business with AWS HealthScribe to gain insights from patient consultations

October 17, 2024

by Laura Salinas Amazon AWS

With the advent of generative AI and machine learning, new opportunities for enhancement became available for different industries and processes. During re:Invent 2023, we launched AWS HealthScribe, a HIPAA eligible service that empowers healthcare software vendors to build their clinical applications to use speech recognition and generative AI to automatically create preliminary clinician documentation. In addition to AWS HealthScribe, we also launched Amazon Q Business, a generative AI-powered assistant that can perform functions such as answer questions, provide summaries, generate content, and securely complete tasks based on data and information that are in your enterprise systems.

AWS HealthScribe combines speech recognition and generative AI trained specifically for healthcare documentation to accelerate clinical documentation and enhance the consultation experience.

Key features of AWS HealthScribe include:

Rich consultation transcripts with word-level timestamps.
Speaker role identification (clinician or patient).
Transcript segmentation into relevant sections such as subjective, objective, assessment, and plan.
Summarized clinical notes for sections such as chief complaint, history of present illness, assessment, and plan.
Evidence mapping that references the original transcript for each sentence in the AI-generated notes.
Extraction of structured medical terms for entries such as conditions, medications, and treatments.

AWS HealthScribe provides a suite of AI-powered features to streamline clinical documentation while maintaining security and privacy. It doesn’t retain audio or output text, and users have control over data storage with encryption in transit and at rest.

With Amazon Q Business, we provide a new generative AI-powered assistant designed specifically for business and workplace use cases. It can be customized and integrated with an organization’s data, systems, and repositories. Amazon Q allows users to have conversations, help solve problems, generate content, gain insights, and take actions through its AI capabilities. Amazon Q offers user-based pricing plans tailored to how the product is used. It can adapt interactions based on individual user identities, roles, and permissions within the organization. Importantly, AWS never uses customer content from Amazon Q to train its underlying AI models, making sure that company information remains private and secure.

In this blog post, we’ll show you how AWS HealthScribe and Amazon Q Business together analyze patient consultations to provide summaries and trends from clinician conversations, simplifying documentation workflows. This automation and use of machine learning from clinician-patient interactions with Amazon HealthScribe and Amazon Q can help improve patient outcomes by enhancing communication, leading to more personalized care for patients and increased efficiency for clinicians.

Benefits and use cases

Gaining insight from patient-clinician interactions alongside a chatbot can help in a variety of ways such as:

Enhanced communication: In analyzing consultations, clinicians using AWS HealthScribe can more readily identify patterns and trends in large patient datasets, which can help improve communication between clinicians and patients. An example would be a clinician understanding common trends in their patient’s symptoms that they can then consider for new consultations.
Personalized care: Using machine learning, clinicians can tailor their care to individual patients by analyzing the specific needs and concerns of each patient. This can lead to more personalized and effective care.
Streamlined workflows: Clinicians can use machine learning to help streamline their workflows by automating tasks such as appointment scheduling and consultation summarization. This can give clinicians more time to focus on providing high-quality care to their patients. An example would be using clinician summaries together with agentic workflows to perform these tasks on a routine basis.

Architecture diagram

In the architecture diagram we present for this demo, two user workflows are shown. To kickoff the process, a clinician uploads the recording of a consultation to Amazon Simple Storage Service (Amazon S3). This audio file is then ingested by AWS HealthScribe and used to analyze consultation conversations. AWS HealthScribe will then output two files which are also stored on Amazon S3. In the second workflow, an authenticated user logs in via AWS IAM Identity Center to an Amazon Q web front end hosted by Amazon Q Business. In this scenario, Amazon Q Business is given the output Amazon S3 bucket as the data source for use in its web app.

Prerequisites

AWS IAM Identity Center will be used as the SAML 2.0-compliant identity provider (IdP). You’ll need to enable an IAM Identity Center instance. Under this instance, be sure to provision a user with a valid email address because this will be the user you will use to sign in to Amazon Q Business. For more details, see Configure user access with the default IAM Identity Center directory.
Amazon Simple Storage Service (Amazon S3) buckets that will be the input and output buckets for the clinician-patient conversations and AWS HealthScribe.

Implementation

To start using AWS HealthScribe you must first start a transcription job that takes a source audio file and outputs summary and transcription JSON files with the analyzed conversation. You’ll then connect these output files to Amazon Q.

Creating the AWS HealthScribe job

In the AWS HealthScribe console, choose Transcription jobs in the navigation pane, and then choose Create job to get started.
Enter a name for the job—in this example, we use FatigueConsult—and select the S3 bucket where the audio file of the clinician-patient conversation is stored.
Next, use the S3 URI search field to find and point the transcription job to the Amazon S3 bucket you want the output files to be saved to. Maintain the default options for audio settings, customization, and content removal.
Create a new AWS Identity and Access Management (IAM) role for AWS HealthScribe to use for access to the S3 input and output buckets by choosing Create an IAM role. In our example, we entered HealthScribeRole as the Role name. To complete the job creation, choose Create job.
This will take a few minutes to finish. When it’s complete, you will see the status change from In Progress to Complete and can inspect the results by selecting the job name.
AWS HealthScribe will create two files: a word-for-word transcript of the conversation with the suffix /transcript.json and a summary of the conversation with the suffix /summary.json. This summary uses the underlying power of generative AI to highlight key topics in the conversation, extract medical terminology, and more.

In this workflow, AWS HealthScribe analyzes the patient-clinician conversation audio to:

Transcribe the consultation
Identify speaker roles (for example, clinician and patient)
Segment the transcript (for example, small talk, visit flow management, assessment, and treatment plan)
Extract medical terms (for example, medication name and medical condition name)
Summarize notes for key sections of the clinical document (for example, history of present illness and treatment plan)
Create evidence mapping (linking every sentence in the AI-generated note with corresponding transcript dialogues).

Connecting an AWS HealthScribe job to Amazon Q

To use Amazon Q with the summarized notes and transcripts from AWS HealthScribe, we need to first create an Amazon Q business application and set the data source as the S3 bucket where the output files were stored in the HealthScribe jobs workflow. This will allow Amazon Q to index the files and give users the ability to ask questions of the data.

In the Amazon Q Business console, choose Get Started, then choose Create Application.
Enter a name for your application and select Create and use a new service-linked role (SLR).
Choose Create when you’re ready to select a data source.
In the Add data source pane select Amazon S3.
To configure the S3 bucket with Amazon Q, enter a name for the data source. In our example we use my-s3-bucket.
Next, locate the S3 bucket with the JSON outputs from HealthScribe using the Browse S3 button. Select Full sync for the sync mode and select a cadence of your preference. Once you complete these steps, Amazon Q Business will run a full sync of the objects in your S3 bucket and be ready for use.
In the main applications dashboard, navigate to the URL under Web experience URL. This is how you will access the Amazon Q web front end to interact with the assistant.

After a user signs in to the web experience, they can start asking questions directly in the chat box as shown in the sample frontend that follows.

Sample frontend workflow

With the AWS HealthScribe results integrated into Amazon Q Business, users can go to the web experience to gain insights from their patient conversations. For example, you can use Q to determine information such as trends in patient symptoms, checking which medications patients are taking and so on as shown in the following figures.

The workflow starts with a question and answer about issues patients had, as shown in the following figure. In the example above, a clinician is asking what the symptoms were of patients who complained of stomach pain. Q responds with common symptoms, like bloating and bowel problems, from the data it has access to. The answers generated cite the source files from Amazon S3 that led to its summary and can be inspected by choosing Sources.

In the following example, a clinician asks what medications patients with knee pain are taking. Using our sample data of various consultations for knee pain, Q tells us patients are taking over the counter ibuprofen, but that it is not often providing patients relief.

This application can also help clinicians understand common trends in their patient data, such as asking what the common symptoms are for patients with chest pain.

In the final example for this post, a clinician asks Q if there are common symptoms for patients complaining of knee and elbow pain. Q responds that both sets of patients describe their pain being exacerbated by movement, but that it cannot conclusively point to any common symptoms across both consultation types. In this case Amazon Q is correctly using source data to prevent a hallucination from occurring.

Considerations

The UI for Amazon Q has limited customization. At the time of writing this post, the Amazon Q frontend cannot be embedded in other tools. Supported customization of the web experience includes the addition of a title and subtitle, adding a welcome message, and displaying sample prompts. For updates on web experience customizations, see Customizing an Amazon Q Business web experience. If this kind of customization is critical to your application and business needs, you can explore custom large language model chatbot designs using Amazon Bedrock or Amazon SageMaker.

AWS HealthScribe uses conversational and generative AI to transcribe patient-clinician conversations and generate clinical notes. The results produced by AWS HealthScribe are probabilistic and might not always be accurate because of various factors, including audio quality, background noise, speaker clarity, the complexity of medical terminology, and context-specific language nuances. AWS HealthScribe is designed to be used in an assistive role for clinicians and medical scribes rather than as a substitute for their clinical expertise. As such, AWS HealthScribe output should not be employed to fully automate clinical documentation workflows, but rather to provide additional assistance to clinicians or medical scribes in their documentation process. Please ensure that your application provides the workflow for reviewing the clinical notes produced by AWS HealthScribe and establishes expectation of the need for human review before finalizing clinical notes.

Amazon Q Business uses machine learning models that generate predictions based on patterns in data, and generate insights and recommendations from your content. Outputs are probabilistic and should be evaluated for accuracy as appropriate for your use case, including by employing human review of the output. You and your users are responsible for all decisions made, advice given, actions taken, and failures to take action based on your use of these features.

This proof-of-concept can be extrapolated to create a patient-facing application as well, with the notion that a patient can review their own conversations with physicians and be given access to their medical records and consultation notes in a way that makes it easy for them to ask questions of the trends and data for their own medical history.

AWS HealthScribe is only available for English-US language at this time in the US East (N. Virginia) Region. Amazon Q Business is only available in US East (N. Virginia) and US West (Oregon).

Clean up

To ensure that you don’t continue to accrue charges from this solution, you must complete the following clean-up steps.

AWS HealthScribe

Navigate to the AWS HealthScribe the console and choose Transcription jobs. Select whichever HealthScribe jobs you want to clean up and choose Delete at the top right corner of the console page.

Amazon S3

To clean up your Amazon S3 resources, navigate to the Amazon S3 console and choose the buckets that you used or created while going through this post. To empty the buckets, follow the instructions for Emptying a bucket. After you empty the bucket, you delete the entire bucket.

Amazon Q Business

To delete your Amazon Q Business application, follow the instructions on Managing Amazon Q Business applications.

Conclusion

In this post, we discussed how you can use AWS HealthScribe with Amazon Q Business to create a chatbot to quickly gain insights into patient clinician conversations. To learn more, reach out to your AWS account team or check out the links that follow.

About the Authors

Laura Salinas is a Startup Solution Architect supporting customers whose core business involves machine learning. She is passionate about guiding her customers on their cloud journey and finding solutions that help them innovate. Outside of work she loves boxing, watching the latest movie at the theater and playing competitive dodgeball.

Tiffany Chen is a Solutions Architect on the CSC team at AWS. She has supported AWS customers with their deployment workloads and currently works with Enterprise customers to build well-architected and cost-optimized solutions. In her spare time, she enjoys traveling, gardening, baking, and watching basketball.

Art Tuazon is a Partner Solutions Architect focused on enabling AWS Partners through technical best practices and is passionate about helping customers build on AWS. In her free time, she enjoys running and cooking.

Winnie Chen is a Solutions Architect currently on the CSC team at AWS supporting greenfield customers. She supports customers of all industries as well as sizes such as enterprise and small to medium businesses. She has helped customers migrate and build their infrastructure on AWS. In her free time, she enjoys traveling and spending time outdoors through activities like hiking, biking and rock climbing.

Use Amazon SageMaker Studio with a custom file system in Amazon EFS

October 17, 2024

by Irene Arroyo Delgado Amazon AWS

Amazon SageMaker Studio is the latest web-based experience for running end-to-end machine learning (ML) workflows. SageMaker Studio offers a suite of integrated development environments (IDEs), which includes JupyterLab, Code Editor, as well as RStudio. Data scientists and ML engineers can spin up SageMaker Studio private and shared spaces, which are used to manage the storage and resource needs of the JupyterLab and Code Editor applications, enable stopping the applications when not in use to save on compute costs, and resume the work from where they stopped.

The storage resources for SageMaker Studio spaces are Amazon Elastic Block Store (Amazon EBS) volumes, which offer low-latency access to user data like notebooks, sample data, or Python/Conda virtual environments. However, there are several scenarios where using a distributed file system shared across private JupyterLab and Code Editor spaces is convenient, which is enabled by configuring an Amazon Elastic File System (Amazon EFS) file system in SageMaker Studio. Amazon EFS provides a scalable fully managed elastic NFS file system for AWS compute instances.

Amazon SageMaker supports automatically mounting a folder in an EFS volume for each user in a domain. Using this folder, users can share data between their own private spaces. However, users can’t share data with other users in the domain; they only have access to their own folder user-default-efs in the $HOME directory of the SageMaker Studio application.

In this post, we explore three distinct scenarios that demonstrate the versatility of integrating custom Amazon EFS with SageMaker Studio.

For further information on configuring Amazon EFS in SageMaker Studio, refer to Attaching a custom file system to a domain or user profile.

Solution overview

In the first scenario, an AWS infrastructure admin wants to set up an EFS file system that can be shared across the private spaces of a given user profile in SageMaker Studio. This means that each user within the domain will have their own private space on the EFS file system, allowing them to store and access their own data and files. The automation described in this post will enable new team members joining the data science team can quickly set up their private space on the EFS file system and access the necessary resources to start contributing to the ongoing project.

The following diagram illustrates this architecture.

This scenario offers the following benefits:

Individual data storage and analysis – Users can store their personal datasets, models, and other files in their private spaces, allowing them to work on their own projects independently. Segregation is made by their user profile.
Centralized data management – The administrator can manage the EFS file system centrally, maintaining data security, backup, and direct access for all users. By setting up an EFS file system with a private space, users can effortlessly track and maintain their work.
Cross-instance file sharing – Users can access their files from multiple SageMaker Studio spaces, because the EFS file system provides a persistent storage solution.

The second scenario is related to the creation of a single EFS directory that is shared across all the spaces of a given SageMaker Studio domain. This means that all users within the domain can access and use the same shared directory on the EFS file system, allowing for better collaboration and centralized data management (for example, to share common artifacts). This is a more generic use case, because there is no specific segregated folder for each user profile.

The following diagram illustrates this architecture.

This scenario offers the following benefits:

Shared project directories – Suppose the data science team is working on a large-scale project that requires collaboration among multiple team members. By setting up a shared EFS directory at project level, the team can collaborate on the same projects by accessing and working on files in the shared directory. The data science team can, for example, use the shared EFS directory to store their Jupyter notebooks, analysis scripts, and other project-related files.
Simplified file management – Users don’t need to manage their own private file storage, because they can rely on the shared directory for their file-related needs.
Improved data governance and security – The shared EFS directory, being centrally managed by the AWS infrastructure admin, can provide improved data governance and security. The admin can implement access controls and other data management policies to maintain the integrity and security of the shared resources.

The third scenario explores the configuration of an EFS file system that can be shared across multiple SageMaker Studio domains within the same VPC. This allows users from different domains to access and work with the same set of files and data, enabling cross-domain collaboration and centralized data management.

The following diagram illustrates this architecture.

This scenario offers the following benefits:

Enterprise-level data science collaboration – Imagine a large organization with multiple data science teams working on various projects across different departments or business units. By setting up a shared EFS file system accessible across the organization’s SageMaker Studio domains, these teams can collaborate on cross-functional projects, share artifacts, and use a centralized data repository for their work.
Shared infrastructure and resources – The EFS file system can be used as a shared resource across multiple SageMaker Studio domains, promoting efficiency and cost-effectiveness.
Scalable data storage – As the number of users or domains increases, the EFS file system automatically scales to accommodate the growing storage and access requirements.
Data governance – The shared EFS file system, being managed centrally, can be subject to stricter data governance policies, access controls, and compliance requirements. This can help the organization meet regulatory and security standards while still enabling cross-domain collaboration and data sharing.

Prerequisites

This post provides an AWS CloudFormation template to deploy the main resources for the solution. In addition to this, the solution expects that the AWS account in which the template is deployed already has the following configuration and resources:

You should have a SageMaker Studio domain. Refer to Quick setup to Amazon SageMaker for instructions to set up a domain with default settings.
You should have an AWS CloudTrail log file that logs the SageMaker API CreateUserProfile. Refer to Creating a trail for your AWS account for additional information.
The CloudFormation resources are deployed in a virtual private cloud (VPC). Make sure the selected VPC allows outbound traffic through a NAT gateway and has proper routing Amazon Simple Storage Service (Amazon S3) endpoint access, which will be required for AWS CloudFormation. Refer to How do I troubleshoot custom resource failures in CloudFormation? for additional information.
The CloudFormation template deploys an AWS Lambda function in a VPC. If the access to AWS services in the selected VPC is restricted using AWS PrivateLink, make sure the Lambda security group can connect to the interface VPC endpoints for SageMaker (API), Amazon EFS, and Amazon Elastic Compute Cloud (Amazon EC2). Refer to Connecting inbound interface VPC endpoints for Lambda for additional information.
You should have the necessary AWS Identity and Access Management permissions to deploy the CloudFormation template in your account.

Refer to Attaching a custom file system to a domain or user profile for additional prerequisites.

Configure an EFS directory shared across private spaces of a given user profile

In this scenario, an administrator wants to provision an EFS file system for all users of a SageMaker Studio domain, creating a private file system directory for each user. We can distinguish two use cases:

Create new SageMaker Studio user profiles – A new team member joins a preexisting SageMaker Studio domain and wants to attach a custom EFS file system to the JupyterLab or Code Editor spaces
Use preexisting SageMaker Studio user profiles – A team member is already working on a specific SageMaker Studio domain and wants to attach a custom EFS file system to the JupyterLab or Code Editor spaces

The solution provided in this post focuses on the first use case. We discuss how to adapt the solution for preexisting SageMaker Studio domain user profiles later in this post.

The following diagram illustrates the high-level architecture of the solution.

In this solution, we use CloudTrail, Amazon EventBridge, and Lambda to automatically create a private EFS directory when a new SageMaker Studio user profile is created. The high-level steps to set up this architecture are as follows:

Create an EventBridge rule that invokes the Lambda function when a new SageMaker user profile is created and logged in CloudTrail.
Create an EFS file system with an access point for the Lambda function and with a mount target in every Availability Zone that the SageMaker Studio domain is located.
Use a Lambda function to create a private EFS directory with the required POSIX permissions for the profile. The function will also update the profile with the new file system configuration.

Deploy the solution using AWS CloudFormation

To use the solution, you can deploy the infrastructure using the following CloudFormation template. This template deploys three main resources in your account: Amazon EFS resources (file system, access points, mount targets), an EventBridge rule, and a Lambda function.

Refer to Create a stack from the CloudFormation console for additional information. The input parameters for this template are:

SageMakerDomainId – The SageMaker Studio domain ID that will be associated with the EFS file system.
SageMakerStudioVpc – The VPC associated to the SageMaker Studio domain.
SageMakerStudioSubnetId – One or multiple subnets associated to the SageMaker Studio domain. The template deploys its resources in these subnets.
SageMakerStudioSecurityGroupId – The security group associated to the SageMaker Studio domain. The template configures the Lambda function with this security group.

Amazon EFS resources

After you deploy the template, navigate to the Amazon EFS console and confirm that the EFS file system has been created. The file system has a mount target in every Availability Zone that your SageMaker domain connects to.

Note that each mount target uses the EC2 security group that SageMaker created in your AWS account when you first created the domain, which allows NFS traffic at port 2049. The provided template automatically retrieves this security group when it is first deployed, using a Lambda backed custom resource.

You can also observe that the file system has an EFS access point. This access point grants root access on the file system for the Lambda function that will create the directories for the SageMaker Studio user profiles.

EventBridge rule

The second main resource is an EventBridge rule invoked when a new SageMaker Studio user profile is created. Its target is the Lambda function that creates the folder in the EFS file system and updates the profile that has been just created. The input of the Lambda function is the event matched, where you can get the SageMaker Studio domain ID and the SageMaker user profile name.

Lambda function

Lastly, the template creates a Lambda function that creates a directory in the EFS file system with the required POSIX permissions for the user profile and updates the user profile with the new file system configuration.

At a POSIX permissions level, you can control which users can access the file system and which files or data they can access. The POSIX user and group ID for SageMaker apps are:

UID – The POSIX user ID. The default is 200001. A valid range is a minimum value of 10000 and maximum value of 4000000.
GID – The POSIX group ID. The default is 1001. A valid range is a minimum value of 1001 and maximum value of 4000000.

The Lambda function is in the same VPC as the EFS file system and it has attached the file system and access point previously created.

Adapt the solution for preexisting SageMaker Studio domain user profiles

We can reuse the previous solution for scenarios in which the domain already has user profiles created. For that, you can create an additional Lambda function in Python that lists all the user profiles for the given SageMaker Studio domain and creates a dedicated EFS directory for each user profile.

The Lambda function should be in the same VPC as the EFS file system and it has attached the file system and access point previously created. You need to add the efs_id and domain_id values as environment variables for the function.

You can include the following code as part of this new Lambda function and run it manually:

import json
import subprocess
import boto3
import os

sm_client = boto3.client('sagemaker')

def lambda_handler(event, context):
    
    # Get EFS and Domain ID
    file_system=os.environ['efs_id']
    domain_id=os.environ['domain_id']    
    
    
    # Get Domain user profiles
    list_user_profiles_response = sm_client.list_user_profiles(
        DomainIdEquals=domain_id
    )
    domain_users = list_user_profiles_response["UserProfiles"]
    
    # Create directories for each user
    for user in domain_users:

        user_profile_name = user["UserProfileName"]

        # Permissions
        repository=f'/mnt/efs/{user_profile_name}'
        subprocess.call(['mkdir', repository])
        subprocess.call(['chown', '200001:1001', repository])
        
        # Update SageMaker user
        response = sm_client.update_user_profile(
            DomainId=domain_id,
            UserProfileName=user_profile_name,
            UserSettings={
                'CustomFileSystemConfigs': [
                    {
                        'EFSFileSystemConfig': {
                            'FileSystemId': file_system,
                            'FileSystemPath': f'/{user_profile_name}'
                        }
                    }
                ]
            }
        )

Configure an EFS directory shared across all spaces of a given domain

In this scenario, an administrator wants to provision an EFS file system for all users of a SageMaker Studio domain, using the same file system directory for all the users.

To achieve this, in addition to the prerequisites described earlier in this post, you need to complete the following steps.

Create the EFS file system

The file system needs to be in the same VPC as the SageMaker Studio domain. Refer to Creating EFS file systems for additional information.

Add mount targets to the EFS file system

Before SageMaker Studio can access the new EFS file system, the file system must have a mount target in each of the subnets associated with the domain. For more information about assigning mount targets to subnets, see Managing mount targets. You can get the subnets associated to the domain on the SageMaker Studio console under Network. You need to create a mount target for each subnet.

Additionally, for each mount target, you must add the security group that SageMaker created in your AWS account when you created the SageMaker Studio domain. The security group name has the format security-group-for-inbound-nfs-domain-id.

The following screenshot shows an example of an EFS file system with two mount targets for a SageMaker Studio domain associated to two subnets. Note the security group associated to both mount targets.

Create an EFS access point

The Lambda function accesses the EFS file system as root using this access point. See Creating access points for additional information.

Create a new Lambda function

Define a new Lambda function with the name LambdaManageEFSUsers. This function updates the default space settings of the SageMaker Studio domain, configuring the file system settings to use a specific EFS file system shared repository path. This configuration is automatically applied to all spaces within the domain.

The Lambda function is in the same VPC as the EFS file system and it has attached the file system and access point previously created. Additionally, you need to add efs_id and domain_id as environment variables for the function.

At a POSIX permissions level, you can control which users can access the file system and which files or data they can access. The POSIX user and group ID for SageMaker apps are:

UID – The POSIX user ID. The default is 200001.
GID – The POSIX group ID. The default is 1001.

The function updates the default space settings of the SageMaker Studio domain, configuring the EFS file system to be used by all users. See the following code:

import json
import subprocess
import boto3
import os
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)
sm_client = boto3.client('sagemaker')

def lambda_handler(event, context):
    
    # Environment variables
    file_system=os.environ['efs_id']
    domain_id=os.environ['domain_id']
    
    # EFS directory name
    repository_name='shared_repository'
    repository=f'/mnt/efs/{repository_name}'
            
    # Add permissions to the new directory
    try:
        subprocess.call(['mkdir -p', repository])
        subprocess.call(['chown', '200001:1001', repository])
    except:
        print("Repository already created")
    
    # Update Sagemaker domain to enable access to the new directory
    response = sm_client.update_domain(
        DomainId=domain_id,
        DefaultUserSettings={
            'CustomFileSystemConfigs': [
                {
                    'EFSFileSystemConfig': {
                        'FileSystemId': file_system,
                        'FileSystemPath': f'/{repository_name}'
                    }
                }
            ]
        }
    )
    logger.info(f"Updated Studio Domain {domain_id} and EFS {file_system}")
    return {
        'statusCode': 200,
        'body': json.dumps(f"Created dir and modified permissions for Studio Domain {domain_id}")
    }

The execution role of the Lambda function needs to have permissions to update the SageMaker Studio domain:

{ 
"Version": "2012-10-17",
    "Statement": [ 
        { 
        "Effect": "Allow", 
        "Action": [
            "sagemaker:UpdateDomain"
        ],
        "Resource": "*" 
        } 
    ]
}

Configure an EFS directory shared across multiple domains under the same VPC

In this scenario, an administrator wants to provision an EFS file system for all users of multiple SageMaker Studio domains, using the same file system directory for all the users. The idea in this case is to assign the same EFS file system to all users of all domains that are within the same VPC. To test the solution, the account should ideally have two SageMaker Studio domains inside the VPC and subnet.

Create the EFS file system, add mount targets, and create an access point

Complete the steps in the previous section to set up your file system, mount targets, and access point.

Create a new Lambda function

Define a Lambda function called LambdaManageEFSUsers. This function is responsible for automating the configuration of SageMaker Studio domains to use a shared EFS file system within a specific VPC. This can be useful for organizations that want to provide a centralized storage solution for their ML projects across multiple SageMaker Studio domains. See the following code:

import json
import subprocess
import boto3
import os
import sys

import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

sm_client = boto3.client('sagemaker')

def lambda_handler(event, context):
    
    #Environment variables
    event_domain_id =event["domain_id"]
    file_system=os.environ['efs_id']
    env_vpc_id =os.environ['vpc_id']
    
    #Event parameters 
    repository_name='shared_repository'
    repository=f'/mnt/efs/{repository_name}'
    domains =[]    

    # List all SageMaker domains in the specified VPC
    response = sm_client.list_domains()
    all_domains = response['Domains']
    for domain in all_domains:
        domain_id =domain["DomainId"]
        data =sm_client.describe_domain(DomainId=domain_id)
        domain_vpc_id = data['VpcId']
        if domain_vpc_id ==env_vpc_id:
            domains.append(domain_id)
    
    # Create directory and add the permission
    try:
        subprocess.call(['mkdir -p', repository])
        subprocess.call(['chown', '200001:1001', repository])
    except:
        print("Repository already created")
    
    #Update Sagemaker domain
    if len(domains)>0:
        for domain_id in domains: 
            response = sm_client.update_domain(
                DomainId=event_domain_id,
                DefaultUserSettings={
                    'CustomFileSystemConfigs': [
                        {
                            'EFSFileSystemConfig': {
                                'FileSystemId': file_system,
                                'FileSystemPath': f'/{repository_name}'
                            }
                        }
                    ]
                }
            )
   
        logger.info(f"Updated Studio for Domains {domains} and EFS {file_system}")
        return {
                'statusCode': 200,
                'body': json.dumps(f"Created dir and modified permissions for Domains {domains}")
            }
    
    else:
        return {
            'statusCode': 400,
            'body': json.dumps(f"VPC id of all the domains {domain_vpc} is different than the vpc id configured {env_vpc_id}")
        }

The execution role of the Lambda function needs to have permissions to describe and update the SageMaker Studio domain:

{ 
"Version": "2012-10-17",
    "Statement": [ 
        { 
        "Effect": "Allow", 
        "Action": [
            "sagemaker:DescribeDomain",
            "sagemaker:UpdateDomain"
        ],
        "Resource": "*" 
        } 
    ]
}

Clean up

To clean up the solution you implemented and avoid further costs, delete the CloudFormation template you deployed in your AWS account. When you delete the template, you also delete the EFS file system and its storage. For additional information, refer to Delete a stack from the CloudFormation console.

Conclusion

In this post, we have explored three scenarios demonstrating the versatility of integrating Amazon EFS with SageMaker Studio. These scenarios highlight how Amazon EFS can provide a scalable, secure, and collaborative data storage solution for data science teams.

The first scenario focused on configuring an EFS directory with private spaces for individual user profiles, allowing users to store and access their own data while the administrator manages the EFS file system centrally.

The second scenario showcased a shared EFS directory across all spaces within a SageMaker Studio domain, enabling better collaboration and centralized data management.

The third scenario explored an EFS file system shared across multiple SageMaker Studio domains, empowering enterprise-level data science collaboration and promoting efficient use of shared resources.

By implementing these Amazon EFS integration scenarios, organizations can unlock the full potential of their data science teams, improve data governance, and enhance the overall efficiency of their data-driven initiatives. The integration of Amazon EFS with SageMaker Studio provides a versatile platform for data science teams to thrive in the evolving landscape of ML and AI.

About the Authors

Irene Arroyo Delgado is an AI/ML and GenAI Specialist Solutions Architect at AWS. She focuses on bringing out the potential of generative AI for each use case and productionizing ML workloads, to achieve customers’ desired business outcomes by automating end-to-end ML lifecycles. In her free time, Irene enjoys traveling and hiking.

Itziar Molina Fernandez is an AI/ML Consultant in the AWS Professional Services team. In her role, she works with customers building large-scale machine learning platforms and generative AI use cases on AWS. In her free time, she enjoys exploring new places.

Matteo Amadei is a Data Scientist Consultant in the AWS Professional Services team. He uses his expertise in artificial intelligence and advanced analytics to extract valuable insights and drive meaningful business outcomes for customers. He has worked on a wide range of projects spanning NLP, computer vision, and generative AI. He also has experience with building end-to-end MLOps pipelines to productionize analytical models. In his free time, Matteo enjoys traveling and reading.

Giuseppe Angelo Porcelli is a Principal Machine Learning Specialist Solutions Architect for Amazon Web Services. With several years of software engineering and an ML background, he works with customers of any size to understand their business and technical needs and design AI and ML solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. He has worked on projects in different domains, including MLOps, computer vision, and NLP, involving a broad set of AWS services. In his free time, Giuseppe enjoys playing football.

Summarize call transcriptions securely with Amazon Transcribe and Amazon Bedrock Guardrails

October 17, 2024

by Yash Yamsanwar Amazon AWS

Given the volume of meetings, interviews, and customer interactions in modern business environments, audio recordings play a crucial role in capturing valuable information. Manually transcribing and summarizing these recordings can be a time-consuming and tedious task. Fortunately, advancements in generative AI and automatic speech recognition (ASR) have paved the way for automated solutions that can streamline this process.

Customer service representatives receive a high volume of calls each day. Previously, calls were recorded and manually reviewed later for compliance, regulations, and company policies. Call recordings had to be transcribed, summarized, and then redacted for personal identifiable information (PII) before analyzing calls, resulting in delayed access to insights.

Redacting PII is a critical practice in security for several reasons. Maintaining the privacy and protection of individuals’ personal information is not only a matter of ethical responsibility, but also a legal requirement. In this post, we show you how to use Amazon Transcribe to get near real-time transcriptions of calls sent to Amazon Bedrock for summarization and sensitive data redaction. We’ll walk through an architecture that uses AWS Step Functions to orchestrate the process, providing seamless integration and efficient processing

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading model providers such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, Mistral AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. You can use Amazon Bedrock Guardrails to redact sensitive information such as PII found in the generated call transcription summaries. Clean, summarized transcripts are then sent to analysts. This provides quicker access to call trends while protecting customer privacy.

Solution overview

The architecture of this solution is designed to be scalable, efficient, and compliant with privacy regulations. It includes the following key components:

Recording – An audio file, such as a meeting or support call, to be transcribed and summarized
Step Functions workflow – Coordinates the transcription and summarization process
Amazon Transcribe – Converts audio recordings into text
Amazon Bedrock – Summarizes the transcription and removes PII
Amazon SNS – Delivers the summary to the designated recipient
Recipient – Receives the summarized, PII-redacted transcript

The following diagram shows the architecture overflow –

The workflow orchestrated by Step Functions is as follows:

An audio recording is provided as an input to the Step Functions workflow. This could be done manually or automatically depending on the specific use case and integration requirements.
The workflow invokes Amazon Transcribe, which converts the multi-speaker audio recording into a textual, speaker-partition transcription. Amazon Transcribe uses advanced speech recognition algorithms and machine learning (ML) models to accurately partition speakers and transcribe the audio, handling various accents, background noise, and other challenges.
The transcription output from Amazon Transcribe is then passed to Anthropic’s Claude 3 Haiku model on Amazon Bedrock through AWS Lambda. This model was chosen because it has relatively lower latency and cost than other models. The model first summarizes the transcript according to its summary instructions, and then the summarized output (the model response) is evaluated by Amazon Bedrock Guardrails to redact PII. To learn how it blocks harmful content, refer to How Amazon Bedrock Guardrails works. The instructions and transcript are both passed to the model as context.
The output from Amazon Bedrock is stored in Amazon Simple Storage Service (Amazon S3) and sent to the designated recipient using Amazon Simple Notification Service (Amazon SNS). Amazon SNS supports various delivery channels, including email, SMS, and mobile push notifications, making sure that the summary reaches the intended recipient in a timely and reliable manner

The recipient can then review the concise summary, quickly grasping the key points and insights from the original audio recording. Additionally, sensitive information has been redacted, maintaining privacy and compliance with relevant regulations.

The following diagram shows the Step Functions workflow –

Prerequisites

Follow these steps before starting:

Amazon Bedrock users need to request access to models before they’re available for use. This is a one-time action. For this solution, you need to enable access to Anthropic’s Claude 3 Haiku model on Amazon Bedrock. For more information, refer to Access Amazon Bedrock foundation models. Deployment, as described below, is currently supported only in the US West (Oregon) us-west-2 AWS Region. Users may explore other models if desired. You might need some customizations to deploy to alternative Regions with different model availability (such as us-east-1, which hosts Anthropic’s Claude 3.5 Sonnet). Make sure you consider model quality, speed, and cost tradeoffs before choosing a model.
Create a guardrail for PII redaction. Configure filters to block or mask sensitive information. This option can be found on the Amazon Bedrock console on the Add sensitive information filters page when creating a guardrail. To learn how to configure filters for other use cases, refer to Remove PII from conversations by using sensitive information filters.

Deploy solution resources

To deploy the solution, download an AWS CloudFormation template to automatically provision the necessary resources in your AWS account. The template sets up the following components:

A Step Functions workflow
Lambda functions
An SNS topic
An S3 bucket
AWS Key Management Service (AWS KMS) keys for data encryption and decryption

By using this template, you can quickly deploy the sample solution with minimal manual configuration. The template requires the following parameters:

Email address used to send summary – The summary will be sent to this address. You must acknowledge the initial Amazon SNS confirmation email before receiving additional notifications.
Summary instructions – These are the instructions given to the Amazon Bedrock model to generate the summary
Guardrail ID – This is the ID of your recently created guardrail, which can be found on the Amazon Bedrock Guardrails console in Guardrail overview

The Summary instructions are read into your Lambda function as an environment variable.

 
# Use the provided instructions to provide the summary. Use a default if no intructions are provided.
SUMMARY_INSTRUCTIONS = os.getenv('SUMMARY_INSTRUCTIONS')
 
These are then used as part of your payload to Anthropic’s Claude 3 Haiku model. This is shared to give you an understanding of how to pass the instructions and text to the model.
 
# Create the payload to provide to the Anthropic model.
        user_message = {"role": "user", "content": f"{SUMMARY_INSTRUCTIONS}{transcript}"}
        messages = [user_message]
response = generate_message(bedrock_client, 'anthropic.claude-3-haiku-20240307-v1:0"', "", messages, 1000)
 
The generate_message() function contains the invocation to Amazon Bedrock with the guardrail ID and other relevant parameters.
 
def generate_message(bedrock_runtime, model_id, system_prompt, messages, max_tokens):
    body = json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "system": system_prompt,
            "messages": messages
        }
    )
print(f'Invoking model: {BEDROCK_MODEL_ID}')
 
    response = bedrock_runtime.invoke_model(
        body=body,
        modelId=BEDROCK_MODEL_ID,
        # contentType=contentType,
        guardrailIdentifier =BEDROCK_GUARDRAIL_ID,
        guardrailVersion ="1",
        trace ="ENABLED")
    response_body = json.loads(response.get('body').read())
    print(f'response: {response}')
    return response_body

Deploy the solution

After you deploy the resources using AWS CloudFormation, complete these steps:

Add a Lambda layer.

Although AWS Lambda regularly updates the version of AWS Boto3 included, at the time of writing this post, it still provides version 1.34.126. To use Amazon Bedrock Guardrails, you need version 1.34.90 or higher, for which we’ll add a Lambda layer that updates the Boto3. You can follow the official developer guide on how to add a Lambda layer.

There are different ways to create a Lambda layer. A simple method is to use the steps outlined in Packaging the layer content, which references a sample application repo. You should be able to replace requests==2.31.0 within requirements.txt content to boto3, which will install the latest available version, then create the layer.

To add the layer to Lambda, make sure that the parameters specified in Creating the layer match the deployed Lambda. That is, you need to update compatible-architectures to x86_64.

Acknowledge the Amazon SNS email confirmation that you should receive a few moments after creating the CloudFormation stack
On the AWS CloudFormation console, find the stack you just created
On the stack’s Outputs tab, look for the value associated with AssetBucketName. It will look something like summary-generator-assetbucket-xxxxxxxxxxxxx.
On the Amazon S3 console, find your S3 assets bucket.

This is where you’ll upload your recordings. Valid file formats are MP3, MP4, WAV, FLAC, AMR, OGG, and WebM.

Upload your recording to the recordings folder in Amazon S3

Uploading recordings will automatically trigger the AWS Step Functions state machine. For this example, we use a sample team meeting recording from the sample recording.

On the AWS Step Functions console, find the summary-generator state machine. Choose the name of the state machine run with the status Running.

Here, you can watch the progress of the state machine as it processes the recording. After it reaches its Success state, you should receive an emailed summary of the recording. Alternatively, you can navigate to the S3 assets bucket and view the transcript there in the transcripts folder.

Expand the solution

Now that you have a working solution, here are some potential ideas to customize the solution for your specific use cases:

Try altering the process to fit your available source content and desired outputs:
- For situations where transcripts are available, create an alternate AWS Step Functions workflow to ingest existing text-based or PDF-based transcriptions
- Instead of using Amazon SNS to notify recipients through email, you can use it to send the output to a different endpoint, such as a team collaboration site or to the team’s chat channel
Try changing the summary instructions for the AWS CloudFormation stack parameter provided to Amazon Bedrock to produce outputs specific to your use case. The following are some examples:
- When summarizing a company’s earnings call, you could have the model focus on potential promising opportunities, areas of concern, and things that you should continue to monitor
- If you’re using the model to summarize a course lecture, it could identify upcoming assignments, summarize key concepts, list facts, and filter out small talk from the recording
For the same recording, create different summaries for different audiences:
- Engineers’ summaries focus on design decisions, technical challenges, and upcoming deliverables
- Project managers’ summaries focus on timelines, costs, deliverables, and action items
- Project sponsors get a brief update on project status and escalations
- For longer recordings, try generating summaries for different levels of interest and time commitment. For example, create a single sentence, single paragraph, single page, or in-depth summary. In addition to the prompt, you might want to adjust the max_tokens_to_sample parameter to accommodate different content lengths.

Clean up

Clean up the resources you created for this solution to avoid incurring costs. You can use an AWS SDK, the AWS Command Line Interface (AWS CLI), or the console.

Delete Amazon Bedrock Guardrails and the Lambda layer you created
Delete the CloudFormation stack

To use the console, follow these steps:

On the Amazon Bedrock console, in the navigation menu, select Guardrails. Choose your guardrail, then select Delete.
On the AWS Lambda console, in the navigation menu, select Layers. Choose your layer, then select Delete.
On the AWS CloudFormation console, in the navigation menu, select Stacks. Choose the stack you created, then select Delete.

Deleting the stack won’t delete the associated S3 bucket. If you no longer require the recordings or transcripts, you can delete the bucket separately. Amazon Transcribe is designed to automatically delete transcription jobs after 90 days. However, you can opt to manually delete these jobs before the 90-day retention period expires.

Conclusion

As businesses turn to data as a foundation for decision-making, having the ability to efficiently extract insights from audio recordings is invaluable. By using the power of generative AI with Amazon Bedrock and Amazon Transcribe, your organization can create concise summaries of audio recordings while maintaining privacy and compliance. The proposed architecture demonstrates how AWS services can be orchestrated using AWS Step Functions to streamline and automate complex workflows, enabling organizations to focus on their core business activities.

This solution not only saves time and effort, but also makes sure that sensitive information is redacted, mitigating potential risks and promoting compliance with data protection regulations. As organizations continue to generate and process large volumes of audio data, solutions like this will become increasingly important for gaining insights, making informed decisions, and maintaining a competitive edge.

About the authors

Yash Yamsanwar is a Machine Learning Architect at Amazon Web Services (AWS). He is responsible for designing high-performance, scalable machine learning infrastructure that optimizes the full lifecycle of machine learning models, from training to deployment. Yash collaborates closely with ML research teams to push the boundaries of what is possible with LLMs and other cutting-edge machine learning technologies.

Sawyer Hirt is a Solutions Architect at AWS, specializing in AI/ML and cloud architectures, with a passion for helping businesses leverage cutting-edge technologies to overcome complex challenges. His expertise lies in designing and optimizing ML workflows, enhancing system performance, and making advanced AI solutions more accessible and cost-effective, with a particular focus on Generative AI. Outside of work, Sawyer enjoys traveling, spending time with family, and staying current with the latest developments in cloud computing and artificial intelligence.

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with AI-powered pipelines

October 16, 2024

by Lucas Desard Amazon AWS

This post was co-written with Lucas Desard, Tom Lauwers, and Sam Landuydt from DPG Media.

DPG Media is a leading media company in Benelux operating multiple online platforms and TV channels. DPG Media’s VTM GO platform alone offers over 500 days of non-stop content.

With a growing library of long-form video content, DPG Media recognizes the importance of efficiently managing and enhancing video metadata such as actor information, genre, summary of episodes, the mood of the video, and more. Having descriptive metadata is key to providing accurate TV guide descriptions, improving content recommendations, and enhancing the consumer’s ability to explore content that aligns with their interests and current mood.

This post shows how DPG Media introduced AI-powered processes using Amazon Bedrock and Amazon Transcribe into its video publication pipelines in just 4 weeks, as an evolution towards more automated annotation systems.

The challenge: Extracting and generating metadata at scale

DPG Media receives video productions accompanied by a wide range of marketing materials such as visual media and brief descriptions. These materials often lack standardization and vary in quality. As a result, DPG Media Producers have to run a screening process to consume and understand the content sufficiently to generate the missing metadata, such as brief summaries. For some content, additional screening is performed to generate subtitles and captions.

As DPG Media grows, they need a more scalable way of capturing metadata that enhances the consumer experience on online video services and aids in understanding key content characteristics.

The following were some initial challenges in automation:

Language diversity – The services host both Dutch and English shows. Some local shows feature Flemish dialects, which can be difficult for some large language models (LLMs) to understand.
Variability in content volume – They offer a range of content volume, from single-episode films to multi-season series.
Release frequency – New shows, episodes, and movies are released daily.
Data aggregation – Metadata needs to be available at the top-level asset (program or movie) and must be reliably aggregated across different seasons.

Solution overview

To address the challenges of automation, DPG Media decided to implement a combination of AI techniques and existing metadata to generate new, accurate content and category descriptions, mood, and context.

The project focused solely on audio processing due to its cost-efficiency and faster processing time. Video data analysis with AI wasn’t required for generating detailed, accurate, and high-quality metadata.

The following diagram shows the metadata generation pipeline from audio transcription to detailed metadata.

The general architecture of the metadata pipeline consists of two primary steps:

Generate transcriptions of audio tracks: use speech recognition models to generate accurate transcripts of the audio content.
Generate metadata: use LLMs to extract and generate detailed metadata from the transcriptions.

In the following sections, we discuss the components of the pipeline in more detail.

Step 1. Generate transcriptions of audio tracks

To generate the necessary audio transcripts for metadata extraction, the DPG Media team evaluated two different transcription strategies: Whisper-v3-large, which requires at least 10 GB of vRAM and high operational processing, and Amazon Transcribe, a managed service with the added benefit of automatic model updates from AWS over time and speaker diarization. The evaluation focused on two key factors: price-performance and transcription quality.

To evaluate the transcription accuracy quality, the team compared the results against ground truth subtitles on a large test set, using the following metrics:

Word error rate (WER) – This metric measures the percentage of words that are incorrectly transcribed compared to the ground truth. A lower WER indicates a more accurate transcription.
Match error rate (MER) – MER assesses the proportion of correct words that were accurately matched in the transcription. A lower MER signifies better accuracy.
Word information lost (WIL) – This metric quantifies the amount of information lost due to transcription errors. A lower WIL suggests fewer errors and better retention of the original content.
Word information preserved (WIP) – WIP is the opposite of WIL, indicating the amount of information correctly captured. A higher WIP score reflects more accurate transcription.
Hits – This metric counts the number of correctly transcribed words, giving a straightforward measure of accuracy.

Both experiments transcribing audio yielded high-quality results without the need to incorporate video or further speaker diarization. For further insights into speaker diarization in other use cases, see Streamline diarization using AI as an assistive technology: ZOO Digital’s story.

Considering the varying development and maintenance efforts required by different alternatives, DPG Media chose Amazon Transcribe for the transcription component of their system. This managed service offered convenience, allowing them to concentrate their resources on obtaining comprehensive and highly accurate data from their assets, with the goal of achieving 100% qualitative precision.

Step 2. Generate metadata

Now that DPG Media has the transcription of the audio files, they use LLMs through Amazon Bedrock to generate the various categories of metadata (summaries, genre, mood, key events, and so on). Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Through Amazon Bedrock, DPG Media selected the Anthropic Claude 3 Sonnet model based on internal testing, and the Hugging Face LMSYS Chatbot Arena Leaderboard for its reasoning and Dutch language performance. Working closely with end-consumers, the DPG Media team tuned the prompts to make sure the generated metadata matched the expected format and style.

After the team had generated metadata at the individual video level, the next step was to aggregate this metadata across an entire series of episodes. This was a critical requirement, because content recommendations on a streaming service are typically made at the series or movie level, rather than the episode level.

To generate summaries and metadata at the series level, the DPG Media team reused the previously generated video-level metadata. They fed the summaries in an ordered and structured manner, along with a specifically tailored system prompt, back through Amazon Bedrock to Anthropic Claude 3 Sonnet.

Using the summaries instead of the full transcriptions of the episodes was sufficient for high-quality aggregated data and was more cost-efficient, because many of DPG Media’s series have extended runs.

The solution also stores the direct association between each type of metadata and its corresponding system prompt, making it straightforward to tune, remove, or add prompts as needed—similar to the adjustments made during the development process. This flexibility allows them to tailor the metadata generation to evolving business requirements.

To evaluate the metadata quality, the team used reference-free LLM metrics, inspired by LangSmith. This approach used a secondary LLM to evaluate the outputs based on tailored metrics such as if the summary is simple to understand, if it contains all important events from the transcription, and if there are any hallucinations in the generated summary. The secondary LLM is used to evaluate the summaries on a large scale.

Results and lessons learned

The implementation of the AI-powered metadata pipeline has been a transformative journey for DPG Media. Their approach saves days of work generating metadata for a TV series.

DPG Media chose Amazon Transcribe for its ease of transcription and low maintenance, with the added benefit of incremental improvements by AWS over the years. For metadata generation, DPG Media chose Anthropic Claude 3 Sonnet on Amazon Bedrock, instead of building direct integrations to various model providers. The flexibility to experiment with multiple models was appreciated, and there are plans to try out Anthropic Claude Opus when it becomes available in their desired AWS Region.

DPG Media decided to strike a balance between AI and human expertise by having the results generated by the pipeline validated by humans. This approach was chosen because the results would be exposed to end-customers, and AI systems can sometimes make mistakes. The goal was not to replace people but to enhance their capabilities through a combination of human curation and automation.

Transforming the video viewing experience is not merely about adding more descriptions, it’s about creating a richer, more engaging user experience. By implementing AI-driven processes, DPG Media aims to offer better-recommended content to users, foster a deeper understanding of its content library, and progress towards more automated and efficient annotation systems. This evolution promises not only to streamline operations but also to align content delivery with modern consumption habits and technological advancements.

Conclusion

In this post, we shared how DPG Media introduced AI-powered processes using Amazon Bedrock into its video publication pipelines. This solution can help accelerate audio metadata extraction, create a more engaging user experience, and save time.

We encourage you to learn more about how to gain a competitive advantage with powerful generative AI applications by visiting Amazon Bedrock and trying this solution out on a dataset relevant to your business.

About the Authors

Lucas Desard is GenAI Engineer at DPG Media. He helps DPG Media integrate generative AI efficiently and meaningfully into various company processes.

Tom Lauwers is a machine learning engineer on the video personalization team for DPG Media. He builds and architects the recommendation systems for DPG Media’s long-form video platforms, supporting brands like VTM GO, Streamz, and RTL play.

Sam Landuydt is the Area Manager Recommendation & Search at DPG Media. As the manager of the team, he guides ML and software engineers in building recommendation systems and generative AI solutions for the company.

Irina Radu is a Prototyping Engagement Manager, part of AWS EMEA Prototyping and Cloud Engineering. She helps customers get the most out of the latest tech, innovate faster, and think bigger.

Fernanda Machado, AWS Prototyping Architect, helps customers bring ideas to life and use the latest best practices for modern applications.

Andrew Shved, Senior AWS Prototyping Architect, helps customers build business solutions that use innovations in modern applications, big data, and AI.