Accelerate video Q&A workflows using Amazon Bedrock Knowledge Bases, Amazon Transcribe, and thoughtful UX design

Accelerate video Q&A workflows using Amazon Bedrock Knowledge Bases, Amazon Transcribe, and thoughtful UX design

Organizations are often inundated with video and audio content that contains valuable insights. However, extracting those insights efficiently and with high accuracy remains a challenge. This post explores an innovative solution to accelerate video and audio review workflows through a thoughtfully designed user experience that enables human and AI collaboration. By approaching the problem from the user’s point of view, we can create a powerful tool that allows people to quickly find relevant information within long recordings without the risk of AI hallucinations.

Many professionals, from lawyers and journalists to content creators and medical practitioners, need to review hours of recorded content regularly to extract verifiably accurate insights. Traditional methods of manual review or simple keyword searches over transcripts are time-consuming and often miss important context. More advanced AI-powered summarization tools exist, but they risk producing hallucinations or inaccurate information, which can be dangerous in high-stakes environments like healthcare or legal proceedings.

Our solution, the Recorded Voice Insight Extraction Webapp (ReVIEW), addresses these challenges by providing a seamless method for humans to collaborate with AI, accelerating the review process while maintaining accuracy and trust in the results. The application is built on top of Amazon Transcribe and Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

User experience

To accelerate a user’s review of a long-form audio or video while mitigating the risk of hallucinations, we introduce the concept of timestamped citations. Not only are large language models (LLMs) capable of answering a user’s question based on the transcript of the file, they are also capable of identifying the timestamp (or timestamps) of the transcript during which the answer was discussed. By using a combination of transcript preprocessing, prompt engineering, and structured LLM output, we enable the user experience shown in the following screenshot, which demonstrates the conversion of LLM-generated timestamp citations into clickable buttons (shown underlined in red) that navigate to the correct portion of the source video.

ReVIEW Application screenshot demonstrating citations linking to specific timestamps in a video.

The user in this example has uploaded a number of videos, including some recordings of AWS re:Invent talks. You’ll notice that the preceding answer actually contains a hallucination originating from an error in the transcript; the AI assistant replied that “Hyperpaths” was announced, when in reality the service is called Amazon SageMaker HyperPod.

The user in the preceding screenshot had the following journey:

  1. The user asks the AI assistant “What’s new with SageMaker?” The assistant searches the timestamped transcripts of the uploaded re:Invent videos.
  2. The assistant provides an answer with citations. Those citations contain both the name of the video and a timestamp, and the frontend displays buttons corresponding to the citations. Each citation can point to a different video, or to different timestamps within the same video.
  3. The user reads that SageMaker “Hyperpaths” was announced. They proceed to verify the accuracy of the generated answer by selecting the buttons, which auto play the source video starting at that timestamp.
  4. The user sees that the product is actually called Amazon SageMaker HyperPod, and can be confident that SageMaker HyperPod was the product announced at re:Invent.

This experience, which is at the heart of the ReVIEW application, enables users to efficiently get answers to questions based on uploaded audio or video files and to verify the accuracy of the answers by rewatching the source media for themselves.

Solution overview

The full code for this application is available on the GitHub repo.

The architecture of the solution is shown in the following diagram, showcasing the flow of data through the application.

Architecture diagram of ReVIEW application

The workflow consists of the following steps:

  1. A user accesses the application through an Amazon CloudFront distribution, which adds a custom header and forwards HTTPS traffic to an Elastic Load Balancing application load balancer. Behind the load balancer is a containerized Streamlit application running on Amazon Elastic Container Service (Amazon ECS).
  2. Amazon Cognito handles user logins to the frontend application and Amazon API Gateway.
  3. When a user uploads a media file through the frontend, a pre-signed URL is generated for the frontend to upload the file to Amazon Simple Storage Service (Amazon S3).
  4. The frontend posts the file to an application S3 bucket, at which point a file processing flow is initiated through a triggered AWS Lambda. The file is sent to Amazon Transcribe and the resulting transcript is stored in Amazon S3. The transcript gets postprocessed into a text form more appropriate for use by an LLM, and an AWS Step Functions state machine syncs the transcript to a knowledge base configured in Amazon Bedrock Knowledge Bases. The knowledge base sync process handles chunking and embedding of the transcript, and storing embedding vectors and file metadata in an Amazon OpenSearch Serverless vector database.
  5. If a user asks a question of one specific transcript (designated by the “pick media file” dropdown menu in the UI), the entire transcript is used to generate the response, so a retrieval step using the knowledge base is not required and an LLM is called directly through Amazon Bedrock.
  6. If the user is asking a question whose answer might appear in any number of source videos (by choosing Chat with all media files on the dropdown menu in the UI), the Amazon Bedrock Knowledge Bases RetrieveAndGenerate API is used to embed the user query, find semantically similar chunks in the vector database, input those chunks into an LLM prompt, and generate a specially formatted response.
  7. Throughout the process, application data from tracking transcription and ingestion status, mapping user names to uploaded files, and caching responses are accomplished with Amazon DynamoDB.

One important characteristic of the architecture is the clear separation of frontend and backend logic through an API Gateway deployed REST API. This was a design decision to enable users of this application to replace the Streamlit frontend with a custom frontend. There are instructions for replacing the frontend in the README of the GitHub repository.

Timestamped citations

The key to this solution lies in the prompt engineering and structured output format. When generating a response to a user’s question, the LLM is instructed to not only provide an answer to the question (if possible), but also to cite its sources in a specific way.

The full prompt can be seen in the GitHub repository, but a shortened pseudo prompt (for brevity) is shown here:

You are an intelligent AI which attempts to answer questions based on retrieved chunks of automatically generated transcripts.

Below are retrieved chunks of transcript with metadata including the file name. Each chunk includes a <media_name> and lines of a transcript, each line beginning with a timestamp.

$$ retrieved transcript chunks $$

Your answer should be in json format, including a list of partial answers, each of which has a citation. The citation should include the source file name and timestamp. Here is the user’s question:

$$ user question $$

The frontend then parses the LLM response into a fixed schema data model, described with Pydantic BaseModels:

from pydantic import BaseModel

class Citation(BaseModel):
    """A single citation from a transcript"""
    media_name: str
    timestamp: int

class PartialQAnswer(BaseModel):
    """Part of a complete answer, to be concatenated with other partial answers"""
    partial_answer: str
    citations: List[Citation]

class FullQAnswer(BaseModel):
    """Full user query response including citations and one or more partial answers"""
    answer: List[PartialQAnswer]

This format allows the frontend to parse the response and display buttons for each citation that cue up the relevant media segment for user review.

Deployment details

The solution is deployed in the form of one AWS Cloud Development Kit (AWS CDK) stack, which contains four nested stacks:

  • A backend that handles transcribing uploaded media and tracking job statuses
  • A Retrieval Augmented Generation (RAG) stack that handles setting up OpenSearch Serverless and Amazon Bedrock Knowledge Bases
  • An API stack that stands up an Amazon Cognito authorized REST API and various Lambda functions to logically separate the frontend from the backend
  • A frontend stack that consists of a containerized Streamlit application running as a load balanced service in an ECS cluster, with a CloudFront distribution connected to the load balancer

Prerequisites

The solution requires the following prerequisites:

  • You need to have an AWS account and an AWS Identity and Access Management (IAM) role and user with permissions to create and manage the necessary resources and components for this application. If you don’t have an AWS account, see How do I create and activate a new Amazon Web Services account?
  • You also need to request access to at least one Amazon Bedrock LLM (to generate answers to questions) and one embedding model (to find transcript chunks that are semantically similar to a user question). The following Amazon Bedrock models are the default, but can be changed using a configuration file at the application deployment time as described later in this post:
    • Amazon Titan Embeddings V2 – Text
    • Amazon’s Nova Pro
  • You need a Python environment with AWS CDK dependencies installed. For instructions, see Working with the AWS CDK in Python.
  • Docker is required to build the Streamlit frontend container at deployment time.
  • The minimal IAM permissions needed to bootstrap and deploy the AWS CDK are described in the ReVIEW/infra/minimal-iam-policy.json file in the GitHub repository. Make sure the IAM user or role deploying the stacks has these permissions.

Clone the repository

Fork the repository, and clone it to the location of your choice. For example:

$ git clone https://github.com/aws-samples/recorded-voice-insight-extraction-webapp.git

Edit the deployment config file

Optionally, edit the infra/config.yaml file to provide a descriptive base name for your stack. This file is also where you can choose specific Amazon Bedrock embedding models for semantic retrieval and LLMs for response generation, and define chunking strategies for the knowledge base that will ingest transcriptions of uploaded media files. This file is also where you can reuse an existing Amazon Cognito user pool if you want to bootstrap your application with an existing user base.

Deploy the AWS CDK stacks

Deploy the AWS CDK stacks with the following code:

$ cd infra
$ cdk bootstrap
$ cdk deploy –-all

You only need to use the preceding command one time per AWS account. The deploy command will deploy the parent stack and four nested stacks. The process takes approximately 20 minutes to complete.

When the deployment is complete, a CloudFront distribution URL of the form xxx.cloudfront.net will be printed on the console screen to access the application. This URL can also be found on the AWS CloudFormation console by locating the stack whose name matches the value in the config file, then choosing the Outputs tab and locating the value associated with the key ReVIEWFrontendURL. That URL will lead you to a login screen like the following screenshot.

Screenshot of ReVIEW application login page.

Create an Amazon Cognito user to access the app

To log in to the running web application, you have to create an Amazon Cognito user. Complete the following steps:

  1. On the Amazon Cognito console, navigate to the recently created user pool.
  2. In the Users section under User Management¸ choose Create user.
  3. Create a user name and password to log in to the ReVIEW application deployed in the account.

When the application deployment is destroyed (as described in the cleanup section), the Amazon Cognito pool remains to preserve the user base. The pool can be fully removed manually using the Amazon Cognito console.

Test the application

Test the application by uploading one or more audio or video files on the File Upload tab. The application supports media formats supported by Amazon Transcribe. If you are looking for a sample video, consider downloading a TED talk. After uploading, you will see the file appear on the Job Status tab. You can track processing progress through transcription, postprocessing, and knowledge base syncing steps on this tab. After at least one file is marked Complete, you can chat with it on the Chat With Your Media tab.

The Analyze Your Media tab allows you to create and apply custom LLM template prompts to individual uploaded files. For example, you can create a basic summary template, or an extract key information template, and apply it to your uploaded files here. This functionality was not described in detail in this post.

Clean up

The deployed application will incur ongoing costs even if it isn’t used, for example from OpenSearch Serverless indexing and search OCU minimums. To delete all resources created when deploying the application, run the following command:

$ cdk destroy –-all

Conclusion

The solution presented in this post demonstrates a powerful pattern for accelerating video and audio review workflows while maintaining human oversight. By combining the power of AI models in Amazon Bedrock with human expertise, you can create tools that not only boost productivity but also maintain the critical element of human judgment in important decision-making processes.

We encourage you to explore this fully open sourced solution, adapt it to your specific use cases, and provide feedback on your experiences.

For expert assistance, the AWS Generative AI Innovation Center, AWS Professional Services, and our AWS Partners are here to help.


About the Author

David Kaleko is a Senior Applied Scientist in the AWS Generative AI Innovation Center.

Read More

Boost team innovation, productivity, and knowledge sharing with Amazon Q Apps

Boost team innovation, productivity, and knowledge sharing with Amazon Q Apps

As enterprises rapidly expand their applications, platforms, and infrastructure, it becomes increasingly challenging to keep up with technology trends, best practices, and programming standards. Enterprises typically provide their developers, engineers, and architects with a variety of knowledge resources such as user guides, technical wikis, code repositories, and specialized tools. However, over time these resources often become siloed within individual teams or organizational silos, making it difficult for employees to easily access relevant information across the broader organization. This lack of knowledge sharing can lead to duplicated efforts, reduced productivity, and missed opportunities to use institutional expertise.

Imagine you’re a developer tasked with troubleshooting a complex issue in your company’s cloud infrastructure. You scour through outdated user guides and scattered conversations, but can’t find the right answer. Minutes turn into hours, sometimes days, as you struggle to piece together the information you need, all while your project falls behind.

To address these challenges, the MuleSoft team integrated Amazon Q Apps, a capability within Amazon Q Business, a generative AI-powered assistant service, directly into their Cloud Central portal—an individualized portal that shows assets owned, costs and usage, and AWS Well-Architected recommendations to over 100 engineer teams. Amazon Q Apps is designed to use Amazon Q Business and its ability to draw upon an enterprise’s own internal data, documents, and systems to provide conversational assistance to users. By tapping into these rich information sources, you can enable your users to create Amazon Q Apps that can answer questions, summarize key points, generate custom content, and even securely complete certain tasks—all without the user having to navigate through disparate repositories or systems. Prior to Amazon Q Apps, MuleSoft was using a chatbot that used Slack, Amazon Lex V2, and Amazon Kendra. The chatbot solution didn’t meet the needs of the engineering and development teams, which prompted the exploration of Amazon Q Apps.

In this post, we demonstrate how Amazon Q Apps can help maximize the value of existing knowledge resources and improve productivity among various teams, ranging from finance to DevOps to support engineers. We share specific examples of how the generative AI assistant can enable surface relevant information, distill complex topics, generate custom content, and execute workflows—all while maintaining robust security and data governance controls.

In addition to demonstrating the power of Amazon Q Apps, we provide guidance on prompt engineering and system prompts reflective of real-world use cases using the rich features of Amazon Q Apps. For instance, let’s consider the scenario of troubleshooting network connectivity. By considering personas and their specific lines of business, we can derive the optimal tone and language to provide a targeted, actionable response. This level of personalization is key to delivering optimized customer experiences and building trust.

Improve production with Amazon Q Apps

Amazon Q Apps is a feature within Amazon Q Business that assists you in creating lightweight, purpose-built applications within Amazon Q Business. You can create these apps in several ways like creating applications with your own words to fit specific requirements, or by transforming your conversations with an Amazon Q Business assistant into prompts that then can be used to generate an application.

With Amazon Q Apps, you can build, share, and customize applications on enterprise data to streamline tasks and boost individual and team productivity. You can also publish applications to an admin-managed library and share them with their coworkers. Amazon Q Apps inherits user permissions, access controls, and enterprise guardrails from Amazon Q Business for secure sharing and adherence to data governance policies.

Amazon Q Apps is only available to users with a Pro subscription. If you have the Lite subscription, you will not be able to view or use Amazon Q Apps.

MuleSoft’s use case with Amazon Q Apps

The team needed a more personalized approach to Amazon Q Business. Upon the announcement of Amazon Q Apps, the team determined it could solve an immediate need across teams. Their Cloud Central portal is already geared for a personalized experience for its users. MuleSoft completed a successful proof of concept integrating Amazon Q Apps into their overall Cloud Central portal. Cloud Central (see the following screenshot) serves as a single pane of glass for both managers and team members to visualize and understand each persona’s personalized cloud assets, cost metrics, and Well-Architected status based on application or infrastructure.

Fig 1 Salesforce MuleSoft Cloud Central Portal

Fig 1: Salesforce MuleSoft Cloud Central Portal

The MuleSoft support team was looking for a way to help them troubleshoot network traffic latency when they rolled out a new customer into their production environment. The MuleSoft team found Amazon Q Apps helpful in providing possible causes for network latency for virtual private clouds (VPCs) as well as in providing prescriptive guidance on how to troubleshoot VPC network latencies. We explore a similar network latency use case in this post.

Solution overview

In this post, we focus on creating Amazon Q applications from the Amazon Q Business Chat and Amazon Q Apps Creator:

  • Amazon Q Business Chat – You can use the Amazon Q Apps icon in the Amazon Q Business Chat assistant to generate a prompt that can be used to create an application. This feature summarizes the Amazon Q Business Chat conversation to create a prompt that you can review and edit before generating an application.
  • Amazon Q Apps Creator – With Amazon Q Apps Creator, you can describe the type of application you want to build using your own words to generate an application. Amazon Q Apps will generate an application for you based on the provided prompt.

Pre-requisites

Make sure you have an AWS account. If not, you can sign up one. Refer to Pre-requisites for Amazon Q Apps for the steps to complete prior to deploying Amazon Q Apps. For more information, see Getting started with Amazon Q Business.

Create an application using Amazon Q Business Chat

You can choose the Amazon Q Apps icon from an Amazon Q chat conversation to generate an application prompt and using it to create an Amazon Q application. The icon is available in the conversations pane on the left, above the Amazon Q Assistant Chat conversation in the upper-right corner, or on the prompt dropdown menu.

Let’s explore an example of using an Amazon Q chat assistant conversation to create an application.

  1. Begin by asking the Amazon Q Business assistant a question related to the data that is provided in the Amazon Q Business application.

For this example, we ask about steps to troubleshoot network latency.

  1. After you’ve finished your conversation, choose the Amazon Q Apps icon in either the conversation pane or in the upper-right corner to launch Amazon Q App Creator.
  2. Review the generated prompt from the conversation and update the prompt to match your application purpose as needed.
  3. Choose Generate to create the application.
  4. To test the application, we enter under User input “I am unable to reach my EC2 host via port 22,” and choose Run.
  5. Review the generated text output and confirm that the troubleshooting steps look correct.
  6. Share the app with all in the library, choose Publish.

The Amazon Q Apps library will show all published applications shared by your teammates. Only users who have access to the Amazon Q Business application will be able to view your published application.

  1. You can choose labels where the application will reside, relating to teams, personas, or categories.

Create an application using Amazon Q Apps Creator

You can start building an Amazon Q application with Amazon Q Apps Creator by describing the task you want to create an application for. Complete the following steps:

  1. Choose Apps in the navigation pane.
  2. Enter your prompt or use an example prompt.

For this post, we enter the prompt “Create an app that crafts insightful content for users to troubleshoot AWS services. It takes inputs like as a use case to work backwards from on a solution. Based on these inputs, the app generates a tailored response for resolving the AWS service use case, providing steps to remediate and content links.”

  1. Choose Generate to create the application.

The Amazon Q application was created with AWS Use Case, Troubleshooting Steps, and Additional Resources sections translated from your prompt.

  1. To test the application, we enter under User input “which AWS tool to manage many AWS accounts and take advantage of consolidated billing,” and choose Run.

The Troubleshooting Steps section highlights using AWS Organizations and provides a walkthrough. The Additional Resources section provides more information about your use case, while citing AWS customer references.

  1. Choose Share to publish your application and choose the appropriate labels.

Results

MuleSoft offers a prime example of the transformative impact of Amazon Q Apps. With this solution, MuleSoft was able to realize a 50% reduction in team inquiries—from 100 down to just 50. These inquiries spanned a wide range, from basic AWS service information to complex networking troubleshooting and even Amazon Elastic Block Store (Amazon EBS) volume migrations from gp2 to gp3.

Pricing

Amazon Q Business offers subscription options for you to customize your access. For more details, see Amazon Q Business pricing.

Conclusion

Amazon Q Business empowers enterprises to maximize the value of their knowledge resources by democratizing access to powerful conversational AI capabilities. Through Amazon Q Apps, organizations can create purpose-built applications using internal data and systems, unlocking new solutions and accelerating innovation.

The MuleSoft team demonstrated this by integrating Amazon Q Apps into their Cloud Central portal, enhancing user experience, streamlining collaboration, and optimizing cloud infrastructure while maintaining robust security and data governance.

Amazon Q Apps provides flexible generative AI application development using natural language, allowing organizations to build and securely publish custom applications tailored to their unique needs. This approach enables teams to boost innovation, productivity, and knowledge sharing across job functions.

By leveraging Amazon Q Business, enterprises can find answers, build applications, and drive productivity using their own enterprise data and conversational AI capabilities.

To learn about other Amazon Q Business customers’ success stories, see Amazon Q Developer customers.

*Note Amazon Q Apps is only available to users with the Pro subscription, if you have the Lite subscription you will not be able to view or use Amazon Q Apps.


About the Authors

Rueben Jimenez is an AWS Sr Solutions Architect. Designing and implementing complex Data Analytics, Machine learning, Generative AI, and cloud infrastructure solutions.

Tiffany Myers is an AWS Product Manager for Amazon Q Apps. Launching generative AI solutions for business users.

Summer Petersil is a Strategic Account Representative (SAR) on the AWS Salesforce team, where she leads Generative AI (GenAI) enablement efforts.

Read More

Harnessing Amazon Bedrock generative AI for resilient supply chain

Harnessing Amazon Bedrock generative AI for resilient supply chain

From pandemic shutdowns to geopolitical tensions, recent years have thrown our global supply chains into unexpected chaos. This turbulent period has taught both governments and organizations a crucial lesson: supply chain excellence depends not just on efficiency but on the ability to navigate disruptions through strategic risk management. By leveraging the generative AI capabilities and tooling of Amazon Bedrock, you can create an intelligent nerve center that connects diverse data sources, converts data into actionable insights, and creates a comprehensive plan to mitigate supply chain risks.

Amazon Bedrock is a fully managed service that enables the development and deployment of generative AI applications using high-performance foundation models (FMs) from leading AI companies through a single API.

Amazon Bedrock Flows affords you the ability to use supported FMs to build workflows by linking prompts, FMs, data sources, and other Amazon Web Services (AWS) services to create end-to-end solutions. Its visual workflow builder and serverless infrastructure enables organizations to accelerate the development and deployment of AI-powered supply chain solutions, improving agility and resilience in the face of evolving challenges. The drag and drop capability of Amazon Bedrock Flows efficiently integrates with Amazon Bedrock Knowledge Bases, Amazon Bedrock Agents and other ever-growing AWS services such as Amazon Simple Storage Service (Amazon S3), AWS Lambda and Amazon Lex.

This post walks through how Amazon Bedrock Flows connects your business systems, monitors medical device shortages, and provides mitigation strategies based on knowledge from Amazon Bedrock Knowledge Bases or data stored in Amazon S3 directly. You’ll learn how to create a system that stays ahead of supply chain risks.

Business workflow

The following is the supply chain business workflow implemented as an Amazon Bedrock flow.

workflow

The following are the steps of the workflow in detail:

  1. The JSON request with the medical device name is submitted to the prompt flow.
  2. The workflow determines if the medical device needs review by following these steps:
    1. The assistant invokes a Lambda function to check the device classification and any shortages.
    2. If there is no shortage, the workflow informs the user that no action is required.
    3. If the device classification is 3 (high-risk medical devices that are essential for sustaining life or health) and there is a shortage, the assistant determines the necessary mitigation steps Devices with classification 3 are treated as high-risk devices and require a comprehensive mitigation strategy. The following steps are followed in this scenario.
      1. Amazon Bedrock Knowledge Bases RetrieveAndGenerate API creates a comprehensive strategy.
      2. The flow emails the mitigation to the given email address.
    4. If the device classification is 2 (medium-risk medical devices that can pose harm to patients) and there is a shortage, the flow lists the mitigation steps as output. Classification device 2 doesn’t require a comprehensive mitigation strategy. We recommend to use these when the information retrieved fits the context size of the model. Mitigation is fetched from Amazon S3 directly.
    5. If the device classification is 1(low-risk devices that don’t pose significant risk to patients) and there is a shortage, the flow outputs only the details of the shortage because no action is required.

Solution overview

The following diagram illustrates the solution architecture. The solution uses Amazon Bedrock Flows to orchestrate the generative AI workflow. An Amazon Bedrock flow consists of nodes, which is a step in the flow and connections to connect to various data sources or to execute various conditions.

architecture

The system workflow includes the following steps:

  1. The user interacts with generative AI applications, which connect with Amazon Bedrock Flows. The user provides information about the device.
  2. A workflow in Amazon Bedrock Flows is a construct consisting of a name, description, permissions, a collection of nodes, and connections between nodes.
  3. A Lambda function node in Amazon Bedrock Flows is used to invoke AWS Lambda to get supply shortage and device classifications. AWS Lambda calculates this information based on the data from Amazon DynamoDB.
  4. If the device classification is 3, the flow queries the knowledge base node to find mitigations and create a comprehensive plan. Amazon Bedrock Guardrails can be applied in a knowledge base node.
  5. A Lambda function node in Amazon Bedrock Flows invokes another Lambda function to email the mitigation plan to the users. AWS Lambda uses Amazon Simple Email Service (Amazon SES) SDK to send emails to verified identities.
  6. Lambda functions are within the private subnet of Amazon Virtual Private Cloud (Amazon VPC) and provide least privilege access to the services using roles and permissions policies. AWS Lambda uses gateway endpoints or NAT gateways to connect to Amazon DynamoDB or Amazon SES, respectively
  7. If the device classification is 2, the flow queries Amazon S3 to fetch the mitigation. In this case, comprehensive mitigation isn’t needed, and it can fit in the model context. This reduces overall cost and simplifies maintenance.

Prerequisites

The following prerequisites need to be completed before you can build the solution.

  1. Have an AWS account.
  2. Have an Amazon VPC with private subnet and public subnet and egress internet access.
  3. This solution is supported only in US East (N. Virginia) us-east-1 AWS Region. You can make the necessary changes to your AWS CloudFormation template to deploy to other Regions.
  4. Have permission to create Lambda functions and configure AWS Identity and Access Management (IAM)
  5. Have permissions to create Amazon Bedrock prompts.
  6. Sign up for model access on the Amazon Bedrock console (for more information, refer to model access in the Amazon Bedrock documentation). For information about pricing for using Amazon Bedrock, refer to Amazon Bedrock pricing. For this post, we use Anthropic’s Claude 3.5 Sonnet, and all instructions pertain to that model.
  7. Enable AWS CloudTrail logging for operational and risk auditing.
  8. Enable budget policy notification to protect the customer from unwanted billing.

Deployment with AWS CloudFormation console

In this step, you deploy the CloudFormation template.

  1. Navigate to the CloudFormation console us-east-1
  2. Download the CloudFormation template and upload it in the Specify template Choose Next.
  3. Enter a name with the following details, as shown in the following screenshot:
    • Stack name
    • Fromemailaddress
    • Toemailaddress
    • VPCId
    • VPCCecurityGroupIds
    • VPCSubnets

cfn

  1. Keep the other values as default. Under Capabilities on the last page, select I acknowledge that AWS CloudFormation might create IAM resources. Choose Submit to create the CloudFormation stack.
  2. After the successful deployment of the whole stack, from the Resources tab, make a note of the following output key values. You’ll need them later.
    • BedrockKBQDataSourceBucket
    • Device2MitigationsBucket
    • KMSKey

cfn resource

This is a sample code for nonproduction use. You should work with your security and legal teams to align with your organizational security, regulatory, and compliance requirements before deployment.

Upload mitigation documents to Amazon S3

In this step, you upload the mitigation documents to Amazon S3.

  1. Download the device 2 mitigation strategy documents
  2. On the Amazon S3 console, search for the Device2MitigationsBucket captured earlier
  3. Upload the downloaded file to the bucket
  4. Download the device 3 mitigation strategy documents
  5. On the Amazon S3 console, search for the BedrockKBQDataSourceBucket captured earlier
  6. Upload these documents to the S3 bucket

Configure Amazon Bedrock Knowledge Bases

In this section, you create an Amazon Bedrock knowledge base and sync it.

  1. Create a knowledge base in Amazon Bedrock Knowledge Bases with BedrockKBQDataSourceBucket as a data source.
  2. Add an inline policy to the service role for Amazon Bedrock Knowledge Bases to decrypt the AWS Key Management Service (AWS KMS) key.
  3. Sync the data with the knowledge base.

Create an Amazon Bedrock workflow

In this section, you create a workflow in Amazon Bedrock Flows.

  1. On the Amazon Bedrock console, select Amazon Bedrock Flows from the left navigation pane. Choose Create flow to create a flow, as shown in the following screenshot.createflow
  1. Enter a Name for the flow and an optional Description.
  2. For the Service role name, choose Create and use a new service role to create a service role for you to use.
  3. Choose Create, as shown in the following screenshot. Your flow is created, and you’ll be taken to the flow builder where you can build your flow.

createflow2

Amazon Bedrock Flow configurations

This section walks through the process of creating the flow. Using Amazon Bedrock Flows, you can quickly build complex generative AI workflows using a visual flow builder. The following steps walk through configuring different components of the business process.

  1. On the Amazon Bedrock console, select Flows from the left navigation pane.
  2. Choose a flow in the Amazon Bedrock Flows
  3. Choose Edit in flow builder.
  4. In the Flow builder section, the center pane displays a Flow input node and a Flow output These are the input and output nodes for your flow.flow builder
  1. Select the Flow Input
  2. In Configure in the left-hand menu, change the Type of the Output to Object, as shown in the following screenshot.
    configure
  1. In the Flow builder pane, select Nodes.

Add prompt node to process the incoming data

A prompt node defines a prompt to use in the flow. You use this node to refine the input for Lambda processing.

  1. Drag the Prompts node and drop it in the center pane.prompt-node
  1. Select the node you just added.
  2. In the Configure section of the Flow builder pane, choose Define in node.
  3. Define the following values:
    • Choose Select model and Anthropic Claude 3 Sonnet.
    • In the Message section add the following prompt:
      Given a supply chain issue description enclosed in description tag <desc> </desc>, classify the device and problem type. Respond only with a JSON object in the following format: { "device": "<device_name>", "problem_type": "<problem_type>" } Device types include but are not limited to: Oxygen Mask Ventilator Hospital Bed Surgical Gloves Defibrillator pacemaker Problem types include but are not limited to: scarcity malfunction quality_issue If an unknown device type is provided respond with unknown for any of the fields <desc> {{description}}</desc>prompt-configure
  1. In the Input section, change the Expression of the input variable description to the following, as shown in the following screenshot:
    • $.data.description
      prompt-inputexpression
  1. The circles on the nodes are connection points. To connect the Prompt node to the input node, drag a line from the circle on the Flow input node to the circle in the Input section of the Prompt
  2. Delete the connection between the Flow Input node and the Flow Output node by double clicking on it. The following video illustrates steps 6 and 7.

Add Lambda node to fetch classifications from database

A Lambda node lets you call a Lambda function in which you can define code to carry out business logic. This solution uses a Lambda node to fetch the shortage information, classification of the device, Amazon S3 object key, and instructions for retrieving information from the knowledge base.

  1. Add the Lambda node by dragging to the center.
  2. From configuration of the node, choose the Lambda function with the name containing SupplyChainMgmt from the dropdown menu, as shown in the following screenshot.

lambda getsupply

  1. Update the Output type as Object, as shown in the following screenshot.

lambda getsupply to output

  1. Connect the Lambda node input to the Prompt node output.

Add condition node to determine the need for mitigation

A condition node sends data from the previous node to different nodes, depending on the conditions that are defined. A condition node can take multiple inputs. This node determines if there is a shortage and follows the appropriate path.

  1. Add the Condition node by dragging it to the center.

condition node

  1. From configuration of the Condition node, in the Input section, update the first input with the following details:
    • Name: classification
    • Type: Number
    • Expression: $.data.classification
      condition1
  1. Choose Add input to add the new input with the following details:
    • Name: shortage
    • Type: Number
    • Expression: $.data.shortage
  2. Connect the output of the Lambda node to the two inputs of the Condition

lambdacondition

  1. From configuration of the Condition node, in the Conditions section, add the following details:
    • Name: Device2Condition
    • Condition: (classification == 2) and (shortage >10)
  2. Choose Add condition and enter the following details:
    • Name: Device3Condition
    • Condition: (classification == 3) and (shortage >10)condition 2 and 3
  1. Connect the circle from If all conditions are false to input of default Flow output
  2. Connect output of Lambda node to default Flow output input node.condition to defaultoutput
  1. In the configurations of the default Flow output node, update the expression to the following:
    • $.data.message

Fetch mitigation using the S3 Retrieval Node

An S3 retrieval node lets you retrieve data from an Amazon S3 location to introduce to the flow. This node will retrieve mitigations directly from Amazon S3 for type 2 devices.

  1. Add an S3 Retrieval node by dragging it to the center.
  2. In the configurations of the node, choose the newly created S3 bucket with a name containing device2mitigationsbucket.
  3. Update the Expression of the input to the following:
    • $.data.S3instructions3retrieval
  1. Connect the circle from the Device2Condition condition of the Condition node to the S3 Retrieval.
  2. Connect the output of the Lambda node to the input of the S3 Retrieval.lambda to s3retrieval
  1. Add the Flow output node by dragging it to the center.
  2. In the configuration of the node, give the node the name
  3. Connect the output of the S3 Retrieval node to S3Output node.
     s3retrieval to ouput

Fetch mitigations using the Knowledge Base Node

A Knowledge Base node lets you send a query to a knowledge base from Amazon Bedrock Knowledge Bases. This node will fetch a comprehensive mitigation strategy from Amazon Bedrock Knowledge Bases for type 3 devices.

  1. Add the Knowledge Base node by dragging it to the center.
  2. From the configuration of the Knowledge Base node, select the knowledge base created earlier.
  3. Select Generate responses based on retrieved results and select Claude 3 Sonnet from the dropdown menu of Select model.
    knowledge base node
  1. In the Input section, update the input expression as the following:
    • Expression: $.data.retrievalQuery

 kb node input

  1. Connect the circle from the Device3Condition condition of the Condition node to the Knowledge base
  2. Connect the output of the Knowledge base node to the Lambda node input with the name codeHookInput.
  3. Add the Flow output node by dragging it to the center.
  4. In the configuration of the node, give the Node name KBOutput.
  5. Connect the output of the Knowledge Base node to KBOutput nodekb to outputnode
  1. Add the Lambda node by dragging it to the center.
  2. From the configuration of the node, choose the Lambda function with the name containing EmailReviewersFunction from the dropdown menu.
    senemailLambda
  1. Choose Add input to add the new input with the following details:
    • Name: email
    • Type: String
    • Expression: $.data.email
      senemailLambda input
  1. Change output Type to Object.

senemailLambda output

  1. Connect the output of the Knowledge base to the new Lambda node input with the name codeHookInput.
  2. Connect the output of the Flow input node to the new Lambda node input with the name email.
  3. Add the Flow output node by dragging it to the center.
  4. In the configuration of the node, give the Node name
  5. In the configurations of the emailOutput Flow output node, update the expression to the following:
    • $.data.message
  1. Connect the output of the Lambda node node to emailOutput Flow Output node
  2. Choose Save to save the flow.

Testing

To test the agent, use the Amazon Bedrock flow builder console. You can embed the API calls into your applications.

  1. In the test window of the newly created flow, give the following prompt by replacing the “To email address” with Toemail provided in the CloudFormation template.
    {"description": "Cochlear implants are in shortage ","retrievalQuery":"find the mitigation for device shortage", "email": "<To email address>"}
  1. SupplyChainManagement Lambda randomly generates shortages. If a shortage is detected, you’ll see an answer from Amazon Bedrock Knowledge Bases.
  2. An email is also sent to the email address provided in the context.teKB generated mitigation strategy
  1. Test the solution for classification 2 devices by giving the following prompt. Replace the To email address with Toemail provided in the CloudFormation template.
    {"description": " oxygen mask are in shortage ","retrievalQuery":"find the mitigation for device shortage", "email": "<To email address>"}
  1. The flow will fetch the results from Amazon S3 directly.
    S3 mitigation-strategy

Clean up

To avoid incurring future charges, delete the resources you created. To clean up the AWS environment, use the following steps:

  1. Empty the contents of the S3 bucket you created as part of the CloudFormation stack.
  2. Delete the flow from Amazon Bedrock.
  3. Delete the Amazon Bedrock knowledge base.
  4. Delete the CloudFormation stack you created.

Conclusion

As we navigate an increasingly unpredictable global business landscape, the ability to anticipate and respond to supply chain disruptions isn’t just a competitive advantage—it’s a necessity for survival. The Amazon Bedrock suite of generative AI–powered tools offers organizations the capability to transform their supply chain management from reactive to proactive, from fragmented to integrated, and from rigid to resilient.

By implementing the solutions outlined in this guide, organizations can:

  • Build automated, intelligent monitoring systems
  • Create predictive risk management frameworks
  • Use AI-driven insights for faster decision-making
  • Develop adaptive supply chain strategies that evolve with emerging challenges

Stay up to date with the latest advancements in generative AI and start building on AWS. If you’re seeking assistance on how to begin, check out the Generative AI Innovation Center.


About the Authors

Marcelo Silva is a Principal Product Manager at Amazon Web Services, leading strategy and growth for Amazon Bedrock Knowledge Bases and Amazon Lex.

Sujatha Dantuluri is a Senior Solutions Architect in the US federal civilian team at AWS. Her expertise lies in architecting mission-critical solutions and working closely with customers to ensure their success. Sujatha is an accomplished public speaker, frequently sharing her insights and knowledge at industry events and conferences.

Ishan Gupta is a Software Engineer at Amazon Bedrock, where he focuses on developing cutting-edge generative AI applications. His interests lie in exploring the potential of large language models and creating innovative solutions that leverage the power of AI.

Read More

How Travelers Insurance classified emails with Amazon Bedrock and prompt engineering

How Travelers Insurance classified emails with Amazon Bedrock and prompt engineering

This is a guest blog post co-written with Jordan Knight, Sara Reynolds, George Lee from Travelers.

Foundation models (FMs) are used in many ways and perform well on tasks including text generation, text summarization, and question answering. Increasingly, FMs are completing tasks that were previously solved by supervised learning, which is a subset of machine learning (ML) that involves training algorithms using a labeled dataset. In some cases, smaller supervised models have shown the ability to perform in production environments while meeting latency requirements. However, there are benefits to building an FM-based classifier using an API service such as Amazon Bedrock, such as the speed to develop the system, the ability to switch between models, rapid experimentation for prompt engineering iterations, and the extensibility into other related classification tasks. An FM-driven solution can also provide rationale for outputs, whereas a traditional classifier lacks this capability. In addition to these features, modern FMs are powerful enough to meet accuracy and latency requirements to replace supervised learning models.

In this post, we walk through how the Generative AI Innovation Center (GenAIIC) collaborated with leading property and casualty insurance carrier Travelers to develop an FM-based classifier through prompt engineering. Travelers receives millions of emails a year with agent or customer requests to service policies. The system GenAIIC and Travelers built uses the predictive capabilities of FMs to classify complex, and sometimes ambiguous, service request emails into several categories. This FM classifier powers the automation system that can save tens of thousands of hours of manual processing and redirect that time toward more complex tasks. With Anthropic’s Claude models on Amazon Bedrock, we formulated the problem as a classification task, and through prompt engineering and partnership with the business subject matter experts, we achieved 91% classification accuracy.

Problem Formulation

The main task was classifying emails received by Travelers into a service request category. Requests involved areas like address changes, coverage adjustments, payroll updates, or exposure changes. Although we used a pre-trained FM, the problem was formulated as a text classification task. However, instead of using supervised learning, which normally involves training resources, we used prompt engineering with few-shot prompting to predict the class of an email. This allowed us to use a pre-trained FM without having to incur the costs of training. The workflow started with an email, then, given the email’s text and any PDF attachments, the email was given a classification by the model.

It should be noted that fine-tuning an FM is another approach that could have improved the performance of the classifier with an additional cost. By curating a longer list of examples and expected outputs, an FM can be trained to perform better on a specific task. In this case, given the accuracy was already high by just using prompt engineering, the accuracy after fine-tuning would have to justify the cost. Although at the time of the engagement, Anthropic’s Claude models weren’t available for fine-tuning on Amazon Bedrock, now Anthropic’s Claude Haiku fine-tuning is in beta testing through Amazon Bedrock.

Overview of solution

The following diagram illustrates the solution pipeline to classify an email.

The workflow consists of the following steps:

  1. The raw email is ingested into the pipeline. The body text is extracted from the email text files.
  2. If the email has a PDF attachment, the PDF is parsed.
  3. The PDF is split into individual pages. Each page is saved as an image.
  4. The PDF page images are processed by Amazon Textract to extract text, specific entities, and table data using Optical Character Recognition (OCR).
  5. Text from the email is parsed.
  6. The text is then cleaned of HTML tags, if necessary.
  7. The text from the email body and PDF attachment are combined into a single prompt for the large language model (LLM).
  8. Anthropic’s Claude classifies this content into one of 13 defined categories and then returns that class. The predictions for each email are further used for analysis of performance.

Amazon Textract served multiple purposes, such as extracting the raw text of the forms included in as attachments in emails. Additional entity extraction and table data detection was included to identify names, policy numbers, dates, and more. The Amazon Textract output was then combined with the email text and given to the model to decide the appropriate class.

This solution is serverless, which has many benefits for the organization. With a serverless solution, AWS provides a managed solution, facilitating lower cost of ownership and reduced complexity of maintenance.

Data

The ground truth dataset contained over 4,000 labeled email examples. The raw emails were in Outlook .msg format and raw .eml format. Approximately 25% of the emails had PDF attachments, of which most were ACORD insurance forms. The PDF forms included additional details that provided a signal for the classifier. Only PDF attachments were processed to limit the scope; other attachments were ignored. For most examples, the body text contained the majority of the predictive signal that aligned with one of the 13 classes.

Prompt engineering

To build a strong prompt, we needed to fully understand the differences between categories to provide sufficient explanations for the FM. Through manually analyzing email texts and consulting with business experts, the prompt included a list of explicit instructions on how to classify an email. Additional instructions showed Anthropic’s Claude how to identify key phrases that help distinguish an email’s class from the others. The prompt also included few-shot examples that demonstrated how to perform the classification, and output examples that showed how the FM is to format its response. By providing the FM with examples and other prompting techniques, we were able to significantly reduce the variance in the structure and content of the FM output, leading to explainable, predictable, and repeatable results.

The structure of the prompt was as follows:

  • Persona definition
  • Overall instruction
  • Few-shot examples
  • Detailed definitions for each class
  • Email data input
  • Final output instruction

To learn more about prompt engineering for Anthropic’s Claude, refer to Prompt engineering in the Anthropic documentation.

“Claude’s ability to understand complex insurance terminology and nuanced policy language makes it particularly adept at tasks like email classification. Its capacity to interpret context and intent, even in ambiguous communications, aligns perfectly with the challenges faced in insurance operations. We’re excited to see how Travelers and AWS have harnessed these capabilities to create such an efficient solution, demonstrating the potential for AI to transform insurance processes.”

– Jonathan Pelosi, Anthropic

Results

For an FM-based classifier to be used in production, it must show a high level of accuracy. Initial testing without prompt engineering yielded 68% accuracy. After using a variety of techniques with Anthropic’s Claude v2, such as prompt engineering, condensing categories, adjusting document processing process, and improving instructions, accuracy increased to 91%. Anthropic’s Claude Instant on Amazon Bedrock also performed well, with 90% accuracy, with additional areas of improvement identified.

Conclusion

In this post, we discussed how FMs can reliably automate the classification of insurance service emails through prompt engineering. When formulating the problem as a classification task, an FM can perform well enough for production environments, while maintaining extensibility into other tasks and getting up and running quickly. All experiments were conducted using Anthropic’s Claude models on Amazon Bedrock.


About the Authors

Jordan Knight is a Senior Data Scientist working for Travelers in the Business Insurance Analytics & Research Department. His passion is for solving challenging real-world computer vision problems and exploring new state-of-the-art methods to do so. He has a particular interest in the social impact of ML models and how we can continue to improve modeling processes to develop ML solutions that are equitable for all. In his free time you can find him either rock climbing, hiking, or continuing to develop his somewhat rudimentary cooking skills.

Sara Reynolds is a Product Owner at Travelers. As a member of the Enterprise AI team, she has advanced efforts to transform processing within Operations using AI and cloud-based technologies. She recently earned her MBA and PhD in Learning Technologies and is serving as an Adjunct Professor at the University of North Texas.

George Lee is AVP, Data Science & Generative AI Lead for International at Travelers Insurance. He specializes in developing enterprise AI solutions, with expertise in Generative AI and Large Language Models. George has led several successful AI initiatives and holds two patents in AI-powered risk assessment. He received his Master’s in Computer Science from the University of Illinois at Urbana-Champaign.

Francisco Calderon is a Data Scientist at the Generative AI Innovation Center (GAIIC). As a member of the GAIIC, he helps discover the art of the possible with AWS customers using generative AI technologies. In his spare time, Francisco likes playing music and guitar, playing soccer with his daughters, and enjoying time with his family.

Isaac Privitera is a Principal Data Scientist with the AWS Generative AI Innovation Center, where he develops bespoke generative AI-based solutions to address customers’ business problems. His primary focus lies in building responsible AI systems, using techniques such as RAG, multi-agent systems, and model fine-tuning. When not immersed in the world of AI, Isaac can be found on the golf course, enjoying a football game, or hiking trails with his loyal canine companion, Barry.

Read More

Accelerate digital pathology slide annotation workflows on AWS using H-optimus-0

Accelerate digital pathology slide annotation workflows on AWS using H-optimus-0

Digital pathology is essential for the diagnosis and treatment of cancer, playing a critical role in healthcare delivery and pharmaceutical research and development. Pathology traditionally relies heavily on pathologist expertise and experience to conduct meticulous examination of tissue samples to identify abnormalities. However, the increasing complexity and volume of cases necessitate advanced tools to assist pathologists in making faster, more accurate diagnoses.

The digitization of pathology slides, known as whole slide images (WSIs), gave rise to the new field of computational pathology. By applying AI to these digitized WSIs, researchers are working to unlock new insights and enhance current annotations workflows. A pivotal advancement in the field of computational pathology has been the emergence of large-scale deep neural network architectures, known as foundation models (FMs). These models are trained using self-supervised learning algorithms on expansive datasets, enabling them to capture a comprehensive repertoire of visual representations and patterns inherent within pathology images. The power of FMs lies in their ability to learn robust and generalizable data embeddings that can be effectively transferred and fine-tuned for a wide variety of downstream tasks, ranging from automated disease detection and tissue characterization to quantitative biomarker analysis and pathological subtyping.

Recently, French startup Bioptimus announced the release of a new pathology vision FM: H-optimus-0, the world’s largest publicly available FM for pathology. With 1.1 billion parameters, H-optimus-0 was trained on a proprietary dataset of several hundreds of millions of images extracted from over 500,000 histopathology slides. This sets a new benchmark for state-of-the-art performance in critical medical diagnostic tasks, from identifying cancerous cells to detecting genetic abnormalities in tumors.

The recent addition of H-optimus-0 to Amazon SageMaker JumpStart marks a significant milestone in making advanced AI capabilities accessible to healthcare organizations. This powerful FM, with its comprehensive training on over 500,000 histopathology slides, represents a valuable tool for organizations looking to enhance their digital pathology workflows.

In this post, we demonstrate how to use H-optimus-0 for two common digital pathology tasks: patch-level analysis for detailed tissue examination, and slide-level analysis for broader diagnostic assessment. Through practical examples, we show you how to adapt this FM to these specific use cases while optimizing computational resources.

Solution overview

Our solution uses the AWS integrated ecosystem to create an efficient scalable pipeline for digital pathology AI workflows. The architecture combines the following services:

The following diagram illustrates the solution architecture for training and deploying fine-tuned FMs using H-optimus-0.

The following diagram illustrates the solution architecture for training and deploying fine-tuned FMs using H-optimus-0

This diagram illustrates the solution architecture for training and deploying fine-tuned FMs using H-optimus-0

This post provides example scripts and training notebooks in the following GitHub repository.

Prerequisites

We assume you have access to and are authenticated in an AWS account. The AWS CloudFormation template for this solution uses t3.medium instances to host the SageMaker notebook. Feature extraction uses g5.2xlarge instance types powered by NVIDIA T4 GPU tested in the us-west-2 AWS Region. Training jobs are run on p3.2xlarge and g5.2xlarge instances. Check your AWS service quotas to make sure you have sufficient access to these instance types.

Create the AWS infrastructure

To get started with pathology AI workflows, we use AWS CloudFormation to automate the setup of our core infrastructure. The provided infra-stack.yml template creates a complete environment ready for model fine-tuning and training.

Our CloudFormation stack configures a secure networking environment using Amazon Virtual Private Cloud (Amazon VPC), establishing both public and private subnets with appropriate gateways for internet connectivity. Within this network, it creates an EFS file system to efficiently store and serve large pathology slide images. The stack also provisions a SageMaker notebook instance that automatically connects to the EFS storage, providing seamless access to training data.

The template handles all necessary security configurations, including AWS Identity and Access Management (IAM) roles. When deploying the stack, make note of the private subnet and security group identifiers; you will need to make sure your training jobs can access the EFS data storage.

For detailed setup instructions and configuration options, refer to the README in our GitHub repository.

Use FMs for patch-level prediction tasks

Patch-level analysis is fundamental to digital pathology AI workflows. Instead of processing entire WSIs that can exceed several gigabytes, patch-level analysis focuses on specific tissue regions. This targeted approach enables efficient resource utilization and faster model development cycles. The following diagram illustrates the workflow of patch-level prediction tasks on a WSI.

 The following diagram illustrates the workflow of patch-level prediction tasks on a WSI

This diagram illustrates the workflow of patch-level prediction tasks on a WSI

Classification task: MHIST dataset

We demonstrate patch-level classification using the MHIST dataset, which contains colorectal polyp images. Early detection of potentially cancerous polyps directly impacts patient survival rates, making this a clinically relevant use case. By adding a simple classification head on top of H-optimus-0’s pretrained features and using linear probing, we achieve 83% accuracy. The implementation uses Amazon EFS for efficient data streaming and p3.2xlarge instances for optimal GPU utilization.

To access the MHIST dataset, submit a data request through their portal to obtain the annotations.csv file and images.zip file. Our repository includes a download_mhist.sh script that automatically downloads and organizes the data in your EFS storage.

Segmentation task: Lizard dataset

For our second patch-level task, we demonstrate nuclear segmentation using the Lizard dataset, which requires precise pixel-level predictions of nuclear boundaries in colon tissue. We adapt H-optimus-0 for segmentation by adding a Mask2Former ViT adapter head, allowing the model to generate detailed segmentation masks while using the FM’s powerful feature extraction capabilities.

The Lizard dataset is available on Kaggle, and our repository includes scripts to automatically download and prepare the data for training. The segmentation implementation runs on g5.16xlarge instances to handle the computational demands of pixel-level predictions.

Use FMs for WSI-level tasks

Analyzing entire WSIs presents unique challenges due to their massive size, often exceeding 50,000 x 50,000 pixels. To address this, we implement multiple instance learning (MIL), which treats each WSI as a collection of smaller patches. Our attention-based MIL approach automatically learns which regions are most relevant for the final prediction. The following diagram illustrates the workflow for WSI-level prediction tasks using MIL.

The following diagram illustrates the workflow for WSI-level prediction tasks using MIL

This diagram illustrates the workflow for WSI-level prediction tasks using MIL

WSI processing pipeline

Our implementation optimizes WSI analysis through the following methods:

  • Intelligent patching – We use the GPU-accelerated CuCIM library to efficiently load WSIs and apply Canny edge detection to identify and extract only tissue-containing regions
  • Feature extraction – The selected patches are processed in parallel using GPU acceleration, with features stored in space-efficient HDF5 format for downstream analysis

MSI status prediction

We demonstrate our WSI pipeline by predicting microsatellite instability (MSI) status, a crucial biomarker that guides immunotherapy decisions in cancer treatment. The TCGA-COAD dataset used for this task can be accessed through the GDC Data Portal, and our repository provides detailed instructions for downloading the WSIs and corresponding MSI labels.

Clean up

After you’ve finished, don’t forget to delete the associated resources (Amazon EFS storage and SageMaker notebook instances) to avoid unexpected costs.

Conclusion

In this post, we demonstrated how you can use AWS services to build scalable digital pathology AI workflows using the H-optimus-0 FM. Through practical examples of both patch-level tasks (MHIST classification and Lizard nuclear segmentation) and WSI analysis (MSI status prediction), we showed how to efficiently handle the unique challenges of computational pathology.

Our implementation highlights the seamless integration between AWS services for handling large-scale pathology data processing. Although we used Amazon EFS for this demonstration to enable high-throughput training workflows, production deployments might consider AWS HealthImaging for long-term storage of medical imaging data.

We hope this pipeline serves as a starting point for your own pathology AI initiatives. The provided GitHub repository contains the necessary components to help you begin building and scaling pathology workflows for your specific use cases. You can clone the repository and set up the infrastructure using the provided CloudFormation template. Then try fine-tuning H-optimus-0 on your own pathology datasets and downstream tasks and compare the results with your current methods.

We’d love to hear about your experiences and insights. Reach out to us or contribute to the publicly available FMs to help advance the field of computational pathology.


About the Authors

Pierre de Malliard is a Senior AI/ML Solutions Architect at Amazon Web Services and supports customers in the healthcare and life sciences industry. Pierre de Malliard is a Senior AI/ML Solutions Architect at Amazon Web Services and supports customers in the healthcare and life sciences industry. In his free time, Pierre enjoys skiing and exploring the New York food scene.

Christopher is a senior partner account manager at Amazon Web Services (AWS), helping independent software vendors (ISVs) innovate, build, and co-sell cloud-based healthcare software-as-a-service (SaaS) solutions in public sector. Part of the Healthcare and Life Sciences Technical Field Community (TFC), Christopher aims to accelerate the digitization and utilization of healthcare data to drive improved outcomes and personalized care delivery.

Read More

DeepSeek-R1 model now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

DeepSeek-R1 model now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

Today, we are announcing that DeepSeek AI’s first-generation frontier model, DeepSeek-R1, is available through Amazon SageMaker JumpStart and Amazon Bedrock Marketplace to deploy for inference. You can now use DeepSeek-R1 to build, experiment, and responsibly scale your generative AI ideas on AWS.

In this post, we demonstrate how to get started with DeepSeek-R1 on Amazon Bedrock and SageMaker JumpStart.

Overview of DeepSeek-R1

DeepSeek-R1 is a large language model (LLM) developed by DeepSeek-AI that uses reinforcement learning to enhance reasoning capabilities through a multi-stage training process from a DeepSeek-V3-Base foundation. A key distinguishing feature is its reinforcement learning (RL) step, which was used to refine the model’s responses beyond the standard pre-training and fine-tuning process. By incorporating RL, DeepSeek-R1 can adapt more effectively to user feedback and objectives, ultimately enhancing both relevance and clarity. In addition, DeepSeek-R1 employs a chain-of-thought (CoT) approach, meaning it’s equipped to break down complex queries and reason through them in a step-by-step manner. This guided reasoning process allows the model to produce more accurate, transparent, and detailed answers. This model combines RL-based fine-tuning with CoT capabilities, aiming to generate structured responses while focusing on interpretability and user interaction. With its wide-ranging capabilities DeepSeek-R1 has captured the industry’s attention as a versatile text-generation model that can be integrated into various workflows such as agents, logical reasoning and data interpretation tasks

DeepSeek-R1 uses a Mixture of Experts (MoE) architecture and is 671 billion parameters in size. The MoE architecture allows activation of 37 billion parameters, enabling efficient inference by routing queries to the most relevant expert “clusters.” This approach allows the model to specialize in different problem domains while maintaining overall efficiency. DeepSeek-R1 requires at least 800 GB of HBM memory in FP8 format for inference. In this post, we will use an ml.p5e.48xlarge instance to deploy the model. ml.p5e.48xlarge comes with 8 Nvidia H200 GPUs providing 1128 GB of GPU memory.

You can deploy DeepSeek-R1 model either through SageMaker JumpStart or Bedrock Marketplace. Because DeepSeek-R1 is an emerging model, we recommend deploying this model with guardrails in place. In this blog, we will use Amazon Bedrock Guardrails to introduce safeguards, prevent harmful content, and evaluate models against key safety criteria. At the time of writing this blog, for DeepSeek-R1 deployments on SageMaker JumpStart and Bedrock Marketplace, Bedrock Guardrails supports only the ApplyGuardrail API. You can create multiple guardrails tailored to different use cases and apply them to the DeepSeek-R1 model, improving user experiences and standardizing safety controls across your generative AI applications.

Prerequisites

To deploy the DeepSeek-R1 model, you need access to an ml.p5e instance. To check if you have quotas for P5e, open the Service Quotas console and under AWS Services, choose Amazon SageMaker, and confirm you’re using ml.p5e.48xlarge for endpoint usage. Make sure that you have at least one ml.P5e.48xlarge instance in the AWS Region you are deploying. To request a limit increase, create a limit increase request and reach out to your account team.

Because you will be deploying this model with Amazon Bedrock Guardrails, make sure you have the correct AWS Identity and Access Management (IAM) permissions to use Amazon Bedrock Guardrails. For instructions, see Set up permissions to use guardrails for content filtering.

Implementing guardrails with the ApplyGuardrail API

Amazon Bedrock Guardrails allows you to introduce safeguards, prevent harmful content, and evaluate models against key safety criteria. You can implement safety measures for the DeepSeek-R1 model using the Amazon Bedrock ApplyGuardrail API. This allows you to apply guardrails to evaluate user inputs and model responses deployed on Amazon Bedrock Marketplace and SageMaker JumpStart. You can create a guardrail using the Amazon Bedrock console or the API. For the example code to create the guardrail, see the GitHub repo.

The general flow involves the following steps: First, the system receives an input for the model. This input is then processed through the ApplyGuardrail API. If the input passes the guardrail check, it’s sent to the model for inference. After receiving the model’s output, another guardrail check is applied. If the output passes this final check, it’s returned as the final result. However, if either the input or output is intervened by the guardrail, a message is returned indicating the nature of the intervention and whether it occurred at the input or output stage. The examples showcased in the following sections demonstrate inference using this API.

Deploy DeepSeek-R1 in Amazon Bedrock Marketplace

Amazon Bedrock Marketplace gives you access to over 100 popular, emerging, and specialized foundation models (FMs) through Amazon Bedrock. To access DeepSeek-R1 in Amazon Bedrock, complete the following steps:

  1. On the Amazon Bedrock console, choose Model catalog under Foundation models in the navigation pane.
    At the time of writing this post, you can use the InvokeModel API to invoke the model. It doesn’t support Converse APIs and other Amazon Bedrock tooling.
  2. Filter for DeepSeek as a provider and choose the DeepSeek-R1 model.

    The model detail page provides essential information about the model’s capabilities, pricing structure, and implementation guidelines. You can find detailed usage instructions, including sample API calls and code snippets for integration. The model supports various text generation tasks, including content creation, code generation, and question answering, using its reinforcement learning optimization and CoT reasoning capabilities.
    The page also includes deployment options and licensing information to help you get started with DeepSeek-R1 in your applications.
  3. To begin using DeepSeek-R1, choose Deploy.

    You will be prompted to configure the deployment details for DeepSeek-R1. The model ID will be pre-populated.
  4. For Endpoint name, enter an endpoint name (between 1–50 alphanumeric characters).
  5. For Number of instances, enter a number of instances (between 1–100).
  6. For Instance type, choose your instance type. For optimal performance with DeepSeek-R1, a GPU-based instance type like ml.p5e.48xlarge is recommended.
    Optionally, you can configure advanced security and infrastructure settings, including virtual private cloud (VPC) networking, service role permissions, and encryption settings. For most use cases, the default settings will work well. However, for production deployments, you might want to review these settings to align with your organization’s security and compliance requirements.
  7. Choose Deploy to begin using the model.

    When the deployment is complete, you can test DeepSeek-R1’s capabilities directly in the Amazon Bedrock playground.
  8. Choose Open in playground to access an interactive interface where you can experiment with different prompts and adjust model parameters like temperature and maximum length.

This is an excellent way to explore the model’s reasoning and text generation abilities before integrating it into your applications. The playground provides immediate feedback, helping you understand how the model responds to various inputs and letting you fine-tune your prompts for optimal results.

You can quickly test the model in the playground through the UI. However, to invoke the deployed model programmatically with any Amazon Bedrock APIs, you need to get the endpoint ARN.

Run inference using guardrails with the deployed DeepSeek-R1 endpoint

The following code example demonstrates how to perform inference using a deployed DeepSeek-R1 model through Amazon Bedrock using the invoke_model and ApplyGuardrail API. You can create a guardrail using the Amazon Bedrock console or the API. For the example code to create the guardrail, see the GitHub repo. After you have created the guardrail, use the following code to implement guardrails. The script initializes the bedrock_runtime client, configures inference parameters, and sends a request to generate text based on a user prompt.

import boto3
import json

# Initialize Bedrock client
bedrock_runtime = boto3.client("bedrock-runtime")

# Configuration
MODEL_ID = "your-model-id"  # Bedrock model ID
GUARDRAIL_ID = "your-guardrail-id"
GUARDRAIL_VERSION = "your-guardrail-version"

def invoke_with_guardrails(prompt, max_tokens=1000, temperature=0.6, top_p=0.9):
    """
    Invoke Bedrock model with input and output guardrails
    """
    # Apply input guardrails
    input_guardrail = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=GUARDRAIL_ID,
        guardrailVersion=GUARDRAIL_VERSION,
        source='INPUT',
        content=[{"text": {"text": prompt}}]
    )
    
    if input_guardrail['action'] == 'GUARDRAIL_INTERVENED':
        return f"Input blocked: {input_guardrail['outputs'][0]['text']}"

    # Prepare model input
    request_body = {
        "inputs": f"""You are an AI assistant. Do as the user asks.
### Instruction: {prompt}
### Response: <think>""",
        "parameters": {
            "max_new_tokens": max_tokens,
            "top_p": top_p,
            "temperature": temperature
        }
    }

    # Invoke model
    response = bedrock_runtime.invoke_model(
        modelId=MODEL_ID,
        body=json.dumps(request_body)
    )
    
    # Parse model response
    model_output = json.loads(response['body'].read())['generated_text']

    # Apply output guardrails
    output_guardrail = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=GUARDRAIL_ID,
        guardrailVersion=GUARDRAIL_VERSION,
        source='OUTPUT',
        content=[{"text": {"text": model_output}}]
    )

    if output_guardrail['action'] == 'GUARDRAIL_INTERVENED':
        return f"Output blocked: {output_guardrail['outputs'][0]['text']}"
    
    return model_output

# Example usage
if __name__ == "__main__":
    prompt = "What's 1+1?"
    result = invoke_with_guardrails(prompt)
    print(result)

Deploy DeepSeek-R1 with SageMaker JumpStart

SageMaker JumpStart is a machine learning (ML) hub with FMs, built-in algorithms, and prebuilt ML solutions that you can deploy with just a few clicks. With SageMaker JumpStart, you can customize pre-trained models to your use case, with your data, and deploy them into production using either the UI or SDK.

Deploying DeepSeek-R1 model through SageMaker JumpStart offers two convenient approaches: using the intuitive SageMaker JumpStart UI or implementing programmatically through the SageMaker Python SDK. Let’s explore both methods to help you choose the approach that best suits your needs.

Deploy DeepSeek-R1 through SageMaker JumpStart UI

Complete the following steps to deploy DeepSeek-R1 using SageMaker JumpStart:

  1. On the SageMaker console, choose Studio in the navigation pane.
  2. First-time users will be prompted to create a domain.
  3. On the SageMaker Studio console, choose JumpStart in the navigation pane.

    The model browser displays available models, with details like the provider name and model capabilities.
  4. Search for DeepSeek-R1 to view the DeepSeek-R1 model card.
    Each model card shows key information, including:

    • Model name
    • Provider name
    • Task category (for example, Text Generation)
    • Bedrock Ready badge (if applicable), indicating that this model can be registered with Amazon Bedrock, allowing you to use Amazon Bedrock APIs to invoke the model
  5. Choose the model card to view the model details page.

    The model details page includes the following information:

    • The model name and provider information
    • Deploy button to deploy the model
    • About and Notebooks tabs with detailed information

    The About tab includes important details, such as:

    • Model description
    • License information
    • Technical specifications
    • Usage guidelines

    Before you deploy the model, it’s recommended to review the model details and license terms to confirm compatibility with your use case.

  6. Choose Deploy to proceed with deployment.
  7. For Endpoint name, use the automatically generated name or create a custom one.
  8. For Instance type¸ choose an instance type (default: ml.p5e.48xlarge).
  9. For Initial instance count, enter the number of instances (default: 1).
    Selecting appropriate instance types and counts is crucial for cost and performance optimization. Monitor your deployment to adjust these settings as needed.Under Inference type, Real-time inference is selected by default. This is optimized for sustained traffic and low latency.
  10. Review all configurations for accuracy. For this model, we strongly recommend adhering to SageMaker JumpStart default settings and making sure that network isolation remains in place.
  11. Choose Deploy to deploy the model.

The deployment process can take several minutes to complete.

When deployment is complete, your endpoint status will change to InService. At this point, the model is ready to accept inference requests through the endpoint. You can monitor the deployment progress on the SageMaker console Endpoints page, which will display relevant metrics and status information. When the deployment is complete, you can invoke the model using a SageMaker runtime client and integrate it with your applications.

Deploy DeepSeek-R1 using the SageMaker Python SDK

To get started with DeepSeek-R1 using the SageMaker Python SDK, you will need to install the SageMaker Python SDK and make sure you have the necessary AWS permissions and environment setup. The following is a step-by-step code example that demonstrates how to deploy and use DeepSeek-R1 for inference programmatically. The code for deploying the model is provided in the Github here . You can clone the notebook and run from SageMaker Studio.

!pip install --force-reinstall --no-cache-dir sagemaker==2.235.2

from sagemaker.serve.builder.model_builder import ModelBuilder 
from sagemaker.serve.builder.schema_builder import SchemaBuilder 
from sagemaker.jumpstart.model import ModelAccessConfig 
from sagemaker.session import Session 
import logging 

sagemaker_session = Session()
 
artifacts_bucket_name = sagemaker_session.default_bucket() 
execution_role_arn = sagemaker_session.get_caller_identity_arn()
 
js_model_id = "deepseek-llm-r1"

gpu_instance_type = "ml.p5e.48xlarge"
 
response = "Hello, I'm a language model, and I'm here to help you with your English."

 sample_input = {
 "inputs": "Hello, I'm a language model,",
 "parameters": {"max_new_tokens": 128, "top_p": 0.9, "temperature": 0.6},
 }
  
 sample_output = [{"generated_text": response}]
  
 schema_builder = SchemaBuilder(sample_input, sample_output)
  
 model_builder = ModelBuilder( 
 model=js_model_id, 
 schema_builder=schema_builder, 
 sagemaker_session=sagemaker_session, 
 role_arn=execution_role_arn, 
 log_level=logging.ERROR ) 
 
 model= model_builder.build() 
 predictor = model.deploy(model_access_configs={js_model_id:ModelAccessConfig(accept_eula=True)}, accept_eula=True) 
 
 
 predictor.predict(sample_input)

You can run additional requests against the predictor:

new_input = {
    "inputs": "What is Amazon doing in Generative AI?",
    "parameters": {"max_new_tokens": 64, "top_p": 0.8, "temperature": 0.7},
}

prediction = predictor.predict(new_input)
print(prediction)

Implement guardrails and run inference with your SageMaker JumpStart predictor

Similar to Amazon Bedrock, you can also use the ApplyGuardrail API with your SageMaker JumpStart predictor. You can create a guardrail using the Amazon Bedrock console or the API, and implement it as shown in the following code:

import boto3
import json
bedrock_runtime = boto3.client('bedrock-runtime')
sagemaker_runtime = boto3.client('sagemaker-runtime')

# Add your guardrail identifier and version created from Bedrock Console or AWSCLI
guardrail_id = "" # Your Guardrail ID
guardrail_version = "" # Your Guardrail Version
endpoint_name = "" # Endpoint Name

prompt = "What's 1+1 equal?"

# Apply guardrail to input before sending to model
input_guardrail_response = bedrock_runtime.apply_guardrail(
    guardrailIdentifier=guardrail_id,
    guardrailVersion=guardrail_version,
    source='INPUT',
    content=[{ "text": { "text": prompt }}]
)

# If input guardrail passes, proceed with model inference
if input_guardrail_response['action'] != 'GUARDRAIL_INTERVENED':
    # Prepare the input for the SageMaker endpoint
    template = f"""You are an AI assistant. Do as the user asks.
### Instruction: {prompt}
### Response: <think>"""
    
    input_payload = {
        "inputs": template,
        "parameters": {
            "max_new_tokens": 1000,
            "top_p": 0.9,
            "temperature": 0.6
        }
    }
    
    # Convert the payload to JSON string
    input_payload_json = json.dumps(input_payload)
    
    # Invoke the SageMaker endpoint
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='application/json',
        Body=input_payload_json
    )
    
    # Get the response from the model
    model_response = json.loads(response['Body'].read().decode())
    
    # Apply guardrail to output
    output_guardrail_response = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=guardrail_id,
        guardrailVersion=guardrail_version,
        source='OUTPUT',
        content=[{ "text": { "text": model_response['generated_text'] }}]
    )
    
    # Check if output passes guardrails
    if output_guardrail_response['action'] != 'GUARDRAIL_INTERVENED':
        print(model_response['generated_text'])
    else:
        print("Output blocked: ", output_guardrail_response['outputs'][0]['text'])
else:
    print("Input blocked: ", input_guardrail_response['outputs'][0]['text'])

Clean up

To avoid unwanted charges, complete the steps in this section to clean up your resources.

Delete the Amazon Bedrock Marketplace deployment

If you deployed the model using Amazon Bedrock Marketplace, complete the following steps:

  1. On the Amazon Bedrock console, under Foundation models in the navigation pane, choose Marketplace deployments.
  2. In the Managed deployments section, locate the endpoint you want to delete.
  3. Select the endpoint, and on the Actions menu, choose Delete.
  4. Verify the endpoint details to make sure you’re deleting the correct deployment:
    1. Endpoint name
    2. Model name
    3. Endpoint status
  5. Choose Delete to delete the endpoint.
  6. In the deletion confirmation dialog, review the warning message, enter confirm, and choose Delete to permanently remove the endpoint.

Delete the SageMaker JumpStart predictor

The SageMaker JumpStart model you deployed will incur costs if you leave it running. Use the following code to delete the endpoint if you want to stop incurring charges. For more details, see Delete Endpoints and Resources.

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we explored how you can access and deploy the DeepSeek-R1 model using Bedrock Marketplace and SageMaker JumpStart. Visit SageMaker JumpStart in SageMaker Studio or Amazon Bedrock Marketplace now to get started. For more information, refer to Use Amazon Bedrock tooling with Amazon SageMaker JumpStart models, SageMaker JumpStart pretrained models, Amazon SageMaker JumpStart Foundation Models, Amazon Bedrock Marketplace, and Getting started with Amazon SageMaker JumpStart.


About the Authors

Vivek Gangasani is a Lead Specialist Solutions Architect for Inference at AWS. He helps emerging generative AI companies build innovative solutions using AWS services and accelerated compute. Currently, he is focused on developing strategies for fine-tuning and optimizing the inference performance of large language models. In his free time, Vivek enjoys hiking, watching movies, and trying different cuisines.

Niithiyn Vijeaswaran is a Generative AI Specialist Solutions Architect with the Third-Party Model Science team at AWS. His area of focus is AWS AI accelerators (AWS Neuron). He holds a Bachelor’s degree in Computer Science and Bioinformatics.

Jonathan Evans is a Specialist Solutions Architect working on generative AI with the Third-Party Model Science team at AWS.

Banu Nagasundaram leads product, engineering, and strategic partnerships for Amazon SageMaker JumpStart, SageMaker’s machine learning and generative AI hub. She is passionate about building solutions that help customers accelerate their AI journey and unlock business value.

Read More

Streamline grant proposal reviews using Amazon Bedrock

Streamline grant proposal reviews using Amazon Bedrock

Government and non-profit organizations evaluating grant proposals face a significant challenge: sifting through hundreds of detailed submissions, each with unique merits, to identify the most promising initiatives. This arduous, time-consuming process is typically the first step in the grant management process, which is critical to driving meaningful social impact.

The AWS Social Responsibility & Impact (SRI) team recognized an opportunity to augment this function using generative AI. The team developed an innovative solution to streamline grant proposal review and evaluation by using the natural language processing (NLP) capabilities of Amazon Bedrock. Amazon Bedrock is a fully managed service that lets you use your choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities that you need to build generative AI applications with security, privacy, and responsible AI.

Historically, AWS Health Equity Initiative applications were reviewed manually by a review committee. It took 14 or more days each cycle for all applications to be fully reviewed. On average, the program received 90 applications per cycle. The June 2024 AWS Health Equity Initiative application cycle received 139 applications, the program’s largest influx to date. It would have taken an estimated 21 days for the review committee to manually process these many applications. The Amazon Bedrock centered approach reduced the review time to 2 days (a 90% reduction).

The goal was to enhance the efficiency and consistency of the review process, empowering customers to build impactful solutions faster. By combining the advanced NLP capabilities of Amazon Bedrock with thoughtful prompt engineering, the team created a dynamic, data-driven, and equitable solution demonstrating the transformative potential of large language models (LLMs) in the social impact domain.

In this post, we explore the technical implementation details and key learnings from the team’s Amazon Bedrock powered grant proposal review solution, providing a blueprint for organizations seeking to optimize their grants management processes.

Building an effective prompt for reviewing grant proposals using generative AI

Prompt engineering is the art of crafting effective prompts to instruct and guide generative AI models, such as LLMs, to produce the desired outputs. By thoughtfully designing prompts, practitioners can unlock the full potential of generative AI systems and apply them to a wide range of real-world scenarios.

When building a prompt for our Amazon Bedrock model to review grant proposals, we used multiple prompt engineering techniques to make sure the model’s responses were tailored, structured, and actionable. This included assigning the model a specific persona, providing step-by-step instructions, and specifying the desired output format.

First, we assigned the model the persona of an expert in public health, with a focus on improving healthcare outcomes for underserved populations. This context helps prime the model to evaluate the proposal from the perspective of a subject matter expert (SME) who thinks holistically about global challenges and community-level impact. By clearly defining the persona, we make sure the model’s responses are tailored to the desired evaluation lens.

Your task is to review a proposal document from the perspective of a given persona, and assess it based on dimensions defined in a rubric. Here are the steps to follow:

1. Review the provided proposal document: {PROPOSAL}

2. Adopt the perspective of the given persona: {PERSONA}

Multiple personas can be assigned against the same rubric to account for various perspectives. For example, when the persona “Public Health Subject Matter Expert” was assigned, the model provided keen insights on the project’s impact potential and evidence basis. When the persona “Venture Capitalist” was assigned, the model provided more robust feedback on the organization’s articulated milestones and sustainability plan for post funding. Similarly, when the persona “Software Development Engineer” was assigned, the model relayed subject matter expertise on the proposed use of AWS technology.

Next, we broke down the review process into a structured set of instructions for the model to follow. This includes reviewing the proposal, assessing it across specific dimensions (impact potential, innovation, feasibility, sustainability), and then providing an overall summary and score. Outlining these step-by-step directives gives the model clear guidance on the required task elements and helps produce a comprehensive and consistent assessment.

3. Assess the proposal based on each dimension in the provided rubric: {RUBRIC}

For each dimension, follow this structure:
<Dimension Name>
 <Summary> Provide a brief summary (2-3 sentences) of your assessment of how well the proposal meets the criteria for this dimension from the perspective of the given persona. </Summary>
 <Score> Provide a score from 0 to 100 for this dimension. Start with a default score of 0 and increase it based on the information in the proposal. </Score>
 <Recommendations> Provide 2-3 specific recommendations for how the author could improve the proposal in this dimension. </Recommendations>
</Dimension Name>

4. After assessing each dimension, provide an <Overall Summary> section with:
 - An overall assessment summary (3-4 sentences) of the proposal's strengths and weaknesses across all dimensions from the persona's perspective
 - Any additional feedback beyond the rubric dimensions
 - Identification of any potential risks or biases in the proposal or your assessment

5. Finally, calculate the <Overall Weighted Score> by applying the weightings specified in the rubric to your scores for each dimension.

Finally, we specified the desired output format as JSON, with distinct sections for the dimensional assessments, overall summary, and overall score. Prescribing this structured response format makes sure that the model’s output can be ingested, stored, and analyzed by our grant review team, rather than being delivered in free-form text. This level of control over the output helps streamline the downstream use of the model’s evaluations.

6. Return your assessment in JSON format with the following structure:

{{ "dimensions": [ {{ "name": "<Dimension Name>", "summary": "<Summary>", "score": <Score>, "recommendations": [ "<Recommendation 1>", "<Recommendation 2>", ... ] }}, ... ], "overall_summary": "<Overall Summary>","overall_score": <Overall Weighted Score> }}

Do not include any other commentary beyond following the specified structure. Focus solely on providing the assessment based on the given inputs.

By combining these prompt engineering techniques—role assignment, step-by-step instructions, and output formatting—we were able to craft a prompt that elicits thorough, objective, and actionable grant proposal assessments from our generative AI model. This structured approach enables us to effectively use the model’s capabilities to support our grant review process in a scalable and efficient manner.

Building a dynamic proposal review application with Streamlit and generative AI

To demonstrate and test the capabilities of a dynamic proposal review solution, we built a rapid prototype implementation using Streamlit, Amazon Bedrock, and Amazon DynamoDB. It’s important to note that this implementation isn’t intended for production use, but rather serves as a proof of concept and a starting point for further development. The application allows users to define and save various personas and evaluation rubrics, which can then be dynamically applied when reviewing proposal submissions. This approach enables a tailored and relevant assessment of each proposal, based on the specified criteria.

The application’s architecture consists of several key components, which we discuss in this section.

The team used DynamoDB, a NoSQL database, to store the personas, rubrics, and submitted proposals. The data stored in DynamoDB was sent to Streamlit, a web application interface. On Streamlit, the team added the persona and rubric to the prompt and sent the prompt to Amazon Bedrock.

import boto3
import json

from api.personas import Persona
from api.rubrics import Rubric
from api.submissions import Submission

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

def _construct_prompt(persona: Persona, rubric: Rubric, submission: Submission):
    rubric_dimensions = [
        f"{dimension['name']}|{dimension['description']}|{dimension['weight']}"
        for dimension in rubric.dimensions
    ]

    # add the table headers the prompt is expecting to the front of the dimensions list
    rubric_dimensions[:0] = ["dimension_name|dimension_description|dimension_weight"]
    rubric_string = "n".join(rubric_dimensions)
    print(rubric_string)

    with open("prompt/prompt_template.txt", "r") as prompt:
        prompt = prompt.read()
        print(prompt)
        return prompt.format(
            PROPOSAL=submission.content,
            PERSONA=persona.description,
            RUBRIC=rubric_string,
        )

Amazon Bedrock used Anthropic’s Claude 3 Sonnet FM to evaluate the submitted proposals against the prompt. The model’s prompts are dynamically generated based on the selected persona and rubric. Amazon Bedrock would send the evaluation results back to Streamlit for team review.

def get_assessment(submission: Submission, persona: Persona, rubric: Rubric):
    prompt = _construct_prompt(persona, rubric, submission)

    body = json.dumps(
        {
            "anthropic_version": "",
            "max_tokens": 2000,
            "temperature": 0.5,
            "top_p": 1,
            "messages": [{"role": "user", "content": prompt}],
        }
    )
    response = bedrock.invoke_model(
        body=body, modelId="anthropic.claude-3-haiku-20240307-v1:0"
    )
    response_body = json.loads(response.get("body").read())
    return response_body.get("content")[0].get("text")

The following diagram illustrates the show of the preceding figure.

The workflow consists of the following steps:

  1. Users can create and manage personas and rubrics through the Streamlit application. These are stored in the DynamoDB database.

  2. When a user submits a proposal for review, they choose the desired persona and rubric from the available options.
  3. The Streamlit application generates a dynamic prompt for the Amazon Bedrock model, incorporating the selected persona and rubric details.
  4. The Amazon Bedrock model evaluates the proposal based on the dynamic prompt and returns the assessment results.
  5. The evaluation results are stored in the DynamoDB database and presented to the user through the Streamlit application.

Impact

This rapid prototype demonstrates the potential for a scalable and flexible proposal review process, allowing organizations to:

  • Reduce application processing time by up to 90%
  • Streamline the review process by automating the evaluation tasks
  • Capture structured data on the proposals and assessments for further analysis
  • Incorporate diverse perspectives by enabling the use of multiple personas and rubrics

Throughout the implementation, the AWS SRI team focused on creating an interactive and user-friendly experience. By working hands-on with the Streamlit application and observing the impact of dynamic persona and rubric selection, users can gain practical experience in building AI-powered applications that address real-world challenges.

Considerations for a production-grade implementation

Although the rapid prototype demonstrates the potential of this solution, a production-grade implementation requires additional considerations and the use of additional AWS services. Some key considerations include:

  • Scalability and performance – For handling large volumes of proposals and concurrent users, a serverless architecture using AWS Lambda, Amazon API Gateway, DynamoDB, and Amazon Simple Storage Service (Amazon S3) would provide improved scalability, availability, and reliability.
  • Security and compliance – Depending on the sensitivity of the data involved, additional security measures such as encryption, authentication and access control, and auditing are necessary. Services like AWS Key Management Service (KMS), Amazon Cognito, AWS Identity and Access Management (IAM), and AWS CloudTrail can help meet these requirements.
  • Monitoring and logging – Implementing robust monitoring and logging mechanisms using services like Amazon CloudWatch and AWS X-Ray enable tracking performance, identifying issues, and maintaining compliance.
  • Automated testing and deployment – Implementing automated testing and deployment pipelines using services like AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy help provide consistent and reliable deployments, reducing the risk of errors and downtime.
  • Cost optimization – Implementing cost optimization strategies, such as using AWS Cost Explorer and AWS Budgets, can help manage costs and help maintain efficient resource utilization.
  • Responsible AI considerations – Implementing safeguards—such as Amazon Bedrock Guardrails—and monitoring mechanisms can help enforce the responsible and ethical use of the generative AI model, including bias detection, content moderation, and human oversight. Although the AWS Health Equity Initiative application form collected customer information such as name, email address, and country of operation, this was systematically omitted when sent to the Amazon Bedrock enabled tool to avoid bias in the model and protect customer data.

By using the full suite of AWS services and following best practices for security, scalability, and responsible AI, organizations can build a production-ready solution that meets their specific requirements while achieving compliance, reliability, and cost-effectiveness.

Conclusion

Amazon Bedrock—coupled with effective prompt engineering—enabled AWS SRI to review grant proposals and deliver awards to customers in days instead of weeks. The skills developed in this project—such as building web applications with Streamlit, integrating with NoSQL databases like DynamoDB, and customizing generative AI prompts—are highly transferable and applicable to a wide range of industries and use cases.


About the authors

Carolyn Vigil  is a Global Lead for AWS Social Responsibility & Impact Customer Engagement. She drives strategic initiatives that leverage cloud computing for social impact worldwide. A passionate advocate for underserved communities, she has co-founded two non-profit organizations serving individuals with developmental disabilities and their families. Carolyn enjoys Mountain adventures with her family and friends in her free time.

Lauren Hollis is a Program Manager for AWS Social Responsibility and Impact. She leverages her background in economics, healthcare research, and technology to support mission-driven organizations deliver social impact using AWS cloud technology.  In her free time, Lauren enjoys reading an playing the piano and cello.

 Ben West is a hands-on builder with experience in machine learning, big data analytics, and full-stack software development. As a technical program manager on the AWS Social Responsibility & Impact team, Ben leverages a wide variety of cloud, edge, and Internet of Things (IoT) technologies to develop innovative prototypes and help public sector organizations make a positive impact in the world.  Ben is an Army Veteran that enjoys cooking and being outdoors.

Mike Haggerty is a Senior Systems Development Engineer (Sr. SysDE) at Amazon Web Services (AWS), working within the PACE-EDGE team. In this role, he contributes to AWS’s edge computing initiatives as part of the Worldwide Public Sector (WWPS) organization’s PACE (Prototyping and Customer Engineering) team. Beyond his professional duties, Mike is a pet therapy volunteer who, together with his dog Gnocchi, provides support services at local community facilities.

Read More

How Aetion is using generative AI and Amazon Bedrock to unlock hidden insights about patient populations

How Aetion is using generative AI and Amazon Bedrock to unlock hidden insights about patient populations

The real-world data collected and derived from patient journeys offers a wealth of insights into patient characteristics and outcomes and the effectiveness and safety of medical innovations. Researchers ask questions about patient populations in the form of structured queries; however, without the right choice of structured query and deep familiarity with complex real-world patient datasets, many trends and patterns can remain undiscovered.

Aetion is a leading provider of decision-grade real-world evidence software to biopharma, payors, and regulatory agencies. The company provides comprehensive solutions to healthcare and life science customers to transform real-world data into real-world evidence.

The use of unsupervised learning methods on semi-structured data along with generative AI has been transformative in unlocking hidden insights. With Aetion Discover, users can conduct rapid, exploratory analyses with real-world data while experiencing a structured approach to research questions. To help accelerate data exploration and hypothesis generation, Discover uses unsupervised learning methods to uncover Smart Subgroups. These subgroups of patients within a larger population display similar characteristics or profiles across a vast range of factors, including diagnoses, procedures, and therapies.

In this post, we review how Aetion’s Smart Subgroups Interpreter enables users to interact with Smart Subgroups using natural language queries. Powered by Amazon Bedrock and Anthropic’s Claude 3 large language models (LLMs), the interpreter responds to user questions expressed in conversational language about patient subgroups and provides insights to generate further hypotheses and evidence. Aetion chose to use Amazon Bedrock for working with LLMs due to its vast model selection from multiple providers, security posture, extensibility, and ease of use.

Amazon Bedrock is a fully managed service that provides access to high-performing foundation models (FMs) from leading AI startups and Amazon through a unified API. It offers a wide range of FMs, allowing you to choose the model that best suits your specific use case.

Aetion’s technology

Aetion uses the science of causal inference to generate real-world evidence on the safety, effectiveness, and value of medications and clinical interventions. Aetion has partnered with the majority of top 20 biopharma, leading payors, and regulatory agencies.

Aetion brings deep scientific expertise and technology to life sciences, regulatory agencies (including FDA and EMA), payors, and health technology assessment (HTA) customers in the US, Canada, Europe, and Japan with analytics that can achieve the following:

  • Optimize clinical trials by identifying target populations, creating external control arms, and contextualizing settings and populations underrepresented in controlled settings
  • Expand industry access through label changes, pricing, coverage, and formulary decisions
  • Conduct safety and effectiveness studies for medications, treatments, and diagnostics

Aetion’s applications, including Discover and Aetion Substantiate, are powered by the Aetion Evidence Platform (AEP), a core longitudinal analytic engine capable of applying rigorous causal inference and statistical methods to hundreds of millions of patient journeys.

AetionAI is a set of generative AI capabilities embedded across the core environment and applications. Smart Subgroups Interpreter is an AetionAI feature in Discover.

The following figure illustrates the organization of Aetion’s services.

Aetion Services

Smart Subgroups

For a user-specified patient population, the Smart Subgroups feature identifies clusters of patients with similar characteristics (for example, similar prevalence profiles of diagnoses, procedures, and therapies).

These subgroups are further classified and labeled by generative AI models based on each subgroup’s prevalent characteristics. For example, as shown in the following generated heat map, the first two Smart Subgroups within a population of patients who were prescribed GLP-1 agonists are labeled “Cataract and Retinal Disease” and “Inflammatory Skin Conditions,” respectively, to capture their defining characteristics.

Smart Subgroups

After the subgroups are displayed, a user engages with AetionAI to probe further with inquiries expressed in natural language. The user can express questions about the subgroups, such as “What are the most common characteristics for patients in the cataract disorders subgroup?” As shown in the following screenshot, AetionAI responds to the user in natural language, citing relevant subgroup statistics in its response.

A user might also ask AetionAI detailed questions such as “Compare the prevalence of cardiovascular diseases or conditions among the ‘Dulaglutide’ group vs the overall population.” The following screenshot shows AetionAI’s response.

In this example, the insights enable the user to hypothesize that Dulaglutide patients might experience fewer circulatory signs and symptoms. They can explore this further in Aetion Substantiate to produce decision-grade evidence with causal inference to assess the effectiveness of Dulaglutide use in cardiovascular disease outcomes.

Solution overview

Smart Subgroups Interpreter combines elements of unsupervised machine learning with generative AI to uncover hidden patterns in real-world data. The following diagram illustrates the workflow.

Let’s review each step in detail:

  • Create the patient population – Users define a patient population using the Aetion Measure Library (AML) features. The AML feature store standardizes variable definitions using scientifically validated algorithms. The user selects the AML features that define the patient population for analysis.
  • Generate features for the patient population – The AEP computes over 1,000 AML features for each patient across various categories, such as diagnoses, therapies, and procedures.
  • Build clusters and summarize cluster features – The Smart Subgroups component trains a topic model using the patient features to determine the optimal number of clusters and assign patients to clusters. The prevalences of the most distinctive features within each cluster, as determined by a trained classification model, are used to describe the cluster characteristics.
  • Generate cluster names and answer user queries – A prompt engineering technique for Anthropic’s Claude 3 Haiku on Amazon Bedrock generates descriptive cluster names and answers user queries. Amazon Bedrock provides access to LLMs from a variety of model providers. Anthropic’s Claude 3 Haiku was selected as the model due to its speed and satisfactory intelligence level.

The solution uses Amazon Simple Storage Service (Amazon S3) and Amazon Aurora for data persistence and data exchange, and Amazon Bedrock with Anthropic’s Claude 3 Haiku models for cluster names generation. Discover and its transactional and batch applications are deployed and scaled on a Kubernetes on AWS cluster to optimize performance, user experience, and portability.

The following diagram illustrates the solution architecture.

Solution - flow
Solution Architecture

The workflow includes the following steps:

  1. Users create Smart Subgroups for their patient population of interest.
  2. AEP uses real-world data and a custom query language to compute over 1,000 science-validated features for the user-selected population. The features are stored in Amazon S3 and encrypted with AWS Key Management Service (AWS KMS) for downstream use.
  3. The Smart Subgroups component trains the clustering algorithm and summarizes the most important features of each cluster. The cluster feature summaries are stored in Amazon S3 and displayed as a heat map to the user. Smart Subgroups is deployed as a Kubernetes job and is run on demand.
  4. Users interact with the Interpreter API microservice by using questions expressed in natural language to retrieve descriptive subgroup names. The data transmitted to the service is encrypted using Transport Layer Security 1.2 (TLS). The Interpreter API uses composite prompt engineering techniques with Anthropic’s Claude 3 Haiku to answer user queries:
    • Versioned prompt templates generate descriptive subgroup names and answer user queries.
    • AML features are added to the prompt template. For example, the description of the feature “Benign Ovarian Cyst” is expanded in a prompt to the LLM as “This measure covers different types of cysts that can form in or on a woman’s ovaries, including follicular cysts, corpus luteum cysts, endometriosis, and unspecified ovarian cysts.”
    • Lastly, the top feature prevalences of each subgroup are added to the prompt template. For example: “In Smart Subgroup 1 the relative prevalence of ‘Cornea and external disease (EYE001)’ is 30.32% In Smart Subgroup 1 the relative prevalence of ‘Glaucoma (EYE003)’ is 9.94%…”
  5. Amazon Bedrock responds back to the application that displays the heat map to the user.

Outcomes

Smart Subgroups Interpreter enables users of the AEP who are unfamiliar with real-world data to discover patterns among patient populations using natural language queries. Users now can turn findings from such discoveries into hypotheses for further analyses across Aetion’s software to generate decision-grade evidence in a matter of minutes, as opposed to days, and without the need of support staff.

Conclusion

In this post, we demonstrated how Aetion uses Amazon Bedrock and other AWS services to help users uncover meaningful patterns within patient populations, even without prior expertise in real-world data. These discoveries lay the groundwork for deeper analysis within Aetion’s Evidence Platform, generating decision-grade evidence that drives smarter, data-informed outcomes.

As we continue expanding our generative AI capabilities, Aetion remains committed to enhancing user experiences and accelerating the journey from real-world data to real-world evidence.

With Amazon Bedrock, the future of innovation is at your fingertips. Explore Generative AI Application Builder on AWS to learn more about building generative AI capabilities to unlock new insights, build transformative solutions, and shape the future of healthcare today.


About the Authors

Javier Beltrán is a Senior Machine Learning Engineer at Aetion. His career has focused on natural language processing, and he has experience applying machine learning solutions to various domains, from healthcare to social media.

Ornela Xhelili is a Staff Machine Learning Architect at Aetion. Ornela specializes in natural language processing, predictive analytics, and MLOps, and holds a Master’s of Science in Statistics. Ornela has spent the past 8 years building AI/ML products for tech startups across various domains, including healthcare, finance, analytics, and ecommerce.

Prasidh Chhabri is a Product Manager at Aetion, leading the Aetion Evidence Platform, core analytics, and AI/ML capabilities. He has extensive experience building quantitative and statistical methods to solve problems in human health.

Mikhail Vaynshteyn is a Solutions Architect with Amazon Web Services. Mikhail works with healthcare life sciences customers and specializes in data analytics services. Mikhail has more than 20 years of industry experience covering a wide range of technologies and sectors.

Read More

Deploy DeepSeek-R1 Distilled Llama models in Amazon Bedrock

Deploy DeepSeek-R1 Distilled Llama models in Amazon Bedrock

Open foundation models (FMs) have become a cornerstone of generative AI innovation, enabling organizations to build and customize AI applications while maintaining control over their costs and deployment strategies. By providing high-quality, openly available models, the AI community fosters rapid iteration, knowledge sharing, and cost-effective solutions that benefit both developers and end-users. DeepSeek AI, a research company focused on advancing AI technology, has emerged as a significant contributor to this ecosystem. Their DeepSeek-R1 models represent a family of large language models (LLMs) designed to handle a wide range of tasks, from code generation to general reasoning, while maintaining competitive performance and efficiency.

Amazon Bedrock Custom Model Import enables the import and use of your customized models alongside existing FMs through a single serverless, unified API. You can access your imported custom models on-demand and without the need to manage underlying infrastructure. Accelerate your generative AI application development by integrating your supported custom models with native Bedrock tools and features like Knowledge Bases, Guardrails, and Agents.

In this post, we explore how to deploy distilled versions of DeepSeek-R1 with Amazon Bedrock Custom Model Import, making them accessible to organizations looking to use state-of-the-art AI capabilities within the secure and scalable AWS infrastructure at an effective cost.

DeepSeek-R1 distilled variations

From the foundation of DeepSeek-R1, DeepSeek AI has created a series of distilled models based on both Meta’s Llama and Qwen architectures, ranging from 1.5–70 billion parameters. The distillation process involves training smaller, more efficient models to mimic the behavior and reasoning patterns of the larger DeepSeek-R1 model by using it as a teacher—essentially transferring the knowledge and capabilities of the 671 billion parameter model into more compact architectures. The resulting distilled models, such as DeepSeek-R1-Distill-Llama-8B (from base model Llama-3.1-8B) and DeepSeek-R1-Distill-Llama-70B (from base model Llama-3.3-70B-Instruct), offer different trade-offs between performance and resource requirements. Although distilled models might show some reduction in reasoning capabilities compared to the original 671B model, they significantly improve inference speed and reduce computational costs. For instance, smaller distilled models like the 8B version can process requests much faster and consume fewer resources, making them more cost-effective for production deployments, whereas larger distilled versions like the 70B model maintain closer performance to the original while still offering meaningful efficiency gains.

Solution overview

In this post, we demonstrate how to deploy distilled versions of DeepSeek-R1 models using Amazon Bedrock Custom Model Import. We focus on importing the variants currently supported DeepSeek-R1-Distill-Llama-8B and DeepSeek-R1-Distill-Llama-70B, which offer an optimal balance between performance and resource efficiency. You can import these models from Amazon Simple Storage Service (Amazon S3) or an Amazon SageMaker AI model repo, and deploy them in a fully managed and serverless environment through Amazon Bedrock. The following diagram illustrates the end-to-end flow.

In this workflow, model artifacts stored in Amazon S3 are imported into Amazon Bedrock, which then handles the deployment and scaling of the model automatically. This serverless approach eliminates the need for infrastructure management while providing enterprise-grade security and scalability.

You can use the Amazon Bedrock console for deploying using the graphical interface and following the instructions in this post, or alternatively use the following notebook to deploy programmatically with the Amazon Bedrock SDK.

Prerequisites

You should have the following prerequisites:

Prepare the model package

Complete the following steps to prepare the model package:

  1. Download the DeepSeek-R1-Distill-Llama model artifacts from Hugging Face, from one of the following links, depending on the model you want to deploy:
    1. https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/tree/main
    2. https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B/tree/main

For more information, you can follow the Hugging Face’s Downloading models or Download files from the hub instructions.

You typically need the following files:

    • Model configuration file: config.json
    • Tokenizer files: tokenizer.json, tokenizer_config.json, and tokenizer.mode
    • Model weights files in .safetensors format
  1. Upload these files to a folder in your S3 bucket, in the same AWS Region where you plan to use Amazon Bedrock. Take note of the S3 path you’re using.

Import the model

Complete the following steps to import the model:

  1. On the Amazon Bedrock console, choose Imported models under Foundation models in the navigation pane.
  1. Choose Import model.
  1. For Model name, enter a name for your model (it’s recommended to use a versioning scheme in your name, for tracking your imported model).
  2. For Import job name, enter a name for your import job.
  3. For Model import settings, select Amazon S3 bucket as your import source, and enter the S3 path you noted earlier (provide the full path in the form s3://<your-bucket>/folder-with-model-artifacts/).
  4. For Encryption, optionally choose to customize your encryption settings.
  5. For Service access role, choose to either create a new IAM role or provide your own.
  6. Choose Import model.

Importing the model will take several minutes depending on the model being imported (for example, the Distill-Llama-8B model could take 5–20 minutes to complete).

Watch this video demo for a step-by-step guide.

Test the imported model

After you import the model, you can test it by using the Amazon Bedrock Playground or directly through the Amazon Bedrock invocation APIs. To use the Playground, complete the following steps:

  1. On the Amazon Bedrock console, choose Chat / Text under Playgrounds in the navigation pane.
  2. From the model selector, choose your imported model name.
  3. Adjust the inference parameters as needed and write your test prompt. For example:
    <|begin▁of▁sentence|><|User|>Given the following financial data: - Company A's revenue grew from $10M to $15M in 2023 - Operating costs increased by 20% - Initial operating costs were $7M Calculate the company's operating margin for 2023. Please reason step by step, and put your final answer within \boxed{}<|Assistant|>

As we’re using an imported model in the playground, we must include the “beginning_of_sentence” and “user/assistant” tags to properly format the context for DeepSeek models; these tags help the model understand the structure of the conversation and provide more accurate responses. If you’re following the programmatic approach in the following notebook then this is being automatically taken care of by configuring the model.

  1. Review the model response and metrics provided.

Note: When you invoke the model for the first time, if you encounter a ModelNotReadyException error the SDK automatically retries the request with exponential backoff. The restoration time varies depending on the on-demand fleet size and model size. You can customize the retry behavior using the AWS SDK for Python (Boto3) Config object. For more information, see Handling ModelNotReadyException.

Once you are ready to import the model, use this step-by-step video demo to help you get started.

Pricing

Custom Model Import enables you to use your custom model weights within Amazon Bedrock for supported architectures, serving them alongside Amazon Bedrock hosted FMs in a fully managed way through On-Demand mode. Custom Model Import does not charge for model import, you are charged for inference based on two factors: the number of active model copies and their duration of activity.

Billing occurs in 5-minute windows, starting from the first successful invocation of each model copy. The pricing per model copy per minute varies based on factors including architecture, context length, region, and compute unit version, and is tiered by model copy size. The Custom Model Units required for hosting depends on the model’s architecture, parameter count, and context length, with examples ranging from 2 Units for a Llama 3.1 8B 128K model to 8 Units for a Llama 3.1 70B 128K model.

Amazon Bedrock automatically manages scaling, maintaining zero to three model copies by default (adjustable through Service Quotas) based on your usage patterns. If there are no invocations for 5 minutes, it scales to zero and scales up when needed, though this may involve cold-start latency of tens of seconds. Additional copies are added if inference volume consistently exceeds single-copy concurrency limits. The maximum throughput and concurrency per copy is determined during import, based on factors such as input/output token mix, hardware type, model size, architecture, and inference optimizations.

Consider the following pricing example: An application developer imports a customized Llama 3.1 type model that is 8B parameter in size with a 128K sequence length in us-east-1 region and deletes the model after 1 month. This requires 2 Custom Model Units. So, the price per minute will be $0.1570 and the model storage costs will be $3.90 for the month.

For more information, see Amazon Bedrock pricing.

Benchmarks

DeepSeek has published benchmarks comparing their distilled models against the original DeepSeek-R1 and base Llama models, available in the model repositories. The benchmarks show that depending on the task DeepSeek-R1-Distill-Llama-70B maintains between 80-90% of the original model’s reasoning capabilities, while the 8B version achieves between 59-92% performance with significantly reduced resource requirements. Both distilled versions demonstrate improvements over their corresponding base Llama models in specific reasoning tasks.

Other considerations

When deploying DeepSeek models in Amazon Bedrock, consider the following aspects:

  • Model versioning is essential. Because Custom Model Import creates unique models for each import, implement a clear versioning strategy in your model names to track different versions and variations.
  • The current supported model formats focus on Llama-based architectures. Although DeepSeek-R1 distilled versions offer excellent performance, the AI ecosystem continues evolving rapidly. Keep an eye on the Amazon Bedrock model catalog as new architectures and larger models become available through the platform.
  • Evaluate your use case requirements carefully. Although larger models like DeepSeek-R1-Distill-Llama-70B provide better performance, the 8B version might offer sufficient capability for many applications at a lower cost.
  • Consider implementing monitoring and observability. Amazon CloudWatch provides metrics for your imported models, helping you track usage patterns and performance. You can monitor costs with AWS Cost Explorer.
  • Start with a lower concurrency quota and scale up based on actual usage patterns. The default limit of three concurrent model copies per account is suitable for most initial deployments.

Conclusion

Amazon Bedrock Custom Model Import empowers organizations to use powerful publicly available models like DeepSeek-R1 distilled versions, among others, while benefiting from enterprise-grade infrastructure. The serverless nature of Amazon Bedrock eliminates the complexity of managing model deployments and operations, allowing teams to focus on building applications rather than infrastructure. With features like auto scaling, pay-per-use pricing, and seamless integration with AWS services, Amazon Bedrock provides a production-ready environment for AI workloads. The combination of DeepSeek’s innovative distillation approach and the Amazon Bedrock managed infrastructure offers an optimal balance of performance, cost, and operational efficiency. Organizations can start with smaller models and scale up as needed, while maintaining full control over their model deployments and benefiting from AWS security and compliance capabilities.

The ability to choose between proprietary and open FMs Amazon Bedrock gives organizations the flexibility to optimize for their specific needs. Open models enable cost-effective deployment with full control over the model artifacts, making them ideal for scenarios where customization, cost optimization, or model transparency are crucial. This flexibility, combined with the Amazon Bedrock unified API and enterprise-grade infrastructure, allows organizations to build resilient AI strategies that can adapt as their requirements evolve.

For more information, refer to the Amazon Bedrock User Guide.


About the Authors

Raj Pathak is a Principal Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Generative AI, Natural Language Processing, Intelligent Document Processing, and MLOps.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.

Morgan Rankey is a Solutions Architect based in New York City, specializing in Hedge Funds. He excels in assisting customers to build resilient workloads within the AWS ecosystem. Prior to joining AWS, Morgan led the Sales Engineering team at Riskified through its IPO. He began his career by focusing on AI/ML solutions for machine asset management, serving some of the largest automotive companies globally.

Harsh Patel is an AWS Solutions Architect supporting 200+ SMB customers across the United States to drive digital transformation through cloud-native solutions. As an AI&ML Specialist, he focuses on Generative AI, Computer Vision, Reinforcement Learning and Anomaly Detection. Outside the tech world, he recharges by hitting the golf course and embarking on scenic hikes with his dog.

Read More

Generative AI operating models in enterprise organizations with Amazon Bedrock

Generative AI operating models in enterprise organizations with Amazon Bedrock

Generative AI can revolutionize organizations by enabling the creation of innovative applications that offer enhanced customer and employee experiences. Intelligent document processing, translation and summarization, flexible and insightful responses for customer support agents, personalized marketing content, and image and code generation are a few use cases using generative AI that organizations are rolling out in production.

Large organizations often have many business units with multiple lines of business (LOBs), with a central governing entity, and typically use AWS Organizations with an Amazon Web Services (AWS) multi-account strategy. They implement landing zones to automate secure account creation and streamline management across accounts, including logging, monitoring, and auditing. Although LOBs operate their own accounts and workloads, a central team, such as the Cloud Center of Excellence (CCoE), manages identity, guardrails, and access policies

As generative AI adoption grows, organizations should establish a generative AI operating model. An operating model defines the organizational design, core processes, technologies, roles and responsibilities, governance structures, and financial models that drive a business’s operations.

In this post, we evaluate different generative AI operating model architectures that could be adopted.

Operating model patterns

Organizations can adopt different operating models for generative AI, depending on their priorities around agility, governance, and centralized control. Governance in the context of generative AI refers to the frameworks, policies, and processes that streamline the responsible development, deployment, and use of these technologies. It encompasses a range of measures aimed at mitigating risks, promoting accountability, and aligning generative AI systems with ethical principles and organizational objectives. Three common operating model patterns are decentralized, centralized, and federated, as shown in the following diagram.

common operating model patterns, decentralized, centralized, and federated

Decentralized model

In a decentralized approach, generative AI development and deployment are initiated and managed by the individual LOBs themselves. LOBs have autonomy over their AI workflows, models, and data within their respective AWS accounts.

This enables faster time-to-market and agility because LOBs can rapidly experiment and roll out generative AI solutions tailored to their needs. However, even in a decentralized model, often LOBs must align with central governance controls and obtain approvals from the CCoE team for production deployment, adhering to global enterprise standards for areas such as access policies, model risk management, data privacy, and compliance posture, which can introduce governance complexities.

Centralized model

In a centralized operating model, all generative AI activities go through a central generative artificial intelligence and machine learning (AI/ML) team that provisions and manages end-to-end AI workflows, models, and data across the enterprise.

LOBs interact with the central team for their AI needs, trading off agility and potentially increased time-to-market for stronger top-down governance. A centralized model may introduce bottlenecks that slow down time-to-market, so organizations need to adequately resource the team with sufficient personnel and automated processes to meet the demand from various LOBs efficiently. Failure to scale the team can negate the governance benefits of a centralized approach.

Federated model

A federated model strikes a balance by having key activities of the generative AI processes managed by a central generative AI/ML platform team.

While LOBs drive their AI use cases, the central team governs guardrails, model risk management, data privacy, and compliance posture. This enables agile LOB innovation while providing centralized oversight on governance areas.

Generative AI architecture components

Before diving deeper into the common operating model patterns, this section provides a brief overview of a few components and AWS services used in the featured architectures.

Large language models

Large language models (LLMs) are large-scale ML models that contain billions of parameters and are pre-trained on vast amounts of data. LLMs may hallucinate, which means a model can provide a confident but factually incorrect response. Furthermore, the data that the model was trained on might be out of date, which leads to providing inaccurate responses. One way to mitigate LLMs from giving incorrect information is by using a technique known as Retrieval Augmented Generation (RAG). RAG is an advanced natural language processing technique that combines knowledge retrieval with generative text models. RAG combines the powers of pre-trained language models with a retrieval-based approach to generate more informed and accurate responses. To set up RAG, you need to have a vector database to provide your model with related source documents. Using RAG, the relevant document segments or other texts are retrieved and shared with LLMs to generate targeted responses with enhanced content quality and relevance.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies, including AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon using a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Amazon SageMaker JumpStart provides access to proprietary FMs from third-party providers such as AI21 Labs, Cohere, and LightOn. In addition, Amazon SageMaker JumpStart onboards and maintains open source FMs from third-party sources such as Hugging Face.

Data sources, embeddings, and vector store

Organizations’ domain-specific data, which provides context and relevance, typically resides in internal databases, data lakes, unstructured data repositories, or document stores, collectively referred to as organizational data sources or proprietary data stores.

A vector store is a system you can use to store and query vectors at scale, with efficient nearest neighbor query algorithms and appropriate indexes to improve data retrieval. It includes not only the embeddings of an organization’s data (mathematical representation of data in the form of vectors) but also raw text from the data in chunks. These vectors are generated by specialized embedding LLMs, which process the organization’s text chunks to create numerical representations (vectors), which are stored with the text chunks in the vector store. For a comprehensive read about vector store and embeddings, you can refer to The role of vector databases in generative AI applications.

With Amazon Bedrock Knowledge Bases, you securely connect FMs in Amazon Bedrock to your company data for RAG. Amazon Bedrock Knowledge Bases facilitates data ingestion from various supported data sources; manages data chunking, parsing, and embeddings; and populates the vector store with the embeddings. With all that provided as a service, you can think of Amazon Bedrock Knowledge Bases as a fully managed and serverless option to build powerful conversational AI systems using RAG.

Guardrails

Content filtering mechanisms are implemented as safeguards to control user-AI interactions, aligning with application requirements and responsible AI policies by minimizing undesirable and harmful content. Guardrails can check user inputs and FM outputs and filter or deny topics that are unsafe, redact personally identifiable information (PII), and enhance content safety and privacy in generative AI applications.

Amazon Bedrock Guardrails is a feature of Amazon Bedrock that you can use to put safeguards in place. You determine what qualifies based on your company policies. These safeguards are FM agnostic. You can create multiple guardrails with different configurations tailored to specific use cases. For a review on Amazon Bedrock Guardrails, you can refer to these blog posts: Guardrails for Amazon Bedrock helps implement safeguards customized to your use cases and responsible AI policies and Guardrails for Amazon Bedrock now available with new safety filters and privacy controls.

Operating model architectures

This section provides an overview of the three kinds of operating models.

Decentralized operating model

In a decentralized operating model, LOB teams maintain control and ownership of their AWS accounts. Each LOB configures and orchestrates generative AI components, common functionalities, applications, and Amazon Bedrock configurations within their respective AWS accounts. This model empowers LOBs to tailor their generative AI solutions according to their specific requirements, while taking advantage of the power of Amazon Bedrock.

With this model, the LOBs configure the core components, such as LLMs and guardrails, and the Amazon Bedrock service account manages the hosting, execution, and provisioning of interface endpoints. These endpoints enable LOBs to access and interact with the Amazon Bedrock services they’ve configured.

Each LOB performs monitoring and auditing of their configured Amazon Bedrock services within their account, using Amazon CloudWatch Logs and AWS CloudTrail for log capture, analysis, and auditing tailored to their needs. Amazon Bedrock cost and usage will be recorded in each LOB’s AWS accounts. By adopting this decentralized model, LOBs retain control over their generative AI solutions through a decentralized configuration, while benefiting from the scalability, reliability, and security of Amazon Bedrock.

The following diagram shows the architecture of the decentralized operating model.

decentralized model architecture

Centralized operating model

The centralized AWS account serves as the primary hub for configuring and managing the core generative AI functionalities, including reusable agents, prompt flows, and shared libraries. LOB teams contribute their business-specific requirements and use cases to the centralized team, which then integrates and orchestrates the appropriate generative AI components within the centralized account.

Although the orchestration and configuration of generative AI solutions reside in the centralized account, they often require interaction with LOB-specific resources and services. To facilitate this, the centralized account uses API gateways or other integration points provided by the LOBs’ AWS accounts. These integration points enable secure and controlled communication between the centralized generative AI orchestration and the LOBs’ business-specific applications, data sources, or services. This centralized operating model promotes consistency, governance, and scalability of generative AI solutions across the organization.

The centralized team maintains adherence to common standards, best practices, and organizational policies, while also enabling efficient sharing and reuse of generative AI components. Furthermore, the core components of Amazon Bedrock, such as LLMs and guardrails, continue to be hosted and executed by AWS in the Amazon Bedrock service account, promoting secure, scalable, and high-performance execution environments for these critical components. In this centralized model, monitoring and auditing of Amazon Bedrock can be achieved within the centralized account, allowing for comprehensive monitoring, auditing, and analysis of all generative AI activities and configurations. Amazon CloudWatch Logs provides a unified view of generative AI operations across the organization.

By consolidating the orchestration and configuration of generative AI solutions in a centralized account while enabling secure integration with LOB-specific resources, this operating model promotes standardization, governance, and centralized control over generative AI operations. It uses the scalability, reliability, security, and centralized monitoring capabilities of AWS managed infrastructure and services, while still allowing for integration with LOB-specific requirements and use cases.

The following is the architecture for a centralized operating model.

centralized model architecture

Federated operating model

In a federated model, Amazon Bedrock enables a collaborative approach where LOB teams can develop and contribute common generative AI functionalities within their respective AWS accounts. These common functionalities, such as reusable agents, prompt flows, or shared libraries, can then be migrated to a centralized AWS account managed by a dedicated team or CCoE.

The centralized AWS account acts as a hub for integrating and orchestrating these common generative AI components, providing a unified platform for action groups and prompt flows. Although the orchestration and configuration of generative AI solutions remain within the LOBs’ AWS accounts, they can use the centralized Amazon Bedrock agents, prompt flows, and other shared components defined in the centralized account.

This federated model allows LOBs to retain control over their generative AI solutions, tailoring them to specific business requirements while benefiting from the reusable and centrally managed components. The centralized account maintains consistency, governance, and scalability of these shared generative AI components, promoting collaboration and standardization across the organization.

Organizations frequently prefer storing sensitive data, including Payment Card Industry (PCI), PII, General Data Protection Regulation (GDPR), and Health Insurance Portability and Accountability Act (HIPAA) information, within their respective LOB AWS accounts. This approach makes sure that LOBs maintain control over their sensitive business data in the vector store while preventing centralized teams from accessing it without proper governance and security measures.

A federated model combines decentralized development, centralized integration, and centralized monitoring. This operating model fosters collaboration, reusability, and standardization while empowering LOBs to retain control over their generative AI solutions. It uses the scalability, reliability, security, and centralized monitoring capabilities of AWS managed infrastructure and services, promoting a harmonious balance between autonomy and governance.

The following is the architecture for a federated operating model.

federated model architecture

Cost management

Organizations may want to analyze Amazon Bedrock usage and costs per LOB. To track the cost and usage of FMs across LOBs’ AWS accounts, solutions that record model invocations per LOB can be implemented.

Amazon Bedrock now supports model invocation resources that use inference profiles. Inference profiles can be defined to track Amazon Bedrock usage metrics, monitor model invocation requests, or route model invocation requests to multiple AWS Regions for increased throughput.

There are two types of inference profiles. Cross-Region inference profiles, which are predefined in Amazon Bedrock and include multiple AWS Regions to which requests for a model can be routed. The other is application inference profiles, which are user created to track cost and model usage when submitting on-demand model invocation requests. You can attach custom tags, such as cost allocation tags, to your application inference profiles. When submitting a prompt, you can include an inference profile ID or its Amazon Resource Name (ARN). This capability enables organizations to track and monitor costs for various LOBs, cost centers, or applications. For a detailed explanation of application inference profiles refer to this post: Track, allocate, and manage your generative AI cost and usage with Amazon Bedrock.

Conclusion

Although enterprises often begin with a centralized operating model, the rapid pace of development in generative AI technologies, the need for agility, and the desire to quickly capture value often lead organizations to converge on a federated operating model.

In a federated operating model, lines of business have the freedom to innovate and experiment with generative AI solutions, taking advantage of their domain expertise and proximity to business problems. Key aspects of the AI workflow, such as data access policies, model risk management, and compliance monitoring, are managed by a central cloud governance team. Successful generative AI solutions developed by a line of business can be promoted and productionized by the central team for enterprise-wide re-use.

This federated model fosters innovation from the lines of business closest to domain problems. Simultaneously, it allows the central team to curate, harden, and scale those solutions adherent to organizational policies, then redeploy them efficiently to other relevant areas of the business.

To sustain this operating model, enterprises often establish a dedicated product team with a business owner that works in partnership with lines of business. This team is responsible for continually evolving the operating model, refactoring and enhancing the generative AI services to help meet the changing needs of the lines of business and keep up with the rapid advancements in LLMs and other generative AI technologies.

Federated operating models strike a balance, mitigating the risks of fully decentralized initiatives while minimizing bottlenecks from overly centralized approaches. By empowering business agility with curation by a central team, enterprises can accelerate compliant, high-quality generative AI capabilities aligned with their innovation goals, risk tolerances, and need for rapid value delivery in the evolving AI landscape.

As enterprises look to capitalize on the generative AI revolution, Amazon Bedrock provides the ideal foundation to establish a flexible operating model tailored to their organization’s needs. Whether you’re starting with a centralized, decentralized, or federated approach, AWS offers a comprehensive suite of services to support the full generative AI lifecycle.

Try Amazon Bedrock and let us know your feedback on how you’re planning to implement the operating model that suits your organization.


About the Authors

Martin Tunstall is a Principal Solutions Architect at AWS. With over three decades of experience in the finance sector, he helps global finance and insurance customers unlock the full potential of Amazon Web Services (AWS).

Yashar Araghi is a Senior Solutions Architect at AWS. He has over 20 years of experience designing and building infrastructure and application security solutions. He has worked with customers across various industries such as government, education, finance, energy, and utilities. In the last 6 years at AWS, Yashar has helped customers design, build, and operate their cloud solutions that are secure, reliable, performant and cost optimized in the AWS Cloud.

Read More