Unlocking the power of Model Context Protocol (MCP) on AWS

Unlocking the power of Model Context Protocol (MCP) on AWS

We’ve witnessed remarkable advances in model capabilities as generative AI companies have invested in developing their offerings. Language models such as Anthropic’s Claude Opus 4 & Sonnet 4, Amazon Nova, and Amazon Bedrock can reason, write, and generate responses with increasing sophistication. But even as these models grow more powerful, they can only work with the information available to them.

No matter how impressive a model might be, it’s confined to the data it was trained on or what’s manually provided in its context window. It’s like having the world’s best analyst locked in a room with incomplete files—brilliant, but isolated from your organization’s most current and relevant information.

This isolation creates three critical challenges for enterprises using generative AI:

  1. Information silos trap valuable data behind custom APIs and proprietary interfaces
  2. Integration complexity requires building and maintaining bespoke connectors and glue code for every data source or tool provided to the language model for every data source
  3. Scalability bottlenecks appear as organizations attempt to connect more models to more systems and tools

Sound familiar? If you’re an AI-focused developer, technical decision-maker, or solution architect working with Amazon Web Services (AWS) and language models, you’ve likely encountered these obstacles firsthand. Let’s explore how the Model Context Protocol (MCP) offers a path forward.

What is the MCP?

The MCP is an open standard that creates a universal language for AI systems to communicate with external data sources, tools, and services. Conceptually, MCP functions as a universal translator, enabling seamless dialogue between language models and the diverse systems where your valuable information resides.

Developed by Anthropic and released as an open source project, MCP addresses a fundamental challenge: how to provide AI models with consistent, secure access to the information they need, when they need it, regardless of where that information lives.

MCP deployment diagram showing client interaction with local and internet-based MCP servers

At its core, MCP implements a client-server architecture:

  • MCP clients are AI applications like Anthropic’s Claude Desktop or custom solutions built on Amazon Bedrock that need access to external data
  • MCP servers provide standardized access to specific data sources, whether that’s a GitHub repository, Slack workspace, or AWS service
  • Communication flow between clients and servers follows a well-defined protocol that can run locally or remotely

This architecture supports three essential primitives that form the foundation of MCP:

  1. Tools – Functions that models can call to retrieve information or perform actions
  2. Resources – Data that can be included in the model’s context such as database records, images, or file contents
  3. Prompts – Templates that guide how models interact with specific tools or resources

What makes MCP especially powerful is its ability to work across both local and remote implementations. You can run MCP servers directly on your development machine for testing or deploy them as distributed services across your AWS infrastructure for enterprise-scale applications.

Solving the M×N integration problem

Before diving deeper into the AWS specific implementation details, it’s worth understanding the fundamental integration challenge MCP solves.

Imagine you’re building AI applications that need to access multiple data sources in your organization. Without a standardized protocol, you face what we call the “M×N problem”: for M different AI applications connecting to N different data sources, you need to build and maintain M×N custom integrations.

This creates an integration matrix that quickly becomes unmanageable as your organization adds more AI applications and data sources. Each new system requires multiple custom integrations, with development teams duplicating efforts across projects. MCP transforms this M×N problem into a simpler M+N equation: with MCP, you build M clients and N servers, requiring only M+N implementations. These solutions to the MCP problem are shown in the following diagram.

Visualization showing how MCP reduces integration complexity from 9 to 6 implementations

This approach draws inspiration from other successful protocols that solved similar challenges:

  • APIs standardized how web applications interact with the backend
  • Language Server Protocol (LSP) standardizes how integrated development environments (IDEs) interact with language-specific tools for coding

In the same way that these protocols revolutionized their domains, MCP is poised to transform how AI applications interact with the diverse landscape of data sources in modern enterprises.

Why MCP matters for AWS users

For AWS customers, MCP represents a particularly compelling opportunity. AWS offers hundreds of services, each with its own APIs and data formats. By adopting MCP as a standardized protocol for AI interactions, you can:

  1. Streamline integration between Amazon Bedrock language models and AWS data services
  2. Use existing AWS security mechanisms such as AWS Identity and Access Management (IAM) for consistent access control
  3. Build composable, scalable AI solutions that align with AWS architectural best practices

MCP and the AWS service landscape

What makes MCP particularly powerful in the AWS context is how it can interface with the broader AWS service landscape. Imagine AI applications that can seamlessly access information from:

MCP servers act as consistent interfaces to these diverse data sources, providing language models with a unified access pattern regardless of the underlying AWS service architecture. This alleviates the need for custom integration code for each service and enables AI systems to work with your AWS resources in a way that respects your existing security boundaries and access controls.

In the remaining sections of this post, we explore how MCP works with AWS services, examine specific implementation examples, and provide guidance for technical decision-makers considering adopt MCP in their organizations.

How MCP works with AWS services, particularly Amazon Bedrock

Now that we’ve shown the fundamental value proposition of MCP, we dive into how it integrates with AWS services, with a special focus on Amazon Bedrock. This integration creates a powerful foundation for building context-aware AI applications that can securely access your organization’s data and tools.

Amazon Bedrock and language models

Amazon Bedrock represents the strategic commitment by AWS to make foundation models (FMs) accessible, secure, and enterprise-ready. It’s a fully managed service that provides a unified API across multiple leading language models, including:

  • Anthropic’s Claude
  • Meta’s Llama
  • Amazon Titan and Amazon Nova

What makes Amazon Bedrock particularly compelling for enterprise deployments is its integration with the broader AWS landscape. You can run FMs with the same security, compliance, and operational tools you already use for your AWS workloads. This includes IAM for access control and CloudWatch for monitoring.

At the heart of the versatility of Amazon Bedrock is the Converse API—the interface that enables multiturn conversations with language models. The Converse API includes built-in support for what AWS calls “tool use,” allowing models to:

  1. Recognize when they need information outside their training data
  2. Request that information from external systems using well-defined function calls
  3. Incorporate the returned data into their responses

This tool use capability in the Amazon Bedrock Converse API dovetails perfectly with MCP’s design, creating a natural integration point.

MCP and Amazon Bedrock integration architecture

Integrating MCP with Amazon Bedrock involves creating a bridge between the model’s ability to request information (through the Converse API) and MCP’s standardized protocol for accessing external systems.

Integration flow walkthrough

To help you understand how MCP and Amazon Bedrock work together in practice, we walk through a typical interaction flow, step-by-step:

  1. The user initiates a query through your application interface:

"What were our Q1 sales figures for the Northwest region?"

  1. Your application forwards the query to Amazon Bedrock through the Converse API:
   # Initialize the Bedrock runtime client with your AWS credentials
   bedrock = boto3.client(service_name='bedrock-runtime', region_name='us-east-1')
   
   # Define the query from the user
   user_query = "What were our Q1 sales figures for the Northwest region?"
   
   # available_tools contains tool definitions that match MCP server capabilities
   # These will be exposed to the model through the Converse API
   
   # Call the Converse API with the user's query and available tools
   response = bedrock.converse(
       modelId="us.anthropic.claude-3-7-sonnet-20250219-v1:0",  # Specify which language model to use
       messages=[{"role": "user", "content": [{"text": user_query}]}],  # Format the user's message
       toolConfig={"tools": available_tools}  # Pass the tool definitions to the model
   )
  1. Amazon Bedrock processes the query and determines that it needs financial data that isn’t in its training data
  2. Amazon Bedrock returns a toolUse message, requesting access to a specific tool:
   {
     "role": "assistant",  // Indicates this message is from the model
     "content": [{
       "toolUse": {  // The model is requesting to use a tool
         "toolUseId": "tu_01234567",  // Unique identifier for this tool use request
         "name": "query_sales_data",  // Name of the tool the model wants to use
         "input": {  // Parameters for the tool call
           "quarter": "Q1",  // The model extracted this parameter from the user query
           "region": "Northwest"  // Another parameter extracted from the user query
         }
       }
     }]
   }
  1. Your MCP client application receives this toolUse message and translates it into an MCP protocol
    tool call
  2. The MCP client routes the request to the appropriate MCP server (in this case, a server connected to your
    financial database)
  3. The MCP server executes the tool, retrieving the requested data from your systems:
   # Call the tool through the MCP protocol
   # session is the MCP client session established earlier
   result = await session.call_tool(
       "query_sales_data",  # The tool name from the toolUse message
       {
           "quarter": "Q1",  # Pass through the parameters from the toolUse message
           "region": "Northwest"
       }
   )
   # The MCP server handles authentication, data access, and result formatting
   # This abstracts away the complexity of accessing different data sources
  1. The tool results are returned through the MCP protocol to your client application
  2. Your application sends the results back to Amazon Bedrock as a toolResult message:
   {
     "role": "user",  // This is sent as if from the user, but contains tool results
     "content": [{
       "toolResult": {  // Indicates this is a result from a tool
         "toolUseId": "tu_01234567",  // Must match the ID from the original toolUse
         "content": [{
           "json": {  // Results are formatted as JSON
             "total_sales": 12450000,  // Numerical data accessible to the model
             "growth": 0.12,  // Percentage growth for analysis
             "top_products": ["Product A", "Product B", "Product C"]  // List data
           }
         }]
       }
     }]
   }
  1. Amazon Bedrock generates a final response incorporating the tool results:
“Based on the data I've retrieved, our Q1 sales figures for the Northwest region were $12.45 million, 
representing a 12% growth compared to the previous quarter. 
The top-performing products were Product A, Product B, and Product C.”
  1. Your application returns the final response to the user

This entire process, illustrated in the following diagram, happens in seconds, giving users the impression of a seamless conversation with an AI that has direct access to their organization’s data. Behind the scenes, MCP is handling the complex work of securely routing requests to the right tools and data sources.

Streamlined sequence diagram showing core MCP message flow from user query to final response

In the next section, we explore a practical implementation example that shows how to connect an MCP server to Amazon Bedrock Knowledge Bases, providing a blueprint for your own implementations.

Practical implementation example: Amazon Bedrock Knowledge Bases integration

As you might recall from our earlier discussion of strategic use cases, enterprise knowledge bases represent one of the most valuable applications of MCP on AWS. Now, we explore a concrete implementation of MCP that connects language models to Amazon Bedrock Knowledge Bases. The code for the MCP server can be found in the AWS Labs MCP code repository and for the client in the same AWS Labs MCP samples directory on GitHub. This example brings to life the “universal translator” concept we introduced earlier, demonstrating how MCP can transform the way AI systems interact with enterprise knowledge repositories.

Understanding the challenge

Enterprise knowledge bases contain vast repositories of information—from documentation and policies to technical guides and product specifications. Traditional search approaches are often inadequate when users ask natural language questions, failing to understand context or identify the most relevant content.

Amazon Bedrock Knowledge Bases provide vector search capabilities that improve upon traditional keyword search, but even this approach has limitations:

  1. Manual filter configuration requires predefined knowledge of metadata structures
  2. Query-result mismatch occurs when users don’t use the exact terminology in the knowledge base
  3. Relevance challenges arise when similar documents compete for attention
  4. Context switching between searching and reasoning disrupts user experience

The MCP server we explore addresses these challenges by creating an intelligent layer between language models and knowledge bases.

Architecture overview

At a high level, our MCP server for Amazon Bedrock Knowledge Bases follows a clean, well-organized architecture that builds upon the client-server pattern we outlined previously. The server exposes two key interfaces to language models:

  1. A knowledge bases resource that provides discovery capabilities for available knowledge bases
  2. A query tool that enables dynamic searching across these knowledge bases

Detailed MCP Bedrock architecture with intelligent query processing workflow and AWS service connections

Remember the M×N integration problem we discussed earlier? This implementation provides a tangible example of how MCP solves it – creating a standardized interface between a large language model and your Amazon Bedrock Knowledge Base repositories.

Knowledge base discovery resource

The server begins with a resource that enables language models to discover available knowledge bases:

@mcp.resource(uri='resource://knowledgebases', name='KnowledgeBases', mime_type='application/json')
async def knowledgebases_resource() -> str:
    """List all available Amazon Bedrock Knowledge Bases and their data sources.
 
    This resource returns a mapping of knowledge base IDs to their details, including:
    - name: The human-readable name of the knowledge base
    - data_sources: A list of data sources within the knowledge base, each with:
      - id: The unique identifier of the data source
      - name: The human-readable name of the data source
 
    ## Example response structure:
    ```json
    {
        "kb-12345": {
            "name": "Customer Support KB",
            "data_sources": [
                {"id": "ds-abc123", "name": "Technical Documentation"},
                {"id": "ds-def456", "name": "FAQs"}
            ]
        },
        "kb-67890": {
            "name": "Product Information KB",
            "data_sources": [
                {"id": "ds-ghi789", "name": "Product Specifications"}
            ]
        }
    }
    ```
 
    ## How to use this information:
    1. Extract the knowledge base IDs (like "kb-12345") for use with the QueryKnowledgeBases tool
    2. Note the data source IDs if you want to filter queries to specific data sources
    3. Use the names to determine which knowledge base and data source(s) are most relevant to the user's query
    """
    return json.dumps(await discover_knowledge_bases(kb_agent_mgmt_client, kb_inclusion_tag_key)) 

This resource serves as both documentation and a discovery mechanism that language models can use to identify available knowledge bases before querying them.

Querying knowledge bases with the MCP tool

The core functionality of this MCP server resides in its QueryKnowledgeBases tool:

@mcp.tool(name='QueryKnowledgeBases')
async def query_knowledge_bases_tool(
    query: str = Field(
        ..., description='A natural language query to search the knowledge base with'
    ),
    knowledge_base_id: str = Field(
        ...,
        description='The knowledge base ID to query. It must be a valid ID from the resource://knowledgebases MCP resource',
    ),
    number_of_results: int = Field(
        10,
        description='The number of results to return. Use smaller values for focused results and larger values for broader coverage.',
    ),
    reranking: bool = Field(
        kb_reranking_enabled,
        description='Whether to rerank the results. Useful for improving relevance and sorting. Can be globally configured with BEDROCK_KB_RERANKING_ENABLED environment variable.',
    ),
    reranking_model_name: Literal['COHERE', 'AMAZON'] = Field(
        'AMAZON',
        description="The name of the reranking model to use. Options: 'COHERE', 'AMAZON'",
    ),
    data_source_ids: Optional[List[str]] = Field(
        None,
        description='The data source IDs to filter the knowledge base by. It must be a list of valid data source IDs from the resource://knowledgebases MCP resource',
    ),
) -> str:
    """Query an Amazon Bedrock Knowledge Base using natural language.
 
    ## Usage Requirements
    - You MUST first use the `resource://knowledgebases` resource to get valid knowledge base IDs
    - You can query different knowledge bases or make multiple queries to the same knowledge base
 
    ## Query Tips
    - Use clear, specific natural language queries for best results
    - You can use this tool MULTIPLE TIMES with different queries to gather comprehensive information
    - Break complex questions into multiple focused queries
    - Consider querying for factual information and explanations separately
     """
## Additional Implementation details …

What makes this tool powerful is its flexibility in querying knowledge bases with natural language. It supports several key features:

  1. Configurable result sizes – Adjust the number of results based on whether you need focused or comprehensive information
  2. Optional reranking – Improve relevance using language models (such as reranking models from Amazon or Cohere)
  3. Data source filtering – Target specific sections of the knowledge base when needed

Reranking is disabled by default in this implementation but can be quickly enabled through environment variables or direct parameter configuration.

Enhanced relevance with reranking

A notable feature of this implementation is the ability to rerank search results using language models available through Amazon Bedrock. This capability allows the system to rescore search results based on deeper semantic understanding:

# Parse reranking enabled environment variable
kb_reranking_enabled_raw = os.getenv('BEDROCK_KB_RERANKING_ENABLED')
kb_reranking_enabled = False  # Default value is now False (off)
if kb_reranking_enabled_raw is not None:
    kb_reranking_enabled_raw = kb_reranking_enabled_raw.strip().lower()
    if kb_reranking_enabled_raw in ('true', '1', 'yes', 'on'):
        kb_reranking_enabled = True

Reranking is particularly valuable for queries where semantic similarity might not be enough to determine the
most relevant content. For example, when answering a specific question, the most relevant document isn’t necessarily
the one with the most keyword matches, but the one that directly addresses the question being asked.

Full interaction flow

This section walks through a complete interaction flow to show how all these components work
together:

  1. The user asks a question to a language model such as Anthropic’s Claude through an application:
   "What's our quarterly IT security audit procedure?"
  1. The language model recognizes it needs to access the knowledge base and calls the MCP tool:
{
  "toolUse": {
    "name": "QueryKnowledgeBases",
    "input": {
      "query": "What's our quarterly IT security audit procedure?",
      "knowledge_base_id": "kb-12345abcde",
      "reranking": true
    }
  }
} 
  1. The MCP server processes the request by querying the knowledge base with the specified parameters
  2. The MCP server returns formatted results to the language model, including content, location, and relevance scores:
Applied metadata filters: department=IT, document_type=Procedure, category=Security Audit, frequency=Quarterly
 
   Results have been reranked to improve relevance.
 
   Result 1 [Score: 0.92]:
   Source: S3 - s3://company-docs/procedures/it/security/quarterly-audit-procedure-v2.pdf
   Content: The IT Security Quarterly Audit Procedure outlines the steps for conducting systematic evaluations of the organization's information security controls. This document provides guidance on scope, methodology, reporting requirements, and remediation tracking...
   Metadata:
     • department: IT
     • document_type: Procedure
     • category: Security Audit
     • frequency: Quarterly
     • last_updated: 2024-11-15
  1. The language model incorporates these results into its response to the user:
   Based on our IT Security Quarterly Audit Procedure, the process involves four main phases:
 
   1. Planning: Define scope, objectives, and schedule for the audit
   2. Execution: Conduct the technical assessments and control evaluations
   3. Reporting: Document findings, risks, and recommendations
   4. Remediation: Track and verify resolution of identified issues
 
   The procedure was last updated on November 15, 2024, and specifies that the Cybersecurity team leads the effort with support from IT Operations.

This interaction, illustrated in the following diagram, demonstrates the seamless fusion of language model capabilities with enterprise knowledge, enabled by the MCP protocol. The user doesn’t need to specify complex search parameters or know the structure of the knowledge base—the integration layer handles these details automatically.

End-to-end sequence diagram of IT security audit query processing through Bedrock Claude and MCP Server with metadata analysis

Looking ahead: The MCP journey continues

As we’ve explored throughout this post, the Model Context Protocol provides a powerful framework for connecting language models to your enterprise data and tools on AWS. But this is just the beginning of the journey.

The MCP landscape is rapidly evolving, with new capabilities and implementations emerging regularly. In future posts in this series, we’ll dive deeper into advanced MCP architectures and use cases, with a particular focus on remote MCP implementation.

The introduction of the new Streamable HTTP transport layer represents a significant advancement for MCP, enabling truly enterprise-scale deployments with features such as:

  • Stateless server options for simplified scaling
  • Session ID management for request routing
  • Robust authentication and authorization mechanisms for secure access control
  • Horizontal scaling across server nodes
  • Enhanced resilience and fault tolerance

These capabilities will be essential as organizations move from proof-of-concept implementations to production-grade MCP deployments that serve multiple teams and use cases.

We invite you to follow this blog post series as we continue to explore how MCP and AWS services can work together to create more powerful, context-aware AI applications for your organization.

Conclusion

As language models continue to transform how we interact with technology, the ability to connect these models to enterprise data and systems becomes increasingly critical. The Model Context Protocol (MCP) offers a standardized, secure, and scalable approach to integration.

Through MCP, AWS customers can:

  • Establish a standardized protocol for AI-data connections
  • Reduce development overhead and maintenance costs
  • Enforce consistent security and governance policies
  • Create more powerful, context-aware AI experiences

The Amazon Bedrock Knowledge Bases implementation we explored demonstrates how MCP can transform simple retrieval into intelligent discovery, adding value far beyond what either component could deliver independently.

Getting started

Ready to begin your MCP journey on AWS? Here are some resources to help you get started:

Learning resources:

Implementation steps:

  1. Identify a high-value use case where AI needs access to enterprise data
  2. Select the appropriate MCP servers for your data sources
  3. Set up a development environment with local MCP implementations
  4. Integrate with Amazon Bedrock using the patterns described in this post
  5. Deploy to production with appropriate security and scaling considerations

Remember that MCP offers a “start small, scale incrementally” approach. You can begin with a single server connecting to one data source, then expand your implementation as you validate the value and establish patterns for your organization.

We encourage you to try the MCP with AWS services today. Start with a simple implementation, perhaps connecting a language model to your documentation or code repositories, and experience firsthand the power of context-aware AI.

Share your experiences, challenges, and successes with the community. The open source nature of MCP means that your contributions—whether code, use cases, or feedback—can help shape the future of this important protocol.

In a world where AI capabilities are advancing rapidly, the difference between good and great implementations often comes down to context. With MCP and AWS, you have the tools to make sure your AI systems have the right context at the right time, unlocking their full potential for your organization.

This blog post is part of a series exploring the Model Context Protocol (MCP) on AWS. In our next installment, we’ll explore the world of agentic AI, demonstrating how to build autonomous agents using the open-source Strands Agents SDK with MCP to create intelligent systems that can reason, plan, and execute complex multi-step workflows. We’ll also explore advanced implementation patterns, remote MCP architectures, and discover additional use cases for MCP.


About the authors

Aditya Addepalli is a Delivery Consultant at AWS, where he works to lead, architect, and build applications directly with customers. With a strong passion for Applied AI, he builds bespoke solutions and contributes to the ecosystem while consistently keeping himself at the edge of technology. Outside of work, you can find him meeting new people, working out, playing video games and basketball, or feeding his curiosity through personal projects.

Elie Schoppik leads live education at Anthropic as their Head of Technical Training. He has spent over a decade in technical education, working with multiple coding schools and starting one of his own. With a background in consulting, education, and software engineering, Elie brings a practical approach to teaching Software Engineering and AI. He’s shared his insights at a variety of technical conferences as well as universities including MIT, Columbia, Wharton, and UC Berkeley.

Jawhny Cooke is a Senior Anthropic Specialist Solutions Architect for Generative AI at AWS. He specializes in integrating and deploying Anthropic models on AWS infrastructure. He partners with customers and AI providers to implement production-grade generative AI solutions through Amazon Bedrock, offering expert guidance on architecture design and system implementation to maximize the potential of these advanced models.

Kenton Blacutt is an AI Consultant within the GenAI Innovation Center. He works hands-on with customers helping them solve real-world business problems with cutting edge AWS technologies, especially Amazon Q and Bedrock. In his free time, he likes to travel, experiment with new AI techniques, and run an occasional marathon.

Mani Khanuja is a Principal Generative AI Specialist Solutions Architect, author of the book Applied Machine Learning and High-Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Nicolai van der Smagt is a Senior Specialist Solutions Architect for Generative AI at AWS, focusing on third-party model integration and deployment. He collaborates with AWS’ biggest AI partners to bring their models to Amazon Bedrock, while helping customers architect and implement production-ready generative AI solutions with these models.

Read More

Build a scalable AI assistant to help refugees using AWS

Build a scalable AI assistant to help refugees using AWS

This post is co-written with Taras Tsarenko, Vitalil Bozadzhy, and Vladyslav Horbatenko. 

As organizations worldwide seek to use AI for social impact, the Danish humanitarian organization Bevar Ukraine has developed a comprehensive virtual generative AI-powered assistant called Victor, aimed at addressing the pressing needs of Ukrainian refugees integrating into Danish society. This post details our technical implementation using AWS services to create a scalable, multilingual AI assistant system that provides automated assistance while maintaining data security and GDPR compliance.

Bevar Ukraine was established in 2014 and has been at the forefront of supporting Ukrainian refugees in Denmark since the full-scale war in 2022, providing assistance to over 30,000 Ukrainians with housing, job search, and integration services. The organization has also delivered more than 200 tons of humanitarian aid to Ukraine, including medical supplies, generators, and essential items for civilians affected by the war.

Background and challenges

The integration of refugees into host countries presents multiple challenges, particularly in accessing public services and navigating complex legal procedures. Traditional support systems, relying heavily on human social workers, often face scalability limitations and language barriers. Bevar Ukraine’s solution addresses these challenges through an AI-powered system that operates continuously while maintaining high standards of service quality.

Solution overview

The solution’s backbone comprises several AWS services to deliver a reliable, secure, and efficient generative AI-powered digital assistant for Ukrainian refugees. The team consisting of three volunteer software developers developed the solution within weeks.

The following diagram illustrates the solution architecture.

Amazon Elastic Compute Cloud (Amazon EC2) serves as the primary compute layer, using Spot Instances to optimize costs. Amazon Simple Storage Service (Amazon S3) provides secure storage for conversation logs and supporting documents, and Amazon Bedrock powers the core natural language processing capabilities. Bevar Ukraine uses Amazon DynamoDB for real-time data access and session management, providing low-latency responses even under high load.

In the process of implementation, we discovered that Anthropic’s Claude 3.5 large language model (LLM) is best suited due to its advanced dialogue logic and ability to maintain a human-like tone. It’s best for thorough, reasoned responses and generating more creative content, which makes Victor’s replies more natural and engaging.

Amazon Titan Embeddings G1 – Text v1.2 excels at producing high-quality vector representations of multilingual text, enabling efficient semantic search and similarity comparisons. This is particularly valuable when Victor needs to retrieve relevant information from a large knowledge base or match users’ queries to previously seen inputs. Amazon Titan Embeddings also integrates smoothly with AWS, simplifying tasks like indexing, search, and retrieval.

In real-world interactions with Victor, some queries require short, specific answers, whereas others need creative generation or contextual understanding. By combining Anthropic’s Claude 3.5. for generation and Amazon Titan Embeddings G1 for semantic retrieval, Victor can route each query through the most appropriate pipeline, retrieving relevant context through embeddings and generating a response, resulting in more accurate and context-aware answers.

Amazon Bedrock provides a remarkable interface to call Anthropic’s Claude 3.5 and Amazon Titan Embeddings G1 (along with other models) without creating separate integrations for each provider, simplifying development and maintenance.

For multilingual support, we used embedders that support multi-language embeddings and translated our materials using Amazon Translate. This enhances the resilience of our Retrieval Augmented Generation (RAG) system. The application is built securely and uses AWS services to accomplish this. AWS Key Management Service (AWS KMS) simplifies the process of encrypting data within the application, and Amazon API Gateway supports the applications REST endpoints. User authentication and authorization capabilities are supported by Amazon Cognito, which provides secure and scalable customer identity and access management (CIAM) capabilities.

The application runs on AWS infrastructure using services that are designed to be secure and scalable like Amazon S3, AWS Lambda, and DynamoDB.

Tips and recommendations

Building an AI assistant solution for refugees using Amazon Bedrock and other AWS services has provided valuable insights into creating impactful AI-powered humanitarian solutions. Through this implementation, we discovered key considerations that organizations should keep in mind when developing similar solutions. The experience highlighted the importance of balancing technical capabilities with human-centric design, providing multilingual support, maintaining data privacy, and creating scalable yet cost-effective solutions. These learnings can serve as a foundation for organizations looking to use AI and cloud technologies to support humanitarian causes, particularly in creating accessible and helpful digital assistance for displaced populations. The following are the main

  • Use the Amazon Bedrock playground to test multiple LLMs side by side using the same prompt. This helps you find the model that gives the best quality, style, and tone of response for your specific use case (for example, factual accuracy vs. conversational tone).
  • Experiment with prompts and settings to improve responses.
  • Keep costs in mind; set up monitoring and budgets in AWS.
  • For tasks involving information retrieval or semantic search, select an embedding model while making sure to pick the appropriate settings. Pay attention to the size of the embeddings, because larger vectors can capture more meaning but might increase costs. Also, check that the model supports the languages your application requires.
  • If you’re using a knowledge base, use the Amazon Bedrock knowledge base playground to experiment with how content is chunked and how many passages are retrieved for each query. Finding the right number of retrieved passages can make a big difference in how clear and focused the final answers are—sometimes fewer, high-quality chunks work better than sending too much context.
  • To enforce safety and privacy, use Amazon Bedrock Guardrails. Guardrails can help prevent the model from leaking sensitive information, such as personal data or internal business content, and you can block harmful responses or enforce a specific tone and formatting style.
  • Start with a simple prototype, test the embedding quality in your domain, and expand iteratively.

Integration and enhancement layer

Bevar Ukraine has extended the core AWS infrastructure with several complementary technologies:

  • Pinecone vector database – For efficient storage and retrieval of semantic embeddings
  • DSPy framework – For structured prompt engineering and optimization of Anthropic’s Claude 3.5 Sonnet responses
  • EasyWeek – For appointment scheduling and resource management
  • Telegram API – For UI delivery
  • Amazon Bedrock Guardrails – For security policy enforcement
  • Amazon Rekognition – For document verification
  • GitHub-based continuous integration and delivery (CI/CD) pipeline – For rapid feature deployment

Key technical insights

The implementation revealed several crucial technical considerations. The DSPy framework was crucial in optimizing and enhancing our language model prompts. By integrating additional layers of reasoning and context awareness tools, DSPy notably improved response accuracy, consistency, and depth. The team found that designing a robust knowledge base with comprehensive metadata was fundamental to the system’s effectiveness.

GDPR compliance required careful architectural decisions, including data minimization, secure storage, and clear user consent mechanisms. Cost optimization was achieved through strategic use of EC2 Spot Instances and implementation of API request throttling, resulting in significant operational savings without compromising performance.

Future enhancements

Our roadmap includes several technical improvements to enhance the system’s capabilities:

  • Implementing advanced context dispatching using machine learning algorithms to improve service coordination across multiple domains
  • Developing a sophisticated human-in-the-loop validation system for complex cases requiring expert oversight
  • Migrating suitable components to a serverless architecture using Lambda to optimize resource utilization and costs
  • Enhancing the knowledge base with advanced semantic search capabilities and automated content updates

Results

This solution, which serves hundreds of Ukrainian refugees in Denmark daily, demonstrates the potential of AWS services in creating scalable, secure, and efficient AI-powered systems for social impact. As a result, volunteers and employees of Bevar Ukraine have saved thousands of hours, and instead of answering repetitive questions from refugees, can support them in more complicated life situations. For refugees, the virtual assistant Victor is a lifeline support that allows users to get responses to the most pressing questions about public services in Denmark and many other questions in seconds instead of having to wait for an available volunteer to help. Given the vast knowledge base Victor is using to generate responses, the quality of support has improved as well.

Conclusion

Through careful architecture design and integration of complementary technologies, we’ve created a platform that effectively addresses the challenges faced by refugees while maintaining high standards of security and data protection.

The success of this implementation provides a blueprint for similar solutions in other social service domains, potentially supporting refugees and other people in need around the world, highlighting the importance of combining robust cloud infrastructure with thoughtful system design to create meaningful social impact.


About the Authors

Taras Tsarenko is a Program Manager at Bevar Ukraine. For over a decade in the world of technology, Taras has led everything from tight-knit agile teams of 5 or more to a company of 90 people that became the best small IT company in Ukraine under 100 people in 2015. Taras is a builder who thrives at the intersection of strategy and execution, where technical expertise meets human impact, whether it’s streamlining workflows, solving complex problems, or empowering teams to create meaningful products. Taras specializes in AI-driven solutions and data engineering, leveraging technologies like machine learning and generative AI using Amazon SageMaker AI, Amazon Bedrock, Amazon OpenSearch Service, and more. Taras is an AWS Certified ML Engineer Associate.

Anton Garvanko is a Senior Analytics Sales Specialist for Europe North at AWS. As a finance professional turned salesman, Anton spent 15 years in various finance leadership roles in supply chain and logistics as well as financial services industries. Anton joined Amazon over 5 years ago and has been part of specialist sales teams focusing on business intelligence, analytics, and generative AI for over 3 years. He is passionate about connecting the worlds of finance and IT by making sure that business intelligence and analytics powered by generative AI support everyday decision-making across industries and use cases.

Vitalii Bozadzhy is a Senior Developer with extensive experience in building high-load, cloud-based solutions, specializing in Java, Golang, SWIFT, and Python. He specializes in scalable backend systems, microservice architectures designed to automate business processes, as well as building reliable and secure cloud infrastructures. Furthermore, he has experience in optimizing compute resources and building advanced solutions integrated into products. His expertise covers the full development cycle—from design and architecture to deployment and maintenance—with a strong focus on performance, fault tolerance, and innovation.

Vladyslav Horbatenko is a computer science student, Professor Assistant, and Data Scientist with a strong focus on artificial intelligence. Vladyslav began his journey with machine learning, reinforcement learning, and deep learning, and gradually became more interested in large language models (LLMs) and their potential impact. This led him to deepen his understanding of LLMs, and now he works on developing, maintaining, and improving LLM-based solutions. He contributes to innovative projects while staying up to date with the latest advancements in AI.

Read More

Enhanced diagnostics flow with LLM and Amazon Bedrock agent integration

Enhanced diagnostics flow with LLM and Amazon Bedrock agent integration

Noodoe is a global leader in EV charging innovation, offering advanced solutions that empower operators to optimize their charging station operations and provide exceptional user experiences. Their universal charging stations are compatible with all EV brands and feature intuitive payment options, including credit cards and Apple Pay. Powered by the Noodoe EV OS cloud management system, the company delivers round-the-clock automated monitoring, diagnostics, and maintenance, achieving a market-leading uptime of 99.83%. With operations in over 15 countries and a robust commitment to sustainability, Noodoe is transforming the EV charging industry through cutting-edge technology and a user-first approach.

Despite its technological strengths, Noodoe has encountered key challenges in helping station operators optimize performance and select the most cost-effective electricity pricing strategies across diverse markets. Traditional systems lack the capability to efficiently process vast amounts of real-time and historical data or provide personalized, station-level recommendations. This limits operators’ ability to make timely, informed decisions—resulting in higher electricity costs, underutilized assets, and a subpar customer experience. These inefficiencies not only reduce profitability but also hinder the ability to scale in a competitive and fast-evolving EV charging landscape.

To solve this, Noodoe has integrated large language models (LLMs) through Amazon Bedrock and Amazon Bedrock Agents to deliver intelligent automation, real-time data access, and multilingual support. These AI-powered tools analyze usage patterns, station diagnostics, and external variables like weather or grid conditions to generate highly tailored pricing recommendations. By using the structured orchestration and prompt-based reasoning of Amazon Bedrock, Noodoe equips operators with actionable insights that improve margins, enhance station utilization, and allow them to offer more competitive rates to users—ultimately boosting customer satisfaction. This service is delivered through a subscription model, creating a new, scalable revenue stream for Noodoe while reinforcing its leadership and innovation in the EV infrastructure space.

In this post, we explore how Noodoe uses AI and Amazon Bedrock to optimize EV charging operations. By integrating LLMs, Noodoe enhances station diagnostics, enables dynamic pricing, and delivers multilingual support. These innovations reduce downtime, maximize efficiency, and improve sustainability. Read on to discover how AI is transforming EV charging management.

Solution overview

The Noodoe AI-enhanced diagnostics flow is built on a multi-step process that combines data collection, AI-powered analytics, and seamless translation for global accessibility, as illustrated in the following figure.

Solution overview

The physical charging station network currently operates over 1,000 sites across more than 20 countries, with plans to expand by more than 50 additional sites by the end of 2025. As illustrated in the following image, it uses the EV Cloud and LLMs to generate relevant recommendations following backend processing.

Photo of an EV charging station

The following screenshot shows an example of the results in the UI.

The following screenshot shows an example of the results in the UI. Overview of Noodoe AI-enhanced diagnostics

The following diagram illustrates the solution data flow.

Overview of Noodoe AI-enhanced diagnostics

To meet the feature requirements, the system operation process includes the following steps:

  1. Charging data is processed through the EV service before entering the database.
  2. The charging history data and pricing data are stored in the EV database.
  3. Amazon EventBridge Scheduler periodically triggers the EV service to perform analysis.
  4. The EV service calls the AI service to analyze historical data and provide pricing recommendations.
  5. The AI service collects the organized historical data to prepare the prompt template.
  6. This information, combined with appropriate prompts, is used in conjunction with Amazon Bedrock Agents as an AI-pricing agent to extract relevant information. The AI-pricing agent analyzes this combined data to identify daily peak and off-peak periods and provide recommendations for user pricing plans.
  7. Optionally, if translation is needed for non-English users, these results from the AI-pricing agent are further processed through another Amazon Bedrock agent for translation.
  8. Optionally, the translation agent uses Anthropic’s Claude Sonnet 3.5 on Amazon Bedrock to get the result in the corresponding language.
  9. Finally, the AI service collects the results in the user’s language for formatting and other processing, then inserts them into a template to create a comprehensive report that is pushed to the user’s end.

In the following section, we dive deep into these steps and the AWS services used.

Architecture of Noodoe AI-enhanced diagnostics

Noodoe faced key challenges in building a globally scalable, reliable, and cost-efficient architecture. They needed a solution that could support rapid expansion, handle high data volumes, and deliver consistent performance across AWS Regions. Addressing these requirements required careful architectural planning to provide flexibility and resilience.

Architecture of Noodoe AI-enhanced diagnosticsThe following diagram illustrates the solution architecture Noodoe built to overcome these challenges to support global growth.

The EV charging optimization platform structures the data flow across multiple AWS services, providing efficient data ingestion, processing, and AI-driven decision-making. Amazon Elastic Kubernetes Service (Amazon EKS) retrieves data from Amazon DocumentDB, processes it, and invokes Amazon Bedrock Agents for reasoning and analysis. This structured data pipeline enables optimized pricing strategies and multilingual customer interactions. By using containerized applications, event-driven workflows, and AI capabilities, the system provides scalable and flexible insights to EV station operators.

Data ingestion and processing

EV charging stations send real-time charging data to AWS IoT Core, which acts as the initial entry point for data processing. The data is then transmitted to Amazon Managed Streaming for Apache Kafka (Amazon MSK) to facilitate high-throughput, reliable streaming. From Amazon MSK, data flows into Amazon EKS, where the EV service processes it before storing the charging history and trend records in DocumentDB. This structured storage provides efficient retrieval for analysis and prediction.

AI-powered pricing analysis

To optimize pricing strategies, Amazon EventBridge triggers a pricing prediction function at regular intervals. This function retrieves historical charging data from DocumentDB and sends it, along with predefined prompts, to the Amazon Bedrock AI-pricing agent. The AI agent, powered by Anthropic’s Claude on Amazon Bedrock, evaluates station usage trends, peak and off-peak periods, and pricing inefficiencies to generate optimal pricing recommendations. Although the pricing agent doesn’t access an Amazon Bedrock knowledge base or trigger action groups, it uses preprocessing and post processing features to refine predictions and improve decision-making.

Multilingual support and report generation

If translation is required, the pricing analysis results are forwarded to the Amazon Bedrock translate agent, which converts the insights into the operator’s preferred language. The translated and structured data is then formatted into a predefined report template and stored in a designated database for later retrieval. This provides seamless access to actionable insights across diverse markets.

UI, monitoring, and performance optimization

Operators access the system through a web-based UI, with Amazon Route 53 and Amazon CloudFront providing fast and efficient content delivery. An Application Load Balancer distributes incoming requests across multiple EKS instances, providing high availability. To optimize performance, Amazon ElastiCache accelerates data retrieval while reducing database load. For system monitoring and observability, Amazon CloudWatch provides additional monitoring and observability. The administrator of Noodoe uses Amazon Managed Service for Prometheus and Amazon Managed Grafana for system monitoring and visualization.

This architecture empowers Noodoe with an AI-driven, scalable, and intelligent EV charging management solution, enhancing station utilization, revenue optimization, and customer experience worldwide.

Summary

The Noodoe AI-enhanced diagnostics flow transforms EV charging operations by integrating Amazon Bedrock Agents, blending rule-based automation, real-time user input, and LLM-powered insights for smarter decision-making. Backed by a comprehensive knowledge base and streamlined APIs, the solution empowers operators to automate workflows, optimize pricing, and boost station performance at scale. Ongoing expansion of the knowledge base, workflow refinement, and real-world testing further improve efficiency and reliability. This approach has delivered a 15% increase in revenue and reduced implementation time by 10%. Continuous feedback and clear documentation equip users to effectively use AI-driven diagnostics for more intelligent charging management.

Roman Kleinerman, Vice President of Products at Noodoe, shares: “We’ve seen revenue increases of 10–25% depending on the location and number of stations, as customers use our Al solution to optimize pricing strategies.”

Noodoe is dedicated to delivering smarter, more intelligent EV charging services that benefit both end-users and operators. Currently, Noodoe operates over 1,000 charging sites across more than 20 countries, with plans to expand by more than 50 additional sites by the end of 2025. Looking ahead, the system is being enhanced to support near real-time dynamic pricing optimization by incorporating factors such as demand, grid conditions, time of day, and weather. Amazon Bedrock Agents help enable these intelligent capabilities, powering dynamic pricing, load balancing, and grid-aware routing to optimize energy distribution and guide users to the most efficient stations. Future enhancements will offer personalized charging recommendations and incentives based on user preferences, maximizing value for both customers and operators. Start building intelligent, AI-powered EV charging solutions with Amazon Bedrock.


About the Authors

Ray Wang is a Senior Solutions Architect at AWS. With 12 years of experience in the IT industry, Ray is dedicated to building modern solutions on the cloud, especially in NoSQL, big data, machine learning, and Generative AI. As a hungry go-getter, he passed all 14 AWS certificates to make his technical field not only deep but wide. He loves to read and watch sci-fi movies in his spare time.

Howard Su is a Solutions Architect at AWS. With many years of experience in software development and system operations, Howard have served in various roles including RD, QA, and SRE, and Howard have been responsible for the architectural design of numerous large-scale systems, as well as participating in several cloud migrations. After accumulating years of development and operations experience, Howard is dedicated to promoting cloud-native service technologies and becoming an advocate for DevOps.

Tony Trinh is a Senior AIML Specialist Architect at AWS. With 13+ years of experience in the IT industry, Tony specializes in architecting scalable, compliance-driven AI and ML solutions—particularly in generative AI, MLOps, and cloud-native data platforms. As a perpetual learner, he’s doing research in Visual Language Model, Responsible AI & Computer Vision and authoring a book in ML engineering. In his spare time, Tony enjoys outdoor activity, experimenting with home improvement, and exploring Melbourne’s vibrant coffee scene.

Read More

Bring Receipts: New NVIDIA AI Blueprint Detects Fraudulent Credit Card Transactions With Precision

Bring Receipts: New NVIDIA AI Blueprint Detects Fraudulent Credit Card Transactions With Precision

Editor’s note: This blog, originally published on October 28, 2024, has been updated.

Financial losses from worldwide credit card transaction fraud are projected to reach more than $403 billion over the next decade.

The new NVIDIA AI Blueprint for financial fraud detection can help combat this burgeoning epidemic — using accelerated data processing and advanced algorithms to improve AI’s ability to detect and prevent credit card transaction fraud.

Launched this week at the Money20/20 financial services conference, the blueprint provides a reference example for financial institutions to identify subtle patterns and anomalies in transaction data based on user behavior to improve accuracy and reduce false positives compared with traditional methods.

It shows developers how to build a financial fraud detection workflow by providing reference code, deployment tools and a reference architecture.

Companies can streamline the migration of their fraud detection workflows from traditional compute to accelerated compute using the NVIDIA AI Enterprise software platform and NVIDIA accelerated computing. The NVIDIA AI Blueprint is available for customers to run on Amazon Web Services, with availability coming soon on Dell Technologies and Hewlett Packard Enterprise. Customers can also use the blueprint through service offerings from NVIDIA partners including Cloudera, EXL, Infosys and SHI International.

Businesses embracing comprehensive machine learning (ML) tools and strategies can observe up to an estimated 40% improvement in fraud detection accuracy, boosting their ability to identify and stop fraudsters faster and mitigate harm.

As such, leading financial organizations like American Express and Capital One have been using AI to build proprietary solutions that mitigate fraud and enhance customer protection.

The new AI Blueprint accelerates model training and inference, and demonstrates how these components can be wrapped into a single, easy-to-use software offering, powered by NVIDIA AI.

Currently optimized for credit card transaction fraud, the blueprint could be adapted for use cases such as new account fraud, account takeover and money laundering.

Using Accelerated Computing and Graph Neural Networks for Fraud Detection

Traditional data science pipelines lack the compute acceleration to handle the massive data volumes required for effective fraud detection. ML models like XGBoost are effective for detecting anomalies in individual transactions but fall short when fraud involves complex networks of linked accounts and devices.

Helping address these gaps, NVIDIA RAPIDS — part of the NVIDIA CUDA-X collection of microservices, libraries, tools and technologies — enables payment companies to speed up data processing and transform raw data into powerful features at scale. These companies can fuel their AI models and integrate them with graph neural networks (GNNs) to uncover hidden, large-scale fraud patterns by analyzing relationships across different transactions, users and devices.

The use of gradient-boosted decision trees — a type of ML algorithm — tapping into libraries such as XGBoost, has long been the standard for fraud detection.

The new AI Blueprint for financial fraud detection enhances the XGBoost ML model with NVIDIA CUDA-X Data Science libraries including GNNs to generate embeddings that can be used as additional features to help reduce false positives.

The GNN embeddings are fed into XGBoost to create and train a model that can then be orchestrated. In addition, NVIDIA Dynamo-Triton, formerly NVIDIA Triton Inference Server, boosts real-time inferencing while optimizing AI model throughput, latency and utilization.

NVIDIA CUDA-X Data Science and Dynamo-Triton are included with NVIDIA AI Enterprise.

Leading Financial Services Organizations Adopt AI

During a time when many large North American financial institutions are reporting online or mobile fraud losses continue to increase, AI is helping to combat this trend.

American Express, which began using AI to fight fraud in 2010, leverages fraud detection algorithms to monitor all customer transactions globally in real time, generating fraud decisions in just milliseconds. Using a combination of advanced algorithms, one of which tapped into the NVIDIA AI platform, American Express enhanced model accuracy, advancing the company’s ability to better fight fraud.

European digital bank bunq uses generative AI and large language models to help detect fraud and money laundering. Its AI-powered transaction-monitoring system achieved nearly 100x faster model training speeds with NVIDIA accelerated computing.

BNY announced in March 2024 that it became the first major bank to deploy an NVIDIA DGX SuperPOD with DGX H100 systems, which will help build solutions that support fraud detection and other use cases.

And now, systems integrators, software vendors and cloud service providers can integrate the new NVIDIA blueprint for fraud detection to boost their financial services applications and help keep customers’ money, identities and digital accounts safe.

Explore the NVIDIA AI Blueprint for financial fraud detection and read this NVIDIA Technical Blog on supercharging fraud detection with GNNs.

Learn more about AI for fraud detection by visiting the AI Summit at Money20/20, running this week in Amsterdam.

See notice regarding software product information.

Read More

Distillation Scaling Laws

We propose a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher. Our findings mitigate the risks associated with large-scale distillation by enabling compute-optimal allocation for both the teacher and student to maximize student performance. We provide compute-optimal distillation recipes for two key scenarios: when a teacher already exists, and when a teacher needs training. In settings involving many students or an existing teacher, distillation outperforms supervised learning up to a compute level…Apple Machine Learning Research

Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

*Equal Contributors
Identifying mistakes (i.e., miscues) made while reading aloud is commonly approached post-hoc by comparing automatic speech recognition (ASR) transcriptions to the target reading text. However, post-hoc methods perform poorly when ASR inaccurately transcribes verbatim speech. To improve on current methods for reading error annotation, we propose a novel end-to-end architecture that incorporates the target reading text via prompting and is trained for both improved verbatim transcription and direct miscue detection. Our contributions include: first, demonstrating that…Apple Machine Learning Research

Build GraphRAG applications using Amazon Bedrock Knowledge Bases

Build GraphRAG applications using Amazon Bedrock Knowledge Bases

In these days, it is more common to companies adopting AI-first strategy to stay competitive and more efficient. As generative AI adoption grows, the technology’s ability to solve problems is also improving (an example is the use case to generate comprehensive market report). One way to simplify the growing complexity of problems to be solved is through graphs, which excel at modeling relationships and extracting meaningful insights from interconnected data and entities.

In this post, we explore how to use Graph-based Retrieval-Augmented Generation (GraphRAG) in Amazon Bedrock Knowledge Bases to build intelligent applications. Unlike traditional vector search, which retrieves documents based on similarity scores, knowledge graphs encode relationships between entities, allowing large language models (LLMs) to retrieve information with context-aware reasoning. This means that instead of only finding the most relevant document, the system can infer connections between entities and concepts, improving response accuracy and reducing hallucinations. To inspect the graph built, Graph Explorer is a great tool.

Introduction to GraphRAG

Traditional Retrieval-Augmented Generation (RAG) approaches improve generative AI by fetching relevant documents from a knowledge source, but they often struggle with context fragmentation, when relevant information is spread across multiple documents or sources.

This is where GraphRAG comes in. GraphRAG was created to enhance knowledge retrieval and reasoning by leveraging knowledge graphs, which structure information as entities and their relationships. Unlike traditional RAG methods that rely solely on vector search or keyword matching, GraphRAG enables multi-hop reasoning (logical connections between different pieces of context), better entity linking, and contextual retrieval. This makes it particularly valuable for complex document interpretation, such as legal contracts, research papers, compliance guidelines, and technical documentation.

Amazon Bedrock Knowledge Bases GraphRAG

Amazon Bedrock Knowledge Bases is a managed service for storing, retrieving, and structuring enterprise knowledge. It seamlessly integrates with the foundation models available through Amazon Bedrock, enabling AI applications to generate more informed and trustworthy responses. Amazon Bedrock Knowledge Bases now supports GraphRAG, an advanced feature that enhances traditional RAG by integrating graph-based retrieval. This allows LLMs to understand relationships between entities, facts, and concepts, making responses more contextually relevant and explainable.

How Amazon Bedrock Knowledge Bases GraphRAG works

Graphs are generated by creating a structured representation of data as nodes (entities) and edges (relationships) between those nodes. The process typically involves identifying key entities within the data, determining how these entities relate to each other, and then modeling these relationships as connections in the graph. After the traditional RAG process, Amazon Bedrock Knowledge Bases GraphRAG performs additional steps to improve the quality of the generated response:

  1. It identifies and retrieves related graph nodes or chunk identifiers that are linked to the initially retrieved document chunks.
  2. The system then expands on this information by traversing the graph structure, retrieving additional details about these related chunks from the vector store.
  3. By using this enriched context, which includes relevant entities and their key connections, GraphRAG can generate more comprehensive responses.

How graphs are constructed

Imagine extracting information from unstructured data such as PDF files. In Amazon Bedrock Knowledge Bases, graphs are constructed through a process that extends traditional PDF ingestion. The system creates three types of nodes: chunk, document, and entity. The ingestion pipeline begins by splitting documents from an Amazon Simple Storage Service (Amazon S3) folder into chunks using customizable methods (you can choose between basic fixed-size chunking to more complex LLM-based chunking mechanisms). Each chunk is then embedded, and an ExtractChunkEntity step uses an LLM to identify key entities within the chunk. This information, along with the chunk’s embedding, text, and document ID, is sent to Amazon Neptune Analytics for storage. The insertion process creates interconnected nodes and edges, linking chunks to their source documents and extracted entities using the bulk load API in Amazon Neptune. The following figure illustrates this process.

Document processing workflow showing chunking and embedding steps leading to Neptune Analytics database

Use case

Consider a company that needs to analyze a large range of documents, and needs to correlate entities that are spread across those documents to answer some questions (for example, Which companies has Amazon invested in or acquired in recent years?). Extracting meaningful insights from this unstructured data and connecting it with other internal and external information poses a significant challenge. To address this, the company decides to build a GraphRAG application using Amazon Bedrock Knowledge Bases, usign the graph databases to represent complex relationships within the data.

One business requirement for the company is to generate a comprehensive market report that provides a detailed analysis of how internal and external information are correlated with industry trends, the company’s actions, and performance metrics. By using Amazon Bedrock Knowledge Bases, the company can create a knowledge graph that represents the intricate connections between press releases, products, companies, people, financial data, external documents and industry events. The Graph Explorer tool becomes invaluable in this process, helping data scientists and analysts to visualize those connections, export relevant subgraphs, and seamlessly integrate them with the LLMs in Amazon Bedrock. After the graph is well structured, anyone in the company can ask questions in natural language using Amazon Bedrock LLMs and generate deeper insights from a knowledge base with correlated information across multiple documents and entities.

Solution overview

In this GraphRAG application using Amazon Bedrock Knowledge Bases, we’ve designed a streamlined process to transform raw documents into a rich, interconnected graph of knowledge. Here’s how it works:

  1. Document ingestion: Users can upload documents manually to Amazon S3 or set up automatic ingestion pipelines.
  2. Chunk, entity extraction, and embeddings generation: In the knowledge base, documents are first split into chunks using fixed size chunking or customizable methods, then embeddings are computed for each chunk. Finally, an LLM is prompted to extract key entities from each chunk, creating a GraphDocument that includes the entity list, chunk embedding, chunked text, and document ID.
  3. Graph construction: The embeddings, along with the extracted entities and their relationships, are used to construct a knowledge graph. The constructed graph data, including nodes (entities) and edges (relationships), is automatically inserted into Amazon Neptune.
  4. Data exploration: With the graph database populated, users can quickly explore the data using Graph Explorer. This intuitive interface allows for visual navigation of the knowledge graph, helping users understand relationships and connections within the data.
  5. LLM-powered application: Finally, users can leverage LLMs through Amazon Bedrock to query the graph and retrieve correlated information across documents. This enables powerful, context-aware responses that draw insights from the entire corpus of ingested documents.

The following figure illustrates this solution.

AWS architecture showing data flow from S3 through Bedrock to Neptune with user query interaction

Prerequisites

The example solution in this post uses datasets from the following websites:

Also, you need to:

  • Create an S3 bucket to store the files on AWS. In this example, we named this bucket: blog-graphrag-s3.
  • Download and upload the PDF and XLS files from the websites into the S3 bucket.

Building the Graph RAG Application

  1. Open the AWS Management Console for Amazon Bedrock.
  2. In the navigation pane, under Knowledge Bases, choose Create.
  3. Select Knowledge Base with vector store, and choose Create.
    Amazon Bedrock Knowledge Bases interface showing creation and testing options
  4. Enter a name for Knowledge Base name (for example: knowledge-base-graphrag-demo) and optional description.
  5. Select Create and use a new service role.
  6. Select Data source as Amazon S3.
  7. Leave everything else as default and choose Next to continue.
  8. Enter a Data source name (for example: knowledge-base-graphrag-data-source).
  9. Select an S3 bucket by choosing Browse S3. (If you don’t have an S3 bucket in your account, create one. Make sure to upload all the necessary files.)
  10. After the S3 bucket is created and files are uploaded, choose blog-graphrag-s3 bucket.
  11. Leave everything else as default and choose Next.
  12. Choose Select model and then select an embeddings model (in this example, we chose the Titan Text Embeddings V2 model).
  13. In the Vector database section, under Vector store creation method select Quick create a new vector store, for the Vector store select Amazon Neptune Analytics (GraphRAG),and choose Next to continue.
  14. Review all the details.
  15. Choose Create Knowledge Base after reviewing all the details.
  16. Creating a knowledge base on Amazon Bedrock might take several minutes to complete depending on the size of the data present in the data source. You should see the status of the knowledge base as Available after it is created successfully.

provide kb details and source configuration

Update and sync the graph with your data

  1. Select the Data source name (in this example, knowledge-base-graphrag-data-source ) to view the synchronization history.
  2. Choose Sync to update the data source.

Amazon Bedrock Knowledge Bases interface showing one active vector store implementation

Visualize the graph using Graph Explorer

Let’s look at the graph created by the knowledge base by navigating to the Amazon Neptune console. Make sure that you’re in the same AWS Region where you created the knowledge base.

  1. Open the Amazon Neptune console.
  2. In the navigation pane, choose Analytics and then Graphs.
  3. You should see the graph created by the knowledge base.

Neptune database graphs interface showing single active graph instance

To view the graph in Graph Explorer, you need to create a notebook by going to the Notebooks section.

You can create the notebook instance manually or by using an AWS CloudFormation template. In this post, we will show you how to do it using the Amazon Neptune console (manual).

To create a notebook instance:

  1. Choose Notebooks.
  2. Choose Create notebook.
    • Select the Analytics as the Neptune Service
    • Associate the notebook with the graph you just created (in this case: bedrock-knowledge-base-imwhqu).
    • Select the notebook instance type.
    • Enter a name for the notebook instance in the Notebook name
    • Create an AWS Identity and Access Management (IAM) role and use the Neptune default configuration.
    • Select VPC, Subnet, and Security group.
  3. Leave Internet access as default and choose Create notebook.

Neptune Notebooks interface showing two active Jupyter notebook instances

Notebook instance creation might take a few minutes. After the Notebook is created, you should see the status as Ready.

To see the Graph Explorer:

  • Go to Actions and choose Open Graph Explorer.

Neptune notebook details page showing aws-neptune-graphraq configuration

By default, public connectivity is disabled for the graph database. To connect to the graph, you must either have a private graph endpoint or enable public connectivity. For this post, you will enable public connectivity for this graph.

To set up a public connection to view the graph (optional):

  1. Go back to the graph you created earlier (under Analytics, Graphs).
  2. Select your graph by choosing the round button to the left of the Graph Identifier.
  3. Choose Modify.
  4. Select the check box Enable public connectivity in the Network
  5. Choose Next.
  6. Review changes and choose Submit.

page showing how to set up public connectivity

To open the Graph Explorer:

  1. Go back to Notebooks.
  2. After the the Notebook Instance is created, click on in the instance name (in this case: aws-neptune-analytics-neptune-analytics-demo-notebook).
  3. Then, choose Actions and then choose Open Graph Explore

image showing how to open graph explorer

  1. You should now see Graph Explorer. To see the graph, add a node to the canvas, then explore and navigate into the graph.

image showing graph explorer

Playground: Working with LLMs to extract insights from the knowledge base using GraphRAG

You’re ready to test the knowledge base.

  1. Choose the knowledge base, select a model, and choose Apply.
  2. Choose Run after adding the prompt. In the example shown in the following screenshot, we asked How is AWS Increasing energy efficiency?).

example prompt

  1. Choose Show details to see the Source chunk.
  2. Choose Metadata associated with this chunk to view the chunk ID, data source ID, and source URI.

image showing chunk id, resource id and resource uri

  1. In the next example, we asked a more complex question: Which companies has AMAZON invested in or acquired in recent years?

example with a complex prompt

Another way to improve the relevance of query responses is to use a reranker model. Using the reranker model in GraphRAG involves providing a query and a list of documents to be reordered based on relevance. The reranker calculates relevance scores for each document in relation to the query, improving the accuracy and pertinence of retrieved results for subsequent use in generating responses or prompts. In the Amazon Bedrock Playgrounds, you can see the results generated by the reranking model in two ways: the data ranked by the reranking solitary (the following figure), or a combination of the reranking model and the LLM to generate new insights.

rerank model example

To use the reranker model:

  1. Check the availability of the reranker model
  2. Go to AWS Management Console for Amazon Bedrock.
  3. From the navigation pane, under Builder tools, choose Knowledge Bases
  4. Choose the same knowledge base we created in the steps before knowledge-base-graphrag-demo.
  5. Click on Test Knowledge Base.
  6. Choose Configurations, expand the Reranking section, choose Select model, and select a reranker model (in this post, we choose Cohere Rerank 3.5).

Clean up

To clean up your resources, complete the following tasks:

  1. Delete the Neptune notebooks: aws-neptune-graphrag.
  2. Delete the Amazon Bedrock Knowledge Bases: knowledge-base-graphrag-demo.
  3. Delete content from the Amazon S3 bucket blog-graphrag-s3.

Conclusion

Using Graph Explorer in combination with Amazon Neptune and Amazon Bedrock LLMs provides a solution for building sophisticated GraphRAG applications. Graph Explorer offers intuitive visualization and exploration of complex relationships within data, making it straightforward to understand and analyze company connections and investments. You can use Amazon Neptune graph database capabilities to set up efficient querying of interconnected data, allowing for rapid correlation of information across various entities and relationships.

By using this approach to analyze Amazon’s investment and acquisition history of Amazon, we can quickly identify patterns and insights that might otherwise be overlooked. For instance, when examining the questions “Which companies has Amazon invested in or acquired in recent years?” or “How is AWS increasing energy efficiency?” The GraphRAG application can cross the knowledge graph, correlating press releases, investor relations information, entities, and financial data to provide a comprehensive overview of Amazon’s strategic moves.

The integration of Amazon Bedrock LLMs further enhances the accuracy and relevance of generated results. These models can contextualize the graph data, helping you to understand the nuances in company relationships and investment trends, and be supportive in generating comprehensive market reports. This combination of graph-based knowledge and natural language processing enables more precise answers and data interpretation, going beyond basic fact retrieval to offer analysis of Amazon’s investment strategy.

In summary, the synergy between Graph Explorer, Amazon Neptune, and Amazon Bedrock LLMs creates a framework for building GraphRAG applications that can extract meaningful insights from complex datasets. This approach streamlines the process of analyzing corporate investments and create new ways to analyze unstructured data across various industries and use cases.


About the authors

Ruan Roloff is a ProServe Cloud Architect specializing in Data & AI at AWS. During his time at AWS, he was responsible for the data journey and data product strategy of customers across a range of industries, including finance, oil and gas, manufacturing, digital natives and public sector — helping these organizations achieve multi-million dollar use cases. Outside of work, Ruan likes to assemble and disassemble things, fish on the beach with friends, play SFII, and go hiking in the woods with his family.

Sai Devisetty is a Technical Account Manager at AWS. He helps customers in the Financial Services industry with their operations in AWS. Outside of work, Sai cherishes family time and enjoys exploring new destinations.

Madhur Prashant is a Generative AI Solutions Architect at Amazon Web Services. He is passionate about the intersection of human thinking and generative AI. His interests lie in generative AI, specifically building solutions that are helpful and harmless, and most of all optimal for customers. Outside of work, he loves doing yoga, hiking, spending time with his twin, and playing the guitar.

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in the financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

Read More

Streamline personalization development: How automated ML workflows accelerate Amazon Personalize implementation

Streamline personalization development: How automated ML workflows accelerate Amazon Personalize implementation

Crafting unique, customized experiences that resonate with customers is a potent strategy for boosting engagement and fostering brand loyalty. However, creating dynamic personalized content is challenging and time-consuming because of the need for real-time data processing, complex algorithms for customer segmentation, and continuous optimization to adapt to shifting behaviors and preferences—all while providing scalability and accuracy. Despite these challenges, the potential rewards make personalization a worthwhile pursuit for many businesses. Amazon Personalize is a fully managed machine learning (ML) service that uses your data to generate product and content recommendations for your users. Amazon Personalize helps accelerate time-to-value with custom models that are trained on data you provide, such as your users, catalog items, and the interactions between users and items to generate personalized content and product recommendations. You can choose from various recipes—algorithms for specific use-cases—to find the ones that fit your needs, such as recommending items that a user is mostly likely to engage with next given their past interactions or next best action that a user is most likely to take.

To maintain a personalized user experience, it’s crucial to implement machine learning operations (MLOps) practices, including continuous integration, deployment, and training of your ML models. MLOps facilitates seamless integration across various ML tools and frameworks, streamlining the development process. A robust machine learning solution for maintaining personalized experiences typically includes automated pipeline construction, as well as automated configuration, training, retraining, and deployment of personalization models. While services like Amazon Personalize offer a ready-to-use recommendation engine, establishing a comprehensive MLOps lifecycle for a personalization solution remains a complex undertaking. This process involves intricate steps to make sure that models remain accurate and relevant as user behaviors and preferences evolve over time.

This blog post presents an MLOps solution that uses AWS Cloud Development Kit (AWS CDK) and services like AWS Step Functions, Amazon EventBridge and Amazon Personalize to automate provisioning resources for data preparation, model training, deployment, and monitoring for Amazon Personalize.

Features and benefits

Deploying this solution offers improved scalability and traceability and allows you to quickly set up a production-ready environment to seamlessly deliver tailored recommendations to users using Amazon Personalize. This solution:

  • Streamlines the creation and management of Amazon Personalize resources.
  • Provides greater flexibility in resource management and selective feature activation.
  • Enhances readability and comprehensibility of complex workflows.
  • Enables event-driven architecture by publishing key Amazon Personalize events, allowing real-time monitoring, and enabling automated responses and integrations with other systems.
  • Includes automated creation of Amazon Personalize resources, including recommenders, solutions, and solution versions.
  • Facilitates end-to-end workflow automation for dataset import, model training, and deployment in Amazon Personalize.
  • Improves organization and modularity of complex processes through nested step functions.
  • Provides flexible activation of specific solution components using AWS CDK.

Solution overview

This solution uses AWS CDK layer 3 constructs. Constructs are the basic building blocks of AWS CDK applications. A construct is a component within your application that represents one or more AWS CloudFormation resources and their configuration.

Solution overview

The solution architecture is shown in the preceding figure and includes:

  1. An Amazon Simple Storage Service (Amazon S3) bucket is used to store interactions, users, and items datasets. In this step, you need to configure your bucket permissions so that Amazon Personalize and AWS Glue can access the datasets and input files.
  2. AWS Glue is used to preprocess the interactions, users, and item datasets. This step helps ensure that the datasets comply with the training data requirements of Amazon Personalize. For more information, see Preparing training data for Amazon Personalize.
  3. EventBridge is used to schedule regular updates, by triggering the workflow and for publishing events related to resource provisioning. Because Step Functions workflow orchestrates the workflow based on the input configuration file, you use that configuration when setting up the scheduled start of Step Functions.
  4. Step Functions workflow manages all resource provisioning of the Amazon Personalize dataset group (including datasets, schemas, event tracker, filters, solutions, campaigns, and batch inference jobs). Step Functions provides monitoring across the solution through event logs. You can also visually track the stages of your workflow in the Step Functions console. You can adjust the input configuration file to better fit your use case, by defining schemas, recipes, and inference options. The solution workflow will have the following steps:
    1. A preprocessing job that runs an AWS Glue job, if provided. This step facilitates any preprocessing of the data that might be required.
    2. Create a dataset group, which is a container for Amazon Personalize resources.
    3. Create a dataset import job for the datasets based on the defined S3 bucket.
    4. Create filters that define any filtering that you want to apply on top of the recommendations.
    5. Create an event tracker for ingesting real-time events, such as user interactions, which in turn influence the recommendations provided.
    6. Create solutions and recommenders for creating custom resources and domain recommenders.
    7. Create a campaign; or a batch inference or segment job for generating inferences for real-time, batch, and segmentation use cases respectively.
  5. If you have a batch inference use case, then recommendations that match your inputs will be output into the S3 bucket that you defined in the input configuration file.
  6. An Amazon EventBridge event bus, where resource status notification updates are posted throughout the AWS Step Functions workflow.

Prerequisites

Before you deploy the AWS CDK stack, make sure that you have the following prerequisites in place:

  1. Install and configure AWS Command Line Interface (AWS CLI).
  2. Install Python 3.12 or newer
  3. Install Node.js 20.16.0 or newer.
  4. Install AWS CDK 2.88.0 or newer.
  5. Docker 27.5.1 or newer (required for AWS Lambda function bundling).

Newer versions of AWS CLI, Python, Node.js, and the AWS CDK are generally compatible, this solution has been tested with the versions listed.

Deploy the solution

With the prerequisites in place, use the following steps to deploy the solution:

  1. Clone the repository to a new folder on your desktop using the following command:
git clone https://github.com/aws-samples/automate-mlops-personalize-cdk-pipeline.git
  1. Create a Python virtual environment for development:
python3 -m venv .venv
source .venv/bin/activate 
pip install -r requirements.txt
  1. Define an Amazon Personalize MLOps pipeline instance PersonalizeMlOpsPipeline (see personalize_pipeline_stack.py for the complete example, which also includes different inference options). In this walkthrough, you create a custom solution with an associated campaign and batch inference job:
PersonalizeMlOpsPipeline(
    self, 'PersonalizePipelineSolution',
    pre_processing_config={
        "job_class": PreprocessingGlueJobFlow
    },
    enable_filters=True,
    enable_event_tracker=True,
    recommendation_config=[
        {
            "type": "solutions",
            "inference_options": ["campaigns", "batchInferenceJobs"]
        },
        {
            "type": "recommenders"
        }
    ]

)

Where:

  • ‘PersonalizePipelineSolution‘ – The name of the pipeline solution stack
  • pre_processing_config – Configuration for the pre-processing job to transform raw data into a format usable by Amazon Personalize. For using AWS Glue jobs for preprocessing specify the AWS Glue job class (PreprocessingGlueJobFlow) as a value to the parameter job_class. Currently, only AWS Glue jobs are supported. You can pass the name of the AWS Glue job that you need to run as a part of the input config. This doesn’t deploy the actual AWS Glue job responsible for pre-processing the files; the actual AWS Glue must be created outside of this solution and the name passed as an input to the state machine. A sample AWS Glue job is supplied in the accompanying repo, which shows how pre-processing can be done.
  • enable_filters – A Boolean value to enable dataset filters for pre-processing. When set to true, the pipeline will create the state machines needed to create filters. Supported options are true or false. If you specify this value as false, the corresponding state machine is not deployed.
  • enable_event_tracker – A Boolean value to enable the Amazon Personalize event tracker. When set to true, the pipeline will create the state machines needed to create an event tracker. Supported options are true or false. If you specify this value as false, the corresponding state machine is not deployed.
  • recommendation_config – Configuration options for recommendations. The two types currently supported are solutions and recommenders. Within the solutions type, you can have multiple options such as campaigns, batchInferenceJobs, and batchSegmentJobs. Based on the selected options, the corresponding state machine and components are created. In the earlier example, we used campaigns and batchInferenceJobs as the option, which means that only the campaigns and batch inference job state machines will be deployed with the AWS CDK.

After the infrastructure is deployed you can also enable and disable certain options through the state machine input configuration file. You can use this AWS CDK code to control what components are deployed in your AWS environment and with the input config, you can select what components run.

Preprocessing: As an optional step, you can use an existing AWS Glue job for preprocessing your data before feeding it into Amazon Personalize, which uses this data to generate recommendations for your end users. While this post demonstrates the process using the Movie Lens dataset, you can adapt it for your own datasets or custom processing needs. To do so, navigate to the glue_job folder and modify the movie_script.py file accordingly, or create an entirely new AWS Glue job tailored to your specific requirements. This preprocessing step, though optional, can be crucial in making sure that your data is optimally formatted for Amazon Personalize to generate accurate recommendations.

  1. Make sure that the AWS Glue job is configured to write its output to an S3 bucket. This bucket should then be specified as an input source in the Step Functions input configuration file.
  2. Verify that the AWS Glue service has the necessary permissions to access the S3 bucket mentioned in your script.
  3. In the input configuration, you’ll need to provide the name of the AWS Glue job that will be executed by the main state machine workflow. It’s crucial that this specified AWS Glue job runs without any errors, because any failures could potentially cause the entire state machine execution to fail.

Package and deploy the solution with AWS CDK, allowing for the most flexibility in development:

Before you can deploy the pipeline using AWS CDK, you need to set up AWS credentials on your local machine. You can refer Set up AWS temporary credentials for more details.

# bootstrap CDK (required once - deploys a CDK bootstrap CloudFormation stack for assets)
cdk bootstrap
# build the solution
cdk synth
# build and deploy the solution
cdk deploy

Run the pipeline

Before initiating the pipeline, create the resources that follow and document the resource names for future reference.

  1. Set up an S3 bucket for dataset storage. If you plan to use the preprocessing step, this should be the same bucket as the output destination.
  2. Update the S3 bucket policy to grant Amazon Personalize the necessary access permissions. See Giving Amazon Personalize access to Amazon S3 resources for policy examples.
  3. Create an AWS Identity and Access Management (IAM) role to be used by the state machine for accessing Amazon Personalize resources.

You can find detailed instructions and policy examples in the GitHub repository.

After you’ve set up these resources, you can create the input configuration file for the Step Functions state machine. If you configure the optional AWS Glue job it will create the input files that are required as an input to the pipeline, refer Configure the Glue Job to create the output files for more details.

Create input configuration

This input file is crucial because it contains all the essential information needed to create and manage your Amazon Personalize resources, this input configuration json acts as input to the Step Functions state machine. The file can contain the following top level objects

  • datasetGroup
  • datasets
  • eventTracker
  • filters
  • solutions (can contain campaigns, batchInferenceJobs and batchSegmentJobs)
  • recommenders

Customize the configuration file according to your specific requirements and include or exclude sections based on the Amazon Personalize artifacts that you want to create. For the dataset import jobs in the datasets section, replace AWS_ACCOUNT_ID, S3_BUCKET_NAME and IAM_ROLE_ARN with the appropriate values. The following is a snippet of the input configuration file. For a complete sample, see input_media.json.

"datasetImportJob": {
    "serviceConfig": {
        "dataSource": {
            "dataLocation": "s3://S3_BUCKET_NAME/users.csv"
        },
        "roleArn": "arn:aws:iam::AWS_ACCOUNT_ID:role/IAM_ROLE_NAME",
        "tags": [
            {
                "tagKey": "string",
                "tagValue": "string"
            }
        ],
        "importMode": "FULL",
        "jobName": "UsersDatasetImportJob"
    },
    "createNewJob": true
}
}

Likewise, if you’re using batch inference or batch segment jobs, remember to also update the BUCKET_NAME and IAM ROLE ARN in those sections. It’s important to verify that you have the required input files for batch inference stored in your S3 bucket. Adjust the file paths in your configuration to accurately reflect the location of these files within your bucket structure. This helps ensure that Amazon Personalize can access the correct data when executing these batch processes.

Adjust the AWS Glue Job name in the configuration file if you have configured it as a part of the AWS CDK stack.

See the property table for a deep dive into each property and identify whether it’s optional or required.

Execute the pipeline

You can run the pipeline using the main state machine by the name PersonalizePipelineSolution from the Step Functions Console or set up a schedule in EventBridge (find the step-by step process in the Schedule the workflow for continued maintenance of the solution section of this post).

  1. In the AWS Management Console for Step Functions, navigate to State machines and select the PersonalizePipelineSolution.

Personalize Pipeline Solution

  1. Choose Start Execution and enter the configuration file that you created for your use case based on the steps in the Create input configuration section.

Start Execution

  1. Choose Start execution and monitor the State Machine execution. In the Step Functions console, you will find a visual representation of the workflow and can track at what stage the execution is. Event logs will give you insight into the progress of the stages and information if there are any errors. The following figure is an example of a completed workflow:

Graph view of AWS Step Functions

  1. After the workflow finishes, you can view the resources in the Amazon Personalize console. For batch inference jobs specifically, you can locate the corresponding step under the Inference tasks section of the graph, and within the Custom Resources area of the Amazon Personalize console.

Get recommendations (real-time inference)

After your pipeline has completed its run successfully, you can obtain recommendations. In the example configuration, we chose to deploy campaigns as the inference option. As a result, you’ll have access to a campaign that can provide real-time recommendations.

We use the Amazon Personalize console to get recommendations. Choose Dataset groups and select your dataset group name. Choose Campaigns and select your campaign name. Enter a userid and items Ids of your choice to test personalized ranking, you can get the userid and item Ids from the input file in the Amazon S3 bucket you configured.

Test Campaign Results

 

Get recommendations (batch inference)

If you have configured batch inference to run, start by verifying that the batch inference step has successfully completed in the Step Functions workflow. Then, use the Amazon S3 console to navigate to the destination S3 bucket for your batch inference job. If you don’t see an output file there, verify that you’ve provided the correct path for the input file in your input configuration.

Media Inference Output - S3 Bucket

 

Schedule the workflow for continued maintenance of the solution

While Amazon Personalize offers automatic training for solutions through its console or SDK, allowing users to set retraining frequencies such as every three days, this MLOps workflow provides an enhanced approach. By using EventBridge schedules you gain more precise control over the timing of retraining processes. Using this method, you can specify exact dates and times for retraining executions. To implement this advanced scheduling, you can configure an EventBridge schedule to trigger the Step Functions execution, giving you finer granularity in managing your machine learning model updates.

  1. Navigate to the Amazon EventBridge Console and select EventBridge Schedule and then choose Create schedule.

Amazon EventBridge

  1. You can establish a recurring schedule for executing your entire workflow. A key benefit of this solution is the enhanced control it offers over the specific date and time you want your workflow to run. This allows for precise timing of your processes, which you can use to align the workflow execution with your operational needs or optimal data processing windows.

Specify schedule detail

  1. Select AWS Step Functions (as shown below) as your target.

Select Target - Event Bridge

  1. Insert the input configuration file that you prepared previously as the input and click Next.

Start Execution

An additional step you can take is to set up a dead-letter queue with Amazon Simple Query Service (Amazon SQS) to handle failed Step Functions executions.

Monitoring and notification

To maintain the reliability, availability, and performance of Step Functions and your solution, set up monitoring and logging. You can set up an EventBridge rule to receive notifications about events that are of interest, such as batch inference being ready in the S3 bucket. Here is how you can set that up:

  1. Navigate to Amazon Simple Notification Service (Amazon SNS) console and create an SNS topic that will be the target for your event.

Create Topic

  1. Amazon SNS supports subscription for different endpoint types such as HTTP/HTTPS, email, Lambda, SMS, and so on. For this example, use an email endpoint.

Create subscription

  1. After you create the topic and the subscription, navigate to the EventBridge console and select Create Rule. Define the details associated with the event such as the name, description, and the event bus.

Define rule detail

  1. To set up the event rule, you’ll use the pattern form. You use this form to define the specific events that will trigger notifications. For the batch segment job completion step, you should configure the source and detail-type fields as follows:
{
"source": ["solutions.aws.personalize"],
"detail-type": ["Personalize BatchSegmentJob status change"]
}

Event pattern

  1. Select the SNS topic as your target and proceed.

Select target(s)

With this procedure, you have set up an EventBridge rule to receive notifications on your email when an object is created in your batch inference bucket. You can also set up logic based on your use case to trigger any downstream processes such as creation of email campaigns with the results of your inference by choosing different targets such as Lambda.

Additionally, you can use Step Functions and Amazon Personalize monitoring through Amazon CloudWatch metrics. See Logging and Monitoring AWS Step Functions and Monitoring Amazon Personalize for more information.

Handling schema updates

Schema updates are available in Amazon Personalize for adding columns to the existing schema. Note that deleting columns from existing schemas isn’t currently supported. To update the schema, make sure that you’re modifying the schema in the input configuration passed to Step Functions. See Replacing a dataset’s schema to add new columns for more information.

Clean up

To avoid incurring additional costs, delete the resources you created during this solution walkthrough. You can clean up the solution by deleting the CloudFormation stack you deployed as part of the setup.

Using the console

  1. Sign in to the AWS CloudFormation console.
  2. On the Stacks page, select this solution’s installation stack.
  3. Choose Delete.

Using the AWS CLI

$ aws cloudformation delete-stack —stack-name <installation-stack-name> 

Conclusion

This MLOps solution for Amazon Personalize offers a powerful, automated approach to creating and maintaining personalized user experiences at scale. By using AWS services like AWS CDK, Step Functions, and EventBridge, the solution streamlines the entire process from data preparation through model deployment and monitoring. The flexibility of this solution means that you can customize it to fit various use cases, and integration with EventBridge keeps models up to date. Delivering exceptional personalized experiences is critical for business growth, and this solution provides an efficient way to harness the power of Amazon Personalize to improve user engagement, customer loyalty, and business results. We encourage you to explore and adapt this solution to enhance your personalization efforts and stay ahead in the competitive digital landscape.

To learn more about the capabilities discussed in this post, check out Amazon Personalize features and the Amazon Personalize Developer Guide.

Additional resources:


About the Authors

Reagan Rosario brings over a decade of technical expertise to his role as a Sr. Specialist Solutions Architect in Generative AI at AWS. Reagan transforms enterprise systems through strategic implementation of AI-powered cloud solutions, automated workflows, and innovative architecture design. His specialty lies in guiding organizations through digital evolution—preserving core business value while implementing cutting-edge generative AI capabilities that dramatically enhance operations and create new possibilities.

Nensi Hakobjanyan is a Solutions Architect at Amazon Web Services, where she supports enterprise Retail and CPG customers in designing and implementing cloud solutions. In addition to her deep expertise in cloud architecture, Nensi brings extensive experience in Machine Learning and Artificial Intelligence, helping organizations unlock the full potential of data-driven innovation. She is passionate about helping customers through digital transformation and building scalable, future-ready solutions in the cloud.

Read More

Fast-track SOP processing using Amazon Bedrock

Fast-track SOP processing using Amazon Bedrock

Standard operating procedures (SOPs) are essential documents in the context of regulations and compliance. SOPs outline specific steps for various processes, making sure practices are consistent, efficient, and compliant with regulatory standards.

SOP documents typically include key sections such as the title, scope, purpose, responsibilities, procedures, documentation, citations (references), and a detailed approval and revision history. In FDA-regulated industries such as healthcare and life sciences, SOPs play a crucial role in defining manufacturing, clinical, laboratory, quality control, quality assurance, and regulatory compliance practices.

When a regulatory body like the US Food and Drug Administration (FDA) introduces changes to regulations, organizations are required to evaluate the changes against their internal SOPs. When necessary, they must update their SOPs to align with the regulation changes and maintain compliance.

In this post, we show different approaches using Amazon Bedrock to identify relationships between regulation changes and SOPs.

Challenge

In the healthcare and life sciences industry, regulatory authorities like the FDA and the European Medicines Agency (EMA) frequently update regulations across various areas, such as clinical trials, medical devices, drug development and approvals, quality risk management, systems and data management, and technology adoption. These regulatory updates often require organizations to correspondingly update their internal SOPs to align with the changes. This process is typically manual, requiring a team of subject matter experts to review the regulatory changes, screen the SOPs to identify relevance, determine the impact, and specify what needs to be updated. This manual approach adds significant overhead for companies and can result in review cycles lasting several days to months.

To address this challenge, we explore approaches that can help automate the identification of relationships between regulatory changes and SOPs. These approaches can also be extended to assess the impact of regulatory changes on an organization’s internal processes and documentation. By using automation, companies can streamline the SOP update process, reducing the time and resources required to maintain alignment with evolving regulatory requirements.

Sample Data

For this post, we used SOPs published by the FDA’s Center for Biologics Evaluation and Research. These publicly available SOPs are used by the FDA staff to guide their duties.

Specifically, we focused on the following SOPs related to biologics procedures. This narrow scope allowed us to dive deeper into a specific regulatory domain within the larger healthcare and life sciences industry.

In addition to the SOPs, we also used three of the FDA’s Biologics Guidance Documents to test the relationship between the regulatory documents and the SOPs.

These guidance documents describe the FDA’s policy interpretations on regulatory issues related to the biologics domain. They cover a wide range of topics, including processing, content, evaluation, approval, inspection, and enforcement of policies. The guidance documents also discuss specific products or issues relating to the design, production, labeling, promotion, manufacturing, and testing of regulated products.

We used the following specific FDA Biologics Guidance Documents for this analysis:

Approaches

A key step in assessing the impact of regulatory changes is to identify if a regulatory guidance is related to an organization’s SOPs. We used Amazon Bedrock along with Amazon Simple Storage Service (Amazon S3) to store the input dataset.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Our experiments used Anthropic’s Claude 3 Opus large language model (LLM) on Amazon Bedrock. However, you can use the broad selection of models available on Amazon Bedrock to experiment with alternative models and choose the one that best suits your specific requirements. Amazon Bedrock frequently releases updated versions of existing AI models that can be accessed and used by simply applying a configuration change, making it a highly flexible choice for deploying the latest AI capabilities.

We focused on the following approaches:

  1. Full document match – Comparing the full text of the regulatory guidance and SOP documents
  2. Text similarity – This approach consists of two options:
    • Vector embeddings – Measuring the semantic similarity between the guidance and SOP texts
    • Keyword Search – Identifying relevant keywords and their occurrences in the documents
  3. Taxonomy topic match – Mapping the guidance and SOP content to a taxonomic structure to identify topical relationships

This post details the approaches we explored and the learnings from our experiments.

Full document match

The following diagram illustrates the full document match architecture.

In this approach, we compared each regulatory change to every SOP by passing the full contents of the SOP and the regulatory change to the model. The goal was to identify relationship between the regulatory change and the SOP.

The following is a sample prompt to check if an SOP is related to a regulation change:

Human:

Act as an expert who understands the Food and Drug Administration's Center for Biologics Evaluation and Research (CBER) internal Biologics procedures, which help their staff in performing their duties in assuring the safety, purity, potency, and effectiveness of biologics and related products (such as vaccines, live biotherapeutics (probiotics), blood products, and cell, tissue, and gene therapies).

Given a regulatory guidance change, and a document describing a CBER Biologics procedure of evaluation, research or review, tell me if the regulatory change is related the procedure.
<regulatory_change>
{regchange}
</regulatory_change>

<biologics_procedure>
{procedure}
</biologics_procedure>
To answer this question, first, determine what the subject of the regulatory change is. For example, the subject might be something like "regulatory considerations for human cells and tissues" or "processing of clinical holds for new drug applications".

In order for the regulatory change to be related to the biologics procedure, the procedure description must include information about the topic. In other words, you should have a specific section from the procedure that is clearly related to the regulatory change in order to say that it is related.

For example, if a procedure description mentions cells or tissues, it could be related to how they need to be handled or manipulated. Similarly if a procedure mentions dispute resolution, it could be related to how a biologics procedure handles dispute resolution. Tell me whether the regulatory change is related to that the biologics procedure. Don't guess, and explain your thinking. If you can't find specific elements in the procedure that are clearly related to the regulations, then the answer is that the procedure is not related by the regulatory change.

Your answer should end with the word "RELATED" or "UNRELATED", reflecting your overall evaluation of whether the procedure description is related to the regulation change or not.

When we ran the full document matching approach using Amazon Bedrock across all the SOPs and the regulatory guidance documents in the dataset, the results showed accurate identification of related SOPs. For example, SOPP 9151 was correctly identified as the only SOP related to the Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry regulation change, with others being identified as unrelated:

Assistant:

SOPP9151-061306.txt:

The regulatory change is about the regulation of human cells, tissues, and cellular and tissue-based products (HCT/Ps). It provides guidance on the criteria for determining how HCT/Ps will be regulated, registration and listing requirements, donor eligibility requirements, current good tissue practice requirements, and FDA inspection and enforcement of HCT/P establishments.

The biologics procedure is specifically about handling requests for exemptions and alternative procedures for HCT/Ps under 21 CFR 1271.155. The procedure outlines the responsibilities and steps for CBER staff to process and respond to these exemption requests.

There is a clear relationship between the regulatory change and the biologics procedure. The regulatory change provides the overall framework and requirements for regulating HCT/Ps, while the biologics procedure focuses on one specific aspect within that framework - handling exemption requests from certain HCT/P requirements.

The procedure directly references and provides instructions for implementing 21 CFR 1271.155, which is part of the HCT/P regulations covered in the regulatory change document. So the biologics procedure is a detailed process that enables the agency to grant exemptions and alternatives to the HCT/P requirements established in the regulations.

In summary, the regulatory change establishes the rules for HCT/Ps, and the biologics procedure describes how the agency will process requests for exceptions to those rules. Therefore, the two documents are closely related.

RELATED

Similarly, SOPP 8005 was correctly identified as the only SOP related to the Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulation change, and the other SOPs were identified as unrelated.

Finally, SOP 8201 was also correctly identified as the only SOP related to the Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry regulation change.

These results demonstrate the effectiveness of the full document matching approach in accurately linking the relevant SOPs to their corresponding regulatory guidance documents.

Text similarity

The following diagram illustrates the text similarity match workflow.

In our second approach, we indexed the SOPs using either vector embeddings for semantic similarity or a keyword-based similarity approach. This allowed us to submit the contents of a regulatory change as a query and return the most similar SOP documents.

The steps involved in this text similarity approach are:

  1. Index the SOPs:
    • For a vector embeddings approach, we generated vector representations of the SOP contents using an LLM to capture semantic similarities.
    • For a keyword-based approach, we identified the most relevant keywords in each SOP and built an index based on their occurrences.
  2. Query the index:
    • For a given regulatory change, we submitted the text as a query to the SOP index.
    • The index then returned the most similar SOPs based on the chosen similarity metric (semantic or keyword-based).

Vector Search

For the text similarity approach, we used the open source in-memory database ChromaDB to generate the vector embeddings and perform the search.

We created a collection within ChromaDB containing all the SOP documents. We then independently queried each regulation guidance document text against this SOP collection. We used the default L2 distance algorithm, where a lower distance score indicates a closer match between the query and the indexed SOP documents.

Although the vector embedding-based text similarity approach identified the top matching SOP document in some cases, it also produced some inaccurate results.

For example, when querying with the Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry regulation, SOPP 9151 was correctly identified as the top match. However, a few other unrelated SOP documents also had low distance scores, which could potentially lead to them being misidentified as relevant:

[
    [
        "SOPP9151-061306.txt",
        0.640560507774353
    ],
    [
        "SOPP-8717--Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt",
        0.8971723914146423
    ],
    [
        "SOPP-8403-Issuance-Reissuance-and-Voluntary-Revocation-of-Biological-Product-Licenses-V6.txt",
        0.9069873690605164
    ],
    [
        "SOPP-8117-Issuing-Tracking-Numbers-eCTD-Format-V9.txt",
        1.069345235824585
    ],
    [
        "SOPP-8507V4-IllegitimateProds-Final.txt",
        1.1143898963928223
    ],
    [
        "SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt",
        1.157564640045166
    ],
    [
        "SOPP-8005-Formal-Dispute-Resolution-Process-V6.txt",
        1.2106068134307861
    ],
    [
        "SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt",
        1.3174282312393188
    ]
]

Similarly, when querying with the Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulation, the vector search incorrectly identified SOPP 8717 as the best match, whereas SOPP 8005, which is more directly related to formal dispute resolution, had a higher distance score:

[
    [
        "SOPP-8717--Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt",
        0.848071277141571
    ],
       …
       …
]

Finally, for the regulation Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry, the vector search again identified SOPP 8717 as the top match, rather than the more relevant SOP 8201:

[
    [
        "SOPP-8717--Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt",
        0.8028254508972168
    ],
    …
    …
]

Keyword search

We also explored a keyword-based similarity method using the Whoosh Python search library. We first created an index of all the SOP documents using the Whoosh library. Then, for each regulation guidance document, we separately searched the index using a Whoosh query parser.

The Whoosh library returns a search score for each matched SOP document, where a higher score indicates a better match to the query.

When searching for the Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry regulation, the top matching document was incorrectly identified as SOPP 8301 instead of the more relevant SOPP 9151:

8 results found
<Hit {'fname': 'SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt'}> with score of 744.420492560645
<Hit {'fname': 'SOPP-8507V4-IllegitimateProds-Final.txt'}> with score of 609.3431135327576
<Hit {'fname': 'SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt'}> with score of 588.5899332740212
<Hit {'fname': 'SOPP-8403-Issuance-Reissuance-and-Voluntary-Revocation-of-Biological-Product-Licenses-V6.txt'}> with score of 582.278310231856
<Hit {'fname': 'SOPP-8005-Formal-Dispute-Resolution-Process-V6.txt'}> with score of 449.11608593886564
<Hit {'fname': 'SOPP-8117-Issuing-Tracking-Numbers-eCTD-Format-V9.txt'}> with score of 377.9723456561558
<Hit {'fname': 'SOPP9151-061306.txt'}> with score of 328.67972016789844
<Hit {'fname': 'SOPP-8717--Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt'}> with score of 160.6985781375766

Similarly, when searching for the Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulation using the keyword-based similarity approach, the Whoosh library again incorrectly identified SOPP 8301 as the top matching document, instead of the more relevant SOPP 8005:

8 results found
<Hit {'fname': 'SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt'}> with score of 721.9535225922815
…
…

In contrast with the previous examples, when searching for the Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry regulation, SOP 8201 was correctly identified as the top matching document:

8 results found
<Hit {'fname': 'SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt'}> with score of 503.3284407067554
…
…

These results indicate that although the keyword-based similarity approach can be effective in some cases, it might also struggle to accurately identify the most relevant SOPs, similar to the challenges faced with the vector embedding-based method.

Taxonomy-based topic matching

The following diagram illustrates the taxonomy-based topic matching architecture.

In our third approach, we focused on building a hierarchical taxonomy of topics associated with the subject of biologics procedures. This taxonomy-based approach aimed to capture the topical relationships between the regulatory guidance documents and the SOPs. This approach has the potential to provide a cost-effective solution, because the entire SOP or input document doesn’t need to be passed with the prompt for every query to the LLM. Performing a full text match between an SOP and a regulatory change might not be a cost-effective approach, especially as the input documents grow in size.

The key steps in this approach were:

Step 1: We constructed a multi-level taxonomy that organized the topics related to biologics procedures. The taxonomy included a hierarchy of main topics, each of which could include subtopics.

The following is an example prompt for building the taxonomy:

Human:

Act as an expert in Biologics procedures for the Food & Drug Administration's Center for Biologics Evaluation and Research (CBER), which help their staff in performing their duties in assuring the safety, purity, potency, and effectiveness of biologics and related products (such as vaccines, live biotherapeutics (probiotics), blood products, and cell, tissue, and gene therapies). Biologics procedures help CBER staff in regulating administration and management of biologics evaluations and reviews including but not limited to clinical studies, electronic submissions, dispute resolutions and management of biologics and related products

I want you to create a hierarchy or taxonomy of topics relating to the biologics procedures. For example, some of the topics may be related to

    - Administrative Guidances

    - Adverse Events and Product Deviation Guidances

    - Application Submission Guidances

    - Biosimilars Guidances

    - Clinical Guidances

    - CMC & GMP Guidances

    - Devices Guidances

    - Generics Guidances

    - Labeling and Promotion Guidances

    - Allergenics Guidances

    - Blood Guidances

    - Cellular and Gene Therapy Guidances

    - Tissue Guidances

    - Vaccine and Related Biological Product Guidances

    - Xenotransplantation Guidances

    - Communication

    - Marketing Applications

    - Compliance

    - Medical Devices

    - Master Files

    - Investigational New Drugs

    - Communication

    - General Information

    - Therapeutic Products

Here's an example of how a topic might be organized:

<topic>
  <name>Master Files</name>
  <subtopic>
      <name>Drug Substance</name>
  </subtopic>

  <subtopic>
      <name>Packaging Material</name>
  </subtopic>

  <subtopic>
      <name>Preparation Information</name>
  </subtopic>

  <subtopic>
      <name>Clinical Study Data</name>
  </subtopic>

  <subtopic>
      <name>Non-Clinical Study Data</name>
  </subtopic>
</topic>

Your hierarchy should include all the topics covering the CBER's biologics procedures, and should be sufficiently exhaustive that it covers many different types of activities and duties in different areas of biologics administration, evaluation and review.

Think step by step, using these steps:

Think about all of the important topics that enable CBER's staff to perform their duties in administering, evaluating, reviewing and managing of biologics and related products.
Organize those topics into a hierarchy or taxonomy. There should be a single top-level item called <biologics_topics>.
Output that hierarchy in an XML format. Make sure that each item in the XML is in plain English and doesn't use special characters like the underscore.

The following is a sample taxonomy hierarchy returned by Amazon Bedrock when using Anthropic’s Claude 3 Opus LLM:

Assistant:

<biologics_topics>

  <topic>
    <name>Administrative Guidances</name>
    <subtopic>
      <name>Forms</name>
    </subtopic>
    <subtopic> 
      <name>Meetings</name>
    </subtopic>
    <subtopic>
      <name>Dispute Resolution</name> 
    </subtopic>
  </topic>

  <topic>
    <name>Submission Guidances</name>
    <subtopic>
      <name>Investigational New Drug Applications</name>
    </subtopic>
    <subtopic>
      <name>Biologics License Applications</name>
    </subtopic>
    <subtopic> 
      <name>Supplements and Other Changes to an Approved Application</name>
    </subtopic>
    <subtopic>
      <name>Biosimilars</name>
    </subtopic>
  </topic>

  <topic> 
    <name>Clinical Guidances</name>
    <subtopic>
      <name>Clinical Trial Design</name>
    </subtopic>
    <subtopic>
      <name>Human Subject Protection</name> 
    </subtopic>
    <subtopic>
      <name>Pediatric Studies</name>
    </subtopic>
  </topic>

  <topic>
    <name>Safety Guidances</name> 
    <subtopic>
      <name>Adverse Event Reporting</name>
    </subtopic>
    <subtopic>
      <name>Safety Assessments</name>
    </subtopic>
    <subtopic>
      <name>Pharmacovigilance</name>
    </subtopic>
  </topic>

  <topic>
    <name>Product Quality Guidances</name>
    <subtopic> 
      <name>Chemistry Manufacturing and Controls</name>
    </subtopic>
    <subtopic>
      <name>Good Manufacturing Practices</name>
    </subtopic>
    <subtopic>
      <name>Analytical Methods</name>
    </subtopic>
  </topic>

  <topic>
    <name>Labeling Guidances</name>
    <subtopic>
      <name>Prescribing Information</name> 
    </subtopic>
    <subtopic>
      <name>Medication Guides</name>
    </subtopic>
    <subtopic>
      <name>Promotional Labeling and Advertising</name>
    </subtopic>
  </topic>

  <topic>
    <name>Product Specific Guidances</name>
    <subtopic>
      <name>Vaccines</name>
    </subtopic> 
    <subtopic>
      <name>Blood and Blood Components</name>
    </subtopic>
    <subtopic>
      <name>Cellular and Gene Therapies</name> 
    </subtopic>
    <subtopic>
      <name>Allergenics</name>
    </subtopic>
    <subtopic> 
      <name>Tissues</name>
    </subtopic>
    <subtopic>
      <name>Xenotransplantation Products</name>
    </subtopic>
  </topic>

  <topic>
    <name>Compliance and Enforcement</name>
    <subtopic>
      <name>Inspections</name>
    </subtopic>
    <subtopic>
      <name>Recalls</name> 
    </subtopic>
    <subtopic>
      <name>Warning Letters</name>
    </subtopic>
  </topic>

</biologics_topics>

This hierarchy covers the key areas involved in CBER’s regulation of biologics, from administrative processes, to investigational products, marketing applications, clinical development, safety monitoring, quality control, labeling, and compliance for the major categories of biological products. The structure aims to capture the breadth of guidances and procedures that CBER staff rely on to ensure the safety, purity, potency and effectiveness of biologics and related products.

Step 2: Using the taxonomy we built, we labeled each SOP with the topics that are covered in the SOP. We instructed the model to select the most specific ones from the list of topics and subtopics.

The following is an example prompt to label SOPs with taxonomy topics using Anthropic’s Claude 3 Opus model:

Human:

Act as an expert who understands Biologics procedures.

The following information describes the details of a Biologics procedure. Choose the topics covered in the description only from the following list of possible topics:

{possible_topics}

<procedure_description>

{procedure}

</procedure_description>

Explain your thinking, and end the explanation by generating a JSON string array containing all of the matching topic names.

All topic names must come from the list of possible topics. Do not return a topic unless it is found within the list of valid topics, subtopics, and sub-subtopics. 

If you generate a topic name that is not found in the list of possible topics, it is an error.

Make sure to select the most specific leaf level sub-subtopic or subtopic available from the list of possible topics, subtopics, and sub-subtopics. Be sure your list covers all of the main points of the Biologics procedure description.

The following is the example SOP topic labeling output from Amazon Bedrock:

Assistant:

SOPP-8005-Formal-Dispute-Resolution-Process-V6.txt: ['Administrative Guidances', 'Dispute Resolution']

SOPP-8117-Issuing-Tracking-Numbers-eCTD-Format-V9.txt: ['Administrative Guidances', 'Submission Guidances', 'Investigational New Drug Applications', 'Biologics License Applications']

SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt: ['Investigational New Drug Applications', 'Meetings', 'Clinical Trial Design', 'Adverse Event Reporting']

SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt: ['Administrative Guidances', 'Submission Guidances', 'Compliance and Enforcement']

SOPP-8403-Issuance-Reissuance-and-Voluntary-Revocation-of-Biological-Product-Licenses-V6.txt: ['Submission Guidances', 'Compliance and Enforcement', 'Labeling Guidances']

SOPP-8507V4-IllegitimateProds-Final.txt: ['Compliance and Enforcement', 'Inspections', 'Recalls']

SOPP-8717--Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt: ['Product Quality Guidances', 'Submission Guidances', 'Administrative Guidances']

SOPP9151-061306.txt: ['Cellular and Gene Therapies', 'Inspections', 'Tissues']

Step 3: To find the relationships between the regulatory guidance documents and the SOPs, we followed a similar approach to label the regulatory changes with the most specific topics or subtopics from the built taxonomy.

The following is an example prompt to label regulatory guidance documents with taxonomy topics:

Human: 
Act as an expert who understands Biologics procedures. The following information describes a regulatory guidance or change that affects how certain Biologics procedures. Please choose the main topic covered in the change description from the following list of possible topics:
{topics_list} 
<regulatory_guidance_description> 
{regulatory_guidance} </regulatory_guidance_description> 
Explain your thinking, and end the explanation by generating an XML item called <topic> with the relevant topic string in it. Make sure the topic is the most specific one available from the list of possible topics, subtopics, and descendants of those. 
You must choose a single leaf-level XML node from the list of possible topics. The correct answer may be in a <topic> node, or <subtopic> or <subsubtopic>. Choose only one, and return the <name> of the selected node inside of a XML item called <topic>. 
Assistant:

Step 4: If the topic assigned to a regulatory change document matches one of the topics of an SOP, we can conclude the regulatory change is related to the SOP.

The following are example outputs of regulatory guidance topic alignment using Amazon Bedrock with the Anthropic Claude 3 Opus model.

The first example uses regulatory guidance Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry. The aligned taxonomy topic is Cellular and Gene Therapies. This topic aligns with SOPP 9151.

For reg change Regs/Regulation_Of_HCT_Ps_Small_Entity_Compliance_Guide.txt:

The regulatory guidance is focused on human cells, tissues, and cellular and tissue-based products (HCT/Ps). It provides an overview of the regulations in 21 CFR Part 1271 that apply to HCT/Ps.

The key topics covered include:

Criteria for regulation solely under section 361 of the PHS Act vs as drugs, devices and/or biological products
Registration and listing requirements for HCT/P establishments
Donor eligibility requirements
Current Good Tissue Practice (CGTP) requirements
FDA inspection and enforcement related to HCT/P establishments

Looking at the list of possible topics, the most relevant high-level topic is:

<topic>

  <name>Product Specific Guidances</name>

Within that, the most applicable subtopic is:

<subtopic>

  <name>Cellular and Gene Therapies</name>

Therefore, the most specific and relevant topic for this regulatory guidance is:

<topic>Cellular and Gene Therapies</topic>

The Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulatory guidance was aligned with the Dispute Resolution topic, which was also identified as a key topic covered in SOPP 8005.

For reg change Regs/Formal_Dispute_Resolution.txt:

The regulatory guidance description is about formal dispute resolution procedures for sponsors to appeal scientific and/or medical disputes to management above the division level at CDER and CBER. This falls under the broad topic of "Administrative Guidances", and the specific subtopic of "Dispute Resolution".

Therefore, the most specific and appropriate topic for this guidance is:

<topic>Dispute Resolution</topic>

The Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry regulatory guidance was mapped to the Investigational New Drug Applications topic, which was found to be aligned with the content covered in SOP 8201, indicating a strong relationship between the guidance and this particular SOP:

For reg change Regs/Submitting_And_Reviewing_Complete_Responses_To_Clinical_Holds.txt:

The regulatory guidance is about the process for submitting and reviewing responses to clinical holds on INDs. The key points are:

  • When FDA imposes a clinical hold on an IND, the study cannot proceed until the sponsor submits a complete response addressing all the clinical hold issues, and FDA notifies the sponsor they can proceed.
  • The guidance describes what the sponsor should include in the complete response, how to submit it, and how FDA will review and respond to it within 30 days.
  • It also covers procedural details like how FDA will track and measure the 30-day response timeline for PDUFA goals.
Looking at the list of possible topics, this guidance falls under:

<topic>

  <name>Submission Guidances</name>

  <subtopic>

    <name>Investigational New Drug Applications</name>

  </subtopic>

</topic>

Since it is specifically about the process for responding to clinical holds on INDs, the most relevant leaf-level topic is:

<topic>Investigational New Drug Applications</topic>

The taxonomic alignment approach was effective in accurately identifying the relationships between the regulatory changes and the SOPs in the test dataset.

Learnings

The following table summarizes our observations. SOPs formatted as red bold italic in the table are misidentified.

Regulation

Full text match

Taxonomy topic match Text similarity
Vector embeddings

Keyword Search

Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry

SOPP 9151

SOPP 9151 SOPP 9151

SOPP 8301

Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry

SOPP 8005

SOPP 8005

SOPP 8717

SOPP 8201

Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry

SOP 8201

SOP 8201

SOPP 8717

SOP 8201

The combination of the full text matching and taxonomy-based topic matching approaches, using Amazon Bedrock and the Anthropic Claude 3 Opus model, enabled accurate identification of the SOPs most closely related to the regulation guidance documents in the dataset. In contrast, the text similarity methods using vector embeddings and keyword search were less successful in correctly matching the SOPs to the relevant regulatory guidance documents.

Conclusion

In this post, we explored various approaches to quickly identify the relationships between regulatory changes and an organization’s SOPs, using Amazon Bedrock and Anthropic’s Claude 3 Opus model. The methods we evaluated included full text matching, text similarity using vector embeddings and keyword search, and a taxonomy-based topic alignment approach.

Our findings indicate that the full text matching and taxonomy-based topic matching were the most effective in accurately identifying the SOPs most closely related to the regulation guidance documents in the test dataset. In contrast, the text similarity techniques using vector embeddings and keyword search were less reliable in consistently matching the SOPs to the relevant regulatory documents.

Both the full text matching and taxonomy-based approaches can be viable options for organizations to assess the relationships between regulatory changes and their internal SOPs. The full text matching might provide more accurate results but requires providing the complete text of the SOP or input document, which could have cost implications.

The taxonomy-based approach, on the other hand, offers a structured way to map the content of the documents to a customizable topic hierarchy. Although the initial taxonomy might not be complete or fully accurate, it can be further enriched and tailored to an organization’s specific needs. If you choose a taxonomy-based approach, you can use a machine-generated starting point and then refine it to better suit your domain and use case requirements.

By adopting the taxonomy-based approach and adapting it to their specific needs, organizations can not only identify the relationships between regulatory changes and SOPs, they can also assess the potential impact of regulatory changes on their internal procedures. This can help streamline the process of screening SOPs against new regulations and fast-track the impact assessment in regulated industries like life sciences and others.

If you want to implement a similar solution in your AWS environment, reach out to your AWS account team for assistance.

I would like to acknowledge Greg Sommerville, Thomaz Silva and Murtuza Bootwala for their contributions to this blog. It couldn’t have been done without them.


About the Author

Ganesh Raam Ramadurai is a Senior Technical Program Manager at Amazon Web Services (AWS), where he leads the PACE (Prototyping and Cloud Engineering) team. He specializes in delivering innovative, AI/ML and Generative AI-driven prototypes that help AWS customers explore emerging technologies and unlock real-world business value. With a strong focus on experimentation, scalability, and impact, Ganesh works at the intersection of strategy and engineering—accelerating customer innovation and enabling transformative outcomes across industries.

Read More