Amazon AWS – Page 7

Unlocking the power of Model Context Protocol (MCP) on AWS

June 3, 2025

by Aditya Addepalli Amazon AWS

We’ve witnessed remarkable advances in model capabilities as generative AI companies have invested in developing their offerings. Language models such as Anthropic’s Claude Opus 4 & Sonnet 4, Amazon Nova, and Amazon Bedrock can reason, write, and generate responses with increasing sophistication. But even as these models grow more powerful, they can only work with the information available to them.

No matter how impressive a model might be, it’s confined to the data it was trained on or what’s manually provided in its context window. It’s like having the world’s best analyst locked in a room with incomplete files—brilliant, but isolated from your organization’s most current and relevant information.

This isolation creates three critical challenges for enterprises using generative AI:

Information silos trap valuable data behind custom APIs and proprietary interfaces
Integration complexity requires building and maintaining bespoke connectors and glue code for every data source or tool provided to the language model for every data source
Scalability bottlenecks appear as organizations attempt to connect more models to more systems and tools

Sound familiar? If you’re an AI-focused developer, technical decision-maker, or solution architect working with Amazon Web Services (AWS) and language models, you’ve likely encountered these obstacles firsthand. Let’s explore how the Model Context Protocol (MCP) offers a path forward.

What is the MCP?

The MCP is an open standard that creates a universal language for AI systems to communicate with external data sources, tools, and services. Conceptually, MCP functions as a universal translator, enabling seamless dialogue between language models and the diverse systems where your valuable information resides.

Developed by Anthropic and released as an open source project, MCP addresses a fundamental challenge: how to provide AI models with consistent, secure access to the information they need, when they need it, regardless of where that information lives.

At its core, MCP implements a client-server architecture:

MCP clients are AI applications like Anthropic’s Claude Desktop or custom solutions built on Amazon Bedrock that need access to external data
MCP servers provide standardized access to specific data sources, whether that’s a GitHub repository, Slack workspace, or AWS service
Communication flow between clients and servers follows a well-defined protocol that can run locally or remotely

This architecture supports three essential primitives that form the foundation of MCP:

Tools – Functions that models can call to retrieve information or perform actions
Resources – Data that can be included in the model’s context such as database records, images, or file contents
Prompts – Templates that guide how models interact with specific tools or resources

What makes MCP especially powerful is its ability to work across both local and remote implementations. You can run MCP servers directly on your development machine for testing or deploy them as distributed services across your AWS infrastructure for enterprise-scale applications.

Solving the M×N integration problem

Before diving deeper into the AWS specific implementation details, it’s worth understanding the fundamental integration challenge MCP solves.

Imagine you’re building AI applications that need to access multiple data sources in your organization. Without a standardized protocol, you face what we call the “M×N problem”: for M different AI applications connecting to N different data sources, you need to build and maintain M×N custom integrations.

This creates an integration matrix that quickly becomes unmanageable as your organization adds more AI applications and data sources. Each new system requires multiple custom integrations, with development teams duplicating efforts across projects. MCP transforms this M×N problem into a simpler M+N equation: with MCP, you build M clients and N servers, requiring only M+N implementations. These solutions to the MCP problem are shown in the following diagram.

This approach draws inspiration from other successful protocols that solved similar challenges:

APIs standardized how web applications interact with the backend
Language Server Protocol (LSP) standardizes how integrated development environments (IDEs) interact with language-specific tools for coding

In the same way that these protocols revolutionized their domains, MCP is poised to transform how AI applications interact with the diverse landscape of data sources in modern enterprises.

Why MCP matters for AWS users

For AWS customers, MCP represents a particularly compelling opportunity. AWS offers hundreds of services, each with its own APIs and data formats. By adopting MCP as a standardized protocol for AI interactions, you can:

Streamline integration between Amazon Bedrock language models and AWS data services
Use existing AWS security mechanisms such as AWS Identity and Access Management (IAM) for consistent access control
Build composable, scalable AI solutions that align with AWS architectural best practices

MCP and the AWS service landscape

What makes MCP particularly powerful in the AWS context is how it can interface with the broader AWS service landscape. Imagine AI applications that can seamlessly access information from:

Amazon Simple Storage (Amazon S3) buckets containing documents, images, and unstructured data
Amazon DynamoDB tables storing structured business information
Amazon Relational Database Service (Amazon RDS) databases with historical transaction records
Amazon CloudWatch logs for operational intelligence
Amazon Bedrock Knowledge Bases for semantic search capabilities

MCP servers act as consistent interfaces to these diverse data sources, providing language models with a unified access pattern regardless of the underlying AWS service architecture. This alleviates the need for custom integration code for each service and enables AI systems to work with your AWS resources in a way that respects your existing security boundaries and access controls.

In the remaining sections of this post, we explore how MCP works with AWS services, examine specific implementation examples, and provide guidance for technical decision-makers considering adopt MCP in their organizations.

How MCP works with AWS services, particularly Amazon Bedrock

Now that we’ve shown the fundamental value proposition of MCP, we dive into how it integrates with AWS services, with a special focus on Amazon Bedrock. This integration creates a powerful foundation for building context-aware AI applications that can securely access your organization’s data and tools.

Amazon Bedrock and language models

Amazon Bedrock represents the strategic commitment by AWS to make foundation models (FMs) accessible, secure, and enterprise-ready. It’s a fully managed service that provides a unified API across multiple leading language models, including:

Anthropic’s Claude
Meta’s Llama
Amazon Titan and Amazon Nova

What makes Amazon Bedrock particularly compelling for enterprise deployments is its integration with the broader AWS landscape. You can run FMs with the same security, compliance, and operational tools you already use for your AWS workloads. This includes IAM for access control and CloudWatch for monitoring.

At the heart of the versatility of Amazon Bedrock is the Converse API—the interface that enables multiturn conversations with language models. The Converse API includes built-in support for what AWS calls “tool use,” allowing models to:

Recognize when they need information outside their training data
Request that information from external systems using well-defined function calls
Incorporate the returned data into their responses

This tool use capability in the Amazon Bedrock Converse API dovetails perfectly with MCP’s design, creating a natural integration point.

MCP and Amazon Bedrock integration architecture

Integrating MCP with Amazon Bedrock involves creating a bridge between the model’s ability to request information (through the Converse API) and MCP’s standardized protocol for accessing external systems.

Integration flow walkthrough

To help you understand how MCP and Amazon Bedrock work together in practice, we walk through a typical interaction flow, step-by-step:

The user initiates a query through your application interface:

"What were our Q1 sales figures for the Northwest region?"

Your application forwards the query to Amazon Bedrock through the Converse API:

   # Initialize the Bedrock runtime client with your AWS credentials
   bedrock = boto3.client(service_name='bedrock-runtime', region_name='us-east-1')
   
   # Define the query from the user
   user_query = "What were our Q1 sales figures for the Northwest region?"
   
   # available_tools contains tool definitions that match MCP server capabilities
   # These will be exposed to the model through the Converse API
   
   # Call the Converse API with the user's query and available tools
   response = bedrock.converse(
       modelId="us.anthropic.claude-3-7-sonnet-20250219-v1:0",  # Specify which language model to use
       messages=[{"role": "user", "content": [{"text": user_query}]}],  # Format the user's message
       toolConfig={"tools": available_tools}  # Pass the tool definitions to the model
   )

Amazon Bedrock processes the query and determines that it needs financial data that isn’t in its training data
Amazon Bedrock returns a toolUse message, requesting access to a specific tool:

   {
     "role": "assistant",  // Indicates this message is from the model
     "content": [{
       "toolUse": {  // The model is requesting to use a tool
         "toolUseId": "tu_01234567",  // Unique identifier for this tool use request
         "name": "query_sales_data",  // Name of the tool the model wants to use
         "input": {  // Parameters for the tool call
           "quarter": "Q1",  // The model extracted this parameter from the user query
           "region": "Northwest"  // Another parameter extracted from the user query
         }
       }
     }]
   }

Your MCP client application receives this toolUse message and translates it into an MCP protocol
tool call
The MCP client routes the request to the appropriate MCP server (in this case, a server connected to your
financial database)
The MCP server executes the tool, retrieving the requested data from your systems:

   # Call the tool through the MCP protocol
   # session is the MCP client session established earlier
   result = await session.call_tool(
       "query_sales_data",  # The tool name from the toolUse message
       {
           "quarter": "Q1",  # Pass through the parameters from the toolUse message
           "region": "Northwest"
       }
   )
   # The MCP server handles authentication, data access, and result formatting
   # This abstracts away the complexity of accessing different data sources

The tool results are returned through the MCP protocol to your client application
Your application sends the results back to Amazon Bedrock as a toolResult message:

   {
     "role": "user",  // This is sent as if from the user, but contains tool results
     "content": [{
       "toolResult": {  // Indicates this is a result from a tool
         "toolUseId": "tu_01234567",  // Must match the ID from the original toolUse
         "content": [{
           "json": {  // Results are formatted as JSON
             "total_sales": 12450000,  // Numerical data accessible to the model
             "growth": 0.12,  // Percentage growth for analysis
             "top_products": ["Product A", "Product B", "Product C"]  // List data
           }
         }]
       }
     }]
   }

Amazon Bedrock generates a final response incorporating the tool results:

“Based on the data I've retrieved, our Q1 sales figures for the Northwest region were $12.45 million, 
representing a 12% growth compared to the previous quarter. 
The top-performing products were Product A, Product B, and Product C.”

Your application returns the final response to the user

This entire process, illustrated in the following diagram, happens in seconds, giving users the impression of a seamless conversation with an AI that has direct access to their organization’s data. Behind the scenes, MCP is handling the complex work of securely routing requests to the right tools and data sources.

In the next section, we explore a practical implementation example that shows how to connect an MCP server to Amazon Bedrock Knowledge Bases, providing a blueprint for your own implementations.

Practical implementation example: Amazon Bedrock Knowledge Bases integration

As you might recall from our earlier discussion of strategic use cases, enterprise knowledge bases represent one of the most valuable applications of MCP on AWS. Now, we explore a concrete implementation of MCP that connects language models to Amazon Bedrock Knowledge Bases. The code for the MCP server can be found in the AWS Labs MCP code repository and for the client in the same AWS Labs MCP samples directory on GitHub. This example brings to life the “universal translator” concept we introduced earlier, demonstrating how MCP can transform the way AI systems interact with enterprise knowledge repositories.

Understanding the challenge

Enterprise knowledge bases contain vast repositories of information—from documentation and policies to technical guides and product specifications. Traditional search approaches are often inadequate when users ask natural language questions, failing to understand context or identify the most relevant content.

Amazon Bedrock Knowledge Bases provide vector search capabilities that improve upon traditional keyword search, but even this approach has limitations:

Manual filter configuration requires predefined knowledge of metadata structures
Query-result mismatch occurs when users don’t use the exact terminology in the knowledge base
Relevance challenges arise when similar documents compete for attention
Context switching between searching and reasoning disrupts user experience

The MCP server we explore addresses these challenges by creating an intelligent layer between language models and knowledge bases.

Architecture overview

At a high level, our MCP server for Amazon Bedrock Knowledge Bases follows a clean, well-organized architecture that builds upon the client-server pattern we outlined previously. The server exposes two key interfaces to language models:

A knowledge bases resource that provides discovery capabilities for available knowledge bases
A query tool that enables dynamic searching across these knowledge bases

Remember the M×N integration problem we discussed earlier? This implementation provides a tangible example of how MCP solves it – creating a standardized interface between a large language model and your Amazon Bedrock Knowledge Base repositories.

Knowledge base discovery resource

The server begins with a resource that enables language models to discover available knowledge bases:

@mcp.resource(uri='resource://knowledgebases', name='KnowledgeBases', mime_type='application/json')
async def knowledgebases_resource() -> str:
    """List all available Amazon Bedrock Knowledge Bases and their data sources.
 
    This resource returns a mapping of knowledge base IDs to their details, including:
    - name: The human-readable name of the knowledge base
    - data_sources: A list of data sources within the knowledge base, each with:
      - id: The unique identifier of the data source
      - name: The human-readable name of the data source
 
    ## Example response structure:
    ```json
    {
        "kb-12345": {
            "name": "Customer Support KB",
            "data_sources": [
                {"id": "ds-abc123", "name": "Technical Documentation"},
                {"id": "ds-def456", "name": "FAQs"}
            ]
        },
        "kb-67890": {
            "name": "Product Information KB",
            "data_sources": [
                {"id": "ds-ghi789", "name": "Product Specifications"}
            ]
        }
    }
    ```
 
    ## How to use this information:
    1. Extract the knowledge base IDs (like "kb-12345") for use with the QueryKnowledgeBases tool
    2. Note the data source IDs if you want to filter queries to specific data sources
    3. Use the names to determine which knowledge base and data source(s) are most relevant to the user's query
    """
    return json.dumps(await discover_knowledge_bases(kb_agent_mgmt_client, kb_inclusion_tag_key))

This resource serves as both documentation and a discovery mechanism that language models can use to identify available knowledge bases before querying them.

Querying knowledge bases with the MCP tool

The core functionality of this MCP server resides in its QueryKnowledgeBases tool:

@mcp.tool(name='QueryKnowledgeBases')
async def query_knowledge_bases_tool(
    query: str = Field(
        ..., description='A natural language query to search the knowledge base with'
    ),
    knowledge_base_id: str = Field(
        ...,
        description='The knowledge base ID to query. It must be a valid ID from the resource://knowledgebases MCP resource',
    ),
    number_of_results: int = Field(
        10,
        description='The number of results to return. Use smaller values for focused results and larger values for broader coverage.',
    ),
    reranking: bool = Field(
        kb_reranking_enabled,
        description='Whether to rerank the results. Useful for improving relevance and sorting. Can be globally configured with BEDROCK_KB_RERANKING_ENABLED environment variable.',
    ),
    reranking_model_name: Literal['COHERE', 'AMAZON'] = Field(
        'AMAZON',
        description="The name of the reranking model to use. Options: 'COHERE', 'AMAZON'",
    ),
    data_source_ids: Optional[List[str]] = Field(
        None,
        description='The data source IDs to filter the knowledge base by. It must be a list of valid data source IDs from the resource://knowledgebases MCP resource',
    ),
) -> str:
    """Query an Amazon Bedrock Knowledge Base using natural language.
 
    ## Usage Requirements
    - You MUST first use the `resource://knowledgebases` resource to get valid knowledge base IDs
    - You can query different knowledge bases or make multiple queries to the same knowledge base
 
    ## Query Tips
    - Use clear, specific natural language queries for best results
    - You can use this tool MULTIPLE TIMES with different queries to gather comprehensive information
    - Break complex questions into multiple focused queries
    - Consider querying for factual information and explanations separately
     """
## Additional Implementation details …

What makes this tool powerful is its flexibility in querying knowledge bases with natural language. It supports several key features:

Configurable result sizes – Adjust the number of results based on whether you need focused or comprehensive information
Optional reranking – Improve relevance using language models (such as reranking models from Amazon or Cohere)
Data source filtering – Target specific sections of the knowledge base when needed

Reranking is disabled by default in this implementation but can be quickly enabled through environment variables or direct parameter configuration.

Enhanced relevance with reranking

A notable feature of this implementation is the ability to rerank search results using language models available through Amazon Bedrock. This capability allows the system to rescore search results based on deeper semantic understanding:

# Parse reranking enabled environment variable
kb_reranking_enabled_raw = os.getenv('BEDROCK_KB_RERANKING_ENABLED')
kb_reranking_enabled = False  # Default value is now False (off)
if kb_reranking_enabled_raw is not None:
    kb_reranking_enabled_raw = kb_reranking_enabled_raw.strip().lower()
    if kb_reranking_enabled_raw in ('true', '1', 'yes', 'on'):
        kb_reranking_enabled = True

Reranking is particularly valuable for queries where semantic similarity might not be enough to determine the
most relevant content. For example, when answering a specific question, the most relevant document isn’t necessarily
the one with the most keyword matches, but the one that directly addresses the question being asked.

Full interaction flow

This section walks through a complete interaction flow to show how all these components work
together:

The user asks a question to a language model such as Anthropic’s Claude through an application:

   "What's our quarterly IT security audit procedure?"

The language model recognizes it needs to access the knowledge base and calls the MCP tool:

{
  "toolUse": {
    "name": "QueryKnowledgeBases",
    "input": {
      "query": "What's our quarterly IT security audit procedure?",
      "knowledge_base_id": "kb-12345abcde",
      "reranking": true
    }
  }
}

The MCP server processes the request by querying the knowledge base with the specified parameters
The MCP server returns formatted results to the language model, including content, location, and relevance scores:

Applied metadata filters: department=IT, document_type=Procedure, category=Security Audit, frequency=Quarterly
 
   Results have been reranked to improve relevance.
 
   Result 1 [Score: 0.92]:
   Source: S3 - s3://company-docs/procedures/it/security/quarterly-audit-procedure-v2.pdf
   Content: The IT Security Quarterly Audit Procedure outlines the steps for conducting systematic evaluations of the organization's information security controls. This document provides guidance on scope, methodology, reporting requirements, and remediation tracking...
   Metadata:
     • department: IT
     • document_type: Procedure
     • category: Security Audit
     • frequency: Quarterly
     • last_updated: 2024-11-15

The language model incorporates these results into its response to the user:

   Based on our IT Security Quarterly Audit Procedure, the process involves four main phases:
 
   1. Planning: Define scope, objectives, and schedule for the audit
   2. Execution: Conduct the technical assessments and control evaluations
   3. Reporting: Document findings, risks, and recommendations
   4. Remediation: Track and verify resolution of identified issues
 
   The procedure was last updated on November 15, 2024, and specifies that the Cybersecurity team leads the effort with support from IT Operations.

This interaction, illustrated in the following diagram, demonstrates the seamless fusion of language model capabilities with enterprise knowledge, enabled by the MCP protocol. The user doesn’t need to specify complex search parameters or know the structure of the knowledge base—the integration layer handles these details automatically.

Looking ahead: The MCP journey continues

As we’ve explored throughout this post, the Model Context Protocol provides a powerful framework for connecting language models to your enterprise data and tools on AWS. But this is just the beginning of the journey.

The MCP landscape is rapidly evolving, with new capabilities and implementations emerging regularly. In future posts in this series, we’ll dive deeper into advanced MCP architectures and use cases, with a particular focus on remote MCP implementation.

The introduction of the new Streamable HTTP transport layer represents a significant advancement for MCP, enabling truly enterprise-scale deployments with features such as:

Stateless server options for simplified scaling
Session ID management for request routing
Robust authentication and authorization mechanisms for secure access control
Horizontal scaling across server nodes
Enhanced resilience and fault tolerance

These capabilities will be essential as organizations move from proof-of-concept implementations to production-grade MCP deployments that serve multiple teams and use cases.

We invite you to follow this blog post series as we continue to explore how MCP and AWS services can work together to create more powerful, context-aware AI applications for your organization.

Conclusion

As language models continue to transform how we interact with technology, the ability to connect these models to enterprise data and systems becomes increasingly critical. The Model Context Protocol (MCP) offers a standardized, secure, and scalable approach to integration.

Through MCP, AWS customers can:

Establish a standardized protocol for AI-data connections
Reduce development overhead and maintenance costs
Enforce consistent security and governance policies
Create more powerful, context-aware AI experiences

The Amazon Bedrock Knowledge Bases implementation we explored demonstrates how MCP can transform simple retrieval into intelligent discovery, adding value far beyond what either component could deliver independently.

Getting started

Ready to begin your MCP journey on AWS? Here are some resources to help you get started:

Learning resources:

Implementation steps:

Identify a high-value use case where AI needs access to enterprise data
Select the appropriate MCP servers for your data sources
Set up a development environment with local MCP implementations
Integrate with Amazon Bedrock using the patterns described in this post
Deploy to production with appropriate security and scaling considerations

Remember that MCP offers a “start small, scale incrementally” approach. You can begin with a single server connecting to one data source, then expand your implementation as you validate the value and establish patterns for your organization.

We encourage you to try the MCP with AWS services today. Start with a simple implementation, perhaps connecting a language model to your documentation or code repositories, and experience firsthand the power of context-aware AI.

Share your experiences, challenges, and successes with the community. The open source nature of MCP means that your contributions—whether code, use cases, or feedback—can help shape the future of this important protocol.

In a world where AI capabilities are advancing rapidly, the difference between good and great implementations often comes down to context. With MCP and AWS, you have the tools to make sure your AI systems have the right context at the right time, unlocking their full potential for your organization.

This blog post is part of a series exploring the Model Context Protocol (MCP) on AWS. In our next installment, we’ll explore the world of agentic AI, demonstrating how to build autonomous agents using the open-source Strands Agents SDK with MCP to create intelligent systems that can reason, plan, and execute complex multi-step workflows. We’ll also explore advanced implementation patterns, remote MCP architectures, and discover additional use cases for MCP.

About the authors

Aditya Addepalli is a Delivery Consultant at AWS, where he works to lead, architect, and build applications directly with customers. With a strong passion for Applied AI, he builds bespoke solutions and contributes to the ecosystem while consistently keeping himself at the edge of technology. Outside of work, you can find him meeting new people, working out, playing video games and basketball, or feeding his curiosity through personal projects.

Elie Schoppik leads live education at Anthropic as their Head of Technical Training. He has spent over a decade in technical education, working with multiple coding schools and starting one of his own. With a background in consulting, education, and software engineering, Elie brings a practical approach to teaching Software Engineering and AI. He’s shared his insights at a variety of technical conferences as well as universities including MIT, Columbia, Wharton, and UC Berkeley.

Jawhny Cooke is a Senior Anthropic Specialist Solutions Architect for Generative AI at AWS. He specializes in integrating and deploying Anthropic models on AWS infrastructure. He partners with customers and AI providers to implement production-grade generative AI solutions through Amazon Bedrock, offering expert guidance on architecture design and system implementation to maximize the potential of these advanced models.

Kenton Blacutt is an AI Consultant within the GenAI Innovation Center. He works hands-on with customers helping them solve real-world business problems with cutting edge AWS technologies, especially Amazon Q and Bedrock. In his free time, he likes to travel, experiment with new AI techniques, and run an occasional marathon.

Mani Khanuja is a Principal Generative AI Specialist Solutions Architect, author of the book Applied Machine Learning and High-Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Nicolai van der Smagt is a Senior Specialist Solutions Architect for Generative AI at AWS, focusing on third-party model integration and deployment. He collaborates with AWS’ biggest AI partners to bring their models to Amazon Bedrock, while helping customers architect and implement production-ready generative AI solutions with these models.

Build a scalable AI assistant to help refugees using AWS

June 3, 2025

by Taras Tsarenko Amazon AWS

This post is co-written with Taras Tsarenko, Vitalil Bozadzhy, and Vladyslav Horbatenko.

As organizations worldwide seek to use AI for social impact, the Danish humanitarian organization Bevar Ukraine has developed a comprehensive virtual generative AI-powered assistant called Victor, aimed at addressing the pressing needs of Ukrainian refugees integrating into Danish society. This post details our technical implementation using AWS services to create a scalable, multilingual AI assistant system that provides automated assistance while maintaining data security and GDPR compliance.

Bevar Ukraine was established in 2014 and has been at the forefront of supporting Ukrainian refugees in Denmark since the full-scale war in 2022, providing assistance to over 30,000 Ukrainians with housing, job search, and integration services. The organization has also delivered more than 200 tons of humanitarian aid to Ukraine, including medical supplies, generators, and essential items for civilians affected by the war.

Background and challenges

The integration of refugees into host countries presents multiple challenges, particularly in accessing public services and navigating complex legal procedures. Traditional support systems, relying heavily on human social workers, often face scalability limitations and language barriers. Bevar Ukraine’s solution addresses these challenges through an AI-powered system that operates continuously while maintaining high standards of service quality.

Solution overview

The solution’s backbone comprises several AWS services to deliver a reliable, secure, and efficient generative AI-powered digital assistant for Ukrainian refugees. The team consisting of three volunteer software developers developed the solution within weeks.

The following diagram illustrates the solution architecture.

Amazon Elastic Compute Cloud (Amazon EC2) serves as the primary compute layer, using Spot Instances to optimize costs. Amazon Simple Storage Service (Amazon S3) provides secure storage for conversation logs and supporting documents, and Amazon Bedrock powers the core natural language processing capabilities. Bevar Ukraine uses Amazon DynamoDB for real-time data access and session management, providing low-latency responses even under high load.

In the process of implementation, we discovered that Anthropic’s Claude 3.5 large language model (LLM) is best suited due to its advanced dialogue logic and ability to maintain a human-like tone. It’s best for thorough, reasoned responses and generating more creative content, which makes Victor’s replies more natural and engaging.

Amazon Titan Embeddings G1 – Text v1.2 excels at producing high-quality vector representations of multilingual text, enabling efficient semantic search and similarity comparisons. This is particularly valuable when Victor needs to retrieve relevant information from a large knowledge base or match users’ queries to previously seen inputs. Amazon Titan Embeddings also integrates smoothly with AWS, simplifying tasks like indexing, search, and retrieval.

In real-world interactions with Victor, some queries require short, specific answers, whereas others need creative generation or contextual understanding. By combining Anthropic’s Claude 3.5. for generation and Amazon Titan Embeddings G1 for semantic retrieval, Victor can route each query through the most appropriate pipeline, retrieving relevant context through embeddings and generating a response, resulting in more accurate and context-aware answers.

Amazon Bedrock provides a remarkable interface to call Anthropic’s Claude 3.5 and Amazon Titan Embeddings G1 (along with other models) without creating separate integrations for each provider, simplifying development and maintenance.

For multilingual support, we used embedders that support multi-language embeddings and translated our materials using Amazon Translate. This enhances the resilience of our Retrieval Augmented Generation (RAG) system. The application is built securely and uses AWS services to accomplish this. AWS Key Management Service (AWS KMS) simplifies the process of encrypting data within the application, and Amazon API Gateway supports the applications REST endpoints. User authentication and authorization capabilities are supported by Amazon Cognito, which provides secure and scalable customer identity and access management (CIAM) capabilities.

The application runs on AWS infrastructure using services that are designed to be secure and scalable like Amazon S3, AWS Lambda, and DynamoDB.

Tips and recommendations

Building an AI assistant solution for refugees using Amazon Bedrock and other AWS services has provided valuable insights into creating impactful AI-powered humanitarian solutions. Through this implementation, we discovered key considerations that organizations should keep in mind when developing similar solutions. The experience highlighted the importance of balancing technical capabilities with human-centric design, providing multilingual support, maintaining data privacy, and creating scalable yet cost-effective solutions. These learnings can serve as a foundation for organizations looking to use AI and cloud technologies to support humanitarian causes, particularly in creating accessible and helpful digital assistance for displaced populations. The following are the main

Use the Amazon Bedrock playground to test multiple LLMs side by side using the same prompt. This helps you find the model that gives the best quality, style, and tone of response for your specific use case (for example, factual accuracy vs. conversational tone).
Experiment with prompts and settings to improve responses.
Keep costs in mind; set up monitoring and budgets in AWS.
For tasks involving information retrieval or semantic search, select an embedding model while making sure to pick the appropriate settings. Pay attention to the size of the embeddings, because larger vectors can capture more meaning but might increase costs. Also, check that the model supports the languages your application requires.
If you’re using a knowledge base, use the Amazon Bedrock knowledge base playground to experiment with how content is chunked and how many passages are retrieved for each query. Finding the right number of retrieved passages can make a big difference in how clear and focused the final answers are—sometimes fewer, high-quality chunks work better than sending too much context.
To enforce safety and privacy, use Amazon Bedrock Guardrails. Guardrails can help prevent the model from leaking sensitive information, such as personal data or internal business content, and you can block harmful responses or enforce a specific tone and formatting style.
Start with a simple prototype, test the embedding quality in your domain, and expand iteratively.

Integration and enhancement layer

Bevar Ukraine has extended the core AWS infrastructure with several complementary technologies:

Pinecone vector database – For efficient storage and retrieval of semantic embeddings
DSPy framework – For structured prompt engineering and optimization of Anthropic’s Claude 3.5 Sonnet responses
EasyWeek – For appointment scheduling and resource management
Telegram API – For UI delivery
Amazon Bedrock Guardrails – For security policy enforcement
Amazon Rekognition – For document verification
GitHub-based continuous integration and delivery (CI/CD) pipeline – For rapid feature deployment

Key technical insights

The implementation revealed several crucial technical considerations. The DSPy framework was crucial in optimizing and enhancing our language model prompts. By integrating additional layers of reasoning and context awareness tools, DSPy notably improved response accuracy, consistency, and depth. The team found that designing a robust knowledge base with comprehensive metadata was fundamental to the system’s effectiveness.

GDPR compliance required careful architectural decisions, including data minimization, secure storage, and clear user consent mechanisms. Cost optimization was achieved through strategic use of EC2 Spot Instances and implementation of API request throttling, resulting in significant operational savings without compromising performance.

Future enhancements

Our roadmap includes several technical improvements to enhance the system’s capabilities:

Implementing advanced context dispatching using machine learning algorithms to improve service coordination across multiple domains
Developing a sophisticated human-in-the-loop validation system for complex cases requiring expert oversight
Migrating suitable components to a serverless architecture using Lambda to optimize resource utilization and costs
Enhancing the knowledge base with advanced semantic search capabilities and automated content updates

Results

This solution, which serves hundreds of Ukrainian refugees in Denmark daily, demonstrates the potential of AWS services in creating scalable, secure, and efficient AI-powered systems for social impact. As a result, volunteers and employees of Bevar Ukraine have saved thousands of hours, and instead of answering repetitive questions from refugees, can support them in more complicated life situations. For refugees, the virtual assistant Victor is a lifeline support that allows users to get responses to the most pressing questions about public services in Denmark and many other questions in seconds instead of having to wait for an available volunteer to help. Given the vast knowledge base Victor is using to generate responses, the quality of support has improved as well.

Conclusion

Through careful architecture design and integration of complementary technologies, we’ve created a platform that effectively addresses the challenges faced by refugees while maintaining high standards of security and data protection.

The success of this implementation provides a blueprint for similar solutions in other social service domains, potentially supporting refugees and other people in need around the world, highlighting the importance of combining robust cloud infrastructure with thoughtful system design to create meaningful social impact.

About the Authors

Taras Tsarenko is a Program Manager at Bevar Ukraine. For over a decade in the world of technology, Taras has led everything from tight-knit agile teams of 5 or more to a company of 90 people that became the best small IT company in Ukraine under 100 people in 2015. Taras is a builder who thrives at the intersection of strategy and execution, where technical expertise meets human impact, whether it’s streamlining workflows, solving complex problems, or empowering teams to create meaningful products. Taras specializes in AI-driven solutions and data engineering, leveraging technologies like machine learning and generative AI using Amazon SageMaker AI, Amazon Bedrock, Amazon OpenSearch Service, and more. Taras is an AWS Certified ML Engineer Associate.

Anton Garvanko is a Senior Analytics Sales Specialist for Europe North at AWS. As a finance professional turned salesman, Anton spent 15 years in various finance leadership roles in supply chain and logistics as well as financial services industries. Anton joined Amazon over 5 years ago and has been part of specialist sales teams focusing on business intelligence, analytics, and generative AI for over 3 years. He is passionate about connecting the worlds of finance and IT by making sure that business intelligence and analytics powered by generative AI support everyday decision-making across industries and use cases.

Vitalii Bozadzhy is a Senior Developer with extensive experience in building high-load, cloud-based solutions, specializing in Java, Golang, SWIFT, and Python. He specializes in scalable backend systems, microservice architectures designed to automate business processes, as well as building reliable and secure cloud infrastructures. Furthermore, he has experience in optimizing compute resources and building advanced solutions integrated into products. His expertise covers the full development cycle—from design and architecture to deployment and maintenance—with a strong focus on performance, fault tolerance, and innovation.

Vladyslav Horbatenko is a computer science student, Professor Assistant, and Data Scientist with a strong focus on artificial intelligence. Vladyslav began his journey with machine learning, reinforcement learning, and deep learning, and gradually became more interested in large language models (LLMs) and their potential impact. This led him to deepen his understanding of LLMs, and now he works on developing, maintaining, and improving LLM-based solutions. He contributes to innovative projects while staying up to date with the latest advancements in AI.

Enhanced diagnostics flow with LLM and Amazon Bedrock agent integration

June 3, 2025

by Ray Wang Amazon AWS

Noodoe is a global leader in EV charging innovation, offering advanced solutions that empower operators to optimize their charging station operations and provide exceptional user experiences. Their universal charging stations are compatible with all EV brands and feature intuitive payment options, including credit cards and Apple Pay. Powered by the Noodoe EV OS cloud management system, the company delivers round-the-clock automated monitoring, diagnostics, and maintenance, achieving a market-leading uptime of 99.83%. With operations in over 15 countries and a robust commitment to sustainability, Noodoe is transforming the EV charging industry through cutting-edge technology and a user-first approach.

Despite its technological strengths, Noodoe has encountered key challenges in helping station operators optimize performance and select the most cost-effective electricity pricing strategies across diverse markets. Traditional systems lack the capability to efficiently process vast amounts of real-time and historical data or provide personalized, station-level recommendations. This limits operators’ ability to make timely, informed decisions—resulting in higher electricity costs, underutilized assets, and a subpar customer experience. These inefficiencies not only reduce profitability but also hinder the ability to scale in a competitive and fast-evolving EV charging landscape.

To solve this, Noodoe has integrated large language models (LLMs) through Amazon Bedrock and Amazon Bedrock Agents to deliver intelligent automation, real-time data access, and multilingual support. These AI-powered tools analyze usage patterns, station diagnostics, and external variables like weather or grid conditions to generate highly tailored pricing recommendations. By using the structured orchestration and prompt-based reasoning of Amazon Bedrock, Noodoe equips operators with actionable insights that improve margins, enhance station utilization, and allow them to offer more competitive rates to users—ultimately boosting customer satisfaction. This service is delivered through a subscription model, creating a new, scalable revenue stream for Noodoe while reinforcing its leadership and innovation in the EV infrastructure space.

In this post, we explore how Noodoe uses AI and Amazon Bedrock to optimize EV charging operations. By integrating LLMs, Noodoe enhances station diagnostics, enables dynamic pricing, and delivers multilingual support. These innovations reduce downtime, maximize efficiency, and improve sustainability. Read on to discover how AI is transforming EV charging management.

Solution overview

The Noodoe AI-enhanced diagnostics flow is built on a multi-step process that combines data collection, AI-powered analytics, and seamless translation for global accessibility, as illustrated in the following figure.

The physical charging station network currently operates over 1,000 sites across more than 20 countries, with plans to expand by more than 50 additional sites by the end of 2025. As illustrated in the following image, it uses the EV Cloud and LLMs to generate relevant recommendations following backend processing.

The following screenshot shows an example of the results in the UI.

Overview of Noodoe AI-enhanced diagnostics

The following diagram illustrates the solution data flow.

To meet the feature requirements, the system operation process includes the following steps:

Charging data is processed through the EV service before entering the database.
The charging history data and pricing data are stored in the EV database.
Amazon EventBridge Scheduler periodically triggers the EV service to perform analysis.
The EV service calls the AI service to analyze historical data and provide pricing recommendations.
The AI service collects the organized historical data to prepare the prompt template.
This information, combined with appropriate prompts, is used in conjunction with Amazon Bedrock Agents as an AI-pricing agent to extract relevant information. The AI-pricing agent analyzes this combined data to identify daily peak and off-peak periods and provide recommendations for user pricing plans.
Optionally, if translation is needed for non-English users, these results from the AI-pricing agent are further processed through another Amazon Bedrock agent for translation.
Optionally, the translation agent uses Anthropic’s Claude Sonnet 3.5 on Amazon Bedrock to get the result in the corresponding language.
Finally, the AI service collects the results in the user’s language for formatting and other processing, then inserts them into a template to create a comprehensive report that is pushed to the user’s end.

In the following section, we dive deep into these steps and the AWS services used.

Architecture of Noodoe AI-enhanced diagnostics

Noodoe faced key challenges in building a globally scalable, reliable, and cost-efficient architecture. They needed a solution that could support rapid expansion, handle high data volumes, and deliver consistent performance across AWS Regions. Addressing these requirements required careful architectural planning to provide flexibility and resilience.

The following diagram illustrates the solution architecture Noodoe built to overcome these challenges to support global growth.

The EV charging optimization platform structures the data flow across multiple AWS services, providing efficient data ingestion, processing, and AI-driven decision-making. Amazon Elastic Kubernetes Service (Amazon EKS) retrieves data from Amazon DocumentDB, processes it, and invokes Amazon Bedrock Agents for reasoning and analysis. This structured data pipeline enables optimized pricing strategies and multilingual customer interactions. By using containerized applications, event-driven workflows, and AI capabilities, the system provides scalable and flexible insights to EV station operators.

Data ingestion and processing

EV charging stations send real-time charging data to AWS IoT Core, which acts as the initial entry point for data processing. The data is then transmitted to Amazon Managed Streaming for Apache Kafka (Amazon MSK) to facilitate high-throughput, reliable streaming. From Amazon MSK, data flows into Amazon EKS, where the EV service processes it before storing the charging history and trend records in DocumentDB. This structured storage provides efficient retrieval for analysis and prediction.

AI-powered pricing analysis

To optimize pricing strategies, Amazon EventBridge triggers a pricing prediction function at regular intervals. This function retrieves historical charging data from DocumentDB and sends it, along with predefined prompts, to the Amazon Bedrock AI-pricing agent. The AI agent, powered by Anthropic’s Claude on Amazon Bedrock, evaluates station usage trends, peak and off-peak periods, and pricing inefficiencies to generate optimal pricing recommendations. Although the pricing agent doesn’t access an Amazon Bedrock knowledge base or trigger action groups, it uses preprocessing and post processing features to refine predictions and improve decision-making.

Multilingual support and report generation

If translation is required, the pricing analysis results are forwarded to the Amazon Bedrock translate agent, which converts the insights into the operator’s preferred language. The translated and structured data is then formatted into a predefined report template and stored in a designated database for later retrieval. This provides seamless access to actionable insights across diverse markets.

UI, monitoring, and performance optimization

Operators access the system through a web-based UI, with Amazon Route 53 and Amazon CloudFront providing fast and efficient content delivery. An Application Load Balancer distributes incoming requests across multiple EKS instances, providing high availability. To optimize performance, Amazon ElastiCache accelerates data retrieval while reducing database load. For system monitoring and observability, Amazon CloudWatch provides additional monitoring and observability. The administrator of Noodoe uses Amazon Managed Service for Prometheus and Amazon Managed Grafana for system monitoring and visualization.

This architecture empowers Noodoe with an AI-driven, scalable, and intelligent EV charging management solution, enhancing station utilization, revenue optimization, and customer experience worldwide.

Summary

The Noodoe AI-enhanced diagnostics flow transforms EV charging operations by integrating Amazon Bedrock Agents, blending rule-based automation, real-time user input, and LLM-powered insights for smarter decision-making. Backed by a comprehensive knowledge base and streamlined APIs, the solution empowers operators to automate workflows, optimize pricing, and boost station performance at scale. Ongoing expansion of the knowledge base, workflow refinement, and real-world testing further improve efficiency and reliability. This approach has delivered a 15% increase in revenue and reduced implementation time by 10%. Continuous feedback and clear documentation equip users to effectively use AI-driven diagnostics for more intelligent charging management.

Roman Kleinerman, Vice President of Products at Noodoe, shares: “We’ve seen revenue increases of 10–25% depending on the location and number of stations, as customers use our Al solution to optimize pricing strategies.”

Noodoe is dedicated to delivering smarter, more intelligent EV charging services that benefit both end-users and operators. Currently, Noodoe operates over 1,000 charging sites across more than 20 countries, with plans to expand by more than 50 additional sites by the end of 2025. Looking ahead, the system is being enhanced to support near real-time dynamic pricing optimization by incorporating factors such as demand, grid conditions, time of day, and weather. Amazon Bedrock Agents help enable these intelligent capabilities, powering dynamic pricing, load balancing, and grid-aware routing to optimize energy distribution and guide users to the most efficient stations. Future enhancements will offer personalized charging recommendations and incentives based on user preferences, maximizing value for both customers and operators. Start building intelligent, AI-powered EV charging solutions with Amazon Bedrock.

About the Authors

Ray Wang is a Senior Solutions Architect at AWS. With 12 years of experience in the IT industry, Ray is dedicated to building modern solutions on the cloud, especially in NoSQL, big data, machine learning, and Generative AI. As a hungry go-getter, he passed all 14 AWS certificates to make his technical field not only deep but wide. He loves to read and watch sci-fi movies in his spare time.

Howard Su is a Solutions Architect at AWS. With many years of experience in software development and system operations, Howard have served in various roles including RD, QA, and SRE, and Howard have been responsible for the architectural design of numerous large-scale systems, as well as participating in several cloud migrations. After accumulating years of development and operations experience, Howard is dedicated to promoting cloud-native service technologies and becoming an advocate for DevOps.

Tony Trinh is a Senior AIML Specialist Architect at AWS. With 13+ years of experience in the IT industry, Tony specializes in architecting scalable, compliance-driven AI and ML solutions—particularly in generative AI, MLOps, and cloud-native data platforms. As a perpetual learner, he’s doing research in Visual Language Model, Responsible AI & Computer Vision and authoring a book in ML engineering. In his spare time, Tony enjoys outdoor activity, experimenting with home improvement, and exploring Melbourne’s vibrant coffee scene.

Build GraphRAG applications using Amazon Bedrock Knowledge Bases

June 2, 2025

by Ruan Roloff Amazon AWS

In these days, it is more common to companies adopting AI-first strategy to stay competitive and more efficient. As generative AI adoption grows, the technology’s ability to solve problems is also improving (an example is the use case to generate comprehensive market report). One way to simplify the growing complexity of problems to be solved is through graphs, which excel at modeling relationships and extracting meaningful insights from interconnected data and entities.

In this post, we explore how to use Graph-based Retrieval-Augmented Generation (GraphRAG) in Amazon Bedrock Knowledge Bases to build intelligent applications. Unlike traditional vector search, which retrieves documents based on similarity scores, knowledge graphs encode relationships between entities, allowing large language models (LLMs) to retrieve information with context-aware reasoning. This means that instead of only finding the most relevant document, the system can infer connections between entities and concepts, improving response accuracy and reducing hallucinations. To inspect the graph built, Graph Explorer is a great tool.

Introduction to GraphRAG

Traditional Retrieval-Augmented Generation (RAG) approaches improve generative AI by fetching relevant documents from a knowledge source, but they often struggle with context fragmentation, when relevant information is spread across multiple documents or sources.

This is where GraphRAG comes in. GraphRAG was created to enhance knowledge retrieval and reasoning by leveraging knowledge graphs, which structure information as entities and their relationships. Unlike traditional RAG methods that rely solely on vector search or keyword matching, GraphRAG enables multi-hop reasoning (logical connections between different pieces of context), better entity linking, and contextual retrieval. This makes it particularly valuable for complex document interpretation, such as legal contracts, research papers, compliance guidelines, and technical documentation.

Amazon Bedrock Knowledge Bases GraphRAG

Amazon Bedrock Knowledge Bases is a managed service for storing, retrieving, and structuring enterprise knowledge. It seamlessly integrates with the foundation models available through Amazon Bedrock, enabling AI applications to generate more informed and trustworthy responses. Amazon Bedrock Knowledge Bases now supports GraphRAG, an advanced feature that enhances traditional RAG by integrating graph-based retrieval. This allows LLMs to understand relationships between entities, facts, and concepts, making responses more contextually relevant and explainable.

How Amazon Bedrock Knowledge Bases GraphRAG works

Graphs are generated by creating a structured representation of data as nodes (entities) and edges (relationships) between those nodes. The process typically involves identifying key entities within the data, determining how these entities relate to each other, and then modeling these relationships as connections in the graph. After the traditional RAG process, Amazon Bedrock Knowledge Bases GraphRAG performs additional steps to improve the quality of the generated response:

It identifies and retrieves related graph nodes or chunk identifiers that are linked to the initially retrieved document chunks.
The system then expands on this information by traversing the graph structure, retrieving additional details about these related chunks from the vector store.
By using this enriched context, which includes relevant entities and their key connections, GraphRAG can generate more comprehensive responses.

How graphs are constructed

Imagine extracting information from unstructured data such as PDF files. In Amazon Bedrock Knowledge Bases, graphs are constructed through a process that extends traditional PDF ingestion. The system creates three types of nodes: chunk, document, and entity. The ingestion pipeline begins by splitting documents from an Amazon Simple Storage Service (Amazon S3) folder into chunks using customizable methods (you can choose between basic fixed-size chunking to more complex LLM-based chunking mechanisms). Each chunk is then embedded, and an ExtractChunkEntity step uses an LLM to identify key entities within the chunk. This information, along with the chunk’s embedding, text, and document ID, is sent to Amazon Neptune Analytics for storage. The insertion process creates interconnected nodes and edges, linking chunks to their source documents and extracted entities using the bulk load API in Amazon Neptune. The following figure illustrates this process.

Use case

Consider a company that needs to analyze a large range of documents, and needs to correlate entities that are spread across those documents to answer some questions (for example, Which companies has Amazon invested in or acquired in recent years?). Extracting meaningful insights from this unstructured data and connecting it with other internal and external information poses a significant challenge. To address this, the company decides to build a GraphRAG application using Amazon Bedrock Knowledge Bases, usign the graph databases to represent complex relationships within the data.

One business requirement for the company is to generate a comprehensive market report that provides a detailed analysis of how internal and external information are correlated with industry trends, the company’s actions, and performance metrics. By using Amazon Bedrock Knowledge Bases, the company can create a knowledge graph that represents the intricate connections between press releases, products, companies, people, financial data, external documents and industry events. The Graph Explorer tool becomes invaluable in this process, helping data scientists and analysts to visualize those connections, export relevant subgraphs, and seamlessly integrate them with the LLMs in Amazon Bedrock. After the graph is well structured, anyone in the company can ask questions in natural language using Amazon Bedrock LLMs and generate deeper insights from a knowledge base with correlated information across multiple documents and entities.

Solution overview

In this GraphRAG application using Amazon Bedrock Knowledge Bases, we’ve designed a streamlined process to transform raw documents into a rich, interconnected graph of knowledge. Here’s how it works:

Document ingestion: Users can upload documents manually to Amazon S3 or set up automatic ingestion pipelines.
Chunk, entity extraction, and embeddings generation: In the knowledge base, documents are first split into chunks using fixed size chunking or customizable methods, then embeddings are computed for each chunk. Finally, an LLM is prompted to extract key entities from each chunk, creating a GraphDocument that includes the entity list, chunk embedding, chunked text, and document ID.
Graph construction: The embeddings, along with the extracted entities and their relationships, are used to construct a knowledge graph. The constructed graph data, including nodes (entities) and edges (relationships), is automatically inserted into Amazon Neptune.
Data exploration: With the graph database populated, users can quickly explore the data using Graph Explorer. This intuitive interface allows for visual navigation of the knowledge graph, helping users understand relationships and connections within the data.
LLM-powered application: Finally, users can leverage LLMs through Amazon Bedrock to query the graph and retrieve correlated information across documents. This enables powerful, context-aware responses that draw insights from the entire corpus of ingested documents.

The following figure illustrates this solution.

Prerequisites

The example solution in this post uses datasets from the following websites:

Also, you need to:

Create an S3 bucket to store the files on AWS. In this example, we named this bucket: blog-graphrag-s3.
Download and upload the PDF and XLS files from the websites into the S3 bucket.

Building the Graph RAG Application

Open the AWS Management Console for Amazon Bedrock.
In the navigation pane, under Knowledge Bases, choose Create.
Select Knowledge Base with vector store, and choose Create.
Enter a name for Knowledge Base name (for example: knowledge-base-graphrag-demo) and optional description.
Select Create and use a new service role.
Select Data source as Amazon S3.
Leave everything else as default and choose Next to continue.
Enter a Data source name (for example: knowledge-base-graphrag-data-source).
Select an S3 bucket by choosing Browse S3. (If you don’t have an S3 bucket in your account, create one. Make sure to upload all the necessary files.)
After the S3 bucket is created and files are uploaded, choose blog-graphrag-s3 bucket.
Leave everything else as default and choose Next.
Choose Select model and then select an embeddings model (in this example, we chose the Titan Text Embeddings V2 model).
In the Vector database section, under Vector store creation method select Quick create a new vector store, for the Vector store select Amazon Neptune Analytics (GraphRAG),and choose Next to continue.
Review all the details.
Choose Create Knowledge Base after reviewing all the details.
Creating a knowledge base on Amazon Bedrock might take several minutes to complete depending on the size of the data present in the data source. You should see the status of the knowledge base as Available after it is created successfully.

Update and sync the graph with your data

Select the Data source name (in this example, knowledge-base-graphrag-data-source ) to view the synchronization history.
Choose Sync to update the data source.

Visualize the graph using Graph Explorer

Let’s look at the graph created by the knowledge base by navigating to the Amazon Neptune console. Make sure that you’re in the same AWS Region where you created the knowledge base.

Open the Amazon Neptune console.
In the navigation pane, choose Analytics and then Graphs.
You should see the graph created by the knowledge base.

To view the graph in Graph Explorer, you need to create a notebook by going to the Notebooks section.

You can create the notebook instance manually or by using an AWS CloudFormation template. In this post, we will show you how to do it using the Amazon Neptune console (manual).

To create a notebook instance:

Choose Notebooks.
Choose Create notebook.
- Select the Analytics as the Neptune Service
- Associate the notebook with the graph you just created (in this case: bedrock-knowledge-base-imwhqu).
- Select the notebook instance type.
- Enter a name for the notebook instance in the Notebook name
- Create an AWS Identity and Access Management (IAM) role and use the Neptune default configuration.
- Select VPC, Subnet, and Security group.
Leave Internet access as default and choose Create notebook.

Notebook instance creation might take a few minutes. After the Notebook is created, you should see the status as Ready.

To see the Graph Explorer:

Go to Actions and choose Open Graph Explorer.

By default, public connectivity is disabled for the graph database. To connect to the graph, you must either have a private graph endpoint or enable public connectivity. For this post, you will enable public connectivity for this graph.

To set up a public connection to view the graph (optional):

Go back to the graph you created earlier (under Analytics, Graphs).
Select your graph by choosing the round button to the left of the Graph Identifier.
Choose Modify.
Select the check box Enable public connectivity in the Network
Choose Next.
Review changes and choose Submit.

To open the Graph Explorer:

Go back to Notebooks.
After the the Notebook Instance is created, click on in the instance name (in this case: aws-neptune-analytics-neptune-analytics-demo-notebook).
Then, choose Actions and then choose Open Graph Explore

You should now see Graph Explorer. To see the graph, add a node to the canvas, then explore and navigate into the graph.

Playground: Working with LLMs to extract insights from the knowledge base using GraphRAG

You’re ready to test the knowledge base.

Choose the knowledge base, select a model, and choose Apply.
Choose Run after adding the prompt. In the example shown in the following screenshot, we asked How is AWS Increasing energy efficiency?).

Choose Show details to see the Source chunk.
Choose Metadata associated with this chunk to view the chunk ID, data source ID, and source URI.

In the next example, we asked a more complex question: Which companies has AMAZON invested in or acquired in recent years?

Another way to improve the relevance of query responses is to use a reranker model. Using the reranker model in GraphRAG involves providing a query and a list of documents to be reordered based on relevance. The reranker calculates relevance scores for each document in relation to the query, improving the accuracy and pertinence of retrieved results for subsequent use in generating responses or prompts. In the Amazon Bedrock Playgrounds, you can see the results generated by the reranking model in two ways: the data ranked by the reranking solitary (the following figure), or a combination of the reranking model and the LLM to generate new insights.

To use the reranker model:

Check the availability of the reranker model
Go to AWS Management Console for Amazon Bedrock.
From the navigation pane, under Builder tools, choose Knowledge Bases
Choose the same knowledge base we created in the steps before knowledge-base-graphrag-demo.
Click on Test Knowledge Base.
Choose Configurations, expand the Reranking section, choose Select model, and select a reranker model (in this post, we choose Cohere Rerank 3.5).

Clean up

To clean up your resources, complete the following tasks:

Delete the Neptune notebooks: aws-neptune-graphrag.
Delete the Amazon Bedrock Knowledge Bases: knowledge-base-graphrag-demo.
Delete content from the Amazon S3 bucket blog-graphrag-s3.

Conclusion

Using Graph Explorer in combination with Amazon Neptune and Amazon Bedrock LLMs provides a solution for building sophisticated GraphRAG applications. Graph Explorer offers intuitive visualization and exploration of complex relationships within data, making it straightforward to understand and analyze company connections and investments. You can use Amazon Neptune graph database capabilities to set up efficient querying of interconnected data, allowing for rapid correlation of information across various entities and relationships.

By using this approach to analyze Amazon’s investment and acquisition history of Amazon, we can quickly identify patterns and insights that might otherwise be overlooked. For instance, when examining the questions “Which companies has Amazon invested in or acquired in recent years?” or “How is AWS increasing energy efficiency?” The GraphRAG application can cross the knowledge graph, correlating press releases, investor relations information, entities, and financial data to provide a comprehensive overview of Amazon’s strategic moves.

The integration of Amazon Bedrock LLMs further enhances the accuracy and relevance of generated results. These models can contextualize the graph data, helping you to understand the nuances in company relationships and investment trends, and be supportive in generating comprehensive market reports. This combination of graph-based knowledge and natural language processing enables more precise answers and data interpretation, going beyond basic fact retrieval to offer analysis of Amazon’s investment strategy.

In summary, the synergy between Graph Explorer, Amazon Neptune, and Amazon Bedrock LLMs creates a framework for building GraphRAG applications that can extract meaningful insights from complex datasets. This approach streamlines the process of analyzing corporate investments and create new ways to analyze unstructured data across various industries and use cases.

About the authors

Ruan Roloff is a ProServe Cloud Architect specializing in Data & AI at AWS. During his time at AWS, he was responsible for the data journey and data product strategy of customers across a range of industries, including finance, oil and gas, manufacturing, digital natives and public sector — helping these organizations achieve multi-million dollar use cases. Outside of work, Ruan likes to assemble and disassemble things, fish on the beach with friends, play SFII, and go hiking in the woods with his family.

Sai Devisetty is a Technical Account Manager at AWS. He helps customers in the Financial Services industry with their operations in AWS. Outside of work, Sai cherishes family time and enjoys exploring new destinations.

Madhur Prashant is a Generative AI Solutions Architect at Amazon Web Services. He is passionate about the intersection of human thinking and generative AI. His interests lie in generative AI, specifically building solutions that are helpful and harmless, and most of all optimal for customers. Outside of work, he loves doing yoga, hiking, spending time with his twin, and playing the guitar.

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in the financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

Streamline personalization development: How automated ML workflows accelerate Amazon Personalize implementation

June 2, 2025

by Reagan Rosario Amazon AWS

Crafting unique, customized experiences that resonate with customers is a potent strategy for boosting engagement and fostering brand loyalty. However, creating dynamic personalized content is challenging and time-consuming because of the need for real-time data processing, complex algorithms for customer segmentation, and continuous optimization to adapt to shifting behaviors and preferences—all while providing scalability and accuracy. Despite these challenges, the potential rewards make personalization a worthwhile pursuit for many businesses. Amazon Personalize is a fully managed machine learning (ML) service that uses your data to generate product and content recommendations for your users. Amazon Personalize helps accelerate time-to-value with custom models that are trained on data you provide, such as your users, catalog items, and the interactions between users and items to generate personalized content and product recommendations. You can choose from various recipes—algorithms for specific use-cases—to find the ones that fit your needs, such as recommending items that a user is mostly likely to engage with next given their past interactions or next best action that a user is most likely to take.

To maintain a personalized user experience, it’s crucial to implement machine learning operations (MLOps) practices, including continuous integration, deployment, and training of your ML models. MLOps facilitates seamless integration across various ML tools and frameworks, streamlining the development process. A robust machine learning solution for maintaining personalized experiences typically includes automated pipeline construction, as well as automated configuration, training, retraining, and deployment of personalization models. While services like Amazon Personalize offer a ready-to-use recommendation engine, establishing a comprehensive MLOps lifecycle for a personalization solution remains a complex undertaking. This process involves intricate steps to make sure that models remain accurate and relevant as user behaviors and preferences evolve over time.

This blog post presents an MLOps solution that uses AWS Cloud Development Kit (AWS CDK) and services like AWS Step Functions, Amazon EventBridge and Amazon Personalize to automate provisioning resources for data preparation, model training, deployment, and monitoring for Amazon Personalize.

Features and benefits

Deploying this solution offers improved scalability and traceability and allows you to quickly set up a production-ready environment to seamlessly deliver tailored recommendations to users using Amazon Personalize. This solution:

Streamlines the creation and management of Amazon Personalize resources.
Provides greater flexibility in resource management and selective feature activation.
Enhances readability and comprehensibility of complex workflows.
Enables event-driven architecture by publishing key Amazon Personalize events, allowing real-time monitoring, and enabling automated responses and integrations with other systems.
Includes automated creation of Amazon Personalize resources, including recommenders, solutions, and solution versions.
Facilitates end-to-end workflow automation for dataset import, model training, and deployment in Amazon Personalize.
Improves organization and modularity of complex processes through nested step functions.
Provides flexible activation of specific solution components using AWS CDK.

Solution overview

This solution uses AWS CDK layer 3 constructs. Constructs are the basic building blocks of AWS CDK applications. A construct is a component within your application that represents one or more AWS CloudFormation resources and their configuration.

The solution architecture is shown in the preceding figure and includes:

An Amazon Simple Storage Service (Amazon S3) bucket is used to store interactions, users, and items datasets. In this step, you need to configure your bucket permissions so that Amazon Personalize and AWS Glue can access the datasets and input files.
AWS Glue is used to preprocess the interactions, users, and item datasets. This step helps ensure that the datasets comply with the training data requirements of Amazon Personalize. For more information, see Preparing training data for Amazon Personalize.
EventBridge is used to schedule regular updates, by triggering the workflow and for publishing events related to resource provisioning. Because Step Functions workflow orchestrates the workflow based on the input configuration file, you use that configuration when setting up the scheduled start of Step Functions.
Step Functions workflow manages all resource provisioning of the Amazon Personalize dataset group (including datasets, schemas, event tracker, filters, solutions, campaigns, and batch inference jobs). Step Functions provides monitoring across the solution through event logs. You can also visually track the stages of your workflow in the Step Functions console. You can adjust the input configuration file to better fit your use case, by defining schemas, recipes, and inference options. The solution workflow will have the following steps:
1. A preprocessing job that runs an AWS Glue job, if provided. This step facilitates any preprocessing of the data that might be required.
2. Create a dataset group, which is a container for Amazon Personalize resources.
3. Create a dataset import job for the datasets based on the defined S3 bucket.
4. Create filters that define any filtering that you want to apply on top of the recommendations.
5. Create an event tracker for ingesting real-time events, such as user interactions, which in turn influence the recommendations provided.
6. Create solutions and recommenders for creating custom resources and domain recommenders.
7. Create a campaign; or a batch inference or segment job for generating inferences for real-time, batch, and segmentation use cases respectively.
If you have a batch inference use case, then recommendations that match your inputs will be output into the S3 bucket that you defined in the input configuration file.
An Amazon EventBridge event bus, where resource status notification updates are posted throughout the AWS Step Functions workflow.

Prerequisites

Before you deploy the AWS CDK stack, make sure that you have the following prerequisites in place:

Install and configure AWS Command Line Interface (AWS CLI).
Install Python 3.12 or newer
Install Node.js 20.16.0 or newer.
Install AWS CDK 2.88.0 or newer.
Docker 27.5.1 or newer (required for AWS Lambda function bundling).

Newer versions of AWS CLI, Python, Node.js, and the AWS CDK are generally compatible, this solution has been tested with the versions listed.

Deploy the solution

With the prerequisites in place, use the following steps to deploy the solution:

Clone the repository to a new folder on your desktop using the following command:

git clone https://github.com/aws-samples/automate-mlops-personalize-cdk-pipeline.git

Create a Python virtual environment for development:

python3 -m venv .venv
source .venv/bin/activate 
pip install -r requirements.txt

Define an Amazon Personalize MLOps pipeline instance PersonalizeMlOpsPipeline (see personalize_pipeline_stack.py for the complete example, which also includes different inference options). In this walkthrough, you create a custom solution with an associated campaign and batch inference job:

PersonalizeMlOpsPipeline(
    self, 'PersonalizePipelineSolution',
    pre_processing_config={
        "job_class": PreprocessingGlueJobFlow
    },
    enable_filters=True,
    enable_event_tracker=True,
    recommendation_config=[
        {
            "type": "solutions",
            "inference_options": ["campaigns", "batchInferenceJobs"]
        },
        {
            "type": "recommenders"
        }
    ]

)

Where:

‘PersonalizePipelineSolution‘ – The name of the pipeline solution stack
pre_processing_config – Configuration for the pre-processing job to transform raw data into a format usable by Amazon Personalize. For using AWS Glue jobs for preprocessing specify the AWS Glue job class (PreprocessingGlueJobFlow) as a value to the parameter job_class. Currently, only AWS Glue jobs are supported. You can pass the name of the AWS Glue job that you need to run as a part of the input config. This doesn’t deploy the actual AWS Glue job responsible for pre-processing the files; the actual AWS Glue must be created outside of this solution and the name passed as an input to the state machine. A sample AWS Glue job is supplied in the accompanying repo, which shows how pre-processing can be done.
enable_filters – A Boolean value to enable dataset filters for pre-processing. When set to true, the pipeline will create the state machines needed to create filters. Supported options are true or false. If you specify this value as false, the corresponding state machine is not deployed.
enable_event_tracker – A Boolean value to enable the Amazon Personalize event tracker. When set to true, the pipeline will create the state machines needed to create an event tracker. Supported options are true or false. If you specify this value as false, the corresponding state machine is not deployed.
recommendation_config – Configuration options for recommendations. The two types currently supported are solutions and recommenders. Within the solutions type, you can have multiple options such as campaigns, batchInferenceJobs, and batchSegmentJobs. Based on the selected options, the corresponding state machine and components are created. In the earlier example, we used campaigns and batchInferenceJobs as the option, which means that only the campaigns and batch inference job state machines will be deployed with the AWS CDK.

After the infrastructure is deployed you can also enable and disable certain options through the state machine input configuration file. You can use this AWS CDK code to control what components are deployed in your AWS environment and with the input config, you can select what components run.

Preprocessing: As an optional step, you can use an existing AWS Glue job for preprocessing your data before feeding it into Amazon Personalize, which uses this data to generate recommendations for your end users. While this post demonstrates the process using the Movie Lens dataset, you can adapt it for your own datasets or custom processing needs. To do so, navigate to the glue_job folder and modify the movie_script.py file accordingly, or create an entirely new AWS Glue job tailored to your specific requirements. This preprocessing step, though optional, can be crucial in making sure that your data is optimally formatted for Amazon Personalize to generate accurate recommendations.

Make sure that the AWS Glue job is configured to write its output to an S3 bucket. This bucket should then be specified as an input source in the Step Functions input configuration file.
Verify that the AWS Glue service has the necessary permissions to access the S3 bucket mentioned in your script.
In the input configuration, you’ll need to provide the name of the AWS Glue job that will be executed by the main state machine workflow. It’s crucial that this specified AWS Glue job runs without any errors, because any failures could potentially cause the entire state machine execution to fail.

Package and deploy the solution with AWS CDK, allowing for the most flexibility in development:

Before you can deploy the pipeline using AWS CDK, you need to set up AWS credentials on your local machine. You can refer Set up AWS temporary credentials for more details.

# bootstrap CDK (required once - deploys a CDK bootstrap CloudFormation stack for assets)
cdk bootstrap
# build the solution
cdk synth
# build and deploy the solution
cdk deploy

Run the pipeline

Before initiating the pipeline, create the resources that follow and document the resource names for future reference.

Set up an S3 bucket for dataset storage. If you plan to use the preprocessing step, this should be the same bucket as the output destination.
Update the S3 bucket policy to grant Amazon Personalize the necessary access permissions. See Giving Amazon Personalize access to Amazon S3 resources for policy examples.
Create an AWS Identity and Access Management (IAM) role to be used by the state machine for accessing Amazon Personalize resources.

You can find detailed instructions and policy examples in the GitHub repository.

After you’ve set up these resources, you can create the input configuration file for the Step Functions state machine. If you configure the optional AWS Glue job it will create the input files that are required as an input to the pipeline, refer Configure the Glue Job to create the output files for more details.

Create input configuration

This input file is crucial because it contains all the essential information needed to create and manage your Amazon Personalize resources, this input configuration json acts as input to the Step Functions state machine. The file can contain the following top level objects

datasetGroup
datasets
eventTracker
filters
solutions (can contain campaigns, batchInferenceJobs and batchSegmentJobs)
recommenders

Customize the configuration file according to your specific requirements and include or exclude sections based on the Amazon Personalize artifacts that you want to create. For the dataset import jobs in the datasets section, replace AWS_ACCOUNT_ID, S3_BUCKET_NAME and IAM_ROLE_ARN with the appropriate values. The following is a snippet of the input configuration file. For a complete sample, see input_media.json.

"datasetImportJob": {
    "serviceConfig": {
        "dataSource": {
            "dataLocation": "s3://S3_BUCKET_NAME/users.csv"
        },
        "roleArn": "arn:aws:iam::AWS_ACCOUNT_ID:role/IAM_ROLE_NAME",
        "tags": [
            {
                "tagKey": "string",
                "tagValue": "string"
            }
        ],
        "importMode": "FULL",
        "jobName": "UsersDatasetImportJob"
    },
    "createNewJob": true
}
}

Likewise, if you’re using batch inference or batch segment jobs, remember to also update the BUCKET_NAME and IAM ROLE ARN in those sections. It’s important to verify that you have the required input files for batch inference stored in your S3 bucket. Adjust the file paths in your configuration to accurately reflect the location of these files within your bucket structure. This helps ensure that Amazon Personalize can access the correct data when executing these batch processes.

Adjust the AWS Glue Job name in the configuration file if you have configured it as a part of the AWS CDK stack.

See the property table for a deep dive into each property and identify whether it’s optional or required.

Execute the pipeline

You can run the pipeline using the main state machine by the name PersonalizePipelineSolution from the Step Functions Console or set up a schedule in EventBridge (find the step-by step process in the Schedule the workflow for continued maintenance of the solution section of this post).

In the AWS Management Console for Step Functions, navigate to State machines and select the PersonalizePipelineSolution.

Choose Start Execution and enter the configuration file that you created for your use case based on the steps in the Create input configuration section.

Choose Start execution and monitor the State Machine execution. In the Step Functions console, you will find a visual representation of the workflow and can track at what stage the execution is. Event logs will give you insight into the progress of the stages and information if there are any errors. The following figure is an example of a completed workflow:

After the workflow finishes, you can view the resources in the Amazon Personalize console. For batch inference jobs specifically, you can locate the corresponding step under the Inference tasks section of the graph, and within the Custom Resources area of the Amazon Personalize console.

Get recommendations (real-time inference)

After your pipeline has completed its run successfully, you can obtain recommendations. In the example configuration, we chose to deploy campaigns as the inference option. As a result, you’ll have access to a campaign that can provide real-time recommendations.

We use the Amazon Personalize console to get recommendations. Choose Dataset groups and select your dataset group name. Choose Campaigns and select your campaign name. Enter a userid and items Ids of your choice to test personalized ranking, you can get the userid and item Ids from the input file in the Amazon S3 bucket you configured.

Get recommendations (batch inference)

If you have configured batch inference to run, start by verifying that the batch inference step has successfully completed in the Step Functions workflow. Then, use the Amazon S3 console to navigate to the destination S3 bucket for your batch inference job. If you don’t see an output file there, verify that you’ve provided the correct path for the input file in your input configuration.

Schedule the workflow for continued maintenance of the solution

While Amazon Personalize offers automatic training for solutions through its console or SDK, allowing users to set retraining frequencies such as every three days, this MLOps workflow provides an enhanced approach. By using EventBridge schedules you gain more precise control over the timing of retraining processes. Using this method, you can specify exact dates and times for retraining executions. To implement this advanced scheduling, you can configure an EventBridge schedule to trigger the Step Functions execution, giving you finer granularity in managing your machine learning model updates.

Navigate to the Amazon EventBridge Console and select EventBridge Schedule and then choose Create schedule.

You can establish a recurring schedule for executing your entire workflow. A key benefit of this solution is the enhanced control it offers over the specific date and time you want your workflow to run. This allows for precise timing of your processes, which you can use to align the workflow execution with your operational needs or optimal data processing windows.

Select AWS Step Functions (as shown below) as your target.

Insert the input configuration file that you prepared previously as the input and click Next.

An additional step you can take is to set up a dead-letter queue with Amazon Simple Query Service (Amazon SQS) to handle failed Step Functions executions.

Monitoring and notification

To maintain the reliability, availability, and performance of Step Functions and your solution, set up monitoring and logging. You can set up an EventBridge rule to receive notifications about events that are of interest, such as batch inference being ready in the S3 bucket. Here is how you can set that up:

Navigate to Amazon Simple Notification Service (Amazon SNS) console and create an SNS topic that will be the target for your event.

Amazon SNS supports subscription for different endpoint types such as HTTP/HTTPS, email, Lambda, SMS, and so on. For this example, use an email endpoint.

After you create the topic and the subscription, navigate to the EventBridge console and select Create Rule. Define the details associated with the event such as the name, description, and the event bus.

To set up the event rule, you’ll use the pattern form. You use this form to define the specific events that will trigger notifications. For the batch segment job completion step, you should configure the source and detail-type fields as follows:

{
"source": ["solutions.aws.personalize"],
"detail-type": ["Personalize BatchSegmentJob status change"]
}

Select the SNS topic as your target and proceed.

With this procedure, you have set up an EventBridge rule to receive notifications on your email when an object is created in your batch inference bucket. You can also set up logic based on your use case to trigger any downstream processes such as creation of email campaigns with the results of your inference by choosing different targets such as Lambda.

Additionally, you can use Step Functions and Amazon Personalize monitoring through Amazon CloudWatch metrics. See Logging and Monitoring AWS Step Functions and Monitoring Amazon Personalize for more information.

Handling schema updates

Schema updates are available in Amazon Personalize for adding columns to the existing schema. Note that deleting columns from existing schemas isn’t currently supported. To update the schema, make sure that you’re modifying the schema in the input configuration passed to Step Functions. See Replacing a dataset’s schema to add new columns for more information.

Clean up

To avoid incurring additional costs, delete the resources you created during this solution walkthrough. You can clean up the solution by deleting the CloudFormation stack you deployed as part of the setup.

Using the console

Sign in to the AWS CloudFormation console.
On the Stacks page, select this solution’s installation stack.
Choose Delete.

Using the AWS CLI

$ aws cloudformation delete-stack —stack-name <installation-stack-name>

Conclusion

This MLOps solution for Amazon Personalize offers a powerful, automated approach to creating and maintaining personalized user experiences at scale. By using AWS services like AWS CDK, Step Functions, and EventBridge, the solution streamlines the entire process from data preparation through model deployment and monitoring. The flexibility of this solution means that you can customize it to fit various use cases, and integration with EventBridge keeps models up to date. Delivering exceptional personalized experiences is critical for business growth, and this solution provides an efficient way to harness the power of Amazon Personalize to improve user engagement, customer loyalty, and business results. We encourage you to explore and adapt this solution to enhance your personalization efforts and stay ahead in the competitive digital landscape.

To learn more about the capabilities discussed in this post, check out Amazon Personalize features and the Amazon Personalize Developer Guide.

Additional resources:

About the Authors

Reagan Rosario brings over a decade of technical expertise to his role as a Sr. Specialist Solutions Architect in Generative AI at AWS. Reagan transforms enterprise systems through strategic implementation of AI-powered cloud solutions, automated workflows, and innovative architecture design. His specialty lies in guiding organizations through digital evolution—preserving core business value while implementing cutting-edge generative AI capabilities that dramatically enhance operations and create new possibilities.

Nensi Hakobjanyan is a Solutions Architect at Amazon Web Services, where she supports enterprise Retail and CPG customers in designing and implementing cloud solutions. In addition to her deep expertise in cloud architecture, Nensi brings extensive experience in Machine Learning and Artificial Intelligence, helping organizations unlock the full potential of data-driven innovation. She is passionate about helping customers through digital transformation and building scalable, future-ready solutions in the cloud.

Fast-track SOP processing using Amazon Bedrock

June 2, 2025

by Ganesh Raam Ramadurai Amazon AWS

Standard operating procedures (SOPs) are essential documents in the context of regulations and compliance. SOPs outline specific steps for various processes, making sure practices are consistent, efficient, and compliant with regulatory standards.

SOP documents typically include key sections such as the title, scope, purpose, responsibilities, procedures, documentation, citations (references), and a detailed approval and revision history. In FDA-regulated industries such as healthcare and life sciences, SOPs play a crucial role in defining manufacturing, clinical, laboratory, quality control, quality assurance, and regulatory compliance practices.

When a regulatory body like the US Food and Drug Administration (FDA) introduces changes to regulations, organizations are required to evaluate the changes against their internal SOPs. When necessary, they must update their SOPs to align with the regulation changes and maintain compliance.

In this post, we show different approaches using Amazon Bedrock to identify relationships between regulation changes and SOPs.

Challenge

In the healthcare and life sciences industry, regulatory authorities like the FDA and the European Medicines Agency (EMA) frequently update regulations across various areas, such as clinical trials, medical devices, drug development and approvals, quality risk management, systems and data management, and technology adoption. These regulatory updates often require organizations to correspondingly update their internal SOPs to align with the changes. This process is typically manual, requiring a team of subject matter experts to review the regulatory changes, screen the SOPs to identify relevance, determine the impact, and specify what needs to be updated. This manual approach adds significant overhead for companies and can result in review cycles lasting several days to months.

To address this challenge, we explore approaches that can help automate the identification of relationships between regulatory changes and SOPs. These approaches can also be extended to assess the impact of regulatory changes on an organization’s internal processes and documentation. By using automation, companies can streamline the SOP update process, reducing the time and resources required to maintain alignment with evolving regulatory requirements.

Sample Data

For this post, we used SOPs published by the FDA’s Center for Biologics Evaluation and Research. These publicly available SOPs are used by the FDA staff to guide their duties.

Specifically, we focused on the following SOPs related to biologics procedures. This narrow scope allowed us to dive deeper into a specific regulatory domain within the larger healthcare and life sciences industry.

In addition to the SOPs, we also used three of the FDA’s Biologics Guidance Documents to test the relationship between the regulatory documents and the SOPs.

These guidance documents describe the FDA’s policy interpretations on regulatory issues related to the biologics domain. They cover a wide range of topics, including processing, content, evaluation, approval, inspection, and enforcement of policies. The guidance documents also discuss specific products or issues relating to the design, production, labeling, promotion, manufacturing, and testing of regulated products.

We used the following specific FDA Biologics Guidance Documents for this analysis:

Approaches

A key step in assessing the impact of regulatory changes is to identify if a regulatory guidance is related to an organization’s SOPs. We used Amazon Bedrock along with Amazon Simple Storage Service (Amazon S3) to store the input dataset.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Our experiments used Anthropic’s Claude 3 Opus large language model (LLM) on Amazon Bedrock. However, you can use the broad selection of models available on Amazon Bedrock to experiment with alternative models and choose the one that best suits your specific requirements. Amazon Bedrock frequently releases updated versions of existing AI models that can be accessed and used by simply applying a configuration change, making it a highly flexible choice for deploying the latest AI capabilities.

We focused on the following approaches:

Full document match – Comparing the full text of the regulatory guidance and SOP documents
Text similarity – This approach consists of two options:
- Vector embeddings – Measuring the semantic similarity between the guidance and SOP texts
- Keyword Search – Identifying relevant keywords and their occurrences in the documents
Taxonomy topic match – Mapping the guidance and SOP content to a taxonomic structure to identify topical relationships

This post details the approaches we explored and the learnings from our experiments.

Full document match

The following diagram illustrates the full document match architecture.

In this approach, we compared each regulatory change to every SOP by passing the full contents of the SOP and the regulatory change to the model. The goal was to identify relationship between the regulatory change and the SOP.

The following is a sample prompt to check if an SOP is related to a regulation change:

Human:

Act as an expert who understands the Food and Drug Administration's Center for Biologics Evaluation and Research (CBER) internal Biologics procedures, which help their staff in performing their duties in assuring the safety, purity, potency, and effectiveness of biologics and related products (such as vaccines, live biotherapeutics (probiotics), blood products, and cell, tissue, and gene therapies).

Given a regulatory guidance change, and a document describing a CBER Biologics procedure of evaluation, research or review, tell me if the regulatory change is related the procedure.

<regulatory_change>
{regchange}
</regulatory_change>

<biologics_procedure>
{procedure}
</biologics_procedure>

To answer this question, first, determine what the subject of the regulatory change is. For example, the subject might be something like "regulatory considerations for human cells and tissues" or "processing of clinical holds for new drug applications".

In order for the regulatory change to be related to the biologics procedure, the procedure description must include information about the topic. In other words, you should have a specific section from the procedure that is clearly related to the regulatory change in order to say that it is related.

For example, if a procedure description mentions cells or tissues, it could be related to how they need to be handled or manipulated. Similarly if a procedure mentions dispute resolution, it could be related to how a biologics procedure handles dispute resolution. Tell me whether the regulatory change is related to that the biologics procedure. Don't guess, and explain your thinking. If you can't find specific elements in the procedure that are clearly related to the regulations, then the answer is that the procedure is not related by the regulatory change.

Your answer should end with the word "RELATED" or "UNRELATED", reflecting your overall evaluation of whether the procedure description is related to the regulation change or not.

When we ran the full document matching approach using Amazon Bedrock across all the SOPs and the regulatory guidance documents in the dataset, the results showed accurate identification of related SOPs. For example, SOPP 9151 was correctly identified as the only SOP related to the Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry regulation change, with others being identified as unrelated:

Assistant:

SOPP9151-061306.txt:

The regulatory change is about the regulation of human cells, tissues, and cellular and tissue-based products (HCT/Ps). It provides guidance on the criteria for determining how HCT/Ps will be regulated, registration and listing requirements, donor eligibility requirements, current good tissue practice requirements, and FDA inspection and enforcement of HCT/P establishments.

The biologics procedure is specifically about handling requests for exemptions and alternative procedures for HCT/Ps under 21 CFR 1271.155. The procedure outlines the responsibilities and steps for CBER staff to process and respond to these exemption requests.

There is a clear relationship between the regulatory change and the biologics procedure. The regulatory change provides the overall framework and requirements for regulating HCT/Ps, while the biologics procedure focuses on one specific aspect within that framework - handling exemption requests from certain HCT/P requirements.

The procedure directly references and provides instructions for implementing 21 CFR 1271.155, which is part of the HCT/P regulations covered in the regulatory change document. So the biologics procedure is a detailed process that enables the agency to grant exemptions and alternatives to the HCT/P requirements established in the regulations.

In summary, the regulatory change establishes the rules for HCT/Ps, and the biologics procedure describes how the agency will process requests for exceptions to those rules. Therefore, the two documents are closely related.

RELATED

Similarly, SOPP 8005 was correctly identified as the only SOP related to the Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulation change, and the other SOPs were identified as unrelated.

Finally, SOP 8201 was also correctly identified as the only SOP related to the Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry regulation change.

These results demonstrate the effectiveness of the full document matching approach in accurately linking the relevant SOPs to their corresponding regulatory guidance documents.

Text similarity

The following diagram illustrates the text similarity match workflow.

In our second approach, we indexed the SOPs using either vector embeddings for semantic similarity or a keyword-based similarity approach. This allowed us to submit the contents of a regulatory change as a query and return the most similar SOP documents.

The steps involved in this text similarity approach are:

Index the SOPs:
- For a vector embeddings approach, we generated vector representations of the SOP contents using an LLM to capture semantic similarities.
- For a keyword-based approach, we identified the most relevant keywords in each SOP and built an index based on their occurrences.
Query the index:
- For a given regulatory change, we submitted the text as a query to the SOP index.
- The index then returned the most similar SOPs based on the chosen similarity metric (semantic or keyword-based).

Vector Search

For the text similarity approach, we used the open source in-memory database ChromaDB to generate the vector embeddings and perform the search.

We created a collection within ChromaDB containing all the SOP documents. We then independently queried each regulation guidance document text against this SOP collection. We used the default L2 distance algorithm, where a lower distance score indicates a closer match between the query and the indexed SOP documents.

Although the vector embedding-based text similarity approach identified the top matching SOP document in some cases, it also produced some inaccurate results.

For example, when querying with the Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry regulation, SOPP 9151 was correctly identified as the top match. However, a few other unrelated SOP documents also had low distance scores, which could potentially lead to them being misidentified as relevant:

[
    [
        "SOPP9151-061306.txt",
        0.640560507774353
    ],
    [
        "SOPP-8717--Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt",
        0.8971723914146423
    ],
    [
        "SOPP-8403-Issuance-Reissuance-and-Voluntary-Revocation-of-Biological-Product-Licenses-V6.txt",
        0.9069873690605164
    ],
    [
        "SOPP-8117-Issuing-Tracking-Numbers-eCTD-Format-V9.txt",
        1.069345235824585
    ],
    [
        "SOPP-8507V4-IllegitimateProds-Final.txt",
        1.1143898963928223
    ],
    [
        "SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt",
        1.157564640045166
    ],
    [
        "SOPP-8005-Formal-Dispute-Resolution-Process-V6.txt",
        1.2106068134307861
    ],
    [
        "SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt",
        1.3174282312393188
    ]
]

Similarly, when querying with the Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulation, the vector search incorrectly identified SOPP 8717 as the best match, whereas SOPP 8005, which is more directly related to formal dispute resolution, had a higher distance score:

[
    [
        "SOPP-8717--Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt",
        0.848071277141571
    ],
       …
       …
]

Finally, for the regulation Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry, the vector search again identified SOPP 8717 as the top match, rather than the more relevant SOP 8201:

[
    [
        "SOPP-8717--Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt",
        0.8028254508972168
    ],
    …
    …
]

Keyword search

We also explored a keyword-based similarity method using the Whoosh Python search library. We first created an index of all the SOP documents using the Whoosh library. Then, for each regulation guidance document, we separately searched the index using a Whoosh query parser.

The Whoosh library returns a search score for each matched SOP document, where a higher score indicates a better match to the query.

When searching for the Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry regulation, the top matching document was incorrectly identified as SOPP 8301 instead of the more relevant SOPP 9151:

8 results found
<Hit {'fname': 'SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt'}> with score of 744.420492560645
<Hit {'fname': 'SOPP-8507V4-IllegitimateProds-Final.txt'}> with score of 609.3431135327576
<Hit {'fname': 'SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt'}> with score of 588.5899332740212
<Hit {'fname': 'SOPP-8403-Issuance-Reissuance-and-Voluntary-Revocation-of-Biological-Product-Licenses-V6.txt'}> with score of 582.278310231856
<Hit {'fname': 'SOPP-8005-Formal-Dispute-Resolution-Process-V6.txt'}> with score of 449.11608593886564
<Hit {'fname': 'SOPP-8117-Issuing-Tracking-Numbers-eCTD-Format-V9.txt'}> with score of 377.9723456561558
<Hit {'fname': 'SOPP9151-061306.txt'}> with score of 328.67972016789844
<Hit {'fname': 'SOPP-8717--Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt'}> with score of 160.6985781375766

Similarly, when searching for the Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulation using the keyword-based similarity approach, the Whoosh library again incorrectly identified SOPP 8301 as the top matching document, instead of the more relevant SOPP 8005:

8 results found
<Hit {'fname': 'SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt'}> with score of 721.9535225922815
…
…

In contrast with the previous examples, when searching for the Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry regulation, SOP 8201 was correctly identified as the top matching document:

8 results found
<Hit {'fname': 'SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt'}> with score of 503.3284407067554
…
…

These results indicate that although the keyword-based similarity approach can be effective in some cases, it might also struggle to accurately identify the most relevant SOPs, similar to the challenges faced with the vector embedding-based method.

Taxonomy-based topic matching

The following diagram illustrates the taxonomy-based topic matching architecture.

In our third approach, we focused on building a hierarchical taxonomy of topics associated with the subject of biologics procedures. This taxonomy-based approach aimed to capture the topical relationships between the regulatory guidance documents and the SOPs. This approach has the potential to provide a cost-effective solution, because the entire SOP or input document doesn’t need to be passed with the prompt for every query to the LLM. Performing a full text match between an SOP and a regulatory change might not be a cost-effective approach, especially as the input documents grow in size.

The key steps in this approach were:

Step 1: We constructed a multi-level taxonomy that organized the topics related to biologics procedures. The taxonomy included a hierarchy of main topics, each of which could include subtopics.

The following is an example prompt for building the taxonomy:

Human:

Act as an expert in Biologics procedures for the Food & Drug Administration's Center for Biologics Evaluation and Research (CBER), which help their staff in performing their duties in assuring the safety, purity, potency, and effectiveness of biologics and related products (such as vaccines, live biotherapeutics (probiotics), blood products, and cell, tissue, and gene therapies). Biologics procedures help CBER staff in regulating administration and management of biologics evaluations and reviews including but not limited to clinical studies, electronic submissions, dispute resolutions and management of biologics and related products

I want you to create a hierarchy or taxonomy of topics relating to the biologics procedures. For example, some of the topics may be related to

    - Administrative Guidances

    - Adverse Events and Product Deviation Guidances

    - Application Submission Guidances

    - Biosimilars Guidances

    - Clinical Guidances

    - CMC & GMP Guidances

    - Devices Guidances

    - Generics Guidances

    - Labeling and Promotion Guidances

    - Allergenics Guidances

    - Blood Guidances

    - Cellular and Gene Therapy Guidances

    - Tissue Guidances

    - Vaccine and Related Biological Product Guidances

    - Xenotransplantation Guidances

    - Communication

    - Marketing Applications

    - Compliance

    - Medical Devices

    - Master Files

    - Investigational New Drugs

    - Communication

    - General Information

    - Therapeutic Products

Here's an example of how a topic might be organized:

<topic>
  <name>Master Files</name>
  <subtopic>
      <name>Drug Substance</name>
  </subtopic>

  <subtopic>
      <name>Packaging Material</name>
  </subtopic>

  <subtopic>
      <name>Preparation Information</name>
  </subtopic>

  <subtopic>
      <name>Clinical Study Data</name>
  </subtopic>

  <subtopic>
      <name>Non-Clinical Study Data</name>
  </subtopic>
</topic>

Your hierarchy should include all the topics covering the CBER's biologics procedures, and should be sufficiently exhaustive that it covers many different types of activities and duties in different areas of biologics administration, evaluation and review.

Think step by step, using these steps:

Think about all of the important topics that enable CBER's staff to perform their duties in administering, evaluating, reviewing and managing of biologics and related products.
Organize those topics into a hierarchy or taxonomy. There should be a single top-level item called <biologics_topics>.
Output that hierarchy in an XML format. Make sure that each item in the XML is in plain English and doesn't use special characters like the underscore.

The following is a sample taxonomy hierarchy returned by Amazon Bedrock when using Anthropic’s Claude 3 Opus LLM:

Assistant:

<biologics_topics>

  <topic>
    <name>Administrative Guidances</name>
    <subtopic>
      <name>Forms</name>
    </subtopic>
    <subtopic> 
      <name>Meetings</name>
    </subtopic>
    <subtopic>
      <name>Dispute Resolution</name> 
    </subtopic>
  </topic>

  <topic>
    <name>Submission Guidances</name>
    <subtopic>
      <name>Investigational New Drug Applications</name>
    </subtopic>
    <subtopic>
      <name>Biologics License Applications</name>
    </subtopic>
    <subtopic> 
      <name>Supplements and Other Changes to an Approved Application</name>
    </subtopic>
    <subtopic>
      <name>Biosimilars</name>
    </subtopic>
  </topic>

  <topic> 
    <name>Clinical Guidances</name>
    <subtopic>
      <name>Clinical Trial Design</name>
    </subtopic>
    <subtopic>
      <name>Human Subject Protection</name> 
    </subtopic>
    <subtopic>
      <name>Pediatric Studies</name>
    </subtopic>
  </topic>

  <topic>
    <name>Safety Guidances</name> 
    <subtopic>
      <name>Adverse Event Reporting</name>
    </subtopic>
    <subtopic>
      <name>Safety Assessments</name>
    </subtopic>
    <subtopic>
      <name>Pharmacovigilance</name>
    </subtopic>
  </topic>

  <topic>
    <name>Product Quality Guidances</name>
    <subtopic> 
      <name>Chemistry Manufacturing and Controls</name>
    </subtopic>
    <subtopic>
      <name>Good Manufacturing Practices</name>
    </subtopic>
    <subtopic>
      <name>Analytical Methods</name>
    </subtopic>
  </topic>

  <topic>
    <name>Labeling Guidances</name>
    <subtopic>
      <name>Prescribing Information</name> 
    </subtopic>
    <subtopic>
      <name>Medication Guides</name>
    </subtopic>
    <subtopic>
      <name>Promotional Labeling and Advertising</name>
    </subtopic>
  </topic>

  <topic>
    <name>Product Specific Guidances</name>
    <subtopic>
      <name>Vaccines</name>
    </subtopic> 
    <subtopic>
      <name>Blood and Blood Components</name>
    </subtopic>
    <subtopic>
      <name>Cellular and Gene Therapies</name> 
    </subtopic>
    <subtopic>
      <name>Allergenics</name>
    </subtopic>
    <subtopic> 
      <name>Tissues</name>
    </subtopic>
    <subtopic>
      <name>Xenotransplantation Products</name>
    </subtopic>
  </topic>

  <topic>
    <name>Compliance and Enforcement</name>
    <subtopic>
      <name>Inspections</name>
    </subtopic>
    <subtopic>
      <name>Recalls</name> 
    </subtopic>
    <subtopic>
      <name>Warning Letters</name>
    </subtopic>
  </topic>

</biologics_topics>

This hierarchy covers the key areas involved in CBER’s regulation of biologics, from administrative processes, to investigational products, marketing applications, clinical development, safety monitoring, quality control, labeling, and compliance for the major categories of biological products. The structure aims to capture the breadth of guidances and procedures that CBER staff rely on to ensure the safety, purity, potency and effectiveness of biologics and related products.

Step 2: Using the taxonomy we built, we labeled each SOP with the topics that are covered in the SOP. We instructed the model to select the most specific ones from the list of topics and subtopics.

The following is an example prompt to label SOPs with taxonomy topics using Anthropic’s Claude 3 Opus model:

Human:

Act as an expert who understands Biologics procedures.

The following information describes the details of a Biologics procedure. Choose the topics covered in the description only from the following list of possible topics:

{possible_topics}

<procedure_description>

{procedure}

</procedure_description>

Explain your thinking, and end the explanation by generating a JSON string array containing all of the matching topic names.

All topic names must come from the list of possible topics. Do not return a topic unless it is found within the list of valid topics, subtopics, and sub-subtopics. 

If you generate a topic name that is not found in the list of possible topics, it is an error.

Make sure to select the most specific leaf level sub-subtopic or subtopic available from the list of possible topics, subtopics, and sub-subtopics. Be sure your list covers all of the main points of the Biologics procedure description.

The following is the example SOP topic labeling output from Amazon Bedrock:

Assistant:

SOPP-8005-Formal-Dispute-Resolution-Process-V6.txt: ['Administrative Guidances', 'Dispute Resolution']

SOPP-8117-Issuing-Tracking-Numbers-eCTD-Format-V9.txt: ['Administrative Guidances', 'Submission Guidances', 'Investigational New Drug Applications', 'Biologics License Applications']

SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt: ['Investigational New Drug Applications', 'Meetings', 'Clinical Trial Design', 'Adverse Event Reporting']

SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt: ['Administrative Guidances', 'Submission Guidances', 'Compliance and Enforcement']

SOPP-8403-Issuance-Reissuance-and-Voluntary-Revocation-of-Biological-Product-Licenses-V6.txt: ['Submission Guidances', 'Compliance and Enforcement', 'Labeling Guidances']

SOPP-8507V4-IllegitimateProds-Final.txt: ['Compliance and Enforcement', 'Inspections', 'Recalls']

SOPP-8717--Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt: ['Product Quality Guidances', 'Submission Guidances', 'Administrative Guidances']

SOPP9151-061306.txt: ['Cellular and Gene Therapies', 'Inspections', 'Tissues']

Step 3: To find the relationships between the regulatory guidance documents and the SOPs, we followed a similar approach to label the regulatory changes with the most specific topics or subtopics from the built taxonomy.

The following is an example prompt to label regulatory guidance documents with taxonomy topics:

Human: 
Act as an expert who understands Biologics procedures. The following information describes a regulatory guidance or change that affects how certain Biologics procedures. Please choose the main topic covered in the change description from the following list of possible topics:
{topics_list} 
<regulatory_guidance_description> 
{regulatory_guidance} </regulatory_guidance_description> 
Explain your thinking, and end the explanation by generating an XML item called <topic> with the relevant topic string in it. Make sure the topic is the most specific one available from the list of possible topics, subtopics, and descendants of those. 
You must choose a single leaf-level XML node from the list of possible topics. The correct answer may be in a <topic> node, or <subtopic> or <subsubtopic>. Choose only one, and return the <name> of the selected node inside of a XML item called <topic>. 
Assistant:

Step 4: If the topic assigned to a regulatory change document matches one of the topics of an SOP, we can conclude the regulatory change is related to the SOP.

The following are example outputs of regulatory guidance topic alignment using Amazon Bedrock with the Anthropic Claude 3 Opus model.

The first example uses regulatory guidance Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry. The aligned taxonomy topic is Cellular and Gene Therapies. This topic aligns with SOPP 9151.

For reg change Regs/Regulation_Of_HCT_Ps_Small_Entity_Compliance_Guide.txt:

The regulatory guidance is focused on human cells, tissues, and cellular and tissue-based products (HCT/Ps). It provides an overview of the regulations in 21 CFR Part 1271 that apply to HCT/Ps.

The key topics covered include:

Criteria for regulation solely under section 361 of the PHS Act vs as drugs, devices and/or biological products
Registration and listing requirements for HCT/P establishments
Donor eligibility requirements
Current Good Tissue Practice (CGTP) requirements
FDA inspection and enforcement related to HCT/P establishments

Looking at the list of possible topics, the most relevant high-level topic is:

<topic>

  <name>Product Specific Guidances</name>

Within that, the most applicable subtopic is:

<subtopic>

  <name>Cellular and Gene Therapies</name>

Therefore, the most specific and relevant topic for this regulatory guidance is:

<topic>Cellular and Gene Therapies</topic>

The Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulatory guidance was aligned with the Dispute Resolution topic, which was also identified as a key topic covered in SOPP 8005.

For reg change Regs/Formal_Dispute_Resolution.txt:

The regulatory guidance description is about formal dispute resolution procedures for sponsors to appeal scientific and/or medical disputes to management above the division level at CDER and CBER. This falls under the broad topic of "Administrative Guidances", and the specific subtopic of "Dispute Resolution".

Therefore, the most specific and appropriate topic for this guidance is:

<topic>Dispute Resolution</topic>

The Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry regulatory guidance was mapped to the Investigational New Drug Applications topic, which was found to be aligned with the content covered in SOP 8201, indicating a strong relationship between the guidance and this particular SOP:

For reg change Regs/Submitting_And_Reviewing_Complete_Responses_To_Clinical_Holds.txt:

The regulatory guidance is about the process for submitting and reviewing responses to clinical holds on INDs. The key points are:

When FDA imposes a clinical hold on an IND, the study cannot proceed until the sponsor submits a complete response addressing all the clinical hold issues, and FDA notifies the sponsor they can proceed.
The guidance describes what the sponsor should include in the complete response, how to submit it, and how FDA will review and respond to it within 30 days.
It also covers procedural details like how FDA will track and measure the 30-day response timeline for PDUFA goals.

Looking at the list of possible topics, this guidance falls under:

<topic>

  <name>Submission Guidances</name>

  <subtopic>

    <name>Investigational New Drug Applications</name>

  </subtopic>

</topic>

Since it is specifically about the process for responding to clinical holds on INDs, the most relevant leaf-level topic is:

<topic>Investigational New Drug Applications</topic>

The taxonomic alignment approach was effective in accurately identifying the relationships between the regulatory changes and the SOPs in the test dataset.

Learnings

The following table summarizes our observations. SOPs formatted as red bold italic in the table are misidentified.

Regulation	Full text match	Taxonomy topic match	Text similarity
Regulation	Full text match	Taxonomy topic match	Vector embeddings	Keyword Search
Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry	SOPP 9151	SOPP 9151	SOPP 9151	*SOPP 8301*
Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry	SOPP 8005	SOPP 8005	*SOPP 8717*	*SOPP 8201*
Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry	SOP 8201	SOP 8201	*SOPP 8717*	SOP 8201

The combination of the full text matching and taxonomy-based topic matching approaches, using Amazon Bedrock and the Anthropic Claude 3 Opus model, enabled accurate identification of the SOPs most closely related to the regulation guidance documents in the dataset. In contrast, the text similarity methods using vector embeddings and keyword search were less successful in correctly matching the SOPs to the relevant regulatory guidance documents.

Conclusion

In this post, we explored various approaches to quickly identify the relationships between regulatory changes and an organization’s SOPs, using Amazon Bedrock and Anthropic’s Claude 3 Opus model. The methods we evaluated included full text matching, text similarity using vector embeddings and keyword search, and a taxonomy-based topic alignment approach.

Our findings indicate that the full text matching and taxonomy-based topic matching were the most effective in accurately identifying the SOPs most closely related to the regulation guidance documents in the test dataset. In contrast, the text similarity techniques using vector embeddings and keyword search were less reliable in consistently matching the SOPs to the relevant regulatory documents.

Both the full text matching and taxonomy-based approaches can be viable options for organizations to assess the relationships between regulatory changes and their internal SOPs. The full text matching might provide more accurate results but requires providing the complete text of the SOP or input document, which could have cost implications.

The taxonomy-based approach, on the other hand, offers a structured way to map the content of the documents to a customizable topic hierarchy. Although the initial taxonomy might not be complete or fully accurate, it can be further enriched and tailored to an organization’s specific needs. If you choose a taxonomy-based approach, you can use a machine-generated starting point and then refine it to better suit your domain and use case requirements.

By adopting the taxonomy-based approach and adapting it to their specific needs, organizations can not only identify the relationships between regulatory changes and SOPs, they can also assess the potential impact of regulatory changes on their internal procedures. This can help streamline the process of screening SOPs against new regulations and fast-track the impact assessment in regulated industries like life sciences and others.

If you want to implement a similar solution in your AWS environment, reach out to your AWS account team for assistance.

I would like to acknowledge Greg Sommerville, Thomaz Silva and Murtuza Bootwala for their contributions to this blog. It couldn’t have been done without them.

About the Author

Ganesh Raam Ramadurai is a Senior Technical Program Manager at Amazon Web Services (AWS), where he leads the PACE (Prototyping and Cloud Engineering) team. He specializes in delivering innovative, AI/ML and Generative AI-driven prototypes that help AWS customers explore emerging technologies and unlock real-world business value. With a strong focus on experimentation, scalability, and impact, Ganesh works at the intersection of strategy and engineering—accelerating customer innovation and enabling transformative outcomes across industries.

Deploy Amazon SageMaker Projects with Terraform Cloud

May 30, 2025

by Max Copeland Amazon AWS

Amazon SageMaker Projects empower data scientists to self-serve Amazon Web Services (AWS) tooling and infrastructure to organize all entities of the machine learning (ML) lifecycle, and further enable organizations to standardize and constrain the resources available to their data science teams in pre-packaged templates.

For AWS customers using Terraform to define and manage their infrastructure-as-code (IaC), the current best practice for enabling Amazon SageMaker Projects carries a dependency on AWS CloudFormation to facilitate integration between AWS Service Catalog and Terraform. This blocks enterprise customers whose IT governance prohibit use of vendor-specific IaC such as CloudFormation from using Terraform Cloud.

This post outlines how you can enable SageMaker Projects with Terraform Cloud, removing the CloudFormation dependency.

AWS Service Catalog engine for Terraform Cloud

SageMaker Projects are directly mapped to AWS Service Catalog products. To obviate the use of CloudFormation, these products must be designated as Terraform products that use the AWS Service Catalog Engine (SCE) for Terraform Cloud. This module, actively maintained by Hashicorp, contains AWS-native infrastructure for integrating Service Catalog with Terraform Cloud so that your Service Catalog products are deployed using the Terraform Cloud platform.

By following the steps in this post, you can use the Service Catalog engine to deploy SageMaker Projects directly from Terraform Cloud.

Prerequisites

To successfully deploy the example, you must have the following:

An AWS account with the necessary permissions to create and manage SageMaker Projects and Service Catalog products. See the Service Catalog documentation for more information on Service Catalog permissions.
An existing Amazon SageMaker Studio domain with an associated Amazon SageMaker user profile. The SageMaker Studio domain must have SageMaker Projects enabled. See Use quick setup for Amazon SageMaker AI.
A Unix terminal with the AWS Command Line Interface (AWS CLI) and Terraform installed. See the Installing or updating to the latest version of the AWS CLIand the Install Terraform for more information about installation.
An existing Terraform Cloud account with the necessary permissions to create and manage workspaces. See the following tutorials to quickly create your own account:
1. HCP Terraform – intro and sign Up
2. Log In to HCP Terraform from the CLI

See Terraform teams and organizations documentation for more information about Terraform Cloud permissions.

Deployment steps

Clone the sagemaker-custom-project-templates repository from the AWS Samples GitHub to your local machine, update the submodules, and navigate to the mlops-terraform-cloud directory.

$ git clone https://github.com/aws-samples/sagemaker-custom-project-templates.git
$ cd sagemaker-custom-project_templates
$ git submodule update --init --recursive
$ cd mlops-terraform-cloud

The preceding code base above creates a Service Catalog portfolio, adds the SageMaker Project template as a Service Catalog product to the portfolio, allows the SageMaker Studio role to access the Service Catalog product, and adds the necessary tags to make the product visible in SageMaker Studio. See Create Custom Project Templates in the SageMaker Projects Documentation for more information about this process.

This prompts your browser to sign into your HCP account and generates a security token. Copy this security token and paste it back into your terminal.

Navigate to your AWS account and retrieve the SageMaker user role Amazon Resource Name (ARN) for the SageMaker user profile associated with your SageMaker Studio domain. This role is used to grant SageMaker Studio users permissions to create and manage SageMaker Projects.
- In the AWS Management Console for Amazon SageMaker, choose Domains from the navigation pane
- Select your studio domain
- Under User Profiles, select your user profile
- In the User Details, copy the ARN
Create a tfvars file with the necessary variables for the Terraform Cloud workspace
```
$ cp terraform.tfvars.example terraform.tfvars
```

Set the appropriate values in the newly created tfvars file. The following variables are required:

tfc_organization = "my-tfc-organization"
tfc_team = "aws-service-catalog"
token_rotation_interval_in_days = 30
sagemaker_user_role_arns = ["arn:aws:iam::XXXXXXXXXXX:role/service-role/AmazonSageMaker-ExecutionRole"]

Make sure that your desired Terraform Cloud (TFC) organization has the proper entitlements and that your tfc_team is unique for this deployment. See the Terraform Organizations Overview for more information on creating organizations.

Initialize the Terraform Cloud workspace
```
$ terraform init
```
Apply the Terraform Cloud workspace
```
$ terraform apply
```
Go back to the SageMaker console using the user profile associated with the SageMaker user role ARN that you copied previously and choose Open Studio application
In the navigation pane, choose Deployments and then choose Projects
Choose Create project, select the mlops-tf-cloud-example product and then choose Next
In Project details, enter a unique name for the template and (option) enter a project description. Choose Create
In a separate tab or window, go back to your Terraform Cloud account’s Workspaces and you’ll see a workspace being provisioned directly from your SageMaker Project deployment. The naming convention of the Workspace will be <ACCOUNT_ID>-<SAGEMAKER_PROJECT_ID>

Further customization

This example can be modified to include custom Terraform in your SageMaker Project template. To do so, define your Terraform in the mlops-product/product directory. When ready to deploy, be sure to archive and compress this Terraform using the following command:

$ cd mlops-product
$ tar -czf product.tar.gz product

Cleanup

To remove the resources deployed by this example, run the following from the project directory:

$ terraform destroy

Conclusion

In this post you defined, deployed, and provisioned a SageMaker Project custom template purely in Terraform. With no dependencies on other IaC tools, you can now enable SageMaker Projects strictly within your Terraform Enterprise infrastructure.

About the author

Max Copeland is a Machine Learning Engineer for AWS, leading customer engagements spanning ML-Ops, data science, data engineering, and generative AI.

How ZURU improved the accuracy of floor plan generation by 109% using Amazon Bedrock and Amazon SageMaker

May 30, 2025

by Federico Di Mattia, Simone Bartoli, Marco Venturelli, Stefano Pellegrini, Enrico Petrucci Amazon AWS

ZURU Tech is on a mission to change the way we build, from town houses and hospitals to office towers, schools, apartment blocks, and more. Dreamcatcher is a user-friendly platform developed by ZURU that allows users with any level of experience to collaborate in the building design and construction process. With the simple click of a button, an entire building can be ordered, manufactured and delivered to the construction site for assembly.

ZURU collaborated with AWS Generative AI Innovation Center and AWS Professional Services to implement a more accurate text-to-floor plan generator using generative AI. With it, users can specify a description of the building they want to design using natural language. For example, instead of designing the foundation, walls, and key aspects of a building from scratch, a user could enter, “Create a house with three bedrooms, two bathrooms, and an outdoor space for entertainment.” The solution would generate a unique floor plan within the 3D design space, allowing users with a non-technical understanding of architecture and construction to create a well-designed house

In this post, we show you why a solution using a large language model (LLM) was chosen. We explore how model selection, prompt engineering, and fine-tuning can be used to improve results. And we explain how the team made sure they could iterate quickly through an evaluation framework using key services such as Amazon Bedrock and Amazon SageMaker.

Understanding the challenge

The foundation for generating a house within Dreamcatcher’s 3D building system is to first confirm we can generate a 2D floor plan based on the user’s prompt. The ZURU team found that generating 2D floor plans, such as the one in the following image, using different machine learning (ML) techniques requires success across two key criteria.

First, the model must understand rooms, the purpose of each room, and their orientation to one another within a two-dimensional vector system. This can also be described as how well the model can adhere to the features described from a user’s prompt. Second, there is also a mathematical component to making sure rooms adhere to criteria such as specific dimensions and floor space. To be certain that they were on the right track and to allow for fast R&D iteration cycles, the ZURU team created a novel evaluation framework that would measure the output of different models based on showing the level of accuracy across these two key metrics.

The ZURU team initially looked at using generative adversarial networks (GAN) for floor plan generation, but experimentation with a GPT2 LLM had positive results based on the test framework. This reinforced the idea that an LLM-based approach could provide the required accuracy for a text-to–floor plan generator.

Improving the results

To improve on the results of the GPT2 model, we worked together and defined two further experiments. The first was a prompt engineering approach. Using Anthropic’s Claude 3.5 Sonnet in Amazon Bedrock the team was able to evaluate the impact of a leading proprietary model with contextual examples included in the prompts. The second approach focused on using fine-tuning with Llama 3B variants to evaluate the improvement of accuracy when the model weights are directly influenced using high-quality examples.

Dataset preparation and analysis

To create the initial dataset, floor plans from thousands of houses were gathered from publicly available sources and reviewed by a team of in-house architects. To streamline the review process, the ZURU team built a custom application with a simple yes/no decision mechanism similar to those found in popular social matching applications, allowing architects to quickly approve plans compatible with the ZURU building system or reject those with disqualifying features. This intuitive approach significantly accelerated ZURU’s evaluation process while maintaining clear decision criteria for each floor plan.

To further enhance this dataset, we began with careful dataset preparation including filtering out the low-quality data (30%) by evaluating the metric score of ground truth dataset. Following this filtering mechanism, data points not achieving 100% accuracy on instruction adherence are removed from the training dataset. This data preparation technique helped to improve the efficiency and quality of the fine-tuning and prompt engineering by more than 20%.

During our exploratory data analysis we found that the dataset contained prompts that can match multiple floor plans as well as floor plans that could match multiple prompts. By moving all related prompt and floor plan combinations to the same data split (either training, validation, or testing) we were able to prevent data leakage and promote robust evaluation.

Prompt engineering approach

As part of our approach, we implemented dynamic matching for few-shot prompting that is different from traditional static sampling methods. Combining this with the implementation of prompt decomposition, we could increase the overall accuracy of the generated floor plan content.

With a dynamic few-shot prompting methodology, we retrieve the most relevant examples at run time based on the details of the input prompt from a high-quality dataset and provide this as part of the prompt to the generative AI model.

The dynamic few-shot prompting approach is further enhanced by prompt decomposition, where we break down complex tasks into smaller, more manageable components to achieve better results from language models. By decomposing queries, each component can be optimized for its specific purpose. We found that combining these methods resulted in improved relevancy in example selection and lower latency in retrieving the example data, leading to better performance and higher quality results.

Prompt engineering architecture

The workflow and architecture implemented for prototyping shown in the following figure demonstrates a systematic approach to AI model optimization. When a user query such as “Build me a house with three bedrooms and two bathrooms” is entered, the workflow follows these steps:

We use prompt decomposition to execute three smaller tasks that retrieve highly relevant examples that match the same features for a house that the user has requested
We use the relevant examples and inject it into the prompt to perform dynamic few-shot prompting to generate a floor plan
We use the reflection technique to ask the generative AI model to self-reflect and asses that the generated content adheres to our requirements

Deep dive on workflow and architecture

The first step in our workflow is to understand the unique features of the house, which we can use as search criteria to find the most relevant examples in the subsequent steps. For this step, we use Amazon Bedrock, which provides a serverless API-driven endpoint for inference. From the wide range of generative AI models offered by Amazon Bedrock, we choose Mistral 7B, which provides the right balance between cost, latency, and accuracy required for this small decomposed step.

The second step is to search for the most relevant examples using the unique features we found. We use Amazon Bedrock Knowledge Bases backed by Amazon OpenSearch Serverless as a vector database to implement metadata filtering and hybrid search to retrieve the most relevant record identifiers. Amazon Simple Storage Service (Amazon S3) is used for storage of the data set, and Amazon Bedrock Knowledge Bases provides a managed solution for vectorizing and indexing the metadata into the vector database.

In the third step, we retrieve the actual floor plan data by record identifier using Amazon DynamoDB. By splitting the search and retrieval of floor plan examples into two steps, we were able to use purpose-built services with Amazon OpenSearch, allowing for low-latency search, and DynamoDB for low-latency data retrieval by key value leading to optimized performance.

After retrieving the most relevant examples for the user’s prompt, in step four we use Amazon Bedrock and Anthropic’s Claude 3.5 Sonnet as a model with leading benchmarks in deep reasoning and mathematics to generate our new floor plan.

Finally, in step five, we implement reflection. We use Amazon Bedrock with Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock again and pass the original prompt, instructions, examples and newly generated floor plan back with a final instruction for the model to reflect and double-check its generated floor plan and correct mistakes.

Fine-tuning approach

We explored two methods for optimizing LLMs for automated floorplan generation: full parameter fine-tuning and Low-Rank Adaptation (LoRA)–based fine-tuning. Full fine-tuning adjusts all LLM parameters, which requires significant memory and training time. In contrast, LoRA tunes only a small subset of parameters, reducing memory requirements and training time.

Workflow and architecture

We implemented our workflow containing data processing, fine-tuning, and inference and testing steps shown in the following figure below, all within a SageMaker Jupyter Lab Notebook provisioned with an ml.p4.24xlarge instance, giving us access to Nvidia A100 GPUs. Because we used a Jupyter notebook and ran all parts of our workflow interactively, we were able to iterate quickly and debug our experiments while maturing the training and testing scripts.

Deep dive on fine tuning workflow

One key insight from our experiments was the critical importance of dataset quality and diversity. Further to our initial dataset preparation, when fine-tuning a model, we found that carefully selecting training samples with larger diversity helped the model learn more robust representations. Additionally, although larger batch sizes generally improved performance (within memory constraints), we had to carefully balance this against computational resources (320 GB GPU memory in an ml.p4.24xlarge instance) and training time (ideally within 1–2 days).

We conducted several iterations to optimize performance, experimenting with various approaches including initial few-sample quick instruction fine-tuning, larger dataset fine-tuning, fine-tuning with early stopping, comparing Llama 3.1 8B and Llama 3 8B models, and varying instruction length in fine-tuning samples. Through these iterations, we found that full fine-tuning of the Llama 3.1 8B model using a curated dataset of 200,000 samples produced the best results.

The training process for full fine-tuning Llama 3.1 8B with BF16 and a microbatch size of three involved eight epochs with 30,000 steps, taking 25 hours to complete. In contrast, the LoRA approach showed significant computational efficiency, requiring only 2 hours of training time and producing an 89 MB checkpoint.

Evaluation framework

The testing framework implements an efficient evaluation methodology that optimizes resource utilization and time while maintaining statistical validity. Key components include:

A prompt deduplication system that identifies and consolidates duplicate instructions in the test dataset, reducing computational overhead and enabling faster iteration cycles for model improvement
A distribution-based performance assessment that filters unique test cases, promotes representative sampling through statistical analysis, and projects results across the full dataset
A metric-based evaluation that implements scoring across key criteria enabling comparative analysis against both the baseline GPT2 model and other approaches.

Results and business impact

To understand how well each approach in our experiment performed, we used the evaluation framework and compared several key metrics. For the purposes of this post, we focus on two of these key metrics. The first reflects how well the model was able to follow users’ instructions to reflect the features required in the house. The second metric looks at how well the features of the house adhered to instructions in mathematical and positioning and orientation. The following image show these results in a graph.

We found that the prompt engineering approach with Anthropic’s Claude 3.5 Sonnet as well as the full fine-tuning approach with Llama 3.1 8b increased the instruction adherence quality over the baseline GPT2 model by 109%, showing that, depending on a team’s skillsets, both approaches could be used to improve the quality of understanding an LLM when generating content such as floor plans.

When looking at mathematical correctness, our prompt engineering approach wasn’t able to create significant improvements over the baseline, but full fine-tuning was a clear winner with a 54% increase over the baseline GPT2 results.

The LoRA-based tuning approach achieves slightly lower performance scores being 20% less in the metric scores on instruction adherence and 50% lower scores on mathematical correctness compared to full fine-tuning, demonstrating the tradeoffs that can be made when it comes to time, cost, and hardware compared to model accuracy.

Conclusion

ZURU Tech has set its vision on fundamentally transforming the way we design and construct buildings. In this post, we highlighted the approach to building and improving a text-to–floor plan generator based on LLMs to create a highly useable and streamlined workflow within a 3D-modeling system. We dived into advanced concepts of prompt engineering using Amazon Bedrock and detailed approaches to fine-tuning LLMs using Amazon SageMaker, showing the different tradeoffs you can make to significantly improve on the accuracy of the content that is generated.

To learn more about the Generative AI Innovation Center program, get in touch with your account team.

About the Authors

Federico Di Mattia is the team leader and Product Owner of ZURU AI at ZURU Tech in Modena, Italy. With a focus on AI-driven innovation, he leads the development of Generative AI solutions that enhance business processes and drive ZURU’s growth.

Niro Amerasinghe is a Senior Solutions Architect based out of Auckland, New Zealand. With experience in architecture, product development, and engineering, he helps customers in using Amazon Web Services (AWS) to grow their businesses.

Haofei Feng is a Senior Cloud Architect at AWS with over 18 years of expertise in DevOps, IT Infrastructure, Data Analytics, and AI. He specializes in guiding organizations through cloud transformation and generative AI initiatives, designing scalable and secure GenAI solutions on AWS. Based in Sydney, Australia, when not architecting solutions for clients, he cherishes time with his family and Border Collies.

Sheldon Liu is an applied scientist, ANZ Tech Lead at the AWS Generative AI Innovation Center. He partners with enterprise customers across diverse industries to develop and implement innovative generative AI solutions, accelerating their AI adoption journey while driving significant business outcomes.

Xuefeng Liu leads a science team at the AWS Generative AI Innovation Center in the Asia Pacific regions. His team partners with AWS customers on generative AI projects, with the goal of accelerating customers’ adoption of generative AI.

Simone Bartoli is a Machine Learning Software Engineer at ZURU Tech, in Modena, Italy. With a background in computer vision, machine learning, and full-stack web development, Simone specializes in creating innovative solutions that leverage cutting-edge technologies to enhance business processes and drive growth.

Marco Venturelli is a Senior Machine Learning Engineer at ZURU Tech in Modena, Italy. With a background in computer vision and AI, he leverages his experience to innovate with generative AI, enriching the Dreamcatcher software with smart features.

Stefano Pellegrini is a Generative AI Software Engineer at ZURU Tech in Italy. Specializing in GAN and diffusion-based image generation, he creates tailored image-generation solutions for various departments across ZURU.

Enrico Petrucci is a Machine Learning Software Engineer at ZURU Tech, based in Modena, Italy. With a strong background in machine learning and NLP tasks, he currently focuses on leveraging Generative AI and Large Language Models to develop innovative agentic systems that provide tailored solutions for specific business cases.

Going beyond AI assistants: Examples from Amazon.com reinventing industries with generative AI

May 30, 2025

by Burak Gozluklu Amazon AWS

Generative AI revolutionizes business operations through various applications, including conversational assistants such as Amazon’s Rufus and Amazon Seller Assistant. Additionally, some of the most impactful generative AI applications operate autonomously behind the scenes, an essential capability that empowers enterprises to transform their operations, data processing, and content creation at scale. These non-conversational implementations, often in the form of agentic workflows powered by large language models (LLMs), execute specific business objectives across industries without direct user interaction.

Non-conversational applications offer unique advantages such as higher latency tolerance, batch processing, and caching, but their autonomous nature requires stronger guardrails and exhaustive quality assurance compared to conversational applications, which benefit from real-time user feedback and supervision.

This post examines four diverse Amazon.com examples of such generative AI applications:

Amazon.com listing creation and catalog data quality improvements – Demonstrating how LLMs are helping selling partners and Amazon.com create higher-quality listings at scale
Prescription processing in Amazon Pharmacy – Showcasing implementation in a highly regulated environment and task decomposition for agentic workflows
Review highlights – Illustrating massive scale batch processing, traditional machine learning (ML) integration, use of smaller LLMs, and cost-effective solution at scale
Amazon Ads creative image and video generation – Highlighting multimodal generative AI and responsible AI practices in creative endeavors

Each case study reveals different aspects of implementing non-conversational generative AI applications, from technical architecture to operational considerations. Throughout these examples, you will learn how the comprehensive suite of AWS services, including Amazon Bedrock and Amazon SageMaker, are the key to success. Finally, we list key learnings commonly shared across these use cases.

Creating high-quality product listings on Amazon.com

Creating high-quality product listings with comprehensive details helps customers make informed purchase decisions. Traditionally, selling partners manually entered dozens of attributes per product. The new generative AI solution, launched in 2024, transforms this process by proactively acquiring product information from brand websites and other sources to improve the customer experience across numerous product categories.

Generative AI simplifies the selling partner experience by enabling information input in various formats such as URLs, product images, or spreadsheets and automatically translating this into the required structure and format. Over 900,000 selling partners have used it, with nearly 80% of generated listing drafts accepted with minimal edits. AI-generated content provides comprehensive product details that help with clarity and accuracy, which can contribute to product discoverability in customer searches.

For new listings, the workflow begins with selling partners providing initial information. The system then generates comprehensive listings using multiple information sources, including titles, descriptions, and detailed attributes. Generated listings are shared with selling partners for approval or editing.

For existing listings, the system identifies products that can be enriched with additional data.

Data integration and processing for a large variety of outputs

The Amazon team built robust connectors for internal and external sources with LLM-friendly APIs using Amazon Bedrock and other AWS services to seamlessly integrate into Amazon.com backend systems.

A key challenge is synthesizing diverse data into cohesive listings across more than 50 attributes, both textual and numerical. LLMs require specific control mechanisms and instructions to accurately interpret ecommerce concepts because they might not perform optimally with such complex, varied data. For example, LLMs might misinterpret “capacity” in a knife block as dimensions rather than number of slots, or mistake “Fit Wear” as a style description instead of a brand name. Prompt engineering and fine-tuning were extensively used to address these cases.

Generation and validation with LLMs

The generated product listings should be complete and correct. To help this, the solution implements a multistep workflow using LLMs for both generation and validation of attributes. This dual-LLM approach helps prevent hallucinations, which is critical when dealing with safety hazards or technical specifications. The team developed advanced self-reflection techniques to make sure the generation and validation processes complement each other effectively.

The following figure illustrates the generation process with validation both performed by LLMs.

Figure 1. Product Listing creation workflow

Multi-layer quality assurance with human feedback

Human feedback is central to the solution’s quality assurance. The process includes Amazon.com experts for initial evaluation and selling partner input for acceptance or edits. This provides high-quality output and enables ongoing enhancement of AI models.

The quality assurance process includes automated testing methods combining ML-, algorithm-, or LLM-based evaluations. Failed listings undergo regeneration, and successful listings proceed to further testing. Using causal inference models, we identify underlying features affecting listing performance and opportunities for enrichment. Ultimately, listings that pass quality checks and receive selling partner acceptance are published, making sure customers receive accurate and comprehensive product information.

The following figure illustrates the workflow of going to production with testing, evaluation, and monitoring of product listing generation.

Figure 2. Product Listing testing and human in the loop workflow

Application-level system optimization for accuracy and cost

Given the high standards for accuracy and completeness, the team adopted a comprehensive experimentation approach with an automated optimization system. This system explores various combinations of LLMs, prompts, playbooks, workflows, and AI tools to iterate for higher business metrics, including cost. Through continuous evaluation and automated testing, the product listing generator effectively balances performance, cost, and efficiency while staying adaptable to new AI developments. This approach means customers benefit from high-quality product information, and selling partners have access to cutting-edge tools for creating listings efficiently.

Generative AI-powered prescription processing in Amazon Pharmacy

Building upon the human-AI hybrid workflows previously discussed in the seller listing example, Amazon Pharmacy demonstrates how these principles can be applied in a Health Insurance Portability and Accountability Act (HIPAA)-regulated industry. Having shared a conversational assistant for patient care specialists in the post Learn how Amazon Pharmacy created their LLM-based chat-bot using Amazon SageMaker, we now focus on automated prescription processing, which you can read about in The life of a prescription at Amazon Pharmacy and the following research paper in Nature Magazine.

At Amazon Pharmacy, we developed an AI system built on Amazon Bedrock and SageMaker to help pharmacy technicians process medication directions more accurately and efficiently. This solution integrates human experts with LLMs in creation and validation roles to enhance precision in medication instructions for our patients.

Agentic workflow design for healthcare accuracy

The prescription processing system combines human expertise (data entry technicians and pharmacists) with AI support for direction suggestions and feedback. The workflow, shown in the following diagram, begins with a pharmacy knowledge-based preprocessor standardizing raw prescription text in Amazon DynamoDB, followed by fine-tuned small language models (SLMs) on SageMaker identifying critical components (dosage, frequency).

Data entry technician and pharmacist workflow with two GenAI modules

(a)

(b)

(c)

Figure 3. (a) Data entry technician and pharmacist workflow with two GenAI modules, (b) Suggestion module workflow and (c) Flagging module workflow

The system seamlessly integrates experts such as data entry technicians and pharmacists, where generative AI complements the overall workflow towards agility and accuracy to better serve our patients. A direction assembly system with safety guardrails then generates instructions for data entry technicians to create their typed directions through the suggestion module. The flagging module flags or corrects errors and enforces further safety measures as feedback provided to the data entry technician. The technician finalizes highly accurate, safe-typed directions for pharmacists who can either provide feedback or execute the directions to the downstream service.

One highlight from the solution is the use of task decomposition, which empowers engineers and scientists to break the overall process into a multitude of steps with individual modules made of substeps. The team extensively used fine-tuned SLMs. In addition, the process employs traditional ML procedures such as named entity recognition (NER) or estimation of final confidence with regression models. Using SLMs and traditional ML in such contained, well-defined procedures significantly improved processing speed while maintaining rigorous safety standards due to incorporation of appropriate guardrails on specific steps.

The system comprises multiple well-defined substeps, with each subprocess operating as a specialized component working semi-autonomously yet collaboratively within the workflow toward the overall objective. This decomposed approach, with specific validations at each stage, proved more effective than end-to-end solutions while enabling the use of fine-tuned SLMs. The team used AWS Fargate to orchestrate the workflow given its current integration into existing backend systems.

In their product development journey, the team turned to Amazon Bedrock, which provided high-performing LLMs with ease-of-use features tailored to generative AI applications. SageMaker enabled further LLM selections, deeper customizability, and traditional ML methods. To learn more about this technique, see How task decomposition and smaller LLMs can make AI more affordable and read about the Amazon Pharmacy business case study.

Building a reliable application with guardrails and HITL

To comply with HIPAA standards and provide patient privacy, we implemented strict data governance practices alongside a hybrid approach that combines fine-tuned LLMs using Amazon Bedrock APIs with Retrieval Augmented Generation (RAG) using Amazon OpenSearch Service. This combination enables efficient knowledge retrieval while maintaining high accuracy for specific subtasks.

Managing LLM hallucinations—which is critical in healthcare—required more than just fine-tuning on large datasets. Our solution implements domain-specific guardrails built on Amazon Bedrock Guardrails, complemented by human-in-the-loop (HITL) oversight to promote system reliability.

The Amazon Pharmacy team continues to enhance this system through real-time pharmacist feedback and expanded prescription format capabilities. This balanced approach of innovation, domain expertise, advanced AI services, and human oversight not only improves operational efficiency, but means that the AI system properly augments healthcare professionals in delivering optimal patient care.

Generative AI-powered customer review highlights

Whereas our previous example showcased how Amazon Pharmacy integrates LLMs into real-time workflows for prescription processing, this next use case demonstrates how similar techniques—SLMs, traditional ML, and thoughtful workflow design—can be applied to offline batch inferencing at massive scale.

Amazon has introduced AI-generated customer review highlights to process over 200 million annual product reviews and ratings. This feature distills shared customer opinions into concise paragraphs highlighting positive, neutral, and negative feedback about products and their features. Shoppers can quickly grasp consensus while maintaining transparency by providing access to related customer reviews and keeping original reviews available.

The system enhances shopping decisions through an interface where customers can explore review highlights by selecting specific features (such as picture quality, remote functionality, or ease of installation for a Fire TV). Features are visually coded with green check marks for positive sentiment, orange minus signs for negative, and gray for neutral—which means shoppers can quickly identify product strengths and weaknesses based on verified purchase reviews. The following screenshot shows review highlights regarding noise level for a product.

Figure 4. An example product review highlights for a product.

A recipe for cost-effective use of LLMs for offline use cases

The team developed a cost-effective hybrid architecture combining traditional ML methods with specialized SLMs. This approach assigns sentiment analysis and keyword extraction to traditional ML while using optimized SLMs for complex text generation tasks, improving both accuracy and processing efficiency. The following diagram shows ttraditional ML and LLMs working to provide the overall workflow.

Figure 5. Use of traditional ML and LLMs in a workflow.

The feature employs SageMaker batch transform for asynchronous processing, significantly reducing costs compared to real-time endpoints. To deliver a near zero-latency experience, the solution caches extracted insights alongside existing reviews, reducing wait times and enabling simultaneous access by multiple customers without additional computation. The system processes new reviews incrementally, updating insights without reprocessing the complete dataset. For optimal performance and cost-effectiveness, the feature uses Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances for batch transform jobs, providing up to 40% better price-performance to alternatives.

By following this comprehensive approach, the team effectively managed costs while handling the massive scale of reviews and products so that the solution remained both efficient and scalable.

Amazon Ads AI-powered creative image and video generation

Having explored mostly text-centric generative AI applications in previous examples, we now turn to multimodal generative AI with Amazon Ads creative content generation for sponsored ads. The solution has capabilities for image and video generation, the details of which we share in this section. In common, this solution uses Amazon Nova creative content generation models at its core.

Working backward from customer need, a March 2023 Amazon survey revealed that nearly 75% of advertisers struggling with campaign success cited creative content generation as their primary challenge. Many advertisers—particularly those without in-house capabilities or agency support—face significant barriers due to the expertise and costs of producing quality visuals. The Amazon Ads solution democratizes visual content creation, making it accessible and efficient for advertisers of different sizes. The impact has been substantial: advertisers using AI-generated images in Sponsored Brands campaigns saw nearly 8% click-through rates (CTR) and submitted 88% more campaigns than non-users.

Last year, the AWS Machine Learning Blog published a post detailing the image generation solution. Since then, Amazon has adopted Amazon Nova Canvas as its foundation for creative image generation, creating professional-grade images from text or image prompts with features for text-based editing and controls for color scheme and layout adjustments.

In September 2024, the Amazon Ads team included the creation of short-form video ads from product images. This feature uses foundation models available on Amazon Bedrock to give customers control over visual style, pacing, camera motion, rotation, and zooming through natural language, using an agentic workflow to first describe video storyboards and then generate the content for the story. The following screenshot shows an example of creative image generation for product backgrounds on Amazon Ads.

Figure 6. Ads image generation example for a product.

As discussed in the original post, responsible AI is at the center of the solution, and Amazon Nova creative models come with built-in controls to support safety and responsible AI use, including watermarking and content moderation.

The solution uses AWS Step Functions with AWS Lambda functions to orchestrate serverless orchestration of both image and video generation processes. Generated content is stored in Amazon Simple Storage Service (Amazon S3) with metadata in DynamoDB, and Amazon API Gateway provides customer access to the generation capabilities. The solution now employs Amazon Bedrock Guardrails in addition to maintaining Amazon Rekognition and Amazon Comprehend integration at various steps for additional safety checks. The following screenshot shows creative AI-generated videos on Amazon Ads campaign builder.

Figure 7. Ads video generation for a product

Creating high-quality ad creatives at scale presented complex challenges. The generative AI model needed to produce appealing, brand-appropriate images across diverse product categories and advertising contexts while remaining accessible to advertisers regardless of technical expertise. Quality assurance and improvement are fundamental to both image and video generation capabilities. The system undergoes continual enhancement through extensive HITL processes enabled by Amazon SageMaker Ground Truth. This implementation delivers a powerful tool that transforms advertisers’ creative process, making high-quality visual content creation more accessible across diverse product categories and contexts.

This is just the beginning of Amazon Ads using generative AI to empower advertising customers to create the content they need to drive their advertising objectives. The solution demonstrates how reducing creative barriers directly increases advertising activity while maintaining high standards for responsible AI use.

Key technical learnings and discussions

Non-conversational applications benefit from higher latency tolerance, enabling batch processing and caching, but require robust validation mechanisms and stronger guardrails due to their autonomous nature. These insights apply to both non-conversational and conversational AI implementations:

Task decomposition and agentic workflows – Breaking complex problems into smaller components has proven valuable across implementations. This deliberate decomposition by domain experts enables specialized models for specific subtasks, as demonstrated in Amazon Pharmacy prescription processing, where fine-tuned SLMs handle discrete tasks such as dosage identification. This strategy allows for specialized agents with clear validation steps, improving reliability and simplifying maintenance. The Amazon seller listing use case exemplifies this through its multistep workflow with separate generation and validation processes. Additionally, the review highlights use case showcased cost-effective and controlled use of LLMs by using traditional ML for preprocessing and performing parts that could be associated with an LLM task.
Hybrid architectures and model selection – Combining traditional ML with LLMs provides better control and cost-effectiveness than pure LLM approaches. Traditional ML excels at well-defined tasks, as shown in the review highlights system for sentiment analysis and information extraction. Amazon teams have strategically deployed both large and small language models based on requirements, integrating RAG with fine-tuning for effective domain-specific applications like the Amazon Pharmacy implementation.
Cost optimization strategies – Amazon teams achieved efficiency through batch processing, caching mechanisms for high-volume operations, specialized instance types such as AWS Inferentia and AWS Trainium, and optimized model selection. Review highlights demonstrates how incremental processing reduces computational needs, and Amazon Ads used Amazon Nova foundation models (FMs) to cost-effectively create creative content.
Quality assurance and control mechanisms – Quality control relies on domain-specific guardrails through Amazon Bedrock Guardrails and multilayered validation combining automated testing with human evaluation. Dual-LLM approaches for generation and validation help prevent hallucinations in Amazon seller listings, and self-reflection techniques improve accuracy. Amazon Nova creative FMs provide inherent responsible AI controls, complemented by continual A/B testing and performance measurement.
HITL implementation – The HITL approach spans multiple layers, from expert evaluation by pharmacists to end-user feedback from selling partners. Amazon teams established structured improvement workflows, balancing automation and human oversight based on specific domain requirements and risk profiles.
Responsible AI and compliance – Responsible AI practices include content ingestion guardrails for regulated environments and adherence to regulations such as HIPAA. Amazon teams integrated content moderation for user-facing applications, maintained transparency in review highlights by providing access to source information, and implemented data governance with monitoring to promote quality and compliance.

These patterns enable scalable, reliable, and cost-effective generative AI solutions while maintaining quality and responsibility standards. The implementations demonstrate that effective solutions require not just sophisticated models, but careful attention to architecture, operations, and governance, supported by AWS services and established practices.

Next steps

The examples from Amazon.com shared in this post illustrate how generative AI can create value beyond traditional conversational assistants. We invite you to follow these examples or create your own solution to discover how generative AI can reinvent your business or even your industry. You can visit the AWS generative AI use cases page to start the ideation process.

These examples showed that effective generative AI implementations often benefit from combining different types of models and workflows. To learn what FMs are supported by AWS services, refer to Supported foundation models in Amazon Bedrock and Amazon SageMaker JumpStart Foundation Models. We also suggest you explore Amazon Bedrock Flows, which can ease the path towards building workflows. Additionally, we remind you that Trainium and Inferentia accelerators provide important cost savings in these applications.

Agentic workflows, as illustrated in our examples, have proven particularly valuable. We recommend exploring Amazon Bedrock Agents for quickly building agentic workflows.

Successful generative AI implementation extends beyond model selection—it represents a comprehensive software development process from experimentation to application monitoring. To begin building your foundation across these essential services, we invite you to explore Amazon QuickStart.

Conclusion

These examples demonstrate how generative AI extends beyond conversational assistants to drive innovation and efficiency across industries. Success comes from combining AWS services with strong engineering practices and business understanding. Ultimately, effective generative AI solutions focus on solving real business problems while maintaining high standards of quality and responsibility.

To learn more about how Amazon uses AI, refer to Artificial Intelligence in Amazon News.

About the Authors

Burak Gozluklu is a Principal AI/ML Specialist Solutions Architect and lead GenAI Scientist Architect for Amazon.com on AWS, based in Boston, MA. He helps strategic customers adopt AWS technologies and specifically Generative AI solutions to achieve their business objectives. Burak has a PhD in Aerospace Engineering from METU, an MS in Systems Engineering, and a post-doc in system dynamics from MIT in Cambridge, MA. He maintains his connection to academia as a research affiliate at MIT. Outside of work, Burak is an enthusiast of yoga.

Emilio Maldonado is a Senior leader at Amazon responsible for Product Knowledge, oriented at building systems to scale the e-commerce Catalog metadata, organize all product attributes, and leverage GenAI to infer precise information that guides Sellers and Shoppers to interact with products. He’s passionate about developing dynamic teams and forming partnerships. He holds a Bachelor of Science in C.S. from Tecnologico de Monterrey (ITESM) and an MBA from Wharton, University of Pennsylvania.

Wenchao Tong is a Sr. Principal Technologist at Amazon Ads in Palo Alto, CA, where he spearheads the development of GenAI applications for creative building and performance optimization. His work empowers customers to enhance product and brand awareness and drive sales by leveraging innovative AI technologies to improve creative performance and quality. Wenchao holds a Master’s degree in Computer Science from Tongji University. Outside of work, he enjoys hiking, board games, and spending time with his family.

Alexandre Alves is a Sr. Principal Engineer at Amazon Health Services, specializing in ML, optimization, and distributed systems. He helps deliver wellness-forward health experiences.

Puneet Sahni is Sr. Principal Engineer in Amazon. He works on improving the data quality of all products available in Amazon catalog. He is passionate about leveraging product data to improve our customer experiences. He has a Master’s degree in Electrical engineering from Indian Institute of Technology (IIT) Bombay. Outside of work he enjoying spending time with his young kids and travelling.

Vaughn Schermerhorn is a Director at Amazon, where he leads Shopping Discovery and Evaluation—spanning Customer Reviews, content moderation, and site navigation across Amazon’s global marketplaces. He manages a multidisciplinary organization of applied scientists, engineers, and product leaders focused on surfacing trustworthy customer insights through scalable ML models, multimodal information retrieval, and real-time system architecture. His team develops and operates large-scale distributed systems that power billions of shopping decisions daily. Vaughn holds degrees from Georgetown University and San Diego State University and has lived and worked in the U.S., Germany, and Argentina. Outside of work, he enjoys reading, travel, and time with his family.

Tarik Arici is a Principal Applied Scientist at Amazon Selection and Catalog Systems (ASCS), working on Catalog Quality Enhancement using GenAI workflows. He has a PhD in Electrical and Computer Engineering from Georgia Tech. Outside of work, Tarik enjoys swimming and biking.

Architect a mature generative AI foundation on AWS

May 30, 2025

by Chaitra Mathur Amazon AWS

Generative AI applications seem simple—invoke a foundation model (FM) with the right context to generate a response. In reality, it’s a much more complex system involving workflows that invoke FMs, tools, and APIs and that use domain-specific data to ground responses with patterns such as Retrieval Augmented Generation (RAG) and workflows involving agents. Safety controls need to be applied to input and output to prevent harmful content, and foundational elements have to be established such as monitoring, automation, and continuous integration and delivery (CI/CD), which are needed to operationalize these systems in production.

Many organizations have siloed generative AI initiatives, with development managed independently by various departments and lines of businesses (LOBs). This often results in fragmented efforts, redundant processes, and the emergence of inconsistent governance frameworks and policies. Inefficiencies in resource allocation and utilization drive up costs.

To address these challenges, organizations are increasingly adopting a unified approach to build applications where foundational building blocks are offered as services to LOBs and teams for developing generative AI applications. This approach facilitates centralized governance and operations. Some organizations use the term “generative AI platform” to describe this approach. This can be adapted to different operating models of an organization: centralized, decentralized, and federated. A generative AI foundation offers core services, reusable components, and blueprints, while applying standardized security and governance policies.

This approach gives organizations many key benefits, such as streamlined development, the ability to scale generative AI development and operations across organization, mitigated risk as central management simplifies implementation of governance frameworks, optimized costs because of reuse, and accelerated innovation as teams can quickly build and ship use cases.

In this post, we give an overview of a well-established generative AI foundation, dive into its components, and present an end-to-end perspective. We look at different operating models and explore how such a foundation can operate within those boundaries. Lastly, we present a maturity model that helps enterprises assess their evolution path.

Overview

Laying out a strong generative AI foundation includes offering a comprehensive set of components to support the end-to-end generative AI application lifecycle. The following diagram illustrates these components.

In this section, we discuss the key components in more detail.

Hub

At the core of the foundation are multiple hubs that include:

Model hub – Provides access to enterprise FMs. As a system matures, a broad range of off-the-shelf or customized models can be supported. Most organizations conduct thorough security and legal reviews before models are approved for use. The model hub acts as a central place to access approved models.
Tool/Agent hub – Enables discovery and connectivity to tool catalog and agents. This could be enabled via protocols such as MCP, Agent2Agent (A2A).

Gateway

A model gateway offers secure access to the model hub through standardized APIs. Gateway is built as a multi-tenant component to provide isolation across teams and business units that are onboarded. Key features of a gateway include:

Access and authorization – The gateway facilitates authentication, authorization, and secure communication between users and the system. It helps verify that only authorized users can use specific models, and can also enforce fine-grained access control.
Unified API – The gateway provides unified APIs to models and features such as guardrails and evaluation. It can also support automated prompt translation to different prompt templates across different models.
Rate limiting and throttling – It handles API requests efficiently by controlling the number of requests allowed in a given time period, preventing overload and managing traffic spikes.
Cost attribution – The gateway monitors usage across the organization and allocates costs to the teams. Because these models can be resource-intensive, tracking model usage helps allocate costs properly, optimize resources, and avoid overspending.
Scaling and load balancing – The gateway can handle load balancing across different servers, model instances, or AWS Regions so that applications remain responsive.
Guardrails – The gateway applies content filters to requests and responses through guardrails and helps adhere to organizational security and compliance standards.
Caching – The cache layer stores prompts and responses that can help improve performance and reduce costs.

The AWS Solutions Library offers solution guidance to set up a multi-provider generative AI gateway. The solution uses an open source LiteLLM proxy wrapped in a container that can be deployed on Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). This offers organizations a building block to develop an enterprise wide model hub and gateway. The generative AI foundation can start with the gateway and offer additional features as it matures.

The gateway pattern for tool/agent hub are still evolving. The model gateway can be a universal gateway to all the hubs or alternatively individual hubs could have their own purpose-built gateways.

Orchestration

Orchestration encapsulates generative AI workflows, which are usually a multi-step process. The steps could involve invocation of models, integrating data sources, using tools, or calling APIs. Workflows can be deterministic, where they are created as predefined templates. An example of a deterministic flow is a RAG pattern. In this pattern, a search engine is used to retrieve relevant sources and augment the data into the prompt context, before the model attempts to generate the response for the user prompt. This aims to reduce hallucination and encourage the generation of responses grounded in verified content.

Alternatively, complex workflows can be designed using agents where a large language model (LLM) decides the flow by planning and reasoning. During reasoning, the agent can decide when to continue thinking, call external tools (such as APIs or search engines), or submit its final response. Multi-agent orchestration is used to tackle even more complex problem domains by defining multiple specialized subagents, which can interact with each other to decompose and complete a task requiring different knowledge or skills. A generative AI foundation can provide primitives such as models, vector databases, and guardrails as a service and higher-level services for defining AI workflows, agents and multi-agents, tools, and also a catalog to encourage reuse.

Model customization

A key foundational capability that can be offered is model customization, including the following techniques:

Continued pre-training – Domain-adaptive pre-training, where existing models are further trained on domain-specific data. This approach can offer a balance between customization depth and resource requirements, necessitating fewer resources than training from scratch.
Fine-tuning – Model adaptation techniques such as instruction fine-tuning and supervised fine-tuning to learn task-specific capabilities. Though less intensive than pre-training, this approach still requires significant computational resources.
Alignment – Training models with user-generated data using techniques such as Reinforcement Learning with Human Feedback (RLHF) and Direct Preference Optimization (DPO).

For the preceding techniques, the foundation should provide scalable infrastructure for data storage and training, a mechanism to orchestrate tuning and training pipelines, a model registry to centrally register and govern the model, and infrastructure to host the model.

Data management

Organizations typically have multiple data sources, and data from these sources is mostly aggregated in data lakes and data warehouses. Common datasets can be made available as a foundational offering to different teams. The following are additional foundational components that can be offered:

Integration with enterprise data sources and external sources to bring in the data needed for patterns such as RAG or model customization
Fully managed or pre-built templates and blueprints for RAG that include a choice of vector databases, chunking data, converting data into embeddings, and indexing them in vector databases
Data processing pipelines for model customization, including tools to create labeled and synthetic datasets
Tools to catalog data, making it quick to search, discover, access, and govern data

GenAIOps

Generative AI operations (GenAIOps) encompasses overarching practices of managing and automating operations of generative AI systems. The following diagram illustrates its components.

Fundamentally, GenAIOps falls into two broad categories:

Operationalizing applications that consume FMs – Although operationalizing RAG or agentic applications shares core principles with DevOps, it requires additional, AI-specific considerations and practices. RAGOps addresses operational practices for managing the lifecycle of RAG systems, which combine generative models with information retrieval mechanisms. Considerations here are choice of vector database, optimizing indexing pipelines, and retrieval strategies. AgentOps helps facilitate efficient operation of autonomous agentic systems. The key concerns here are tool management, agent coordination using state machines, and short-term and long-term memory management.
Operationalizing FM training and tuning – ModelOps is a category under GenAIOps, which is focused on governance and lifecycle management of models, including model selection, continuous tuning and training of models, experiments tracking, central model registry, prompt management and evaluation, model deployment, and model governance. FMOps, which is operationalizing FMs, and LLMOps, which is specifically operationalizing LLMs, fall under this category.

In addition, operationalization involves implementing CI/CD processes for automating deployments, integrating evaluation and prompt management systems, and collecting logs, traces, and metrics to optimize operations.

Observability

Observability for generative AI needs to account for the probabilistic nature of these systems—models might hallucinate, responses can be subjective, and troubleshooting is harder. Like other software systems, logs, metrics, and traces should be collected and centrally aggregated. There should be tools to generate insights out of this data that can be used to optimize the applications even further. In addition to component-level monitoring, as generative AI applications mature, deeper observability should be implemented, such as instrumenting traces, collecting real-world feedback, and looping it back to improve models and systems. Evaluation should be offered as a core foundational component, and this includes automated and human evaluation and LLM-as-a-judge pipelines along with storage of ground truth data.

Responsible AI

To balance the benefits of generative AI with the challenges that arise from it, it’s important to incorporate tools, techniques, and mechanisms that align to a broad set of responsible AI dimensions. At AWS, these Responsible AI dimensions include privacy and security, safety, transparency, explainability, veracity and robustness, fairness, controllability, and governance. Each organization would have its own governing set of responsible AI dimensions that can be centrally incorporated as best practices through the generative AI foundation.

Security and privacy

Communication should be over TLS, and private network access should be supported. User access should be secure, and a system should support fine-grained access control. Rate limiting and throttling should be in place to help prevent abuse. For data security, data should be encrypted at rest and transit, and tenant data isolation patterns should be implemented. Embeddings stored in vector stores should be encrypted. For model security, custom model weights should be encrypted and isolated for different tenants. Guardrails should be applied to input and output to filter topics and harmful content. Telemetry should be collected for actions that users take on the central system. Data quality is ownership of the consuming applications or data producers. The consuming applications should integrate observability into applications.

Governance

The two key areas of governance are model and data:

Model governance – Monitor model for performance, robustness, and fairness. Model versions should be managed centrally in a model registry. Appropriate permissions and policies should be in place for model deployments. Access controls to models should be established.
Data governance – Apply fine-grained access control to data managed by the system, including training data, vector stores, evaluation data, prompt templates, workflow, and agent definitions. Establish data privacy policies such as managing sensitive data (for example, personally identifiable information (PII) redaction), for the data managed by the system, protecting prompts and data and not using them to improve models.

Tools landscape

A variety of AWS services, AWS partner solutions, and third-party tools and frameworks are available to architect a comprehensive generative AI foundation. The following figure might not cover the entire gamut of tools, but we have created a landscape based on our experience with these tools.

Operational boundaries

One of the challenges to solve for is who owns the foundational components and how do they operate within the organization’s operating model. Let’s look at three common operating models:

Centralized – Operations are centralized to one team. Some organizations refer to this team as the platform team or platform engineering team. In this model, foundational components are managed by a central team and offered to LOBs and enterprise teams.

Decentralized – LOBs and teams build their respective systems and operate independently. The central team takes on a role of a Center of Excellence (COE) that defines best practices, standards, and governance frameworks. Logs and metrics can be aggregated in a central place.

Federated – A more flexible model is a hybrid of the two. A central team manages the foundation that offers building blocks for model access, evaluation, guardrails, central logs, and metrics aggregation to teams. LOBs and teams use the foundational components but also build and manage their own components as necessary.

Multi-tenant architecture

Irrespective of the operating model, it’s important to define how tenants are isolated and managed within the system. The multi-tenant pattern depends on a number of factors:

Tenant and data isolation – Data ownership is critical for building generative AI systems. A system should establish clear policies on data ownership and access rights, making sure data is accessible only to authorized users. Tenant data should be securely isolated from others to maintain privacy and confidentiality. This can be through physical isolation of data, for example, setting up isolated vector databases for each tenant for a RAG application, or by logical separation, for example, using different indexes within a shared database. Role-based access control should be set up to make sure users within a tenant can access resources and data specific to their organization.
Scalability and performance – Noisy neighbors can be a real problem, where one tenant is extremely chatty compared to others. Proper resource allocation according to tenant needs should be established. Containerization of workloads can be a good strategy to isolate and scale tenants individually. This also ties into the deployment strategy described later in this section, by means of which a chatty tenant can be completely isolated from others.
Deployment strategy – If strict isolation is required for use cases, then each tenant can have dedicated instances of compute, storage, and model access. This means gateway, data pipelines, data storage, training infrastructure, and other components are deployed on an isolated infrastructure per tenant. For tenants who don’t need strict isolation, shared infrastructure can be used and partitioning of resources can be achieved by a tenant identifier. A hybrid model can also be used, where the core foundation is deployed on shared infrastructure and specific components are isolated by tenant. The following diagram illustrates an example architecture.
Observability – A mature generative AI system should provide detailed visibility into operations at both the central and tenant-specific level. The foundation offers a central place for collecting logs, metrics, and traces, so you can set up reporting based on tenant needs.
Cost Management – A metered billing system should be set up based on usage. This requires establishing cost tracking based on resource usage of different components plus model inference costs. Model inference costs vary by models and by providers, but there should be a common mechanism of allocating costs per tenant. System administrators should be able to track and monitor usage across teams.

Let’s break this down by taking a RAG application as an example. In the hybrid model, the tenant deployment contains instances of a vector database that stores the embeddings, which supports strict data isolation requirements. The deployment will additionally include the application layer that contains the frontend code and orchestration logic to take the user query, augment the prompt with context from the vector database, and invoke FMs on the central system. The foundational components that offer services such as evaluation and guardrails for applications to consume to build a production-ready application are in a separate shared deployment. Logs, metrics, and traces from the applications can be fed into a central aggregation place.

Generative AI foundation maturity model

We have defined a maturity model to track the evolution of the generative AI foundation across different stages of adoption. The maturity model can be used to assess where you are in the development journey and plan for expansion. We define the curve along four stages of adoption: emerging, advanced, mature, and established.

The details for each stage are as follows:

Emerging – The foundation offers a playground for model exploration and assessment. Teams are able to develop proofs of concept using enterprise approved models.
Advanced – The foundation facilitates first production use cases. Multiple environments exist for development, testing, and production deployment. Monitoring and alerts are established.
Mature – Multiple teams are using the foundation and are able to develop complex use cases. CI/CD and infrastructure as code (IaC) practices accelerate the rollout of reusable components. Deeper observability such as tracing is established.
Established – A best-in-class system, fully automated and operating at scale, with governance and responsible AI practices, is established. The foundation enables diverse use cases, and is fully automated and governed. Most of the enterprise teams are onboarded on it.

The evolution might not be exactly linear along the curve in terms of specific capabilities, but certain key performance indicators can be used to evaluate the adoption and growth.

Conclusion

Establishing a comprehensive generative AI foundation can be a critical step in harnessing the power of AI at scale. Enterprise AI development brings unique challenges ranging from agility, reliability, governance, scale, and collaboration. Therefore, a well-constructed foundation with the right components and adapted to the operating model of business aids in building and scaling generative AI applications across the enterprise.

The rapidly evolving generative AI landscape means there might be cutting-edge tools we haven’t covered under the tools landscape. If you’re using or aware of state-of-the art solutions that align with the foundational components, we encourage you to share them in the comments section.

Our team is dedicated to helping customers solve challenges in generative AI development at scale—whether it’s architecting a generative AI foundation, setting up operational best practices, or implementing responsible AI practices. Leave us a comment and we will be glad to collaborate.

About the authors

Chaitra Mathur is as a GenAI Specialist Solutions Architect at AWS. She works with customers across industries in building scalable generative AI platforms and operationalizing them. Throughout her career, she has shared her expertise at numerous conferences and has authored several blogs in the Machine Learning and Generative AI domains.

Dr. Alessandro Cerè is a GenAI Evaluation Specialist and Solutions Architect at AWS. He assists customers across industries and regions in operationalizing and governing their generative AI systems at scale, ensuring they meet the highest standards of performance, safety, and ethical considerations. Bringing a unique perspective to the field of AI, Alessandro has a background in quantum physics and research experience in quantum communications and quantum memories. In his spare time, he pursues his passion for landscape and underwater photography.

Aamna Najmi is a GenAI and Data Specialist at AWS. She assists customers across industries and regions in operationalizing and governing their generative AI systems at scale, ensuring they meet the highest standards of performance, safety, and ethical considerations, bringing a unique perspective of modern data strategies to complement the field of AI. In her spare time, she pursues her passion of experimenting with food and discovering new places.

Dr. Andrew Kane is the WW Tech Leader for Security and Compliance for AWS Generative AI Services, leading the delivery of under-the-hood technical assets for customers around security, as well as working with CISOs around the adoption of generative AI services within their organizations. Before joining AWS at the beginning of 2015, Andrew spent two decades working in the fields of signal processing, financial payments systems, weapons tracking, and editorial and publishing systems. He is a keen karate enthusiast (just one belt away from Black Belt) and is also an avid home-brewer, using automated brewing hardware and other IoT sensors. He was the legal licensee in his ancient (AD 1468) English countryside village pub until early 2020.

Bharathi Srinivasan is a Generative AI Data Scientist at the AWS Worldwide Specialist Organization. She works on developing solutions for Responsible AI, focusing on algorithmic fairness, veracity of large language models, and explainability. Bharathi guides internal teams and AWS customers on their responsible AI journey. She has presented her work at various learning conferences.

Denis V. Batalov is a 17-year Amazon veteran and a PhD in Machine Learning, Denis worked on such exciting projects as Search Inside the Book, Amazon Mobile apps and Kindle Direct Publishing. Since 2013 he has helped AWS customers adopt AI/ML technology as a Solutions Architect. Currently, Denis is a Worldwide Tech Leader for AI/ML responsible for the functioning of AWS ML Specialist Solutions Architects globally. Denis is a frequent public speaker, you can follow him on Twitter @dbatalov.

Nick McCarthy is a Generative AI Specialist at AWS. He has worked with AWS clients across various industries including healthcare, finance, sports, telecoms and energy to accelerate their business outcomes through the use of AI/ML. Outside of work he loves to spend time traveling, trying new cuisines and reading about science and technology. Nick has a Bachelors degree in Astrophysics and a Masters degree in Machine Learning.

Alex Thewsey is a Generative AI Specialist Solutions Architect at AWS, based in Singapore. Alex helps customers across Southeast Asia to design and implement solutions with ML and Generative AI. He also enjoys karting, working with open source projects, and trying to keep up with new ML research.

Willie Lee is a Senior Tech PM for the AWS worldwide specialists team focusing on GenAI. He is passionate about machine learning and the many ways it can impact our lives, especially in the area of language comprehension.