Unlock retail intelligence by transforming data into actionable insights using generative AI with Amazon Q Business

Unlock retail intelligence by transforming data into actionable insights using generative AI with Amazon Q Business

Businesses often face challenges in managing and deriving value from their data. According to McKinsey, 78% of organizations now use AI in at least one business function (as of 2024), showing the growing importance of AI solutions in business. Additionally, 21% of organizations using generative AI have fundamentally redesigned their workflows, showing how AI is transforming business operations.

Gartner identifies AI-powered analytics and reporting as a core investment area for retail organizations, with most large retailers expected to deploy or scale such solutions within the next 12–18 months. The retail sector’s data complexity demands sophisticated solutions that can integrate seamlessly with existing systems. Amazon Q Business offers features that can be tailored to meet specific business needs, including integration capabilities with popular retail management systems, point-of-sale systems, inventory management software, and ecommerce systems. Through advanced AI algorithms, the system analyzes historical data and current trends, helping businesses prepare effectively for seasonal fluctuations in demand and make data-driven decisions.

Amazon Q Business for Retail Intelligence is an AI-powered assistant designed to help retail businesses streamline operations, improve customer service, and enhance decision-making processes. This solution is specifically engineered to be scalable and adaptable to businesses of various sizes, helping them compete more effectively. In this post, we show how you can use Amazon Q Business for Retail Intelligence to transform your data into actionable insights.

Solution overview

Amazon Q Business for Retail Intelligence is a comprehensive solution that transforms how retailers interact with their data using generative AI. The solution architecture combines the powerful generative AI capabilities of Amazon Q Business and Amazon QuickSight visualizations to deliver actionable insights across the entire retail value chain. Our solution also uses Amazon Q Apps so retail personas and users can create custom AI-powered applications to streamline day-to-day tasks and automate workflows and business processes.

The following diagram illustrates the solution architecture.

SolutionArchitecture

The solution uses the AWS architecture above to deliver a secure, high-performance, and reliable solution for retail intelligence. Amazon Q Business serves as the primary generative AI engine, enabling natural language interactions and powering custom retail-specific applications. The architecture incorporates AWS IAM Identity Center for robust authentication and access control, and Amazon Simple Storage Service (Amazon S3) provides secure data lake storage for retail data sources. We use QuickSight for interactive visualizations, enhancing data interpretation. The solution’s flexibility is further enhanced by AWS Lambda for serverless processing, Amazon API Gateway for efficient endpoint management, and Amazon CloudFront for optimized content delivery. This solution uses the Amazon Q Business custom plugin to call the API endpoints to start the automated workflows directly from the Amazon Q Business web application interface based on customer queries and interactions.

This setup implements a three-tier architecture: a data integration layer that securely ingests data from multiple retail sources, a processing layer where Amazon Q Business analyzes queries and generates insights, and a presentation layer that delivers personalized, role-based insights through a unified interface.

We have provided an AWS CloudFormation template, sample datasets, and scripts that you can use to set up the environment for this demonstration.

In the following sections, we dive deeper on how this solution works.

Deployment

We have provided the Amazon Q Business for Retail Intelligence solution as open source—you can use it as a starting point for your own solution and help us make it better by contributing fixes and features through GitHub pull requests. Visit the GitHub repository to explore the code, choose Watch to be notified of new releases, and check the README for the latest documentation updates.

After you set up the environment, you can access the Amazon Q Business for Retail Intelligence dashboard, as shown in the following screenshot.

RetailItelligenceDashboard

You can interact with the QuickSight visualizations and Amazon Q Business chat interface to ask questions using natural language.

Key features and capabilities

Retail users can interact with this solution in many ways. In this section, we explore the key features.

For C-suite executives or senior leadership wanting to know how your business is performing, our solution provides a single pane of glass and makes it straightforward to access and interact with your enterprise’s qualitative and quantitative data using natural language. For example, users can analyze quantitative data like product sales or marketing campaign performance using the interactive visualizations powered by QuickSight and qualitative data like customer feedback from Amazon Q Business using natural language, all from a single interface.

Consider that you are a marketing analyst and you want to evaluate campaign performance and reach across channels and conduct an analysis on ad spend vs. revenue. With Amazon Q Business, you can run complex queries with natural language questions and with share the Q Apps with multiple teams. The solution provides automated insights about customer behavior and campaign effectiveness, helping marketing teams make faster decisions and quick adjustments to maximize ROI.

marketingCampaignInfo

Similarly, let’s assume you are a merchandising planner or a vendor manager and you want to understand the impact of cost-prohibitive events for your international business that deals with importing and exporting of goods and services. You can add inputs to Amazon Q Apps and get responses based on that specific product or product family.

AlternativeProducts

Users can also send requests through APIs using Amazon Q Business custom plugins for real-time interactions with downstream applications. For example, a store manager might want to know which items in the current inventory they need to replenish or rebalance for the next week based on weather predictions or local sporting events.

To learn more, refer to the following complete demo.

For this post, we haven’t used the generative business intelligence (BI) capabilities of Amazon Q with our QuickSight visualizations. To learn more, see Amazon Q in QuickSight.

Empowering retail personas with AI-driven intelligence

Amazon Q Business for Retail Intelligence transforms how retailers handle their data challenges through a generative AI-powered assistant. This solution integrates seamlessly with existing systems, using Retrieval Augmented Generation (RAG) to unify disparate data sources and deliver actionable insights in real time.The following are some of the key benefits for various roles:

  • C-Suite executives – Access comprehensive real-time dashboards for company-wide metrics and KPIs while using AI-driven recommendations for strategic decisions. Use predictive analytics to anticipate consumer shifts and enable proactive strategy adjustments for business growth.
  • Merchandisers – Gain immediate insights into sales trends, profit margins, and inventory turnover rates through automated analysis tools and AI-powered pricing strategies. Identify and capitalize on emerging trends through predictive analytics for optimal product mix and category management.
  • Inventory managers – Implement data-driven stock level optimization across multiple store locations while streamlining operations with automated reorder point calculations. Accurately predict and prepare for seasonal demand fluctuations to maintain optimal inventory levels during peak periods.
  • Store managers – Maximize operational efficiency through AI-predicted staffing optimization while accessing detailed insights about local conditions affecting store performance. Compare store metrics against other locations using sophisticated benchmarking tools to identify improvement opportunities.
  • Marketing analysts – Monitor and analyze marketing campaign effectiveness across channels in real time while developing sophisticated customer segments using AI-driven analysis. Calculate and optimize marketing ROI across channels for efficient budget allocation and improved campaign performance.

Amazon Q Business for Retail Intelligence makes complex data analysis accessible to different users through its natural language interface. This solution enables data-driven decision-making across organizations by providing role-specific insights that break down traditional data silos. By providing each retail persona tailored analytics and actionable recommendations, organizations can achieve greater operational efficiency and maintain a competitive edge in the dynamic retail landscape.

Conclusion

Amazon Q Business for Retail Intelligence combines generative AI capabilities with powerful visualization tools to revolutionize retail operations. By enabling natural language interactions with complex data systems, this solution democratizes data access across organizational levels, from C-suite executives to store managers. The system’s ability to provide role-specific insights, automate workflows, and facilitate real-time decision-making positions it as a crucial tool for retail businesses seeking to maintain competitiveness in today’s dynamic landscape. As retailers continue to embrace AI-driven solutions, Amazon Q Business for Retail Intelligence can help meet the industry’s growing needs for sophisticated data analysis and operational efficiency.

To learn more about our solutions and offerings, refer to Amazon Q Business and Generative AI on AWS. For expert assistance, AWS Professional Services, AWS Generative AI partner solutions, and AWS Generative AI Competency Partners are here to help.


About the authors

Suprakash Dutta is a Senior Solutions Architect at Amazon Web Services, leading strategic cloud transformations for Fortune 500 retailers and large enterprises. He specializes in architecting mission-critical retail solutions that drive significant business outcomes, including cloud-native based systems, generative AI implementations, and retail modernization initiatives. He’s a multi-cloud certified architect and has delivered transformative solutions that modernized operations across thousands of retail locations while driving breakthrough efficiencies through AI-powered retail intelligence solutions.

Alberto Alonso is a Specialist Solutions Architect at Amazon Web Services. He focuses on generative AI and how it can be applied to business challenges.

Abhijit Dutta is a Sr. Solutions Architect in the Retail/CPG vertical at AWS, focusing on key areas like migration and modernization of legacy applications, data-driven decision-making, and implementing AI/ML capabilities. His expertise lies in helping organizations use cloud technologies for their digital transformation initiatives, with particular emphasis on analytics and generative AI solutions.

Ramesh Venkataraman is a Solutions Architect who enjoys working with customers to solve their technical challenges using AWS services. Outside of work, Ramesh enjoys following stack overflow questions and answers them in any way he can.

Girish Nazhiyath is a Sr. Solutions Architect in the Amazon Web Services Retail/CPG vertical. He enjoys working with retail/CPG customers to enable technology-driven retail innovation, with over 20 years of expertise in multiple retail segments and domains worldwide.

Krishnan Hariharan is a Sr. Manager, Solutions Architecture at AWS based out of Chicago. In his current role, he uses his diverse blend of customer, product, technology, and operations skills to help retail/CPG customers build the best solutions using AWS. Prior to AWS, Krishnan was President/CEO at Kespry, and COO at LightGuide. He has an MBA from The Fuqua School of Business, Duke University and a Bachelor of Science in Electronics from Delhi University.

Read More

Democratize data for timely decisions with text-to-SQL at Parcel Perform

Democratize data for timely decisions with text-to-SQL at Parcel Perform

This post was co-written with Le Vy from Parcel Perform.

Access to accurate data is often the true differentiator of excellent and timely decisions. This is even more crucial for customer-facing decisions and actions. A correctly implemented state-of-the-art AI can help your organization simplify access to data for accurate and timely decision-making for the customer-facing business team, while reducing the undifferentiated heavy lifting done by your data team. In this post, we share how Parcel Perform, a leading AI Delivery Experience Platform for e-commerce businesses worldwide, implemented such a solution.

Accurate post-purchase deliveries tracking can be crucial for many ecommerce merchants. Parcel Perform provides an AI-driven, intelligent end-to-end data and delivery experience and software as a service (SaaS) system for ecommerce merchants. The system uses AWS services and state-of-the-art AI to process hundreds of millions of daily parcel delivery movement data and provide a unified tracking capability across couriers for the merchants, with emphasis on accuracy and simplicity.

The business team in Parcel Perform often needs access to data to answer questions related to merchants’ parcel deliveries, such as “Did we see a spike in delivery delays last week? If so, in which transit facilities were this observed, and what was the primary cause of the issue?” Previously, the data team had to manually form the query and run it to fetch the data. With the new generative AI-powered text-to-SQL capability in Parcel Perform, the business team can self-serve their data needs by using an AI assistant interface. In this post, we discuss how Parcel Perform incorporated generative AI, data storage, and data access through AWS services to make timely decisions.

Data analytics architecture

The solution starts with data ingestion, storage, and access. Parcel Perform adopted the data analytics architecture shown in the following diagram.

Architecture diagram of the parcel event data ingestion at Parcel Perform

One key data type in the Parcel Perform parcel monitoring application is the parcel event data, which can reach billions of rows. This includes the parcel’s shipment status change, location change, and much more. This day-to-day data from multiple business units lands in relational databases hosted on Amazon Relational Database Service (Amazon RDS).

Although relational databases are suitable for rapid data ingestion and consumption from the application, a separate analytics stack is needed to handle analytics in a scalable and performant way without disrupting the main application. These analytics needs include answering aggregation queries from questions like “How many parcels were delayed last week?”

Parcel Perform uses Amazon Simple Storage Service (Amazon S3) with a query engine provided by Amazon Athena to meet their analytics needs. With this approach, Parcel Perform benefits from cost-effective storage while still being able to run SQL queries as needed on the data through Athena, which is priced on usage.

Data in Amazon S3 is stored in Apache Iceberg data format that allows data updates, which is useful in this case because the parcel events sometimes get updated. It also supports partitioning for better performance. Amazon S3 Tables, launched in late 2024, is a managed Iceberg tables feature that can also be an option for you.

Parcel Perform uses an Apache Kafka cluster managed by Amazon Managed Streaming for Apache Kafka (Amazon MSK) as the stream to move the data from the source to the S3 bucket. Amazon MSK Connect with a Debezium connector streams data with change data capture (CDC) from Amazon RDS to Amazon MSK.

Apache Flink, running on Amazon Elastic Kubernetes Service (Amazon EKS), processes data streams from Amazon MSK. It writes this data to an S3 bucket according to the Iceberg format, and updates the data schema in the AWS Glue Data Catalog. The data schema enables Athena to correctly query the data in the S3 bucket.

Now that you understand how the data is ingested and stored, let’s show how the data is consumed using the generative AI-powered data serving assistant for the business teams in Parcel Perform.

AI agent that can query data

The users of the data serving AI agent in Parcel Perform are customer-facing business team members who often query the parcel event data to answer questions from ecommerce merchants regarding the parcel deliveries and to proactively assist them. The following screenshot shows the UI experience for the AI agent assistant, powered by text-to-SQL with generative AI.

A screenshot of the AI assistant

This functionality helped the Parcel Perform team and their customers save time, which we discuss later in this post. In the following section, we present the architecture that powers this feature.

Text-to-SQL AI agent architecture

The data serving AI assistant architecture in Parcel Perform is shown in the following diagram.

Architecture diagram of the AI assistantThe AI assistant UI is powered by an application built with the Fast API framework hosted on Amazon EKS. It is also fronted by an Application Load Balancer to allow for potential horizontal scalability.

The application uses LangGraph to orchestrate the workflow of large language model (LLM) invocations, the use of tools, and the memory checkpointing. The graph uses multiple tools, including those from SQLDatabase Toolkit, to automatically fetch the data schema through Athena. The graph also uses an Amazon Bedrock Knowledge Bases retriever to retrieve business information from a knowledge base. Parcel Perform uses Anthropic’s Claude models in Amazon Bedrock to generate SQL.

Although the function of Athena as a query engine to query the parcel event data on Amazon S3 is clear, Parcel Perform still needs a knowledge base. In this use case, the SQL generation performs better when the LLM has more business contextual information to help interpret database fields and translate logistics terminology into data representations. This is better illustrated with the following two examples:

  • Parcel Perform’s data lake operations use specific codes: c for create and u for update. When analyzing data, Parcel Perform sometimes needs to focus only on initial creation records, where operation code is equal to c. Because this business logic might not be inherent in the training of LLMs in general, Parcel Perform explicitly defines this in their business context.
  • In logistics terminology, transit time has specific industry conventions. It’s measured in days, and same-day deliveries are recorded as transit_time = 0. Although this is intuitive for logistics professionals, an LLM might incorrectly interpret a request like “Get me all shipments with same-day delivery” by using WHERE transit_time = 1 instead of WHERE transit_time = 0 in the generated SQL statement.

Therefore, each incoming question goes to a Retrieval Augmented Generation (RAG) workflow to find potentially relevant stored business information, to enrich the context. This mechanism helps provide the specific rules and interpretations that even advanced LLMs might not be able to derive from general training data.

Parcel Perform uses Amazon Bedrock Knowledge Bases as a managed solution for the RAG workflow. They ingest business contextual information by uploading files to Amazon S3. Amazon Bedrock Knowledge Bases processes the files, converts them to chunks, uses embedding models to generate vectors, and stores the vectors in a vector database to make them searchable. The steps are fully managed by Amazon Bedrock Knowledge Bases. Parcel Perform stores the vectors in Amazon OpenSearch Serverless as the vector database of choice to simplify infrastructure management.

Amazon Bedrock Knowledge Bases provides the Retrieve API, which takes in an input (such as a question from the AI assistant), converts it into a vector embedding, searches for relevant chunks of business context information in the vector database, and returns the top relevant document chunks. It is integrated with the LangChain Amazon Bedrock Knowledge Bases retriever by calling the invoke method.

The next step involves invoking an AI agent with the supplied business contextual information and the SQL generation prompt. The prompt was inspired by a prompt in LangChain Hub. The following is a code snippet of the prompt:

You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
Relevant context:
{rag_context}
You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for the relevant columns given the question.
You have access to tools for interacting with the database.
- Only use the below tools. Only use the information returned by the below tools to construct your final answer.
- DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.
- To start querying for final answer you should ALWAYS look at the tables in the database to see what you can query. Do NOT skip this step.
- Then you should query the schema of the most relevant tables

The prompt sample is part of the initial instruction for the agent. The data schema is automatically inserted by the tools from the SQLDatabase Toolkit at a later step of this agentic workflow. The following steps occur after a user enters a question in the AI assistant UI:

  1. The question triggers a run of the LangGraph graph.
  2. The following processes happen in parallel:
    1. The graph fetches the database schema from Athena through SQLDatabase Toolkit.
    2. The graph passes the question to the Amazon Bedrock Knowledge Bases retriever and gets a list of relevant business information regarding the question.
  3. The graph invokes an LLM using Amazon Bedrock by passing the question, the conversation context, data schema, and business context information. The result is the generated SQL.
  4. The graph uses SQLDatabase Toolkit again to run the SQL through Athena and fetch the data output.
  5. The data output is passed into an LLM to generate the final response based on the initial question asked. Amazon Bedrock Guardrails is used as a safeguard to avoid inappropriate inputs and responses.
  6. The final response is returned to the user through the AI assistant UI.

The following diagram illustrates these steps.

Architecture diagram of the AI assistant with numbered steps

This implementation demonstrates how Parcel Perform transforms raw inquiries into actionable data for timely decision-making. Security is also implemented in multiple components. From a network perspective, the EKS pods are placed in private subnets in Amazon Virtual Private Cloud (Amazon VPC) to improve network security of the AI assistant application. This AI agent is placed behind a backend layer that requires authentication. For data security, sensitive data is masked at rest in the S3 bucket. Parcel Perform also limits the permissions of the AWS Identity and Access Management (IAM) role used to access the S3 bucket so it can only access certain tables.

In the following sections, we discuss Parcel Perform’s approach to building this data transformation solution.

From idea to production

Parcel Perform started with the idea of freeing their data team from manually serving the request from the business team, while also improving the timeliness of the data availability to support the business team’s decision-making.

With the help of the AWS Solutions Architect team, Parcel Perform completed a proof of concept using AWS services and a Jupyter notebook in Amazon SageMaker Studio. After an initial success, Parcel Perform integrated the solution with their orchestration tool of choice, LangGraph.

Before going into production, Parcel Perform conducted extensive testing to verify the results were consistent. They added LangSmith Tracing to log the AI agent’s steps and results to evaluate its performance.

The Parcel Perform team discovered challenges during their journey, which we discuss in the following section. They performed prompt engineering to address those challenges. Eventually, the AI agent was integrated into production to be used by the business team. Afterward, Parcel Perform collected user feedback internally and monitored logs from LangSmith Tracing to verify performance was maintained.

The challenges

This journey isn’t free from challenges. Firstly, some ecommerce merchants might have several records in the data lake under various names. For example, a merchant with the name “ABC” might have multiple records such, as “ABC Singapore Holdings Pte. Ltd.,” “ABC Demo Account,” “ABC Test Group,” and so on. For a question like “Was there any parcel shipment delay by ABC last week?”, the generated SQL has the element of WHERE merchant_name LIKE '%ABC%', which might result in ambiguity. During the proof of concept stage, this problem caused incorrect matching of the result.

For this challenge, Parcel Perform relies on careful prompt engineering to instruct the LLM to identify when the name was potentially ambiguous. The AI agent then calls Athena again to look for matching names. The LLM decides which merchant name to use based on multiple factors, including the significance in data volume contribution and the account status in the data lake. In the future, Parcel Perform intends to implement a more sophisticated technique by prompting the user to resolve the ambiguity.

The second challenge is about unrestricted questions that might yield expensive queries running across large amounts of data and resulting in longer query waiting time. Some of these questions might not have a LIMIT clause imposed in the query. To solve this, Parcel Perform instructs the LLM to add a LIMIT clause with a certain number of maximum results if the user doesn’t specify the intended number of results. In the future, Parcel Perform plans to use the query EXPLAIN results to identify heavy queries.

The third challenge is related to tracking usage and incurred cost of this particular solution. Having started multiple generative AI projects using Amazon Bedrock and sometimes with the same LLM ID, Parcel Perform must distinguish usage incurred by projects. Parcel Perform creates an inference profile for each project, associates the profile with tags, and includes that profile in each LLM call for that project. With this setup, Parcel Perform is able to segregate costs based on projects to improve cost visibility and monitoring.

The impact

To extract data, the business team clarifies details with the data team, makes a request, checks feasibility, and waits for bandwidth. This process lengthens when requirements come from customers or teams in different time zones, with each clarification adding 12–24 hours due to asynchronous communication. Simpler requests made early in the workday might complete within 24 hours, whereas more complex requests or those during busy periods can take 3–5 business days.

With the text-to-SQL AI agent, this process is dramatically streamlined—minimizing the back-and-forth communication for requirement clarification, removing the dependency on data team bandwidth, and automating result interpretation.

Parcel Perform’s measurements show that the text-to-SQL AI agent reduces the average time-to-insight by 99%, from 2.3 days to an average of 10 minutes, saving approximately 3,850 total hours of wait time per month across requesters while maintaining data accuracy.

Users can directly query the data without intermediaries, receiving results in minutes rather than days. Teams across time zones can now access insights any time of day, alleviating the frustrating “wait until Asia wakes up” or “catch EMEA before they leave” delays, leading to happier customers and faster problem-solving.

This transformation has profoundly impacted the data analytics team’s capacity and focus, freeing the data team for more strategic work and helping everyone make faster, more informed decisions. Before, the analysts spent approximately 25% of their working hours handling routine data extraction requests—equivalent to over 260 hours monthly across the team. Now, with basic and intermediate queries automated, this number has dropped to just 10%, freeing up nearly 160 hours each month for high-impact work. Analysts now focus on complex data analysis rather than spending time on basic data retrieval tasks.

Conclusion

Parcel Perform’s solution demonstrates how you can use generative AI to enhance productivity and customer experience. Parcel Perform has built a text-to-SQL AI agent that transforms a business team’s question into SQL that can fetch the actual data. This improves the timeliness of data availability for decision-making that involves customers. Furthermore, the data team can avoid the undifferentiated heavy lifting to focus on complex data analysis tasks.

This solution uses multiple AWS services like Amazon Bedrock and tools like LangGraph. You can start with a proof of concept and consult your AWS Solutions Architect or engage with AWS Partners. If you have questions, post them on AWS re:Post. You can also make the development more straightforward with the help of Amazon Q Developer. When you face challenges, you can iterate to find the solution, which might include prompt engineering or adding additional steps to your workflow.

Security is a top priority. Make sure your AI assistant has proper guardrails in place to protect against prompt threats, inappropriate topics, profanity, leaked data, and other security issues. You can integrate Amazon Bedrock Guardrails with your generative AI application through an API.To learn more, refer to the following resources:


About the authors

Yudho Ahmad Diponegoro profile pictureYudho Ahmad Diponegoro is a Senior Solutions Architect at AWS. Having been part of Amazon for 10+ years, he has had various roles from software development to solutions architecture. He helps startups in Singapore when it comes to architecting in the cloud. While he keeps his breadth of knowledge across technologies and industries, he focuses in AI and machine learning where he has been guiding various startups in ASEAN to adopt machine learning and generative AI at AWS.

Le Vy is the AI Team Lead at Parcel Perform, where she drives the development of AI applications and explores emerging AI research. She started her career in data analysis and deepened her focus on AI through a Master’s in Artificial Intelligence. Passionate about applying data and AI to solve real business problems, she also dedicates time to mentoring aspiring technologists and building a supportive community for youth in tech. Through her work, Vy actively challenges gender norms in the industry and champions lifelong learning as a key to innovation.

Loke Jun Kai is a GenAI/ML Specialist Solutions Architect in AWS, covering strategic customers across the ASEAN region. He works with customers ranging from Start-up to Enterprise to build cutting-edge use cases and scalable GenAI Platforms. His passion in the AI space, constant research and reading, have led to many innovative solutions built with concrete business outcomes. Outside of work, he enjoys a good game of tennis and chess.

Read More

Query Amazon Aurora PostgreSQL using Amazon Bedrock Knowledge Bases structured data

Query Amazon Aurora PostgreSQL using Amazon Bedrock Knowledge Bases structured data

Amazon Bedrock Knowledge Bases offers a fully managed Retrieval Augmented Generation (RAG) feature that connects large language models (LLMs) to internal data sources. This feature enhances foundation model (FM) outputs with contextual information from private data, making responses more relevant and accurate.

At AWS re:Invent 2024, we announced Amazon Bedrock Knowledge Bases support for natural language querying to retrieve structured data from Amazon Redshift and Amazon SageMaker Lakehouse. This feature provides a managed workflow for building generative AI applications that access and incorporate information from structured and unstructured data sources. Through natural language processing, Amazon Bedrock Knowledge Bases transforms natural language queries into SQL queries, so users can retrieve data directly from supported sources without understanding database structure or SQL syntax.

In this post, we discuss how to make your Amazon Aurora PostgreSQL-Compatible Edition data available for natural language querying through Amazon Bedrock Knowledge Bases while maintaining data freshness.

Structured data retrieval in Amazon Bedrock Knowledge Bases and Amazon Redshift Zero-ETL

Structured data retrieval in Amazon Bedrock Knowledge Bases enables natural language interactions with your database by converting user queries into SQL statements. When you connect a supported data source like Amazon Redshift, Amazon Bedrock Knowledge Bases analyzes your database schema, table relationships, query engine, and historical queries to understand the context and structure of your information. This understanding allows the service to generate accurate SQL queries from natural language questions.

At the time of writing, Amazon Bedrock Knowledge Bases supports structured data retrieval directly from Amazon Redshift and SageMaker Lakehouse. Although direct support for Aurora PostgreSQL-Compatible isn’t currently available, you can use the zero-ETL integration between Aurora PostgreSQL-Compatible and Amazon Redshift to make your data accessible to Amazon Bedrock Knowledge Bases structured data retrieval. Zero-ETL integration automatically replicates your Aurora PostgreSQL tables to Amazon Redshift in near real time, alleviating the need for complex extract, transform, and load (ETL) pipelines or data movement processes.

This architectural pattern is particularly valuable for organizations seeking to enable natural language querying of their structured application data stored in Amazon Aurora database tables. By combining zero-ETL integration with Amazon Bedrock Knowledge Bases, you can create powerful applications like AI assistants that use LLMs to provide natural language responses based on their operational data.

Solution overview

The following diagram illustrates the architecture we will implement to connect Aurora PostgreSQL-Compatible to Amazon Bedrock Knowledge Bases using zero-ETL.

Architecture Diagram

The workflow consists of the following steps:

  1. Data is stored in Aurora PostgreSQL-Compatible within the private subnet. We use a bastion host to connect securely to the database from the public subnet.
  2. Using zero-ETL integration, this data is made available in Amazon Redshift, also located in the private subnet.
  3. Amazon Bedrock Knowledge Bases uses Amazon Redshift as its structured data source.
  4. Users can interact with Amazon Bedrock Knowledge Bases using the AWS Management Console or an AWS SDK client, which sends natural language queries. These queries are processed by Amazon Bedrock Knowledge Bases to retrieve information stored in Amazon Redshift (sourced from Aurora).

Prerequisites

Make sure you’re logged in with a user role with access to create an Aurora database, run DDL (CREATE, ALTER, DROP, RENAME) and DML (SELECT, INSERT, UPDATE, DELETE) statements, create a Redshift database, set up zero-ETL integration, and create an Amazon Bedrock knowledge base.

Set up the Aurora PostgreSQL database

In this section, we walk through creating and configuring an Aurora PostgreSQL database with a sample schema for our demonstration. We create three interconnected tables: products, customers, and orders.

Provision the database

Let’s begin by setting up our database environment. Create a new Aurora PostgreSQL database cluster and launch an Amazon Elastic Compute Cloud (Amazon EC2) instance that will serve as our access point for managing the database. The EC2 instance will make it straightforward to create tables and manage data throughout this post.

The following screenshot shows the details of our database cluster and EC2 instance.

Aurora PostgreSQL cluster

For instructions to set up your database, refer to Creating and connecting to an Aurora PostgreSQL DB cluster.

Create the database schema

After you connect to your database using SSH on your EC2 instance (described in Creating and connecting to an Aurora PostgreSQL DB cluster), it’s time to create your data structure. We use the following DDL statements to create three tables:

-- Create Product table
CREATE TABLE product (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(100) NOT NULL,
    price DECIMAL(10, 2) NOT NULL
);

-- Create Customer table
CREATE TABLE customer (
    customer_id SERIAL PRIMARY KEY,
    customer_name VARCHAR(100) NOT NULL,
    pincode VARCHAR(10) NOT NULL
);

-- Create Orders table
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL,
    customer_id INTEGER NOT NULL,
    FOREIGN KEY (product_id) REFERENCES product(product_id),
    FOREIGN KEY (customer_id) REFERENCES customer(customer_id)
);

Populate the tables with data

After you create the tables, you can populate them with sample data. When inserting data into the orders table, remember to maintain referential integrity by verifying the following:

  • The product_id exists in the product table
  • The customer_id exists in the customer table

We use the following example code to populate the tables:

INSERT INTO product (product_id, product_name, price) VALUES (1, 'Smartphone X', 699.99);
INSERT INTO product (product_id, product_name, price) VALUES (2, 'Laptop Pro', 1299.99);
INSERT INTO product (product_id, product_name, price) VALUES (3, 'Wireless Earbuds', 129.99);
INSERT INTO customer (customer_id, customer_name, pincode) VALUES (1, 'John Doe', '12345');
INSERT INTO customer (customer_id, customer_name, pincode) VALUES (2, 'Jane Smith', '23456');
INSERT INTO customer (customer_id, customer_name, pincode) VALUES (3, 'Robert Johnson', '34567');
INSERT INTO orders (order_id, product_id, customer_id) VALUES (1, 1, 1);
INSERT INTO orders (order_id, product_id, customer_id) VALUES (2, 1, 2);
INSERT INTO orders (order_id, product_id, customer_id) VALUES (3, 2, 3);
INSERT INTO orders (order_id, product_id, customer_id) VALUES (4, 2, 1);
INSERT INTO orders (order_id, product_id, customer_id) VALUES (5, 3, 2);
INSERT INTO orders (order_id, product_id, customer_id) VALUES (6, 3, 3);

Make sure to maintain referential integrity when populating the orders table to avoid foreign key constraint violations.

You can also use similar examples to build your schema and populate data for this.

Set up the Redshift cluster and configure zero-ETL

Now that you have set up your Aurora PostgreSQL database, you can establish the zero-ETL integration with Amazon Redshift. This integration automatically syncs your data between Aurora PostgreSQL-Compatible and Amazon Redshift.

Set up Amazon Redshift

First, create an Amazon Redshift Serverless workgroup and namespace. For instructions, see Creating a data warehouse with Amazon Redshift Serverless.

Create a zero-ETL integration

The zero-ETL integration process involves two main steps:

  1. Create the zero-ETL integration from your Aurora PostgreSQL database to Redshift Serverless.
  2. After you establish the integration on the Aurora side, create the corresponding mapping database in Amazon Redshift. This step is crucial for facilitating proper data synchronization between the two services.

The following screenshot shows our zero-ETL integration details.

Zero ETL Integration

Verify the integration

After you complete the integration, you can verify its success through several checks.

Firstly, you can check the zero-ETL integration details in the Amazon Redshift console. You should see an Active status for your integration, along with source and destination information, as shown in the following screenshot.

Redshift Zero ETL

Additionally, you can use the Redshift Query Editor v2 to verify that your data has been successfully populated. A simple query like SELECT * FROM customer; should return the synchronized data from your Aurora PostgreSQL database, as shown in the following screenshot.

Amazon Redshift Query Editor

Set up the Amazon Bedrock knowledge base with structured data

The final step is to create an Amazon Bedrock knowledge base that will enable natural language querying of our data.

Create the Amazon Bedrock knowledge base

Create a new Amazon Bedrock knowledge base with the structured data option. For instructions, see Build a knowledge base by connecting to a structured data store. Then you must synchronize the query engine to enable data access.

Configure data access permissions

Before the sync process can succeed, you need to grant appropriate permissions to the Amazon Bedrock Knowledge Bases AWS Identity and Access Management (IAM) role. This involves executing GRANT SELECT commands for each table in your Redshift database.

Run the following command in Redshift Query Editor v2 for each table:GRANT SELECT ON <table_name> TO "IAMR:<KB Role name>";For example:GRANT SELECT ON customer TO "IAMR:AmazonBedrockExecutionRoleForKnowledgeBase_ej0f0";

For production setups, integrating the end-user identity into the data access flow requires identity federation. Refer to AWS documentation on structured database access for the role-based access model. For federating identities from web clients, Amazon Cognito or SAML federation with AWS Security Token Service (AWS STS) might be required depending on your architecture.

Verify the setup

After you complete the configuration, your knowledge base should show the following details:

  • Status as Available
  • Query engine successfully synced with Amazon Redshift
  • COMPLETE status for the database synchronization

You can now start querying your data using natural language.

Example natural language queries

Now that you have set up your Amazon Bedrock knowledge base, you can begin testing its capabilities by running natural language queries against your structured data. Amazon Bedrock Knowledge Bases structured data translates plain English questions into SQL and uses FMs to generate human-readable responses.

You can test your Amazon Bedrock knowledge base in two ways:

  • Amazon Bedrock console – On the Amazon Bedrock console, choose Knowledge Bases in the navigation pane, and select your configured knowledge base. Open the Test tab to interact with your data.
  • AWS SDK or API – Use the AWS SDK (for example, Python’s Boto3) or directly call the RetrieveAndGenerate API to send queries programmatically.

In this section, we illustrate the console experience. On the Amazon Bedrock console, you can interact with your Amazon Bedrock knowledge base in two modes:

  • Generate SQL queries – This option shows you the raw SQL query that gets generated from your natural language question

Amazon Bedrock Knowledge base

  • Retrieval and response generation – This provides a natural language response using an LLM

Retrieval and response generation

The following table contains some examples of queries and their respective SQL and model response generation.

Natural Language Query Generate SQL API Result Retrieval and Response Generation Model Used for Response Generation
How many customers do we have?
SELECT COUNT(DISTINCT customer_id) AS num_customers 
FROM public.customer;

We currently have 11 unique customers. Amazon Nova Lite
Which all customers have purchased the most products?
SELECT c. "customer_name", COUNT(o."product_id") 
AS "total_products_purchased" 
FROM public.customer c 
INNER JOIN public.orders o 
ON c. "customer_id" = o. "customer_id" 
GROUP BY c. "customer_name" 
ORDER BY "total_products_purchased" DESC;

Based on the data, the customers who have purchased
the most products are Charlie Davis, Alice Brown,
and John Doe, each having purchased 14 products.
Following closely are Jane Smith, Grace Lee, and Bob Johnson,
who have each purchased 13 products. Henry Taylor,
Frank Miller, and Eva Wilson have each purchased 12 products,
while Ivy Anderson has purchased 11 products.
Amazon Nova Lite
Who all have purchased more than one number of the most expensive product?
SELECT c.customer_name 
FROM public.customer c 
INNER JOIN public.orders o 
ON c.customer_id = o.customer_id 
INNER JOIN public.product p 
ON o.product_id = p.product_id 
WHERE p.price = (SELECT MAX(price) 
FROM public.product) 
GROUP BY c.customer_name 
HAVING COUNT(DISTINCT o.order_id);

The customers who have purchased more than one number of the
most expensive product are Grace Lee, Jane Smith, Alice Brown,
and Eva Wilson.
Amazon Nova Micro

Clean up

When you’re done using this solution, clean up the resources you created to avoid ongoing charges.

Conclusion

In this post, we demonstrated how to enable natural language querying of Aurora PostgreSQL data using Amazon Bedrock Knowledge Bases through zero-ETL integration with Amazon Redshift. We showed how to set up the database, configure zero-ETL integration, and establish the knowledge base connection for seamless data access. Although this solution provides an effective way to interact with your data using natural language, you should consider the additional storage costs in Amazon Redshift when implementing this architecture for your use case.

Please try out this solution for yourself and share your feedback in the comments.


About the authors

Girish B is a Senior Solutions Architect at AWS India Pvt Ltd based in Bengaluru. Girish works with many ISV customers to design and architect innovative solutions on AWS

Dani Mitchell is a Generative AI Specialist Solutions Architect at AWS. He is focused on helping accelerate enterprises across the world on their generative AI journeys with Amazon Bedrock

Read More

Configure fine-grained access to Amazon Bedrock models using Amazon SageMaker Unified Studio

Configure fine-grained access to Amazon Bedrock models using Amazon SageMaker Unified Studio

Enterprises adopting advanced AI solutions recognize that robust security and precise access control are essential for protecting valuable data, maintaining compliance, and preserving user trust. As organizations expand AI usage across teams and applications, they require granular permissions to safeguard sensitive information and manage who can access powerful models. Amazon SageMaker Unified Studio addresses these needs so organizations can configure fine-grained access policies, making sure that only authorized users can interact with foundation models (FMs) while supporting secure, collaborative innovation at scale.

Launched in 2025, SageMaker Unified Studio is a single data and AI development environment where you can find and access the data in your organization and act on it using the best tools across use cases. SageMaker Unified Studio brings together the functionality and tools from existing AWS analytics and AI/ML services, including Amazon EMR, AWS Glue, Amazon Athena, Amazon Redshift, Amazon Bedrock, and Amazon SageMaker AI.

Amazon Bedrock in SageMaker Unified Studio provides various options for discovering and experimenting with Amazon Bedrock models and applications. For example, you can use a chat playground to try a prompt with Anthropic’s Claude without having to write code. You can also create a generative AI application that uses an Amazon Bedrock model and features, such as a knowledge base or a guardrail. To learn more, refer to Amazon Bedrock in SageMaker Unified Studio.

In this post, we demonstrate how to use SageMaker Unified Studio and AWS Identity and Access Management (IAM) to establish a robust permission framework for Amazon Bedrock models. We show how administrators can precisely manage which users and teams have access to specific models within a secure, collaborative environment. We guide you through creating granular permissions to control model access, with code examples for common enterprise governance scenarios. By the end, you will understand how to tailor access to generative AI capabilities to meet your organization’s requirements—addressing a core challenge in enterprise AI adoption by enabling developer flexibility while maintaining strong security standards.

Solution overview

In SageMaker Unified Studio, a domain serves as the primary organizational structure, so you can oversee multiple AWS Regions, accounts, and workloads from a single interface. Each domain is assigned a unique URL and offers centralized control over studio configurations, user accounts, and network settings.

Inside each domain, projects facilitate streamlined collaboration. Projects can span different Regions or accounts within a Region, and their metadata includes details such as the associated Git repository, team members, and their permissions. Every account where a project has resources is assigned at least one project role, which determines the tools, compute resources, datasets, and artificial intelligence and machine learning (AI/ML) assets accessible to project members. To manage data access, you can adjust the IAM permissions tied to the project’s role. SageMaker Unified Studio uses several IAM roles. For a complete list, refer to Identity and access management for Amazon SageMaker Unified Studio.

There are two primary methods for users to interact with Amazon Bedrock models in SageMaker Unified Studio: the SageMaker Unified Studio playground and SageMaker Unified Studio projects.

In the SageMaker Unified Studio playground scenario, model consumption roles provide secure access to Amazon Bedrock FMs. You can choose between automatic role creation for individual models or configuring a single role for all models. The default AmazonSageMakerBedrockModelConsumptionRole comes with preconfigured permissions to consume Amazon Bedrock models, including invoking the Amazon Bedrock application inference profile created for a particular SageMaker Unified Studio domain. To fine-tune access control, you can add inline policies to these roles that explicitly allow or deny access to specific Amazon Bedrock models.

The following diagram illustrates this architecture.

Playground Access

The workflow consists of the following steps:

  1. Initial access path:
    1. The SageMakerDomain execution role connects to the SageMaker Unified Studio domain.
    2. The connection flows to the SageMaker user profile.
    3. The user profile accesses SageMaker Unified Studio.
  2. Restricted access path (top flow):
    1. The direct access attempt from SageMaker Unified Studio to Amazon Bedrock is denied.
    2. The IAM policy blocks access to FMs.
    3. Anthropic’s Claude 3.7 Sonnet access is denied (marked with X).
    4. Anthropic’s Claude 3.5 Haiku access is denied (marked with X).
  3. Permitted access path (bottom flow):
    1. SageMakerBedrockModelConsumptionRole is used.
    2. The appropriate IAM policy allows access (marked with a checkmark).
    3. Amazon Bedrock access is permitted.
    4. Anthropic’s Claude 3.7 Sonnet access is allowed (marked with checkmark).
    5. Anthropic’s Claude 3.5 Haiku access is allowed (marked with checkmark).
  4. Governance mechanism:
    1. IAM policies serve as the control point for model access.
    2. Different roles determine different levels of access permission.
    3. Access controls are explicitly defined for each FM.

In the SageMaker Unified Studio project scenario, SageMaker Unified Studio uses a model provisioning role to create an inference profile for an Amazon Bedrock model in a project. The inference profile is required for the project to interact with the model. You can either let SageMaker Unified Studio automatically create a unique model provisioning role, or you can provide a custom model provisioning role. The default AmazonSageMakerBedrockModelManagementRole has the AWS policy AmazonDataZoneBedrockModelManagementPolicy attached. You can restrict access to specific account IDs through custom trust policies. You can also attach inline policies and use the statement CreateApplicationInferenceProfileUsingFoundationModels to allow or deny access to specific Amazon Bedrock models in your project.

The following diagram illustrates this architecture.

Projects Access

The workflow consists of the following steps:

  1. Initial access path:
    1. The SageMakerDomain execution role connects to the SageMaker Unified Studio domain.
    2. The connection flows to the SageMaker user profile.
    3. The user profile accesses SageMaker Unified Studio.
  2. Restricted access path (top flow):
    1. The direct access attempt from SageMaker Unified Studio to Amazon Bedrock is denied.
    2. The IAM policy blocks access to FMs.
    3. Anthropic’s Claude 3.7 Sonnet access is denied (marked with X).
    4. Anthropic’s Claude 3.5 Haiku access is denied (marked with X).
  3. Permitted access path (bottom flow):
    1. SageMakerBedrockModelManagementRole is used.
    2. The appropriate IAM policy allows access (marked with a checkmark).
    3. Amazon Bedrock access is permitted.
    4. Anthropic’s Claude 3.7 Sonnet access is allowed (marked with checkmark).
    5. Anthropic’s Claude 3.5 Haiku access is allowed (marked with checkmark).
  4. Governance mechanism:
    1. IAM policies serve as the control point for model access.
    2. Different roles determine different levels of access permission.
    3. Access controls are explicitly defined for each FM.

By customizing the policies attached to these roles, you can control which actions are permitted or denied, thereby governing access to generative AI capabilities.

To use a specific model from Amazon Bedrock, SageMaker Unified Studio uses the model ID of the chosen model as part of the API calls. At the time of writing, SageMaker Unified Studio supports the following Amazon Bedrock models (the full list of current models can be found here), grouped by model provider:

  • Amazon:
    • Amazon Titan Text G1 – Premier: amazon.titan-text-premier-v1:0
    • Amazon Nova Pro: amazon.nova-pro-v1:0
    • Amazon Nova Lite: amazon.nova-lite-v1:0
    • Amazon Nova Canvas: amazon.nova-canvas-v1:0
  • Stability AI:
    • SDXL 1.0: stability.stable-diffusion-xl-v1
  • AI21 Labs:
    • Jamba-Instruct: ai21.jamba-instruct-v1:0
    • Jamba 1.5 Large: ai21.jamba-1-5-large-v1:0
    • Jamba 1.5 Mini: ai21.jamba-1-5-mini-v1:0
  • Anthropic:
    • Claude 3.7 Sonnet: anthropic.claude-3-7-sonnet-20250219-v1:0
  • Cohere:
    • Command R+: cohere.command-r-plus-v1:0
    • Command Light: cohere.command-light-text-v14
    • Embed Multilingual: cohere.embed-multilingual-v3
  • DeepSeek:
    • DeepSeek-R1: deepseek.r1-v1:0
  • Meta:
    • Llama 3.3 70B Instruct: meta.llama3-3-70b-instruct-v1:0
    • Llama 4 Scout 17B Instruct: meta.llama4-scout-17b-instruct-v1:0
    • Llama 4 Maverick 17B Instruct: meta.llama4-maverick-17b-instruct-v1:0
  • Mistral AI:
    • Mistral 7B Instruct: mistral.mistral-7b-instruct-v0:2
    • Pixtral Large (25.02): mistral.pixtral-large-2502-v1:0

Create a model consumption role for the playground scenario

In the following steps, you create an IAM role with a trust policy, add two inline policies, and attach them to the role.

Create the IAM role with a trust policy

Complete the following steps to create an IAM role with a trust policy:

  1. On the IAM console, in the navigation pane, choose Roles, then choose Create role.
  2. For Trusted entity type, select Custom trust policy.
  3. Delete the default policy in the editor and enter the following trust policy (replace account-id for the aws:SourceAccount field with your AWS account ID):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "datazone.amazonaws.com"
            },
            "Action": [
                "sts:AssumeRole",
                "sts:SetContext"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "[account-id]"
                }
            }
        }
    ]
}
  1. Choose Next.
  2. Skip the Add permissions page by choosing Next.
  3. Enter a name for the role (for example, DataZoneBedrock-Role) and an optional description.
  4. Choose Create role.

Add the first inline policy

Complete the following steps to add an inline policy:

  1. On the IAM console, open the newly created role details page.
  2. On the Permissions tab, choose Add permissions and then Create inline policy.
  3. On the JSON tab, delete the default policy and enter the first inline policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "InvokeDomainInferenceProfiles",
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": "arn:aws:bedrock:*:*:application-inference-profile/*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/AmazonDataZoneDomain": "${datazone:domainId}",
                    "aws:ResourceAccount": "${aws:PrincipalAccount}"
                },
                "Null": {
                    "aws:ResourceTag/AmazonDataZoneProject": "true"
                }
            }
        }
    ]
}
  1. Choose Review policy.
  2. Name the policy (for example, DataZoneDomainInferencePolicy) and choose Create policy.

Add the second inline policy

Complete the following steps to add another inline policy:

  1. On the role’s Permissions tab, choose Add permissions and then Create inline policy.
  2. On the JSON tab, delete the default policy and enter the second inline policy (replace account-id in the bedrock:InferenceProfileArn field with your account ID):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowInferenceProfileToInvokeFoundationModels",
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": [
                "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-haiku-20241022-v1:0",
                "arn:aws:bedrock:us-east-2::foundation-model/anthropic.claude-3-5-haiku-20241022-v1:0",
                "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-5-haiku-20241022-v1:0"
            ],
            "Condition": {
                "ArnLike": {
                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:*:[account-id]:application-inference-profile/*"
                }
            }
        }
    ]
}
  1. Choose Review policy.
  2. Name the policy (for example, BedrockFoundationModelAccessPolicy) and choose Create policy.

Explanation of the policies

In this section, we discuss the details of the policies.

Trust policy explanation

This trust policy defines who can assume the IAM role:

  • It allows the Amazon DataZone service (datazone.amazonaws.com) to assume this role
  • The service can perform sts:AssumeRole and sts:SetContext actions
  • A condition restricts this to only when the AWS account you specify is your AWS account

This makes sure that only Amazon DataZone from your specified account can assume this role.

First inline policy explanation

This policy controls access to Amazon Bedrock inference profiles:

  • It allows invoking Amazon Bedrock models (bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream), but only for resources that are application inference profiles (arn:aws:bedrock:::application-inference-profile/*)
  • It has three important conditions:
  • The profile must be tagged with AmazonDataZoneDomain matching the domain ID of the caller
  • The resource must be in the same AWS account as the principal making the request
  • The resource must not have an AmazonDataZoneProject tag set to "true"

This effectively limits access to only those inference profiles that belong to the same Amazon DataZone domain as the caller and are not associated with a specific project.

Second inline policy explanation

This policy controls which specific FMs can be accessed:

  • It allows the same Amazon Bedrock model invocation actions, but only for specific Anthropic Claude 3.5 Haiku models in three Regions:
    • us-east-1
    • us-east-2
    • us-west-2
  • It has a condition that the request must come through an inference profile from your account

Combined effect on the SageMaker Unified Studio domain generative AI playground

Together, these policies create a secure, controlled environment for using Amazon Bedrock models in the SageMaker Unified Studio domain through the following methods:

  • Limiting model access – Only the specified Anthropic Claude 3.5 Haiku model can be used, not other Amazon Bedrock models
  • Enforcing access through inference profiles – Models can only be accessed through properly configured application inference profiles
  • Maintaining domain isolation – Access is restricted to inference profiles tagged with the user’s Amazon DataZone domain
  • Helping to prevent cross-account access – Resources must be in the same account as the principal
  • Regional restrictions – Access the model is only allowed in three specific AWS Regions

This implementation follows the principle of least privilege by providing only the minimum permissions needed for the intended use case, while maintaining proper security boundaries between different domains and projects.

Create a model provisioning role for the project scenario

In this section, we walk through the steps to create an IAM role with a trust policy and add the required inline policy to make sure that the models are limited to the approved ones.

Create the IAM role with a trust policy

Complete the following steps to create an IAM role with a trust policy:

  1. On the IAM console, in the navigation pane, choose Roles, then choose Create role.
  2. For Trusted entity type, select Custom trust policy.
  3. Delete the default policy in the editor and enter the following trust policy (replace account-id for the aws:SourceAccount field with your account ID):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "datazone.amazonaws.com"
            },
            "Action": [
                "sts:AssumeRole",
                "sts:SetContext"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "[account-id]"
                }
            }
        }
    ]
}
  1. Choose Next.
  2. Skip the Add permissions page by choosing Next.
  3. Enter a name for the role (for example, SageMakerModelManagementRole) and an optional description, such as Role for managing Bedrock model access in SageMaker Unified Studio.
  4. Choose Create role.

Add the inline policy

Complete the following steps to add an inline policy:

  1. On the IAM console, open the details page of the newly created role.
  2. On the Permissions tab, choose Add permissions and then Create inline policy.
  3. On the JSON tab, delete the default policy and enter the following inline policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ManageApplicationInferenceProfile",
            "Effect": "Allow",
            "Action": [
                "bedrock:CreateInferenceProfile",
                "bedrock:TagResource"
            ],
            "Resource": [
                "arn:aws:bedrock:*:*:application-inference-profile/*"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:ResourceAccount": "${aws:PrincipalAccount}"
                },
                "ForAnyValue:StringEquals": {
                    "aws:TagKeys": [
                        "AmazonDataZoneProject"
                    ]
                },
                "Null": {
                    "aws:ResourceTag/AmazonDataZoneProject": "false",
                    "aws:RequestTag/AmazonDataZoneProject": "false"
                }
            }
        },
        {
            "Sid": "DeleteApplicationInferenceProfile",
            "Effect": "Allow",
            "Action": [
                "bedrock:DeleteInferenceProfile"
            ],
            "Resource": [
                "arn:aws:bedrock:*:*:application-inference-profile/*"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:ResourceAccount": "${aws:PrincipalAccount}"
                },
                "Null": {
                    "aws:ResourceTag/AmazonDataZoneProject": "false"
                }
            }
        },
        {
            "Sid": "CreateApplicationInferenceProfileUsingFoundationModels",
            "Effect": "Allow",
            "Action": [
                "bedrock:CreateInferenceProfile"
            ],
            "Resource": [
                "arn:aws:bedrock:*::foundation-model/anthropic.claude-3-5-haiku-20241022-v1:0"
            ]
        },
        {
            "Sid": "CreateApplicationInferenceProfileUsingBedrockModels",
            "Effect": "Allow",
            "Action": [
                "bedrock:CreateInferenceProfile"
            ],
            "Resource": [
                "arn:aws:bedrock:*:*:inference-profile/*"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:ResourceAccount": "${aws:PrincipalAccount}"
                }
            }
        }
    ]
}
  1. Choose Review policy.
  2. Name the policy (for example, BedrockModelManagementPolicy) and choose Create policy.

Explanation of the policies

In this section, we discuss the details of the policies.

Trust policy explanation

This trust policy defines who can assume the IAM role:

  • It allows the Amazon DataZone service to assume this role
  • The service can perform sts:AssumeRole and sts:SetContext actions
  • A condition restricts this to only when the source AWS account is your AWS account

This makes sure that only Amazon DataZone from your specific account can assume this role.

Inline policy explanation

This policy controls access to Amazon Bedrock inference profiles and models:

  • It allows creating and managing application inference profiles (bedrock:CreateInferenceProfile, bedrock:TagResource, bedrock:DeleteInferenceProfile)
  • It specifically permits creating inference profiles for only the anthropic.claude-3-5-haiku-20241022-v1:0 model
  • Access is controlled through several conditions:
    • Resources must be in the same AWS account as the principal making the request
    • Operations are restricted based on the AmazonDataZoneProject tag
    • Creating profiles requires proper tagging with AmazonDataZoneProject
    • Deleting profiles is only allowed for resources with the appropriate AmazonDataZoneProject tag

Combined effect on SageMaker Unified Studio domain project

Together, these policies create a secure, controlled environment for using Amazon Bedrock models in SageMaker Unified Studio domain projects through the following methods:

  • Limiting model access – Only the specified Anthropic Claude 3.5 Haiku model can be used
  • Enforcing proper tagging – Resources must be properly tagged with AmazonDataZoneProject identifiers
  • Maintaining account isolation – Resources must be in the same account as the principal
  • Implementing least privilege – Only specific actions (create, tag, delete) are permitted on inference profiles
  • Providing project-level isolation – Access is restricted to inference profiles tagged with appropriate project identifiers

This implementation follows the principle of least privilege by providing only the minimum permissions needed for project-specific use cases, while maintaining proper security boundaries between different projects and making sure that only approved FMs can be accessed.

Configure a SageMaker Unified Studio domain to use the roles

Complete the following steps to create a SageMaker Unified Studio domain and configure it to use the roles you created:

  1. On the SageMaker console, choose the appropriate Region.
  2. Choose Create a Unified Studio domain and then choose Quick setup.
  3. For Name, enter a meaningful domain name.
  4. Scroll down to Generative AI resources.
  5. Under Model provisioning role, choose the model management role created in the previous section.
  6. Under Model consumption role, select Use a single existing role for all models and choose the model consumption role created in the previous section.
  7. Complete the remaining steps according to your AWS IAM Identity Center configurations and create your domain.

Clean up

To avoid incurring future charges related to various services used in SageMaker Unified Studio, log out of the SageMaker Unified Studio domain and delete the domain in SageMaker Unified Studio.

Conclusion

In this post, we demonstrated how the SageMaker Unified Studio playground and SageMaker Unified Studio projects invoke large language models powered by Amazon Bedrock, and how enterprises can govern access to these models, whether you want to limit access to specific models or to every model from the service. You can combine the IAM policies shown in this post in the same IAM role to provide complete control. By following these guidelines, enterprises can make sure their use of generative AI models is both secure and aligned with organizational policies. This approach not only safeguards sensitive data but also empowers business analysts and data scientists to harness the full potential of AI within a controlled environment.

Now that your environment is configured with strong identity based policies, we suggest reading the following posts to learn how Amazon SageMaker Unified Studio enables you to securely innovate quickly, and at scale, with generative AI:


About the authors

VarunVarun Jasti is a Solutions Architect at Amazon Web Services, working with AWS Partners to design and scale artificial intelligence solutions for public sector use cases to meet compliance standards. With a background in Computer Science, his work covers broad range of ML use cases primarily focusing on LLM training/inferencing and computer vision. In his spare time, he loves playing tennis and swimming.

Saptarshi BanarjeeSaptarshi Banarjee serves as a Senior Solutions Architect at AWS, collaborating closely with AWS Partners to design and architect mission-critical solutions. With a specialization in generative AI, AI/ML, serverless architecture, Next-Gen Developer Experience tools and cloud-based solutions, Saptarshi is dedicated to enhancing performance, innovation, scalability, and cost-efficiency for AWS Partners within the cloud ecosystem.

JonJon Turdiev is a Senior Solutions Architect at Amazon Web Services, where he helps startup customers build well-architected products in the cloud. With over 20 years of experience creating innovative solutions in cybersecurity, AI/ML, healthcare, and Internet of Things (IoT), Jon brings deep technical expertise to his role. Previously, Jon founded Zehntec, a technology consulting company, and developed award-winning medical bedside terminals deployed in hospitals worldwide. Jon holds a Master’s degree in Computer Science and shares his knowledge through webinars, workshops, and as a judge at hackathons.

LijanLijan Kuniyil is a Senior Technical Account Manager at AWS. Lijan enjoys helping AWS enterprise customers build highly reliable and cost-effective systems with operational excellence. Lijan has over 25 years of experience in developing solutions for financial, healthcare and consulting companies.

Read More

Improve conversational AI response times for enterprise applications with the Amazon Bedrock streaming API and AWS AppSync

Improve conversational AI response times for enterprise applications with the Amazon Bedrock streaming API and AWS AppSync

Many enterprises are using large language models (LLMs) in Amazon Bedrock to gain insights from their internal data sources. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Organizations implementing conversational AI systems often face a common challenge: although their APIs can quickly find answers to targeted questions, more complex queries requiring reasoning-actioning (ReAct) logic can take substantial time to process, negatively impacting user experience. This issue is particularly pronounced in regulated industries where security requirements add additional complexity. For instance, a global financial services organization with over $1.5 trillion in assets under management encountered this exact challenge. Despite successfully implementing a conversational AI system that integrated with multiple LLMs and data sources, they needed a solution that could maintain their rigorous security protocols—including AWS services operating within virtual private cloud (VPC) environments and enterprise OAuth integration—while improving response times for complex queries.

AWS AppSync is a fully managed service that enables developers to build serverless GraphQL APIs with real-time capabilities. This post demonstrates how to combine AWS AppSync subscriptions with Amazon Bedrock streaming endpoints to deliver LLM responses incrementally. We provide an enterprise-grade implementation blueprint that helps organizations in regulated industries maintain security compliance while optimizing user experience through immediate real-time response delivery.

Solution overview

The solution discussed in this post uses AWS AppSync to start the asynchronous conversational workflow. An AWS Lambda function does the heavy lifting of interacting with the Amazon Bedrock streaming API. As the LLM produces tokens, they are streamed to the frontend using AWS AppSync mutations and subscriptions.

A reference implementation of the Lambda function and AWS AppSync API is provided in the sample code in this post. The following diagram illustrates the reference architecture. It provides a high-level overview of how the various AWS services are integrated to achieve the desired outcome.

Solution Architecture

Let’s traverse how a user’s request is handled in the solution, and how the user receives real-time responses from an LLM in Amazon Bedrock:

  1. When the user loads the UI application, the application subscribes to the GraphQL subscription onSendMesssage(), which returns whether the WebSocket connection was successful or not.
  2. After the user enters a query, it invokes a GraphQL query (getLLMResponse) and triggers the Data Source Lambda function.
  3. The Data Source Lambda function publishes an event to the Amazon Simple Notification Service (Amazon SNS) topic, and a 201 message is sent to the user, completing the synchronous flow.

These steps are better illustrated in the following sequence diagram.

Sequence Diagram 1

  1. The Orchestrator Lambda function gets triggered by a published SNS event and initiates the stream with the Amazon Bedrock API call InvokeModelWithResponseStream.
  2. Amazon Bedrock receives the user query, initiates the stream, and starts sending stream tokens back to the Lambda function.
  3. When the Orchestrator Lambda function receives a stream token from Amazon Bedrock, the function invokes the GraphQL mutation sendMessage.
  4. The mutation triggers the onSendMessage subscription containing the LLM partial response, and the UI prints those stream tokens as it receives it.

The following diagram illustrates these steps in more detail.

Sequence Diagram 2

In the following sections, we discuss the components that make up the solution in more detail.

Data and API design

The AppSync API GraphQL schema consists of query, subscription, and mutation operations.

The following code is the query operation:

input GetLlmResponseInput {
	sessionId: String!
	message: String!
	locale: String!
}
type Query {
	getLlmResponse(input: GetLlmResponseInput!): GetLlmResponse
		@aws_api_key
}

The query operation, getLLMResponse, is synchronous and accepts sessionId, locale, and user-provided message.

The frontend must send a unique sessionId; this session ID uniquely identifies the user’s chat session. The session ID doesn’t change for the duration of active conversation. For example, if the user reloads the frontend, a new sessionId is generated and sent to the query operation.

The frontend must also send locale, which indicates the user’s preferred language. For a list of supported locales, see Languages and locales supported by Amazon Lex V2. For example, we use en_US for North American English.

Finally, the user’s message (or query) is set in the message attribute. The value of the message attribute is passed to the LLM for analysis.

The following code is the subscription operation:

type Subscription {
	onSendMessage(sessionId: String!): SendMessageResponse
		@aws_subscribe(mutations: ["sendMessage"])
        @aws_api_key
}

The AWS AppSync subscription operation, onSendMessage, accepts sessionId as a parameter. The frontend calls the onSendMessage subscription operation to subscribe to a WebSocket connection using sessionId. This allows the frontend to receive messages from the AWS AppSync API whenever a mutation operation successfully executes for the given sessionId.

The following code is the mutation operation:

input SendMessageInput {
	sessionId: String!
	message: String!
	locale: String!
}
type Mutation {
	sendMessage(input: SendMessageInput!): SendMessageResponse
		@aws_api_key
        @aws_iam
}

The mutation operation, sendMessage, accepts a payload of type SendMessageInput. The caller must provide all required attributes in the SendMessageInput type, indicated by the exclamation point in the GraphQL schema excerpt, to successfully send a message to the frontend using the mutation operation.

The Orchestrator Lambda function calls the sendMessage mutation to send partially received LLM tokens to the frontend. We discuss the Orchestrator Lambda function in more detail later in this post.

AWS AppSync Data Source Lambda function

AWS AppSync invokes the AWS AppSync Data Source Lambda function when the frontend calls the GraphQL query operation, getLLMResponse. The GraphQL query is a synchronous operation.

The implementation of the AWS AppSync Data Source Lambda function is in the following GitHub repo, called bedrock-appsync-ds-lambda. This Lambda function extracts the user’s message from the incoming GraphQL query operation and sends the value to the SNS topic. The Lambda function then returns success status code to the caller indicating that the message is submitted to the backend for processing.

AWS AppSync Orchestrator Lambda function

The AWS AppSync Orchestrator Lambda function runs whenever an event is published to the SNS topic. This function initiates the Amazon Bedrock streaming API using the converse_stream Boto3 API call.

The following code snippet shows how the Orchestrator Lambda function receives the SNS event, processes it, and then calls the Boto3 API:

brt = boto3.client(service_name='bedrock-runtime', region_name="us-west-2")
messages = []
message = {
    "role": "user",
    "content": [{"text": parsed_event["message"]}]
}
messages.append(message)
response = brt.converse_stream(
    modelId=model_id,
    messages=messages
)

The code first instantiates the Boto3 client using the bedrock-runtime service name. The Lambda function receives the SNS event and parses it using the Python JSON library. The parsed contents are stored in the sns_event dictionary. The code creates a Amazon Bedrock Messages API style prompt with role and content attributes:

message = {
    "role": "user",
    "content": [{"text": parsed_event["message"]}]
}

The content attribute’s value comes from the sns_event["message"] attribute in the SNS event. Refer to the converse_stream Boto3 API documentation for list of role values.

The converse_stream API accepts modelId and messages parameters. The value of modelId comes from an environment variable set on the Lambda function. The messages parameter is of type dictionary, and it must only contain Amazon Bedrock Messages API style prompts.

When the converse_stream API successfully runs, it returns an object that the Lambda code further analyzes to send partial tokens to the frontend:

stream = response.get('body')
if stream:
    self.appsync = AppSync(locale="en_US", session_id=session_id)
    self.appsync.invoke_mutation(DEFAULT_STREAM_START_TOKEN)
    event_count = 0
    buffer = ""
    for event in stream:
        if event:
            if list(event)[0] == "contentBlockDelta":
                event_count += 1
                buffer += event["contentBlockDelta"]["delta"]["text"]
            if event_count > 5:
                self.appsync.invoke_mutation(buffer)
                event_count = 0
                buffer = ""
        if len(buffer) != 0:
            self.appsync.invoke_mutation(buffer)

As the LLM generates a token in response to the prompt it received, Lambda first sends DEFAULT_STREAM_START_TOKEN to the frontend using the AWS AppSync mutation operation. This token is a mechanism to alert the frontend to start rendering tokens. As the Lambda function receives chunks from the converse_stream API, it calls the AWS AppSync mutation operation, sending a partial token to the frontend to render.

To improve the user experience and reduce network overhead, the Lambda function doesn’t invoke the AWS AppSync mutation operation for every chunk it receives from the Amazon Bedrock converse_stream API. Instead, the Lambda code buffers partial tokens and invokes the AWS AppSync mutation operation after receiving five chunks. This avoids the overhead of AWS AppSync network calls, thereby reducing latency and improving the user experience.

After the Lambda function has finished sending the tokens, it sends DEFAULT_STREAM_END_TOKEN:

self.appsync.invoke_mutation(DEFAULT_STREAM_END_TOKEN)This token alerts the frontend that LLM streaming is complete.

For more details, refer to the GitHub repo. It contains a reference implementation of the Orchestrator Lambda function called bedrock-orchestrator-lambda.

Prerequisites

To deploy the solution, you must have the Terraform CLI installed in your environment. Complete all the steps in the Prerequisites section in the accompanying GitHub documentation.

Deploy the solution

Complete the following steps to deploy the solution:

  1. Open a command line terminal window.
  2. Change to the deployment folder.
  3. Edit the sample.tfvars file. Replace the variable values to match your AWS environment.
region = "us-west-2"
lambda_s3_source_bucket_name = "YOUR_DEPLOYMENT_BUCKET"
lambda_s3_source_bucket_key  = "PREFIX_WITHIN_THE_BUCKET"
  1. Run the following commands to deploy the solution:
$ terraform init
$ terraform apply -var-file=”sample.tfvars”

Detailed deployment steps are in the Deploy the solution section in the accompanying GitHub repository.

Test the solution

To test the solution, use the provided sample web UI and run it inside VS Code. For more information, refer to accompanying README documentation.

Clean up

Use the following code to clean your AWS environment from the resources deployed in the previous section. You must use the same sample.tfvars that you used to deploy the solution.

$ terraform destroy -var-file=”sample.tfvars”

Conclusion

This post demonstrated how integrating an Amazon Bedrock streaming API with AWS AppSync subscriptions significantly enhances AI assistant responsiveness and user satisfaction. By implementing this streaming approach, the global financial services organization reduced initial response times for complex queries by approximately 75%—from 10 seconds to just 2–3 seconds—empowering users to view responses as they’re generated rather than waiting for complete answers. The business benefits are clear: reduced abandonment rates, improved user engagement, and a more responsive AI experience. Organizations can quickly implement this solution using the provided Lambda and Terraform code, quickly bringing these improvements to their own environments.

For even greater flexibility, AWS AppSync Events offers an alternative implementation pattern that can further enhance real-time capabilities using a fully managed WebSocket API. By addressing the fundamental tension between comprehensive AI responses and speed, this streaming approach enables organizations to maintain high-quality interactions while delivering the responsive experience modern users expect.


About the authors

Salman Moghal, a Principal Consultant at AWS Professional Services Canada, specializes in crafting secure generative AI solutions for enterprises. With extensive experience in full-stack development, he excels in transforming complex technical challenges into practical business outcomes across banking, finance, and insurance sectors. In his downtime, he enjoys racquet sports and practicing Funakoshi Genshin’s teachings at his martial arts dojo.

Philippe Duplessis-Guindon is a cloud consultant at AWS, where he has worked on a wide range of generative AI projects. He has touched on most aspects of these projects, from infrastructure and DevOps to software development and AI/ML. After earning his bachelor’s degree in software engineering and a master’s in computer vision and machine learning from Polytechnique Montreal, Philippe joined AWS to put his expertise to work for customers. When he’s not at work, you’re likely to find Philippe outdoors—either rock climbing or going for a run.

Read More

Scale generative AI use cases, Part 1: Multi-tenant hub and spoke architecture using AWS Transit Gateway

Scale generative AI use cases, Part 1: Multi-tenant hub and spoke architecture using AWS Transit Gateway

Generative AI continues to reshape how businesses approach innovation and problem-solving. Customers are moving from experimentation to scaling generative AI use cases across their organizations, with more businesses fully integrating these technologies into their core processes. This evolution spans across lines of business (LOBs), teams, and software as a service (SaaS) providers. Although many AWS customers typically started with a single AWS account for running generative AI proof of concept use cases, the growing adoption and transition to production environments have introduced new challenges.

These challenges include effectively managing and scaling implementations, as well as abstracting and reusing common concerns such as multi-tenancy, isolation, authentication, authorization, secure networking, rate limiting, and caching. To address these challenges effectively, a multi-account architecture proves beneficial, particularly for SaaS providers serving multiple enterprise customers, large enterprises with distinct divisions, and organizations with strict compliance requirements. This multi-account approach helps maintain a well-architected system by providing better organization, security, and scalability for your AWS environment. It also enables you to more efficiently manage these common concerns across your expanding generative AI implementations.

In this two-part series, we discuss a hub and spoke architecture pattern for building a multi-tenant and multi-account architecture. This pattern supports abstractions for shared services across use cases and teams, helping create secure, scalable, and reliable generative AI systems. In Part 1, we present a centralized hub for generative AI service abstractions and tenant-specific spokes, using AWS Transit Gateway for cross-account interoperability. The hub account serves as the entry point for end-user requests, centralizing shared functions such as authentication, authorization, model access, and routing decisions. This approach alleviates the need to implement these functions separately in each spoke account. Where applicable, we use virtual private cloud (VPC) endpoints for accessing AWS services.

In Part 2, we discuss a variation of this architecture using AWS PrivateLink to securely share the centralized endpoint in the hub account to teams within your organization or with external partners.

The focus in both posts is on centralizing authentication, authorization, model access, and multi-account secure networking for onboarding and scaling generative AI use cases with Amazon Bedrock. We don’t discuss other system capabilities such as prompt catalog, prompt caching, versioning, model registry, and cost. However, those could be extensions of this architecture.

Solution overview

Our solution implements a hub and spoke pattern that provides a secure, scalable system for managing generative AI implementations across multiple accounts. At its core, the architecture consists of a centralized hub account that serves as the entry point for requests, complemented by spoke accounts that contain tenant-specific resources. The following diagram illustrates this architecture.

Architecture Diagram

The hub account serves as the centralized account that provides common services across tenants and serves as the entry point for end-user requests. It centralizes shared functions such as authentication, authorization, and routing decisions, alleviating the need to implement these functions separately for each tenant. The hub account is operated and maintained by a core engineering team.

The hub infrastructure includes public and private VPCs, an internet-facing Application Load Balancer (ALB), Amazon Cognito for authentication, and necessary VPC endpoints for AWS services.

The spoke accounts contain tenant-specific resources, such as AWS Identity and Access Management (IAM) role permissions and Amazon Bedrock resources. Spoke accounts can be managed by either the core engineering team or the tenant, depending on organizational needs.

Each spoke account maintains its own private VPC, VPC interface endpoints for Amazon Bedrock, specific IAM roles and permissions, and account-level controls. These components are connected through Transit Gateway, which provides secure cross-account networking and manages traffic flow between hub and spoke VPCs. The flow of requests through the system as shown in the preceding architecture includes the following steps:

  1. A user (representing Tenant 1, 2, or N) accesses the client application.
  2. The client application in the hub account’s public subnet authenticates the user and receives an ID/JWT token. In our example, we use an Amazon Cognito user pool as the identity provider (IdP).
  3. The client application uses custom attributes in the JWT token to determine the corresponding route in the ALB. The ALB, based on the context path, routes the request to the tenant’s AWS Lambda function target group.
  4. The tenant-specific Lambda function in the hub account’s private subnet is invoked.
  5. The function assumes a cross-account role in the tenant’s account. The function invokes Amazon Bedrock in the spoke account by referring to the regional DNS name of the Amazon Bedrock VPCE. The model is invoked and the result is sent back to the user.

This architecture makes sure that requests flow through a central entry point while maintaining tenant isolation. By invoking Amazon Bedrock in the spoke account, each request inherits that account’s limits, access control, cost assignments, service control policies (SCPs), and other account-level controls.

The sample code for this solution is separated into two sections. The first section shows the solution for a single hub and spoke account. The second section extends the solution by deploying another spoke account. Detailed instructions for each step are provided in the repository README. In the following sections, we provide an outline of the deployment steps.

Prerequisites

We assume you are familiar with the fundamentals of AWS networking, including Amazon Virtual Private Cloud (Amazon VPC) and VPC constructs like route tables and VPC interconnectivity options. We assume you are also familiar with multi-tenant architectures and their core principles of serving multiple tenants from a shared infrastructure while maintaining isolation.

To implement the solution, you must have the following prerequisites:

  • Hub and spoke accounts (required):
    • Two AWS accounts: one hub account and one spoke account
    • Access to the amazon.titan-text-lite-v1 model in the spoke account
  • Additional spoke account (optional):
    • A third AWS account (spoke account for a second tenant)
    • Access to the anthropic.claude-3-haiku-20240307-v1:0 model in the second spoke account

Design considerations

The implementation of this architecture involves several important design choices that affect how the solution operates, scales, and can be maintained. In this section, we explore these considerations across different components, explaining the rationale behind each choice and potential alternatives where applicable.

Lambda functions

In our design, we have the ALB target group as Lambda functions running the hub account instead of the spoke account. This allows for centralized management of business logic and centralized logging and monitoring. As the architecture evolves to include shared functionality such as prompt caching, semantic routing, or using large language model (LLM) proxies (middleware services that provide unified access to multiple models while handling, rate limiting, and request routing, as discussed in Part 2), implementing these features in the hub account provides consistency across tenants. We chose Lambda functions to implement the token validation and routing logic, but you can use other compute options such as Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS) depending on your organizations’ preferences.

We use 1-to-1 mapping for Lambda functions to each tenant. Even though the current logic in each function is similar, having a dedicated function for each tenant can minimize noisy neighbor issues and tenant tier-specific configurations such as memory and concurrency.

VPC endpoints

In this solution, we use dedicated Amazon Bedrock runtime VPC endpoints in the spoke accounts. Dedicated VPC endpoints for each spoke account are suited for organizations where operators of the spoke account manage the tenant features, such as allowing access to models, setting up knowledge bases, and guardrails. Depending on your organization’s policies, a different variation of this architecture can be achieved by using a centralized Amazon Bedrock runtime VPC in the hub account (as shown in Part 2). Centralized VPC endpoints are suited for organizations where a central engineering team manages the features for the tenants.

Other factors such as costs, access control, and endpoint quotas need to be considered when choosing a centralized or dedicated approach for the location of the Amazon Bedrock VPC endpoints. VPC endpoint policies with the centralized approach might run into the 20,480-character limit as the number of tenants grows. There are hourly fees for VPC endpoints and transit gateway attachments provisioned regardless of usage. If VPC endpoints are provisioned in spoke accounts, each tenant will incur additional hourly fees.

Client application

For demonstration purposes, the client application in this solution is deployed in the public subnet in the hub VPC. The application can be deployed in an account outside of the either the hub or spoke VPCs, or deployed at the edge as a single-page application using Amazon CloudFront and Amazon Simple Storage Service (Amazon S3).

Tenancy

Enterprises use various tenancy models when scaling generative AI, each with distinct advantages and disadvantages. Our solution implements a silo model, assigning each tenant to a dedicated spoke account. For smaller organizations with fewer tenants and less stringent isolation requirements, an alternative approach using a pooled model (multiple tenants per spoke account) might be more appropriate unless they plan to scale significantly in the future or have specific compliance requirements. For more information on multi-tenancy design, see Let’s Architect! Designing architectures for multi-tenancy. Cell-based architectures for multi-tenant applications can provide benefits such as fault isolation and scaling. See Reducing the Scope of Impact with Cell-Based Architecture for more information.

Frontend gateway

In this solution, we chose ALB as the entry point for requests. ALB offers several advantages for our generative AI use case:

  • Long-running connections – ALB supports connections up to 4,000 seconds, which is beneficial for LLM responses that might take longer than 30 seconds to complete
  • Scalability – ALB can handle a high volume of concurrent connections, making it suitable for enterprise-scale deployments
  • Integration with AWS WAF – ALB seamlessly integrates with AWS WAF, providing enhanced security and protection against common web exploits

Amazon API Gateway is an alternative option when API versioning, usage plans, or granular API management capabilities are required, and when expected message sizes and response times align with its quotas. AWS AppSync is another option suitable when exposing the LLMs through a GraphQL interface.

Choose the gateway that best serves your customers. ALB handles high-volume, long-running connections efficiently. API Gateway provides comprehensive REST API management. AWS AppSync delivers real-time GraphQL capabilities. Evaluate each based on your application’s response time requirements, API needs, scale demands, and specific use case.

Although this post demonstrates connectivity using HTTP for simplicity, this is not recommended for production use. Production deployments should always implement HTTPS with proper SSL/TLS certificates to maintain secure communication.

IP addressing

The AWS CloudFormation template to deploy solution resources uses example CIDRs. When deploying this architecture in a second spoke account, use unique IP addresses that don’t overlap with your existing environments. Transit Gateway operates at Layer 3 and requires distinct IP spaces to route traffic between VPCs.

Deploy a hub and spoke account

In this section, we set up the local AWS Command Line Interface (AWS CLI) environment to deploy this solution in two AWS accounts. Detailed instructions are provided in the repository README.

  1. Deploy a CloudFormation stack in the hub account, and another stack in the spoke account.
  2. Configure connectivity between the hub and spoke VPCs using Transit Gateway attachments.
  3. Create an Amazon Cognito user with tenant1 as the value for a custom user attribute, tenant_id.
  4. Create an item in an Amazon DynamoDB table that maps the tenant ID to model access and routing information specific to a tenant, tenant1 in our case.

The following screenshots show the custom attribute value tenant1 for the user, and the item in the DynamoDB table that maps spoke account details for tenant1.

Tenant user attributes

Tenant mappings

Validate connectivity

In this section, we validate connectivity from a test application in the hub account to the Amazon Bedrock model in the spoke account. We do so by sending a curl request from an EC2 instance (representing our client application) to the ALB. Both the EC2 instance and the ALB are located in the public subnet of the hub account. The request and response are then routed through Transit Gateway attachments between the hub and spoke VPCs. The following screenshot shows the execution of a utility script on your local workstation that authenticates a user and exports the necessary variables. These variables will be used to construct the curl request on the EC2 instance.

Tenant user attributes

The following screenshot shows the curl request being executed from the EC2 instance to the ALB. The response confirms that the request was successfully processed and served by the amazon.titan-text-lite-v1 model, which is the model mapped to this user (tenant1). The model is hosted in the spoke account.

Tenant1 validation

Deploy a second spoke account

In this section, we extend the deployment to include a second spoke account for an additional tenant. We validate the multi-tenant connectivity by sending another curl request from the same EC2 instance to the ALB in the hub account. Detailed instructions are provided in the repository README.

The following screenshot shows the response to this request, demonstrating that the system correctly identifies and routes requests based on tenant information. In this case, the user’s tenant_id attribute value is tenant2, and the request is successfully routed to the anthropic.claude-3-haiku-20240307-v1:0 model, which is mapped to tenant2 in the second spoke account.

Tenant2 validation

Clean up

To clean up your resources, complete the following steps:

  1. If you created the optional resources for a second spoke account, delete them:
    1. Change the directory to genai-secure-patterns/hub-spoke-transit-gateway/scripts/optional
    2. Run the cleanup script ./cleanupOptionalStack.sh.
  2. Clean up the main stack:
    1. Change the directory to genai-secure-patterns/hub-spoke-transit-gateway/scripts/
    2. Run the cleanup script ./cleanup.sh.

Conclusion

As organizations increasingly adopt and scale generative AI use cases across different teams and LOBs, there is a growing need for secure, scalable, and reliable multi-tenant architectures. This two-part series addresses this need by providing guidance on implementing a hub and spoke architecture pattern. By adopting such well-architected practices from the outset, you can build scalable and robust solutions that unlock the full potential of generative AI across your organization.

In this post, we covered how to set up a centralized hub account hosting shared services like authentication, authorization, and networking using Transit Gateway. We also demonstrated how to configure spoke accounts to host tenant-specific resources like Amazon Bedrock. Try out the provided code samples to see this architecture in action.

Part 2 will explore an alternative implementation using PrivateLink to interconnect the VPCs in the hub and spoke accounts.


About the Authors

Nikhil Penmetsa is a Senior Solutions Architect at AWS. He helps organizations understand best practices around advanced cloud-based solutions. He is passionate about diving deep with customers to create solutions that are cost-effective, secure, and performant. Away from the office, you can often find him putting in miles on his road bike or hitting the open road on his motorbike.

Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure and scalable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides his motorcycle and walks with his 3-year-old Sheepadoodle!

Read More

Accelerate AI development with Amazon Bedrock API keys

Accelerate AI development with Amazon Bedrock API keys

Today, we’re excited to announce a significant improvement to the developer experience of Amazon Bedrock: API keys. API keys provide quick access to the Amazon Bedrock APIs, streamlining the authentication process so that developers can focus on building rather than configuration.

CamelAI is an open-source, modular framework for building intelligent multi-agent systems for data generation, world simulation, and task automation.

“As a startup with limited resources, streamlined customer onboarding is critical to our success. The Amazon Bedrock API keys enable us to onboard enterprise customers in minutes rather than hours. With Bedrock, our customers can quickly provision access to leading AI models and seamlessly integrate them into CamelAI,”

said Miguel Salinas, CTO, CamelAI.

In this post, explore how API keys work and how you can start using them today.

API key authentication

Amazon Bedrock now provides API key access to streamline integration with tools and frameworks that expect API key-based authentication. The Amazon Bedrock and Amazon Bedrock runtime SDKs support API key authentication for methods including on-demand inference, provisioned throughput inference, model fine-tuning, distillation, and evaluation.

The diagram compares the default authentication process to Amazon Bedrock (in orange) with the API keys approach (in blue). In the default process, you must create an identity in AWS IAM Identity Center or IAM, attach IAM policies to provide permissions to perform API operations, and generate credentials, which you can then use to make API calls. The grey boxes in the diagram highlight the steps that Amazon Bedrock now streamlines when generating an API key. Developers can now authenticate and access Amazon Bedrock APIs with minimal setup overhead.

You can generate API keys in the Amazon Bedrock console, choosing between two types.

With long-term API keys, you can set expiration times ranging from 1 day to no expiration. These keys are associated with an IAM user that Amazon Bedrock automatically creates for you. The system attaches the AmazonBedrockLimitedAccess managed policy to this IAM user, and you can then modify permissions as needed through the IAM service. We recommend using long-term keys primarily for exploration of Amazon Bedrock.

Short-term API keys use the IAM permissions from your current IAM principal and expire when your account’s session ends or can last up to 12 hours. Short-term API keys use AWS Signature Version 4 for authentication. For continuous application use, you can implement API key refreshing with a script as shown in this example. We recommend that you use short-term API keys for setups that require a higher level of security.

Making Your First API Call

Once you have access to foundation models, getting started with Amazon Bedrock API key is straightforward. Here’s how to make your first API call using the AWS SDK for Python (Boto3 SDK) and API keys:

Generate an API key

To generate an API key, follow these steps:

  1. Sign in to the AWS Management Console and open the Amazon Bedrock console
  2. In the left navigation panel, select API keys
  3. Choose either Generate short-term API key or Generate long-term API key
  4. For long-term keys, set your desired expiration time and optionally configure advanced permissions
  5. Choose Generate and copy your API key

Set Your API Key as Environment Variable

You can set your API key as an environment variable so that it’s automatically recognized when you make API requests:

# To set the API key as an environment variable, you can open a terminal and run the following command:
export AWS_BEARER_TOKEN_BEDROCK=${api-key}

The Boto3 SDK automatically detects your environment variable when you create an Amazon Bedrock client.

Make Your First API Call

You can now make API calls to Amazon Bedrock in multiple ways:

  1. Using curl
    curl -X POST "https://bedrock-runtime.us-east-1.amazonaws.com/model/us.anthropic.claude-3-5-haiku-20241022-v1:0/converse" 
      -H "Content-Type: application/json" 
      -H "Authorization: Bearer $AWS_BEARER_TOKEN_BEDROCK" 
      -d '{
        "messages": [
            {
                "role": "user",
                "content": [{"text": "Hello"}]
            }
        ]
      }'

  2. Using the Amazon Bedrock SDK:
    import boto3
    
    # Create an Amazon Bedrock client
    client = boto3.client(
        service_name="bedrock-runtime",
        region_name="us-east-1"     # If you've configured a default region, you can omit this line
    ) 
    
    # Define the model and message
    model_id = "us.anthropic.claude-3-5-haiku-20241022-v1:0"
    messages = [{"role": "user", "content": [{"text": "Hello"}]}]
       
    response = client.converse(
        modelId=model_id,
        messages=messages,
    )
    
    # Print the response
    print(response['output']['message']['content'][0]['text'])

  3. You can also use native libraries like Python Requests:
    import requests
    import os
    
    url = "https://bedrock-runtime.us-east-1.amazonaws.com/model/us.anthropic.claude-3-5-haiku-20241022-v1:0/converse"
    
    payload = {
        "messages": [
            {
                "role": "user",
                "content": [{"text": "Hello"}]
            }
        ]
    }
    
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {os.environ['AWS_BEARER_TOKEN_BEDROCK']}"
    }
    
    response = requests.request("POST", url, json=payload, headers=headers)
    
    print(response.text)

Bridging developer experience and enterprise security requirements

Enterprise administrators can now streamline their user onboarding to Amazon Bedrock foundation models. With setups that require a higher level of security, administrators can enable short-term API keys for their users. Short-term API keys use AWS Signature Version 4 and existing IAM principals, maintaining established access controls implemented by administrators.

For audit and compliance purposes, all API calls are logged in AWS CloudTrail. API keys are passed as authorization headers to API requests and aren’t logged.

Conclusion

Amazon Bedrock API keys are available in 20 AWS Regions where Amazon Bedrock is available: US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Hyderabad, Mumbai, Osaka, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Milan, Paris, Spain, Stockholm, Zurich), and South America (São Paulo). To learn more about API keys in Amazon Bedrock, visit the API Keys documentation in the Amazon Bedrock user guide.

Give API keys a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.


About the Authors

Sofian Hamiti is a technology leader with over 10 years of experience building AI solutions, and leading high-performing teams to maximize customer outcomes. He is passionate in empowering diverse talent to drive global impact and achieve their career aspirations.

Ajit Mahareddy is an experienced Product and Go-To-Market (GTM) leader with over 20 years of experience in product management, engineering, and go-to-market. Prior to his current role, Ajit led product management building AI/ML products at leading technology companies, including Uber, Turing, and eHealth. He is passionate about advancing generative AI technologies and driving real-world impact with generative AI.

Nakul Vankadari Ramesh is a Software Development Engineer with over 7 years of experience building large-scale distributed systems. He currently works on the Amazon Bedrock team, helping accelerate the development of generative AI capabilities. Previously, he contributed to Amazon Managed Blockchain, focusing on scalable and reliable infrastructure.

Huong Nguyen is a Principal Product Manager at AWS. She is a product leader at Amazon Bedrock, with 18 years of experience building customer-centric and data-driven products. She is passionate about democratizing responsible machine learning and generative AI to enable customer experience and business innovation. Outside of work, she enjoys spending time with family and friends, listening to audiobooks, traveling, and gardening.

Massimiliano Angelino is Lead Architect for the EMEA Prototyping team. During the last 3 and half years he has been an IoT Specialist Solution Architect with a particular focus on edge computing, and he contributed to the launch of AWS IoT Greengrass v2 service and its integration with Amazon SageMaker Edge Manager. Based in Stockholm, he enjoys skating on frozen lakes.

Read More

Accelerating data science innovation: How Bayer Crop Science used AWS AI/ML services to build their next-generation MLOps service

Accelerating data science innovation: How Bayer Crop Science used AWS AI/ML services to build their next-generation MLOps service

The world’s population is expanding at a rapid rate. The growing global population requires innovative solutions to produce food, fiber, and fuel, while restoring natural resources like soil and water and addressing climate change. Bayer Crop Science estimates farmers need to increase crop production by 50% by 2050 to meet these demands. To support their mission, Bayer Crop Science is collaborating with farmers and partners to promote and scale regenerative agriculture—a future where farming can produce more while restoring the environment.

Regenerative agriculture is a sustainable farming philosophy that aims to improve soil health by incorporating nature to create healthy ecosystems. It’s based on the idea that agriculture should restore degraded soils and reverse degradation, rather than sustain current conditions. The Crop Science Division at Bayer believes regenerative agriculture is foundational to the future of farming. Their vision is to produce 50% more food by restoring nature and scaling regenerative agriculture. To make this mission a reality, Bayer Crop Science is driving model training with Amazon SageMaker and accelerating code documentation with Amazon Q.

In this post, we show how Bayer Crop Science manages large-scale data science operations by training models for their data analytics needs and maintaining high-quality code documentation to support developers. Through these solutions, Bayer Crop Science projects up to a 70% reduction in developer onboarding time and up to a 30% improvement in developer productivity.

Challenges

Bayer Crop Science faced the challenge of scaling genomic predictive modeling to increase its speed to market. It also needed data scientists to focus on building the high-value foundation models (FMs), rather than worrying about constructing and engineering the solution itself. Prior to building their solution, the Decision Science Ecosystem, provisioning a data science environment could take days for a data team within Bayer Crop Science.

Solution overview

Bayer Crop Science’s Decision Science Ecosystem (DSE) is a next-generation machine learning operations (MLOps) solution built on AWS to accelerate data-driven decision making for data science teams at scale across the organization. AWS services assist Bayer Crop Science in creating a connected decision-making system accessible to thousands of data scientists. The company is using the solution for generative AI, product pipeline advancements, geospatial imagery analytics of field data, and large-scale genomic predictive modeling that will allow Bayer Crop Science to become more data-driven and increase speed to market. This solution helps the data scientist at every step, from ideation to model output, including the entire business decision record made using DSE. Other divisions within Bayer are also beginning to build a similar solution on AWS based on the success of DSE.

Bayer Crop Science teams’ DSE integrates cohesively with SageMaker, a fully managed service that lets data scientists quickly build, train, and deploy machine learning (ML) models for different use cases so they can make data-informed decisions quickly. This boosts collaboration within Bayer Crop Science across product supply, R&D, and commercial. Their data science strategy no longer needs self-service data engineering, but rather provides an effective resource to drive fast data engineering at scale. Bayer Crop Science chose SageMaker because it provides a single cohesive experience where data scientists can focus on building high-value models, without having to worry about constructing and engineering the resource itself. With the help of AWS services, cross-functional teams can align quickly to reduce operational costs by minimizing redundancy, addressing bugs early and often, and quickly identifying issues in automated workflows. The DSE solution uses SageMaker, Amazon Elastic Kubernetes Service (Amazon EKS), AWS Lambda, and Amazon Simple Storage Service (Amazon S3) to accelerate innovation at Bayer Crop Science and to create a customized, seamless, end-to-end user experience.

The following diagram illustrates the DSE architecture.

Comprehensive AWS DSE architecture showing data scientist workflow from platform through deployment, with monitoring and security controls

Solution walkthrough

Bayer Crop Science had two key challenges in managing large-scale data science operations: maintaining high-quality code documentation and optimizing existing documentation across multiple repositories. With Amazon Q, Bayer Crop Science tackled both challenges, which empowered them to onboard developers more rapidly and improve developer productivity.

The company’s first use case focused on automatically creating high-quality code documentation. When a developer pushes code to a GitHub repository, a webhook—a lightweight, event-driven communication that automatically sends data between applications using HTTP—triggers a Lambda function through Amazon API Gateway. This function then uses Amazon Q to analyze the code changes and generate comprehensive documentation and change summaries. The updated documentation is then stored in Amazon S3. The same Lambda function also creates a pull request with the AI-generated summary of code changes. To maintain security and flexibility, Bayer Crop Science uses Parameter Store, a capability of AWS Systems Manager, to manage prompts for Amazon Q, allowing for quick updates without redeployment, and AWS Secrets Manager to securely handle repository tokens.

This automation significantly reduces the time developers spend creating documentation and pull request descriptions. The generated documentation is also ingested into Amazon Q, so developers can quickly answer questions they have about a repository and onboard onto projects.

The second use case addresses the challenge of maintaining and improving existing code documentation quality. An AWS Batch job, triggered by Amazon EventBridge, processes the code repository. Amazon Q generates new documentation for each code file, which is then indexed along with the source code. The system also generates high-level documentation for each module or functionality and compares the AI-generated documentation with existing human-written documentation. This process makes it possible for Bayer Crop Science to systematically evaluate and enhance their documentation quality over time.

To improve search capabilities, Bayer Crop Science added repository names as custom attributes in the Amazon Q index and prefixed them to indexed content. This enhancement improved the accuracy and relevance of documentation searches. The development team also implemented strategies to handle API throttling and variability in AI responses, maintaining robustness in production environments. Bayer Crop Science is considering developing a management plane to streamline the addition of new repositories and centralize the management of settings, tokens, and prompts. This would further enhance the scalability and ease of use of the system.

Organizations looking to replicate Bayer Crop Science’s success can implement similar webhook-triggered documentation generation, use Amazon Q Business for both generating and evaluating documentation quality, and integrate the solution with existing version control and code review processes. By using AWS services like Lambda, Amazon S3, and Systems Manager, companies can create a scalable and manageable architecture for their documentation needs. Amazon Q Developer also helps organizations further accelerate their development timelines by providing real-time code suggestions and a built-in next-generation chat experience.

“One of the lessons we’ve learned over the last 10 years is that we want to write less code. We want to focus our time and investment on only the things that provide differentiated value to Bayer, and we want to leverage everything we can that AWS provides out of the box. Part of our goal is reducing the development cycles required to transition a model from proof-of-concept phase, to production, and ultimately business adoption. That’s where the value is.”

– Will McQueen, VP, Head of CS Global Data Assets and Analytics at Bayer Crop Science.

Summary

Bayer Crop Science’s approach aligns with modern MLOps practices, enabling data science teams to focus more on high-value modeling tasks rather than time-consuming documentation processes and infrastructure management. By adopting these practices, organizations can significantly reduce the time and effort required for code documentation while improving overall code quality and team collaboration.

Learn more about Bayer Crop Science’s generative AI journey, and discover how Bayer Crop Science is redesigning sustainable practices through cutting-edge technology.

About Bayer

Bayer is a global enterprise with core competencies in the life science fields of health care and nutrition. In line with its mission, “Health for all, Hunger for none,” the company’s products and services are designed to help people and the planet thrive by supporting efforts to understand the major challenges presented by a growing and aging global population. Bayer is committed to driving sustainable development and generating a positive impact with its businesses. At the same time, Bayer aims to increase its earning power and create value through innovation and growth. The Bayer brand stands for trust, reliability, and quality throughout the world. In fiscal 2023, the Group employed around 100,000 people and had sales of 47.6 billion euros. R&D expenses before special items amounted to 5.8 billion euros. For more information, go to www.bayer.com.


About the authors

Headshot of Lance SmithLance Smith is a Senior Solutions Architect and part of the Global Healthcare and Life Sciences industry division at AWS. He has spent the last 2 decades helping life sciences companies apply technology in pursuit of their missions to help patients. Outside of work, he loves traveling, backpacking, and spending time with his family.

Headshot of Kenton BlacuttKenton Blacutt is an AI Consultant within the Amazon Q Customer Success team. He works hands-on with customers, helping them solve real-world business problems with cutting-edge AWS technologies. In his free time, he likes to travel and run an occasional marathon.

Headshot of Karthik PrabhakarKarthik Prabhakar is a Senior Applications Architect within the AWS Professional Services team. In this role, he collaborates with customers to design and implement cutting-edge solutions for their mission-critical business systems, focusing on areas such as scalability, reliability, and cost optimization in digital transformation and modernization projects.

Headshot of Jake MalmadJake Malmad is a Senior DevOps Consultant within the AWS Professional Services team, specializing in infrastructure as code, security, containers, and orchestration. As a DevOps consultant, he uses this expertise to collaboratively works with customers, architecting and implementing solutions for automation, scalability, reliability, and security across a wide variety of cloud adoption and transformation engagements.

Headshot of Nicole BrownNicole Brown is a Senior Engagement Manager within the AWS Professional Services team based in Minneapolis, MN. With over 10 years of professional experience, she has led multidisciplinary, global teams across the healthcare and life sciences industries. She is also a supporter of women in tech and currently holds a board position within the Women at Global Services affinity group.

Read More

Combat financial fraud with GraphRAG on Amazon Bedrock Knowledge Bases

Combat financial fraud with GraphRAG on Amazon Bedrock Knowledge Bases

Financial fraud detection isn’t just important to banks—it’s essential. With global fraud losses surpassing $40 billion annually and sophisticated criminal networks constantly evolving their tactics, financial institutions face an increasingly complex threat landscape. Today’s fraud schemes operate across multiple accounts, institutions, and channels, creating intricate webs designed specifically to evade detection systems.

Financial institutions have invested heavily in detection capabilities, but the core challenge remains: how to connect the dots across fragmented information landscapes where the evidence of fraud exists not within individual documents or transactions, but in the relationships between them.

In this post, we show how to use Amazon Bedrock Knowledge Bases GraphRAG with Amazon Neptune Analytics to build a financial fraud detection solution.

The limitations of traditional RAG systems

In recent years, Retrieval Augmented Generation (RAG) has emerged as a promising approach for building AI systems grounded in organizational knowledge. However, traditional RAG-based systems have limitations when it comes to complex financial fraud detection.The fundamental limitation lies in how conventional RAG processes information. Standard RAG retrieves and processes document chunks as isolated units, looking for semantic similarities between a query and individual text passages. This approach works well for straightforward information retrieval, but falls critically short in the following scenarios:

  • Evidence is distributed across multiple documents and systems
  • The connections between entities matter more than the entities themselves
  • Complex relationship chains require multi-hop reasoning
  • Structural context (like hierarchical document organization) provides critical clues
  • Entity resolution across disparate references is essential

A fraud analyst intuitively follows connection paths—linking an account to a phone number, that phone number to another customer, and that customer to a known fraud ring. Traditional RAG systems, however, lack this relational reasoning capability, leaving sophisticated fraud networks undetected until losses have already occurred.

Amazon Bedrock Knowledge Bases with GraphRAG for financial fraud detection

Amazon Bedrock Knowledge Bases GraphRAG helps financial institutions implement fraud detection systems without building complex graph infrastructure from scratch. By offering a fully managed service that seamlessly integrates knowledge graph construction, maintenance, and querying with powerful foundation models (FMs), Amazon Bedrock Knowledge Bases dramatically lowers the technical barriers to implementing relationship-aware fraud detection. Financial organizations can now use their existing transaction data, customer profiles, and risk signals within a graph context that preserves the critical connections between entities while benefiting from the natural language understanding of FMs. This powerful combination enables fraud analysts to query complex financial relationships using intuitive natural language to detect suspicious patterns that can result in financial fraud.

Example fraud detection use case

To demonstrate this use case, we use a fictitious bank (AnyCompany Bank) in Australia whose customers hold savings, checking, and credit card accounts with the bank. These customers perform transactions to buy goods and services from merchants across the country using their debit and credit cards. AnyCompany Bank is looking to use the latest advancements in GraphRAG and generative AI technologies to detect subtle patterns in fraudulent behavior that will yield higher accuracy and reduce false positives.A fraud analyst at AnyCompany Bank wants to use natural language queries to get answers to the following types of queries:

  • Basic queries – For example, “Show me all the transactions processed by ABC Electronics” or “What accounts does Michael Green own?”
  • Relationship exploration queries – For example, “Which devices have accessed account A003?” or “Show all relationships between Jane Smith and her devices.”
  • Temporal pattern detection queries – For example, “Which accounts had transactions and device access on the same day?” or “Which accounts had transactions outside their usual location pattern?”
  • Fraud detection queries – For example, as “Find unusual transaction amounts compared to account history” or “Are there any accounts with failed transactions followed by successful ones within 24 hours?”

Solution overview

To help illustrate the core GraphRAG principles, we have simplified the data model to six key tables: accounts, transactions, individuals, devices, merchants, and relationships. Real-world financial fraud detection systems are much more complex, with hundreds of entity types and intricate relationships, but this example demonstrates the essential concepts that scale to enterprise implementations. The following figure is an example of the accounts table.

The following figure is an example of the individuals table.

The following figure is an example of the devices table.

The following figure is an example of the transactions table.

The following figure is an example of the merchants table.

The following figure is an example of the relationships table.

The following diagram shows the relationships among these entities: accounts, individuals, devices, transactions, and merchants. For example, the individual John Doe uses device D001 to access account A001 to execute transaction T001, which is processed by merchant ABC Electronics.

In the following sections, we demonstrate how to upload documents to Amazon Simple Storage Service (Amazon S3), create a knowledge base using Amazon Bedrock Knowledge Bases, and test the knowledge base by running natural language queries.

Prerequisites

To follow along with this post, make sure you have an active AWS account with appropriate permissions to access Amazon Bedrock and create an S3 bucket to be the data source. Additionally, verify that you have enabled access to both Anthropic’s Claude 3.5 Haiku and an embeddings model, such as Amazon Titan Text Embeddings V2.

Uplaod documents to Amazon S3

In this step, you create an S3 bucket as the data source and upload the six tables (accounts, individuals, devices, transactions, merchants, and relationships) as Excel data sheets. The following screenshot shows our S3 bucket and its contents.

Create a knowledge base

Complete the following steps to create the knowledge base:

  1. On the Amazon Bedrock console, choose Knowledge Bases under Builder tools in the navigation pane.
  2. Choose Create and Knowledge Base with vector store.

  1. In the Knowledge Base details section, provide the following information:
    1. Enter a meaningful name for the knowledge base.
    2. For IAM permissions, select Create and use a new service role to create a new AWS Identity and Access Management (IAM) role.
    3. For Choose data source, select Amazon S3.
    4. Choose Next.

  1. In the Configure data source section, provide the following information:
    1. Enter a data source name.
    2. For Data source location, select the location of your data source (for example, we select This AWS account).
    3. For S3 source, choose Browse S3 and choose the location where you uploaded the files.
    4. For Parsing staretgy, select Amazon Bedrock default parser.
    5. For Chunking strategy, choose Default chunking.
    6. Choose Next.

  1. In the Configure data storage and processing section, provide the following information:
    1. For Embeddings model, choose Titan Text Embeddings V2.
    2. For Vector store creation method, select Quick create a new vector store.
    3. For Vector store type, select Amazon Neptune Analytics (GraphRAG).
    4. Choose Next

Amazon Bedrock chooses the FM as Anthropic’s Claude 3 Haiku v1 to automatically build graphs for our knowledge base. This automatically enables contextual enrichment.

  1. Choose Create knowledge base.
  2. Choose the knowledge base when it’s in Available status.

  1. Select the data source and choose Sync, then wait for the sync process to complete.

In the sync process, Amazon Bedrock ingests data files from Amazon S3, creates chunks and embeddings, and automatically extracts entities and relationships, creating the graph.

Test the knowledge base and run natural language queries

When the sync is complete, you can test the knowledge base.

  1. In the Test Knowledge Base section, choose Select model.
  2. Set the model as Anthropic’s Claude 3.5 Haiku (or another model of your choice) and then choose Apply.

  1. Enter a sample query and choose Run.

Let’s start with some basic queries, such as “Show me all transactions processed by ABC Electronics” or “What accounts does Michael Green own?” The generated responses are shown in the following screenshot.

We can also run some relationship exploration queries, such as “Which devices have accessed account A003?” or “Show all relationships between Jane Smith and her devices.” The generated responses are shown in the following screenshot. To arrive at the response, the model will do multi-hop reasoning where it will traverse multiple files.

The model can also perform temporal pattern detection queries, such as “Which accounts had transactions and device access on the same day?” or “Which accounts had transactions outside their usual location pattern?” The generated responses are shown in the following screenshot.

Let’s try out some fraud detection queries, such as “Find unusual transaction amounts compared to account history” or “Are there any accounts with failed transactions followed by successful ones within 24 hours?” The generated responses are shown in the following screenshot.

The GraphRAG solution also enables complex relationship queries, such as “Show the complete path from Emma Brown to Pacific Fresh Market” or “Map all connections between the individuals and merchants in the system.” The generated responses are shown in the following screenshot.

Clean up

To avoid incurring additional costs, clean up the resources you created. This includes deleting the Amazon Bedrock knowledge base, its associated IAM role, and the S3 bucket used for source documents. Additionally, you must separately delete the Neptune Analytics graph that was automatically created by Amazon Bedrock Knowledge Bases during the setup process.

Conclusion

GraphRAG in Amazon Bedrock emerges as a game-changing feature in the fight against financial fraud. By automatically connecting relationships across transaction data, customer profiles, historical patterns, and fraud reports, it significantly enhances financial institutions’ ability to detect complex fraud schemes that traditional systems might miss. Its unique capability to understand and link information across multiple documents and data sources proves invaluable when investigating sophisticated fraud patterns that span various touchpoints and time periods.For financial institutions and fraud detection teams, GraphRAG intelligent document processing means faster, more accurate fraud investigations. It can quickly piece together related incidents, identify common patterns in fraud reports, and connect seemingly unrelated activities that might indicate organized fraud rings. This deeper level of insight, combined with its ability to provide comprehensive, context-aware responses, enables security teams to stay one step ahead of fraudsters who continuously evolve their tactics.As financial crimes become increasingly sophisticated, GraphRAG in Amazon Bedrock stands as a powerful tool for fraud prevention, transforming how you can analyze, connect, and act on fraud-related information. The future of fraud detection demands tools that can think and connect like humans—and GraphRAG is leading the way in making this possible.


About the Authors

Senaka Ariyasinghe is a Senior Partner Solutions Architect at AWS. He collaborates with Global Systems Integrators to drive cloud innovation across the Asia-Pacific and Japan region. He specializes in helping AWS partners develop and implement scalable, well-architected solutions, with particular emphasis on generative AI, machine learning, cloud migration strategies, and the modernization of enterprise applications.

Senthil Nathan is a Senior Partner Solutions Architect working with Global Systems Integrators at AWS. In his role, Senthil works closely with global partners to help them maximize the value and potential of the AWS Cloud landscape. He is passionate about using the transformative power of cloud computing and emerging technologies to drive innovation and business impact.

Deependra Shekhawat is a Senior Energy and Utilities Industry Specialist Solutions Architect based in Sydney, Australia. In his role, Deependra helps energy companies across the Asia-Pacific and Japan region use cloud technologies to drive sustainability and operational efficiency. He specializes in creating robust data foundations and advanced workflows that enable organizations to harness the power of big data, analytics, and machine learning for solving critical industry challenges.

Aaron Sempf is Next Gen Tech Lead for the AWS Partner Organization in Asia-Pacific and Japan. With over 20 years in distributed system engineering design and development, he focuses on solving for large-scale complex integration and event-driven systems. In his spare time, he can be found coding prototypes for autonomous robots, IoT devices, distributed solutions, and designing agentic architecture patterns for generative AI-assisted business automation.

Ozan Eken is a Product Manager at AWS, passionate about building cutting-edge generative AI and graph analytics products. With a focus on simplifying complex data challenges, Ozan helps customers unlock deeper insights and accelerate innovation. Outside of work, he enjoys trying new foods, exploring different countries, and watching soccer.

JaiPrakash Dave is a Partner Solutions Architect working with Global Systems Integrators at AWS based in India. In his role, JaiPrakash guides AWS partners in the India region to design and scale well-architected solutions, focusing on generative AI, machine learning, DevOps, and application and data modernization initiatives.

Read More

Classify call center conversations with Amazon Bedrock batch inference

Classify call center conversations with Amazon Bedrock batch inference

In this post, we demonstrate how to build an end-to-end solution for text classification using the Amazon Bedrock batch inference capability with the Anthropic’s Claude Haiku model. Amazon Bedrock batch inference offers a 50% discount compared to the on-demand price, which is an important factor when dealing with a large number of requests. We walk through classifying travel agency call center conversations into categories, showcasing how to generate synthetic training data, process large volumes of text data, and automate the entire workflow using AWS services.

Challenges with high-volume text classification

Organizations across various sectors face a common challenge: the need to efficiently handle high-volume classification tasks. From travel agency call centers categorizing customer inquiries to sales teams analyzing lost opportunities and finance departments classifying invoices, these manual processes are a daily necessity. But these tasks come with significant challenges.

The manual approach to analyzing and categorizing these classification requests is not only time-intensive but also prone to inconsistencies. As teams process the high volume of data, the potential for errors and inefficiencies grows. By implementing automated systems to classify these interactions, multiple departments stand to gain substantial benefits. They can uncover hidden trends in their data, significantly enhance the quality of their customer service, and streamline their operations for greater efficiency.

However, the path to effective automated classification has its own challenges. Organizations must grapple with the complexities of efficiently processing vast amounts of textual information while maintaining consistent accuracy in their classification results. In this post, we demonstrate how to create a fully automated workflow while keeping operational costs under control.

Data

For this solution, we used synthetic call center conversation data. For realistic training data that maintains user privacy, we generated synthetic conversations using Anthropic’s Claude 3.7 Sonnet. We used the following prompt generate synthetic data:

Task: Generate <N> synthetic conversations from customer calls to an imaginary travel
company. Come up with 10 most probable categories that calls of this nature can come 
from and treat them as classification categories for these calls. For each generated 
call create a column that indicates the category for that call. 
Conversations should follow the following format:
"User: ...
Agent: ...
User: ...
Agent: ...
...

Class: One of the 10 following categories that is most relevant to the conversation."
Ten acceptable classes:
1. Booking Inquiry - Customer asking about making new reservations
2. Reservation Change - Customer wanting to modify existing bookings
3. Cancellation Request - Customer seeking to cancel their travel plans
4. Refund Issues - Customer inquiring about getting money back
5. Travel Information - Customer seeking details about destinations, documentation, etc.
6. Complaint - Customer expressing dissatisfaction with service
7. Payment Problem - Customer having issues with billing or payments
8. Loyalty Program - Customer asking about rewards points or membership status
9. Special Accommodation - Customer requesting special arrangements
10. Technical Support - Customer having issues with website, app or booking systems

Instructions:
- Keep conversations concise
- Use John Doe for male names and Jane Doe for female names
- Use john.doe@email.com for male email address,jane.doe@email.com for female email 
address and corporate@email.com for corporate email address, whenever you need to 
generate emails.Use " or ' instead of " whenever there is a quote within the
conversation

The synthetic dataset includes the following information:

  • Customer inquiries about flight bookings
  • Hotel reservation discussions
  • Travel package negotiations
  • Customer service complaints
  • General travel inquiries

Solution overview

The solution architecture uses a serverless, event-driven, scalable design to effectively handle and classify large quantities of classification requests. Built on AWS, it automatically starts working when new classification request data arrives in an Amazon Simple Storage Service (Amazon S3) bucket. The system then uses Amazon Bedrock batch processing to analyze and categorize the content at scale, minimizing the need for constant manual oversight.

The following diagram illustrates the solution architecture.

Architecture Diagram

The architecture follows a well-structured flow that facilitates reliable processing of classification requests:

  • Data preparation – The process begins when the user or application submits classification requests into the S3 bucket (Step 1). These requests are ingested into an Amazon Simple Queue Service (Amazon SQS) queue, providing a reliable buffer for incoming data and making sure no requests are lost during peak loads. A serverless data processor, implemented using an AWS Lambda function, reads messages from the queue and begins its data processing work (Step 2). It prepares the data for batch inference, crafting it into the JSONL format with schema that Amazon Bedrock requires. It stores files in a separate S3 bucket to maintain a clear separation from the original S3 bucket shared with the customer’s application, enhancing security and data management.
  • Batch inference – When the data arrives in the S3 bucket, it initiates a notification to an SQS queue. This queue activates the Lambda function batch initiator, which starts the batch inference process. The function submits Amazon Bedrock batch inference jobs through the CreateModelInvocationJob API (Step 3). This initiator acts as the bridge between the queued data and the powerful classification capabilities of Amazon Bedrock. Amazon Bedrock then efficiently processes the data in batches. This batch processing approach allows for optimal use of resources while maintaining high throughput. When Amazon Bedrock completes its task, the classification results are stored in an output S3 bucket (Step 4) for postprocessing and analysis.
  • Classification results processing – After classification is complete, the system processes the results through another SQS queue (Step 5) and specialized Lambda function, which organizes the classifications into simple-to-read files, such as CSV, JSON, or XLSX (Step 6). These files are immediately available to both the customer’s applications and support teams who need to access this information (Step 7).
  • Analytics – We built an analytics layer that automatically catalogs and organizes the classification results, transforming raw classification data into actionable insights. An AWS Glue crawler catalogs everything so it can be quickly found later (Step 8). Now your business teams can use Amazon Athena to run SQL queries against the data, uncovering patterns and trends in the classified categories. We also built an Amazon QuickSight dashboard that provides visualization capabilities, so stakeholders can transform datasets into actionable reports ready for decision-making. (Step 9).

We use AWS best practices in this solution, including event-driven and batch processing for optimal resource utilization, batch operations for cost-effectiveness, decoupled components for independent scaling, and least privilege access patterns. We implemented the system using the AWS Cloud Development Kit (AWS CDK) with TypeScript for infrastructure as code (IaC) and Python for application logic, making sure we achieve seamless automation, dynamic scaling, and efficient processing of classification requests, positioning it to effectively address both current requirements and future demands.

Prerequisites

To perform the solution, you must have the following prerequisites:

  • An active AWS account.
  • An AWS Region from the list of batch inference supported Regions for Amazon Bedrock.
  • Access to your selected models hosted on Amazon Bedrock. Make sure the selected model has been enabled in Amazon Bedrock. The solution is configured to use Anthropic’s Claude 3 Haiku by default.
  • Sign up for QuickSight in the same Region where the main application will be deployed. While subscribing, make sure to configure access to Athena and Amazon S3.
  • In QuickSight, create a group named quicksight-access for managing dashboard access permissions. Make sure to add your own role to this group so you can access the dashboard after it’s deployed. If you use a different group name, modify the corresponding name in the code accordingly.
  • To set up the AWS CDK, install the AWS CDK Command Line Interface (CLI). For instructions, see AWS CDK CLI reference.

Deploy the solution

The solution is accessible in the GitHub repository.

Complete the following steps to set up and deploy the solution:

  1. Clone the Repository: Run the following command: git clone git@github.com:aws-samples/sample-genai-bedrock-batch-classifier.git
  2. Set Up AWS Credentials: Create an AWS Identity and Access Management (IAM) user with appropriate permissions, generate credentials for AWS Command Line Interface (AWS CLI) access, and create a profile. For instructions, see Authenticating using IAM user credentials for the AWS CLI. You can use the Admin Role for testing purposes, although it violates the principle of least privilege and should be avoided in production environments in favor of custom roles with minimal required permissions.
  3. Bootstrap the Application: In the CDK folder, run the command npm install & cdk bootstrap --profile {your_profile_name}, replacing {your_profile_name} with your AWS profile name.
  4. Deploy the Solution: Run the command cdk deploy --all --profile {your_profile_name}, replacing {your_profile_name} with your AWS profile name.

After you complete the deployment process, you will see a total of six stacks created in your AWS account, as illustrated in the following screenshot.

List of stacks

SharedStack acts as a central hub for resources that multiple parts of the system need to access. Within this stack, there are two S3 buckets: one handles internal operations behind the scenes, and the other serves as a bridge between the system and customers, so they can both submit their classification requests and retrieve their results.

DataPreparationStack serves as a data transformation engine. It’s designed to handle incoming files in three specific formats: XLSX, CSV, and JSON, which at the time of writing are the only supported input formats. This stack’s primary role is to convert these inputs into the specialized JSONL format required by Amazon Bedrock. The data processing script is available in the GitHub repo. This transformation makes sure that incoming data, regardless of its original format, is properly structured before being processed by Amazon Bedrock. The format is as follows:

{
 "recordId": ${unique_id}, 
 "modelInput": {
     "anthropic_version": "bedrock-2023-05-31", 
     "max_tokens": 1024,
     "messages": [ { 
           "role": "user", 
           "content": [{"type":"text", "text": ${initial_text}]} ],
      },
      "system": ${prompt}
}

where:
initial_text - text that you want to classify
prompt       - instructions to Bedrock service how to classify
unique_id    - id coming from the upstream service, otherwise it will be 
               automatically generated by the code

BatchClassifierStack handles the classification operations. Although currently powered by Anthropic’s Claude Haiku, the system maintains flexibility by allowing straightforward switches to alternative models as needed. This adaptability is made possible through a comprehensive constants file that serves as the system’s control center. The following configurations are available:

  • PREFIX – Resource naming convention (genai by default).
  • BEDROCK_AGENT_MODEL – Model selection.
  • BATCH_SIZE – Number of classifications per output file (enables parallel processing); the minimum should be 100.
  • CLASSIFICATION_INPUT_FOLDER – Input folder name in the S3 bucket that will be used for uploading incoming classification requests.
  • CLASSIFICATION_OUTPUT_FOLDER – Output folder name in the S3 bucket where the output files will be available after the classification completes.
  • OUTPUT_FORMAT – Supported formats (CSV, JSON, XLSX).
  • INPUT_MAPPING – A flexible data integration approach that adapts to your existing file structures rather than requiring you to adapt to ours. It consists of two key fields:
    • record_id – Optional unique identifier (auto-generated if not provided).
    • record_text – Text content for classification.
  • PROMPT – Template for guiding the model’s classification behavior. A sample prompt template is available in the GitHub repo. Pay attention to the structure of the template that guides the AI model through its decision-making process. The template not only combines a set of possible categories, but also contains instructions, requiring the model to select a single category and present it within <class> tags. These instructions help maintain consistency in how the model processes incoming requests and saves the output.

BatchResultsProcessingStack functions as the data postprocessing stage, transforming the Amazon Bedrock JSONL output into user-friendly formats. At the time of writing, the system supports CSV, JSON, and XLSX. These processed files are then stored in a designated output folder in the S3 bucket, organized by date for quick retrieval and management. The conversion scripts are available in the GitHub repo. The output files have the following schema:

  • ID – Resource naming convention
  • INPUT_TEXT – Initial text that was used for classification
  • CLASS – The classification category
  • RATIONALE – Reasoning or explanation of given classification

Excel File Sample

AnalyticsStack provides a business intelligence (BI) dashboard that displays a list of classifications and allows filtering based on defined in prompt categories. It offers the following key configuration options:

  • ATHENA_DATABASE_NAME – Defines the name of Athena database that is used as a main data source for the QuickSight dashboard.
  • QUICKSIGHT_DATA_SCHEMA – Defines how labels should be displayed on the dashboard and specifies which columns are filterable.
  • QUICKSIGHT_PRINCIPAL_NAME – Designates the principal group that will have access to the QuickSight dashboard. The group should be created manually before deploying the stack.
  • QUICKSIGHT_QUERY_MODE – You can choose between SPICE or direct query for fetching data, depending on your use case, data volume, and data freshness requirements. The default setting is direct query.

Now that you’ve successfully deployed the system, you can prepare your data file—this can be either real customer data or the synthetic dataset we provided for testing. When your file is ready, go to the S3 bucket named {prefix}-{account_id}-customer-requests-bucket-{region} and upload your file to input_data folder. After the batch inference job is complete, you can view the classification results on the dashboard. You can find it under the name {prefix}-{account_id}-classifications-dashboard-{region}. The following screenshot shows a preview of what you can expect.

BI Dashboard

The dashboard will not display data until Amazon Bedrock finishes processing the batch inference jobs and the AWS Glue crawler creates the Athena table. Without these steps completed, the dashboard can’t connect to the table because it doesn’t exist yet. Additionally, you must update the QuickSight role permissions that were set up during pre-deployment. To update permissions, complete the following steps:

  1. On the QuickSight console, choose the user icon in the top navigation bar and choose Manage QuickSight.
  2. In the navigation pane, choose Security & Permissions.
  3. Verify that the role has been granted proper access to the S3 bucket with the following path format: {prefix}-{account_id}-internal-classifications-{region}.

Results

To test the solution’s performance and reliability, we tested 1,190 synthetically generated travel agency conversations from a single Excel file across multiple runs. The results were remarkably consistent across 10 consecutive runs, with processing times ranging between 11–12 minutes per batch (200 classifications in a single batch).Our solution achieved the following:

  • Speed – Maintained consistent processing times around 11–12 minutes
  • Accuracy – Achieved 100% classification accuracy on our synthetic dataset
  • Cost-effectiveness – Optimized expenses through efficient batch processing

Challenges

For certain cases, the generated class didn’t exactly match the class name given in the prompt. For instance, in multiple cases, it output “Hotel/Flight Booking Inquiry” instead of “Booking Inquiry,” which was defined as the class in the prompt. This was addressed by prompt engineering and asking the model to check the final class output to match exactly with one of the provided classes.

Error handling

For troubleshooting purposes, the solution includes an Amazon DynamoDB table that tracks batch processing status, along with Amazon CloudWatch Logs. Error tracking is not automated and requires manual monitoring and validation.

Key takeaways

Although our testing focused on travel agency scenarios, the solution’s architecture is flexible and can be adapted to various classification needs across different industries and use cases.

Known limitations

The following are key limitations of the classification solution and should be considered when planning its use:

  • Minimum batch size – Amazon Bedrock batch inference requires at least 100 classifications per batch.
  • Processing time – The completion time of a batch inference job depends on various factors, such as job size. Although Amazon Bedrock strives to complete a typical job within 24 hours, this time frame is a best-effort estimate and not guaranteed.
  • Input file formats – The solution currently supports only CSV, JSON, and XLSX file formats for input data.

Clean up

To avoid additional charges, clean up your AWS resources when they’re no longer needed by running the command cdk destroy --all --profile {your_profile_name}, replacing {your_profile_name} with your AWS profile name.

To remove resources associated with this project, complete the following steps:

  1. Delete the S3 buckets:
    1. On the Amazon S3 console, choose Buckets in the navigation pane.
    2. Locate your buckets by searching for your {prefix}.
    3. Delete these buckets to facilitate proper cleanup.
  2. Clean up the DynamoDB resources:
    1. On the DynamoDB console, choose Tables in the navigation pane.
    2. Delete the table {prefix}-{account_id}-batch-processing-status-{region}.

This comprehensive cleanup helps make sure residual resources don’t remain in your AWS account from this project.

Conclusion

In this post, we explored how Amazon Bedrock batch inference can transform your large-scale text classification workflows. You can now automate time-consuming tasks your teams handle daily, such as analyzing lost sales opportunities, categorizing travel requests, and processing insurance claims. This solution frees your teams to focus on growing and improving your business.

Furthermore, this solution gives the opportunity to create a system that provides real-time classifications, seamlessly integrates with your communication channels, offers enhanced monitoring capabilities, and supports multiple languages for global operations.

This solution was developed for internal use in test and non-production environments only. It is the responsibility of the customer to perform their due diligence to verify the solution aligns with their compliance obligations.

We’re excited to see how you will adapt this solution to your unique challenges. Share your experience or questions in the comments—we’re here to help you get started on your automation journey.


About the authors

Nika Mishurina is a Senior Solutions Architect with Amazon Web Services. She is passionate about delighting customers through building end-to-end production-ready solutions for Amazon. Outside of work, she loves traveling, working out, and exploring new things.

Farshad Harirchi is a Principal Data Scientist at AWS Professional Services. He helps customers across industries, from retail to industrial and financial services, with the design and development of generative AI and machine learning solutions. Farshad brings extensive experience in the entire machine learning and MLOps stack. Outside of work, he enjoys traveling, playing outdoor sports, and exploring board games.

Read More