Transform text-based prompts into high-resolution eight-second videos in Gemini Advanced and use Whisk Animate to turn images into eight-second animated clips.Read More
TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization
Direct Preference Optimization (DPO) has been widely adopted for preference alignment of Large Language Models (LLMs) due to its simplicity and effectiveness. However, DPO is derived as a bandit problem in which the whole response is treated as a single arm, ignoring the importance differences between tokens, which may affect optimization efficiency and make it difficult to achieve optimal results. In this work, we propose that the optimal data for DPO has equal expected rewards for each token in winning and losing responses, as there is no difference in token importance. However, since the…Apple Machine Learning Research
CoMotion: Concurrent Multi-Person 3D Motion
We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream. Our system maintains temporally coherent predictions in crowded scenes filled with difficult poses and occlusions. Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame. Rather than match detections across time, poses are updated directly from a new input image, which enables online tracking through occlusion. We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a…Apple Machine Learning Research
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
Diffusion transformers have been widely adopted for text-to-image synthesis. While scaling these models up to billions of parameters shows promise, the effectiveness of scaling beyond current sizes remains underexplored and challenging. By explicitly exploiting the computational heterogeneity of image generations, we develop a new family of Mixture-of-Experts (MoE) models (EC-DIT) for diffusion transformers with expert-choice routing. EC-DIT learns to adaptively optimize the compute allocated to understand the input texts and generate the respective image patches, enabling heterogeneous…Apple Machine Learning Research
Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project

The Semantic Telemetry Project aims to better understand complex, turn-based human-AI interactions in Microsoft Copilot using a new data science approach.
This understanding is crucial for recognizing how individuals utilize AI systems to address real-world tasks. It provides actionable insights, enhances key use cases , and identifies opportunities for system improvement.
In a recent blog post, we shared our approach for classifying chat log data using large language models (LLMs), which allows us to analyze these interactions at scale and in near real time. We also introduced two of our LLM-generated classifiers: Topics and Task Complexity.
This blog post will examine how our suite of LLM-generated classifiers can serve as early indicators for user engagement and highlight how usage and satisfaction varies based on AI and user expertise.
The key findings from our research are:
- When users engage in more professional, technical, and complex tasks, they are more likely to continue utilizing the tool and increase their level of interaction with it.
- Novice users currently engage in simpler tasks, but their work is gradually becoming more complex over time.
- More expert users are satisfied with AI responses only where AI expertise is on par with their own expertise on the topic, while novice users had low satisfaction rates regardless of AI expertise.
Read on for more information on these findings. Note that all analyses were conducted on anonymous Copilot in Bing interactions containing no personal information.
Classifiers mentioned in article:
Knowledge work classifier: Tasks that involve creating artifacts related to information work typically requiring creative and analytical thinking. Examples include strategic business planning, software design, and scientific research.
Task complexity classifier: Assesses the cognitive complexity of a task if a user performs it without the use of AI. We group into two categories: low complexity and high complexity.
Topics classifier: A single label for the primary topic of the conversation.
User expertise: Labels the user’s expertise on the primary topic within the conversation as one of the following categories: Novice (no familiarity with the topic), Beginner (little prior knowledge or experience), Intermediate (some basic knowledge or familiarity with the topic), Proficient (can apply relevant concepts from conversation), and Expert (deep and comprehensive understanding of the topic).
AI expertise: Labels the AI agent expertise based on the same criteria as user expertise above.
User satisfaction: A 20-question satisfaction/dissatisfaction rubric that the LLM evaluates to create an aggregate score for overall user satisfaction.
What keeps Bing Chat users engaged?
We conducted a study of a random sample of 45,000 anonymous Bing Chat users during May 2024. The data was grouped into three cohorts based on user activity over the course of the month:
- Light (1 active chat session per week)
- Medium (2-3 active chat sessions per week)
- Heavy (4+ active chat sessions per week)
The key finding is that heavy users are doing more professional, complex work.
We utilized our knowledge work classifier to label the chat log data as relating to knowledge work tasks. What we found is knowledge work tasks were higher in all cohorts, with the highest percentage in heavy users.

Analyzing task complexity, we observed that users with higher engagement frequently perform the highest number of tasks with high complexity, while users with lower engagement performed more tasks with low complexity.

Looking at the overall data, we can filter on heavy users and see higher numbers of chats where the user was performing knowledge work tasks. Based on task complexity, we see that most knowledge work tasks seek to apply a solution to an existing problem, primarily within programming and scripting. This is in line with our top overall topic, technology, which we discussed in the previous post.

In contrast, light users tended to do more low complexity tasks (“Remember”), using Bing Chat like a traditional search engine and engaging more in topics like business and finance and computers and electronics.

Novice queries are becoming more complex
We looked at Bing Chat data from January through August 2024 and we classified chats using our User Expertise classifier. When we looked at how the different user expertise groups were using the tool for professional tasks, we discovered that proficient and expert users tend to do more professional tasks with high complexity in topics like programming and scripting, professional writing and editing, and physics and chemistry.



In contrast, novice users engaged more in professional tasks relating to business and finance and education and learning, mainly using the tool to recall information.

However, novices are targeting increasingly more complex tasks over time. Over the eight-month period, we see the percentage of high complexity tasks rise from about 36% to 67%, revealing that novices are learning and adapting quickly (see Figure 9).

How does user satisfaction vary according to expertise?
We classified both the user expertise and AI agent expertise for anonymous interactions in Copilot in Bing. We compared the level of user and AI agent expertise with our user satisfaction classifier.
The key takeaways are:
- Experts and proficient users are only satisfied with AI agents with similar expertise (expert/proficient).
- Novices are least satisfied, regardless of the expertise of the AI agent.

Conclusion
Understanding these metrics is vital for grasping user behavior over time and relating it to real-world business indicators. Users are finding value from complex professional knowledge work tasks, and novices are quickly adapting to the tool and finding these high value use-cases. By analyzing user satisfaction in conjunction with expertise levels, we can tailor our tools to better meet the needs of different user groups. Ultimately, these insights can help improve user understanding across a variety of tasks.
In our next post, we will examine the engineering processes involved in LLM-generated classification.
The post Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project appeared first on Microsoft Research.
DolphinGemma: How Google AI is helping decode dolphin communication
DolphinGemma, a large language model developed by Google, is helping scientists study how dolphins communicate — and hopefully find out what they’re saying, too.Read More
Build multi-agent systems with LangGraph and Amazon Bedrock
Large language models (LLMs) have raised the bar for human-computer interaction where the expectation from users is that they can communicate with their applications through natural language. Beyond simple language understanding, real-world applications require managing complex workflows, connecting to external data, and coordinating multiple AI capabilities. Imagine scheduling a doctor’s appointment where an AI agent checks your calendar, accesses your provider’s system, verifies insurance, and confirms everything in one go—no more app-switching or hold times. In these real-world scenarios, agents can be a game changer, delivering more customized generative AI applications.
LLM agents serve as decision-making systems for application control flow. However, these systems face several operational challenges during scaling and development. The primary issues include tool selection inefficiency, where agents with access to numerous tools struggle with optimal tool selection and sequencing, context management limitations that prevent single agents from effectively managing increasingly complex contextual information, and specialization requirements as complex applications demand diverse expertise areas such as planning, research, and analysis. The solution lies in implementing a multi-agent architecture, which involves decomposing the main system into smaller, specialized agents that operate independently. Implementation options range from basic prompt-LLM combinations to sophisticated ReAct (Reasoning and Acting) agents, allowing for more efficient task distribution and specialized handling of different application components. This modular approach enhances system manageability and allows for better scaling of LLM-based applications while maintaining functional efficiency through specialized components.
This post demonstrates how to integrate open-source multi-agent framework, LangGraph, with Amazon Bedrock. It explains how to use LangGraph and Amazon Bedrock to build powerful, interactive multi-agent applications that use graph-based orchestration.
AWS has introduced a multi-agent collaboration capability for Amazon Bedrock Agents, enabling developers to build, deploy, and manage multiple AI agents working together on complex tasks. This feature allows for the creation of specialized agents that handle different aspects of a process, coordinated by a supervisor agent that breaks down requests, delegates tasks, and consolidates outputs. This approach improves task success rates, accuracy, and productivity, especially for complex, multi-step tasks.
Challenges with multi-agent systems
In a single-agent system, planning involves the LLM agent breaking down tasks into a sequence of small tasks, whereas a multi-agent system must have workflow management involving task distribution across multiple agents. Unlike single-agent environments, multi-agent systems require a coordination mechanism where each agent must maintain alignment with others while contributing to the overall objective. This introduces unique challenges in managing inter-agent dependencies, resource allocation, and synchronization, necessitating robust frameworks that maintain system-wide consistency while optimizing performance.
Memory management in AI systems differs between single-agent and multi-agent architectures. Single-agent systems use a three-tier structure: short-term conversational memory, long-term historical storage, and external data sources like Retrieval Augmented Generation (RAG). Multi-agent systems require more advanced frameworks to manage contextual data, track interactions, and synchronize historical records across agents. These systems must handle real-time interactions, context synchronization, and efficient data retrieval, necessitating careful design of memory hierarchies, access patterns, and inter-agent sharing.
Agent frameworks are essential for multi-agent systems because they provide the infrastructure for coordinating autonomous agents, managing communication and resources, and orchestrating workflows. Agent frameworks alleviate the need to build these complex components from scratch.
LangGraph, part of LangChain, orchestrates agentic workflows through a graph-based architecture that handles complex processes and maintains context across agent interactions. It uses supervisory control patterns and memory systems for coordination.
LangGraph Studio enhances development with graph visualization, execution monitoring, and runtime debugging capabilities. The integration of LangGraph with Amazon Bedrock empowers you to take advantage of the strengths of multiple agents seamlessly, fostering a collaborative environment that enhances the efficiency and effectiveness of LLM-based systems.
Understanding LangGraph and LangGraph Studio
LangGraph implements state machines and directed graphs for multi-agent orchestration. The framework provides fine-grained control over both the flow and state of your agent applications. LangGraph models agent workflows as graphs. You define the behavior of your agents using three key components:
- State – A shared data structure that represents the current snapshot of your application.
- Nodes – Python functions that encode the logic of your agents.
- Edges – Python functions that determine which Node to execute next based on the current state. They can be conditional branches or fixed transitions.
LangGraph implements a central persistence layer, enabling features that are common to most agent architectures, including:
- Memory – LangGraph persists arbitrary aspects of your application’s state, supporting memory of conversations and other updates within and across user interactions.
- Human-in-the-loop – Because state is checkpointed, execution can be interrupted and resumed, allowing for decisions, validation, and corrections at key stages through human input.
LangGraph Studio is an integrated development environment (IDE) specifically designed for AI agent development. It provides developers with powerful tools for visualization, real-time interaction, and debugging capabilities. The key features of LangGraph Studio are:
- Visual agent graphs – The IDE’s visualization tools allow developers to represent agent flows as intuitive graphic wheels, making it straightforward to understand and modify complex system architectures.
- Real-time debugging – The ability to interact with agents in real time and modify responses mid-execution creates a more dynamic development experience.
- Stateful architecture – Support for stateful and adaptive agents within a graph-based architecture enables more sophisticated behaviors and interactions.
The following screenshot shows the nodes, edges, and state of a typical LangGraph agent workflow as viewed in LangGraph Studio.
Figure 1: LangGraph Studio UI
In the preceding example, the state begins with __start__
and ends with __end__
. The nodes for invoking the model and tools are defined by you and the edges tell you which paths can be followed by the workflow.
LangGraph Studio is available as a desktop application for MacOS users. Alternatively, you can run a local in-memory development server that can be used to connect a local LangGraph application with a web version of the studio.
Solution overview
This example demonstrates the supervisor agentic pattern, where a supervisor agent coordinates multiple specialized agents. Each agent maintains its own scratchpad while the supervisor orchestrates communication and delegates tasks based on agent capabilities. This distributed approach improves efficiency by allowing agents to focus on specific tasks while enabling parallel processing and system scalability.
Let’s walk through an example with the following user query: “Suggest a travel destination and search flight and hotel for me. I want to travel on 15-March-2025 for 5 days.” The workflow consists of the following steps:
- The Supervisor Agent receives the initial query and breaks it down into sequential tasks:
- Destination recommendation required.
- Flight search needed for March 15, 2025.
- Hotel booking required for 5 days.
- The Destination Agent begins its work by accessing the user’s stored profile. It searches its historical database, analyzing patterns from similar user profiles to recommend the destination. Then it passes the destination back to the Supervisor Agent.
- The Supervisor Agent forwards the chosen destination to the Flight Agent, which searches available flights for the given date.
- The Supervisor Agent activates the Hotel Agent, which searches for hotels in the destination city.
- The Supervisor Agent compiles the recommendations into a comprehensive travel plan, presenting the user with a complete itinerary including destination rationale, flight options, and hotel suggestions.
The following figure shows a multi-agent workflow of how these agents connect to each other and which tools are involved with each agent.
Figure 2: Multi-agent workflow
Prerequisites
You will need the following prerequisites before you can proceed with this solution. For this post, we use the us-west-2
AWS Region. For details on available Regions, see Amazon Bedrock endpoints and quotas.
- A valid AWS account.
- An AWS Identity and Access Management (IAM) role in the account that has sufficient permissions to create the necessary resources.
- Access to Anthropic’s Claude 3 Sonnet and Claude 3.5 Sonnet in Amazon Bedrock. For instructions, see Access Amazon Bedrock foundation models.
- A LangGraph application up and running locally. For instructions, see Quickstart: Launch Local LangGraph Server.
Core components
Each agent is structured with two primary components:
- graph.py – This script defines the agent’s workflow and decision-making logic. It implements the LangGraph state machine for managing agent behavior and configures the communication flow between different components. For example:
- The Flight Agent’s graph manages the flow between chat and tool operations.
- The Hotel Agent’s graph handles conditional routing between search, booking, and modification operations.
- The Supervisor Agent’s graph orchestrates the overall multi-agent workflow.
- tools.py – This script contains the concrete implementations of agent capabilities. It implements the business logic for each operation and handles data access and manipulation. It provides specific functionalities like:
- Flight tools:
search_flights
,book_flights
,change_flight_booking
,cancel_flight_booking
. - Hotel tools:
suggest_hotels
,book_hotels
,change_hotel_booking
,cancel_hotel_booking
.
- Flight tools:
This separation between graph (workflow) and tools (implementation) allows for a clean architecture where the decision-making process is separate from the actual execution of tasks. The agents communicate through a state-based graph system implemented using LangGraph, where the Supervisor Agent directs the flow of information and tasks between the specialized agents.
To set up Amazon Bedrock with LangGraph, refer to the following GitHub repo. The high-level steps are as follows:
- Install the required packages:
These packages are essential for AWS Bedrock integration:
boto
: AWS SDK for Python, handles AWS service communicationlangchain-aws
: Provides LangChain integrations for AWS services
- Import the modules:
- Create an LLM object:
LangGraph Studio configuration
This project uses a langgraph.json configuration file to define the application structure and dependencies. This file is essential for LangGraph Studio to understand how to run and visualize your agent graphs.
LangGraph Studio uses this file to build and visualize the agent workflows, allowing you to monitor and debug the multi-agent interactions in real time.
Testing and debugging
You’re now ready to test the multi-agent travel assistant. You can start the graph using the langgraph dev
command. It will start the LangGraph API server in development mode with hot reloading and debugging capabilities. As shown in the following screenshot, the interface provides a straightforward way to select which graph you want to test through the dropdown menu at the top left. The Manage Configuration button at the bottom lets you set up specific testing parameters before you begin. This development environment provides everything you need to thoroughly test and debug your multi-agent system with real-time feedback and monitoring capabilities.
Figure 3: LangGraph studio with Destination Agent recommendation
LangGraph Studio offers flexible configuration management through its intuitive interface. As shown in the following screenshot, you can create and manage multiple configuration versions (v1, v2, v3) for your graph execution. For example, in this scenario, we want to use user_id
to fetch historic use information. This versioning system makes it simple to track and switch between different test configurations while debugging your multi-agent system.
Figure 4: Runnable configuration details
In the preceding example, we set up the user_id
that tools can use to retrieve history or other details.
Let’s test the Planner Agent. This agent has the compare_and_recommend_destination
tool, which can check past travel data and recommend travel destinations based on the user profile. We use user_id
in the configuration so that can it be used by the tool.
LangGraph has concept of checkpoint memory that is managed using a thread. The following screenshot shows that you can quickly manage threads in LangGraph Studio.
Figure 5: View graph state in the thread
In this example, destination_agent
is using a tool; you can also check the tool’s output. Similarly, you can test flight_agent
and hotel_agent
to verify each agent.
When all the agents are working well, you’re ready to test the full workflow. You can evaluate the state a verify input and output of each agent.
The following screenshot shows the full view of the Supervisor Agent with its sub-agents.
Figure 6: Supervisor Agent with complete workflow
Considerations
Multi-agent architectures must consider agent coordination, state management, communication, output consolidation, and guardrails, maintaining processing context, error handling, and orchestration. Graph-based architectures offer significant advantages over linear pipelines, enabling complex workflows with nonlinear communication patterns and clearer system visualization. These structures allow for dynamic pathways and adaptive communication, ideal for large-scale deployments with simultaneous agent interactions. They excel in parallel processing and resource allocation but require sophisticated setup and might demand higher computational resources. Implementing these systems necessitates careful planning of system topology, robust monitoring, and well-designed fallback mechanisms for failed interactions.
When implementing multi-agent architectures in your organization, it’s crucial to align with your company’s established generative AI operations and governance frameworks. Prior to deployment, verify alignment with your organization’s AI safety protocols, data handling policies, and model deployment guidelines. Although this architectural pattern offers significant benefits, its implementation should be tailored to fit within your organization’s specific AI governance structure and risk management frameworks.
Clean up
Delete any IAM roles and policies created specifically for this post. Delete the local copy of this post’s code. If you no longer need access to an Amazon Bedrock FM, you can remove access from it. For instructions, see Add or remove access to Amazon Bedrock foundation models
Conclusion
The integration of LangGraph with Amazon Bedrock significantly advances multi-agent system development by providing a robust framework for sophisticated AI applications. This combination uses LangGraph’s orchestration capabilities and FMs in Amazon Bedrock to create scalable, efficient systems. It addresses challenges in multi-agent architectures through state management, agent coordination, and workflow orchestration, offering features like memory management, error handling, and human-in-the-loop capabilities. LangGraph Studio’s visualization and debugging tools enable efficient design and maintenance of complex agent interactions. This integration offers a powerful foundation for next-generation multi-agent systems, providing effective workflow handling, context maintenance, reliable results, and optimal resource utilization.
For the example code and demonstration discussed in this post, refer to the accompanying GitHub repository. You can also refer to the following GitHub repo for Amazon Bedrock multi-agent collaboration code samples.
About the Authors
Jagdeep Singh Soni is a Senior Partner Solutions Architect at AWS based in the Netherlands. He uses his passion for generative AI to help customers and partners build generative AI applications using AWS services. Jagdeep has 15 years of experience in innovation, experience engineering, digital transformation, cloud architecture, and ML applications.
Ajeet Tewari is a Senior Solutions Architect for Amazon Web Services. He works with enterprise customers to help them navigate their journey to AWS. His specialties include architecting and implementing scalable OLTP systems and leading strategic AWS initiatives.
Rupinder Grewal is a Senior AI/ML Specialist Solutions Architect with AWS. He currently focuses on serving of models and MLOps on Amazon SageMaker. Prior to this role, he worked as a Machine Learning Engineer building and hosting models. Outside of work, he enjoys playing tennis and biking on mountain trails.
Dynamic text-to-SQL for enterprise workloads with Amazon Bedrock Agents
Generative AI enables us to accomplish more in less time. Text-to-SQL empowers people to explore data and draw insights using natural language, without requiring specialized database knowledge. Amazon Web Services (AWS) has helped many customers connect this text-to-SQL capability with their own data, which means more employees can generate insights. In this process, we discovered that a different approach is needed in enterprise environments where there are over 100 tables, each with dozens of columns. We also learned that robust error handling is critical when errors occur in the generated SQL query based on users’ questions.
This post demonstrates how enterprises can implement a scalable agentic text-to-SQL solution using Amazon Bedrock Agents, with advanced error-handling tools and automated schema discovery to enhance database query efficiency. Our agent-based solution offers two key strengths:
- Automated scalable schema discovery – The schema and table metadata can be dynamically updated to generate SQL when the initial attempt to execute the query fails. This is important for enterprise customers who have a lot of tables and columns and many queries patterns.
- Automated error handling – The error message is directly fed back to the agent to improve the success rate of running queries.
You’ll find that these features help you tackle enterprise-scale database challenges while making your text-to-SQL experience more robust and efficient.
Use case
An agentic text-to-SQL solution can benefit enterprises with complex data structures. In this post, to understand the mechanics and benefits of the agentic text-to-SQL solution in a complex enterprise environment, imagine you’re a business analyst on the risk management team in a bank. You need to answer questions such as “Find all transactions that occurred in the United States and were flagged as fraudulent, along with the device information used for those transactions,” or “Retrieve all transactions for John Doe that occurred between January 1, 2023, and December 31, 2023, including fraud flags and merchant details.” For this, there are dozens—or sometimes hundreds—of tables that you need to not only be aware but also craft complex JOIN queries. The following diagram illustrates a sample table schema that might be needed for fraud investigations.
The key pain points of implementing a text-to-SQL solution in this complex environment include the following, but aren’t limited to:
- The amount of table information and schema will get excessive, which will entail manual updates on the prompts and limit its scale.
- As a result, the solution might require additional validation, impacting the quality and performance of generating SQL.
Now, consider our solution and how it addresses these problems.
Solution overview
Amazon Bedrock Agents seamlessly manages the entire process from question interpretation to query execution and result interpretation, without manual intervention. It seamlessly incorporates multiple tools, and the agent analyzes and responds to unexpected results. When queries fail, the agent autonomously analyzes error messages, modifies queries, and retries—a key benefit over static systems.
As of December 2024, the Amazon Bedrock with structured data feature provides built-in support for Amazon Redshift, offering seamless text-to-SQL capabilities without custom implementation. This is recommended as the primary solution for Amazon Redshift users.
Here are the capabilities that this solution offers:
- Executing text-to-SQL with autonomous troubleshooting:
- The agent can interpret natural language questions and convert them into SQL queries. It then executes these queries against an Amazon Athena database and returns the results.
- If a query execution fails, the agent can analyze the error messages returned by AWS Lambda and automatically retries the modified query when appropriate.
- Dynamic schema discovery
- Listing tables – The agent can provide a comprehensive list of the tables in the fraud detection database. This helps users understand the available data structures.
- Describing table schemas – Users can request detailed information about the schema of specific tables. The agent will provide column names, data types, and associated comments, giving users a clear understanding of the data structure.
The solution uses direct database tools for schema discovery instead of vector store–based retrieval or static schema definitions. This approach provides complete accuracy with lower operational overhead because it doesn’t require a synchronization mechanism and continually reflects the current database structure. Direct schema access through tools is more maintainable than hardcoded approaches that require manual updates, and it provides better performance and cost-efficiency through real-time database interaction.
The workflow is as follows:
- A user asks questions to Amazon Bedrock Agents.
- To serve the user’s questions, the agent determines the appropriate action to invoke:
- To execute the generated query with confidence, the agent will invoke the athena-query
- To confirm the database schema first, the agent will invoke the athena-schema-reader tool:
- Retrieve a list of available tables using its
/list_tables
endpoint. - Obtain the specific schema of a certain table using its
/describe_table
endpoint.
- Retrieve a list of available tables using its
- The Lambda function sends the query to Athena to execute.
- Athena queries the data from the Amazon Simple Storage Service (Amazon S3) data bucket and stores the query results in the S3 output bucket.
- The Lambda function retrieves and processes the results. If an error occurs:
- The Lambda function captures and formats the error message for the agent to understand.
- The error message is returned to Amazon Bedrock Agents.
- The agent analyzes the error message and tries to resolve it. To retry with the modified query, the agent may repeat steps 2–5.
- The agent formats and presents the final responses to the user.
The following architecture diagram shows this workflow.
Implementation walkthrough
To implement the solution, use the instructions in the following sections.
Intelligent error handling
Our agentic text-to-SQL solution implements practical error handling that helps agents understand and recover from issues. By structuring errors with consistent elements, returning nonbreaking errors where possible, and providing contextual hints, the system enables agents to self-correct and continue their reasoning process.
Agent instructions
Consider the key prompt components that make this solution unique. Intelligent error handling helps automate troubleshooting and refine the query by letting the agent understand the type of errors and what to do when error happens:
The prompt gives guidance on how to approach the errors. It also states that the error types and hints will be provided by Lambda. In the next section, we explain how Lambda processes the errors and passes them to the agent.
Implementation details
Here are some key examples from our error handling system:
These error types cover the main scenarios in text-to-SQL interactions:
- Query execution failures – Handles syntax errors and table reference issues, guiding the agent to use the correct table names and SQL syntax
- Result retrieval issues – Addresses permission problems and invalid column references, helping the agent verify the schema and access rights
- API validation – Verifies that basic requirements are met before query execution, minimizing unnecessary API calls
Each error type includes both an explanatory message and an actionable hint, enabling the agent to take appropriate corrective steps. This implementation shows how straightforward it can be to enable intelligent error handling; instead of handling errors traditionally within Lambda, we return structured error messages that the agent can understand and act upon.
Dynamic schema discovery
The schema discovery is pivotal to keeping Amazon Bedrock Agents consuming the most recent and relevant schema information.
Agent instructions
Instead of hardcoded database schema information, we allow the agent to discover the database schema dynamically. We’ve created two API endpoints for this purpose:
Implementation details
Based on the agent instructions, the agent will invoke the appropriate API endpoint.
The /list_tables
endpoint lists the tables in a specified database. This is particularly useful when you have multiple databases or frequently add new tables:
The /describe_table
endpoint reads a specific table’s schema with details. We use the “DESCRIBE
” command, which includes column comments along with other schema details. These comments help the agent better understand the meaning of the individual columns:
When implementing a dynamic schema reader, consider including comprehensive column descriptions to enhance the agent’s understanding of the data model.
These endpoints enable the agent to maintain an up-to-date understanding of the database structure, improving its ability to generate accurate queries and adapt to changes in the schema.
Demonstration
You might not experience the exact same response with the presented screenshot due to the indeterministic nature of large language models (LLMs).
The solution is available for you to deploy in your environment with sample data. Clone the repository from this GitHub link and follow the README guidance. After you deploy the two stacks—AwsText2Sql-DbStack and AwsText2Sql-AgentStack—follow these steps to put the solution in action:
- Go to Amazon Bedrock and select Agents.
- Select AwsText-to-SQL-AgentStack-DynamicAgent and test by asking questions in the Test window on the right.
- Example interactions:
- Which demographic groups or industries are most frequently targeted by fraudsters? Present aggregated data.
- What specific methods or techniques are commonly used by perpetrators in the reported fraud cases?
- What patterns or trends can we identify in the timing and location of fraud incidents?
- Show the details of customers who have made transactions with merchants located in Denver.
- Provide a list of all merchants along with the total number of transactions they’ve processed and the number of those transactions that were flagged as fraudulent.
- List the top five customers based on the highest transaction amounts they’ve made.
- Choose Show trace and examine each step to understand what tools are used and the agent’s rationale for approaching your question, as shown in the following screenshot.
- (Optional) You can test the Amazon Bedrock Agents code interpreter by enabling it in Agent settings. Follow the instructions at Enable code interpretation in Amazon Bedrock and ask the agent “Create a bar chart showing the top three cities that have the most fraud cases.”
Best practices
Building on our discussion of dynamic schema discovery and intelligent error handling, here are key practices to optimize your agentic text-to-SQL solution:
- Use dynamic schema discovery and error handling – Use endpoints such as
/list_tables
and/describe_table
to allow the agent to dynamically adapt to your database structure. Implement comprehensive error handling as demonstrated earlier, enabling the agent to interpret and respond to various error types effectively. - Balance static and dynamic information – Although dynamic discovery is powerful, consider including crucial, stable information in the prompt. This might include database names, key table relationships, or frequently used tables that rarely change. Striking this balance can improve performance without sacrificing flexibility.
- Tailoring to your environment – We designed the sample to always invoke
/list_tables
and/describe_table
, and your implementation might need adjustments. Consider your specific database engine’s capabilities and limitations. You might need to provide additional context beyond only column comments. Think about including database descriptions, table relationships, or common query patterns. The key is to give your agent as much relevant information as possible about your data model and business context, whether through extended metadata, custom endpoints, or detailed instructions. - Implement robust data protection – Although our solution uses Athena, which inherently doesn’t support write operations, it’s crucial to consider data protection in your specific environment. Start with clear instructions in the prompt (for example, “read-only operations only”), and consider additional layers such as Amazon Bedrock Guardrails or an LLM-based review system to make sure that generated queries align with your security policies.
- Implement layered authorization – To enhance data privacy when using Amazon Bedrock Agents, you can use services such as Amazon Verified Permissions to validate user access before the agent processes sensitive data. Pass user identity information, such as a JWT token, to the agent and its associated Lambda function, enabling fine-grained authorization checks against pre-built policies. By enforcing access control at the application level based on the Verified Permissions decision, you can mitigate unintended data disclosure and maintain strong data isolation. To learn more, refer to Enhancing data privacy with layered authorization for Amazon Bedrock Agents in the AWS Security Blog.
- Identify the best orchestration strategy for your agent – Amazon Bedrock provides you with an option to customize your agent’s orchestration strategy. Custom orchestration gives you full control of how you want your agents to handle multistep tasks, make decisions, and execute workflows.
By implementing these practices, you can create a text-to-SQL solution that not only uses the full potential of AI agents, it also maintains the security and integrity of your data systems.
Conclusion
In conclusion, the implementation of a scalable agentic text-to-SQL solution using AWS services offers significant advantages for enterprise workloads. By using automated schema discovery and robust error handling, organizations can efficiently manage complex databases with numerous tables and columns. The agent-based approach promotes dynamic query generation and refinement, leading to higher success rates in data querying. We’d like to invite you to try this solution out today! Visit GitHub to dive deeper into the details of the solution, and follow the deployment guide to test in your AWS account.
About the Authors
Jimin Kim is a Prototyping Architect on the AWS Prototyping and Cloud Engineering (PACE) team, based in Los Angeles. With specialties in Generative AI and SaaS, she loves helping her customers succeed in their business. Outside of work, she cherishes moments with her wife and three adorable calico cats.
Jiwon Yeom is a Solutions Architect at AWS, based in New York City. She focuses on Generative AI in the financial services industry and is passionate about helping customers build scalable, secure, and human-centered AI solutions. Outside of work, she enjoys writing, and exploring hidden bookstores.
DolphinGemma: How Google AI is helping decode dolphin communication
Dolphin researchers are using Gemma and Google Pixel phones to try to decipher how dolphins talk to one another.Read More
NVIDIA to Manufacture American-Made AI Supercomputers in US for First Time
NVIDIA is working with its manufacturing partners to design and build factories that, for the first time, will produce NVIDIA AI supercomputers entirely in the U.S.
Together with leading manufacturing partners, the company has commissioned more than a million square feet of manufacturing space to build and test NVIDIA Blackwell chips in Arizona and AI supercomputers in Texas.
NVIDIA Blackwell chips have started production at TSMC’s chip plants in Phoenix, Arizona. NVIDIA is building supercomputer manufacturing plants in Texas, with Foxconn in Houston and with Wistron in Dallas. Mass production at both plants is expected to ramp up in the next 12-15 months.
The AI chip and supercomputer supply chain is complex and demands the most advanced manufacturing, packaging, assembly and test technologies. NVIDIA is partnering with Amkor and SPIL for packaging and testing operations in Arizona.
Within the next four years, NVIDIA plans to produce up to half a trillion dollars of AI infrastructure in the United States through partnerships with TSMC, Foxconn, Wistron, Amkor and SPIL. These world-leading companies are deepening their partnership with NVIDIA, growing their businesses while expanding their global footprint and hardening supply chain resilience.
NVIDIA AI supercomputers are the engines of a new type of data center created for the sole purpose of processing artificial intelligence — AI factories that are the infrastructure powering a new AI industry. Tens of “gigawatt AI factories” are expected to be built in the coming years. Manufacturing NVIDIA AI chips and supercomputers for American AI factories is expected to create hundreds of thousands of jobs and drive trillions of dollars in economic security over the coming decades.
“The engines of the world’s AI infrastructure are being built in the United States for the first time,” said Jensen Huang, founder and CEO of NVIDIA. “Adding American manufacturing helps us better meet the incredible and growing demand for AI chips and supercomputers, strengthens our supply chain and boosts our resiliency.”
The company will utilize its advanced AI, robotics and digital twin technologies to design and operate the facilities, including NVIDIA Omniverse to create digital twins of factories and NVIDIA Isaac GR00T to build robots to automate manufacturing.