April 2025 – Page 4

AI’s potential to tackle the UK’s productivity puzzle

Google’s AI Works report shares how the UK can double AI usage, narrow the AI adoption gap, and boost worker productivity.Read More

Our new research into zebrafish brains could help us predict brain activity.

AI could one day help neuroscientists predict activity in the human brain, just like language models can predict the next word in a sentence.After more than a decade of …Read More

Enterprise-grade natural language to SQL generation using LLMs: Balancing accuracy, latency, and scale

This blog post is co-written with Renuka Kumar and Thomas Matthew from Cisco.

Enterprise data by its very nature spans diverse data domains, such as security, finance, product, and HR. Data across these domains is often maintained across disparate data environments (such as Amazon Aurora, Oracle, and Teradata), with each managing hundreds or perhaps thousands of tables to represent and persist business data. These tables house complex domain-specific schemas, with instances of nested tables and multi-dimensional data that require complex database queries and domain-specific knowledge for data retrieval.

Recent advances in generative AI have led to the rapid evolution of natural language to SQL (NL2SQL) technology, which uses pre-trained large language models (LLMs) and natural language to generate database queries in the moment. Although this technology promises simplicity and ease of use for data access, converting natural language queries to complex database queries with accuracy and at enterprise scale has remained a significant challenge. For enterprise data, a major difficulty stems from the common case of database tables having embedded structures that require specific knowledge or highly nuanced processing (for example, an embedded XML formatted string). As a result, NL2SQL solutions for enterprise data are often incomplete or inaccurate.

This post describes a pattern that AWS and Cisco teams have developed and deployed that is viable at scale and addresses a broad set of challenging enterprise use cases. The methodology allows for the use of simpler, and therefore more cost-effective and lower latency, generative models by reducing the processing required for SQL generation.

Specific challenges for enterprise-scale NL2SQL

Generative accuracy is paramount for NL2SQL use cases; inaccurate SQL queries might result in a sensitive enterprise data leak, or lead to inaccurate results impacting critical business decisions. Enterprise-scale data presents specific challenges for NL2SQL, including the following:

Complex schemas optimized for storage (and not retrieval) – Enterprise databases are often distributed in nature and optimized for storage and not for retrieval. As a result, the table schemas are complex, involving nested tables and multi-dimensional data structures (for example, a cell containing an array of data). As a further result, creating queries for retrieval from these data stores requires specific expertise and involves complex filtering and joins.
Diverse and complex natural language queries – The user’s natural language input might also be complex because they might refer to a list of entities of interest or date ranges. Converting the logical meaning of these user queries into a database query can lead to overly long and complex SQL queries due to the original design of the data schema.
LLM knowledge gap – NL2SQL language models are typically trained on data schemas that are publicly available for education purposes and might not have the necessary knowledge complexity required of large, distributed databases in production environments. Consequently, when faced with complex enterprise table schemas or complex user queries, LLMs have difficulty generating correct query statements because they have difficulty understanding interrelationships between the values and entities of the schema.
LLM attention burden and latency – Queries containing multi-dimensional data often involve multi-level filtering over each cell of the data. To generate queries for cases such as these, the generative model requires more attention to support attending to the increase in relevant tables, columns, and values; analyzing the patterns; and generating more tokens. This increases the LLM’s query generation latency, and the likelihood of query generation errors, because of the LLM misunderstanding data relationships and generating incorrect filter statements.
Fine-tuning challenge – One common approach to achieve higher accuracy with query generation is to fine-tune the model with more SQL query samples. However, it is non-trivial to craft training data for generating SQL for embedded structures within columns (for example, JSON, or XML), to handle sets of identifiers, and so on, to get baseline performance (which is the problem we are trying to solve in the first place). This also introduces a slowdown in the development cycle.

Solution design and methodology

The solution described in this post provides a set of optimizations that solve the aforementioned challenges while reducing the amount of work that has to be performed by an LLM for generating accurate output. This work extends upon the post Generating value from enterprise data: Best practices for Text2SQL and generative AI. That post has many useful recommendations for generating high-quality SQL, and the guidelines outlined might be sufficient for your needs, depending on the inherent complexity of the database schemas.

To achieve generative accuracy for complex scenarios, the solution breaks down NL2SQL generation into a sequence of focused steps and sub-problems, narrowing the generative focus to the appropriate data domain. Using data abstractions for complex joins and data structure, this approach enables the use of smaller and more affordable LLMs for the task. This approach results in reduced prompt size and complexity for inference, reduced response latency, and improved accuracy, while enabling the use of off-the-shelf pre-trained models.

Narrowing scope to specific data domains

The solution workflow narrows down the overall schema space into the data domain targeted by the user’s query. Each data domain corresponds to the set of database data structures (tables, views, and so on) that are commonly used together to answer a set of related user queries, for an application or business domain. The solution uses the data domain to construct prompt inputs for the generative LLM.

This pattern consists of the following elements:

Mapping input queries to domains – This involves mapping each user query to the data domain that is appropriate for generating the response for NL2SQL at runtime. This mapping is similar in nature to intent classification, and enables the construction of an LLM prompt that is scoped for each input query (described next).
Scoping data domain for focused prompt construction – This is a divide-and-conquer pattern. By focusing on the data domain of the input query, redundant information, such as schemas for other data domains in the enterprise data store, can be excluded. This might be considered as a form of prompt pruning; however, it offers more than prompt reduction alone. Reducing the prompt context to the in-focus data domain enables greater scope for few-shot learning examples, declaration of specific business rules, and more.
Augmenting SQL DDL definitions with metadata to enhance LLM inference – This involves enhancing the LLM prompt context by augmenting the SQL DDL for the data domain with descriptions of tables, columns, and rules to be used by the LLM as guidance on its generation. This is described in more detail later in this post.
Determine query dialect and connection information – For each data domain, the database server metadata (such as the SQL dialect and connection URI) is captured during use case onboarding and made available at runtime to be automatically included in the prompt for SQL generation and subsequent query execution. This enables scalability through decoupling the natural language query from the specific queried data source. Together, the SQL dialect and connectivity abstractions allow for the solution to be data source agnostic; data sources might be distributed within or across different clouds, or provided by different vendors. This modularity enables scalable addition of new data sources and data domains, because each is independent.

Managing identifiers for SQL generation (resource IDs)

Resolving identifiers involves extracting the named resources, as named entities, from the user’s query and mapping the values to unique IDs appropriate for the target data source prior to NL2SQL generation. This can be implemented using natural language processing (NLP) or LLMs to apply named entity recognition (NER) capabilities to drive the resolution process. This optional step has the most value when there are many named resources and the lookup process is complex. For instance, in a user query such as “In what games did Isabelle Werth, Nedo Nadi, and Allyson Felix compete?” there are named resources: ‘allyson felix’, ‘isabelle werth’, and ‘nedo nadi’. This step allows for rapid and precise feedback to the user when a resource can’t be resolved to an identifier (for example, due to ambiguity).

This optional process of handling many or paired identifiers is included to offload the burden on LLMs for user queries with challenging sets of identifiers to be incorporated, such as those that might come in pairs (such as ID-type, ID-value), or where there are many identifiers. Rather than having the generative LLM insert each unique ID into the SQL directly, the identifiers are made available by defining a temporary data structure (such as a temporary table) and a set of corresponding insert statements. The LLM is prompted with few-shot learning examples to generate SQL for the user query by joining with the temporary data structure, rather than attempt identity injection. This results in a simpler and more consistent query pattern for cases when there are one, many, or pairs of identifiers.

Handling complex data structures: Abstracting domain data structures

This step is aimed at simplifying complex data structures into a form that can be understood by the language model without having to decipher complex inter-data relationships. Complex data structures might appear as nested tables or lists within a table column, for instance.

We can define temporary data structures (such as views and tables) that abstract complex multi-table joins, nested structures, and more. These higher-level abstractions provide simplified data structures for query generation and execution. The top-level definitions of these abstractions are included as part of the prompt context for query generation, and the full definitions are provided to the SQL execution engine, along with the generated query. The resulting queries from this process can use simple set operations (such as IN, as opposed to complex joins) that LLMs are well trained on, thereby alleviating the need for nested joins and filters over complex data structures.

Augmenting data with data definitions for prompt construction

Several of the optimizations noted earlier require making some of the specifics of the data domain explicit. Fortunately, this only has to be done when schemas and use cases are onboarded or updated. The benefit is higher generative accuracy, reduced generative latency and cost, and the ability to support arbitrarily complex query requirements.

To capture the semantics of a data domain, the following elements are defined:

The standard tables and views in data schema, along with comments to describe the tables and columns.
Join hints for the tables and views, such as when to use outer joins.
Data domain-specific rules, such as which columns might not appear in a final select statement.
The set of few-shot examples of user queries and corresponding SQL statements. A good set of examples would include a wide variety of user queries for that domain.
Definitions of the data schemas for any temporary tables and views used in the solution.
A domain-specific system prompt that specifies the role and expertise that the LLM has, the SQL dialect, and the scope of its operation.
A domain-specific user prompt.
Additionally, if temporary tables or views are used for the data domain, a SQL script is required that, when executed, creates the desired temporary data structures needs to be defined. Depending on the use case, this can be a static or dynamically generated script.

Accordingly, the prompt for generating the SQL is dynamic and constructed based on the data domain of the input question, with a set of specific definitions of data structure and rules appropriate for the input query. We refer to this set of elements as the data domain context. The purpose of the data domain context is to provide the necessary prompt metadata for the generative LLM. Examples of this, and the methods described in the previous sections, are included in the GitHub repository. There is one context for each data domain, as illustrated in the following figure.

Bringing it all together: The execution flow

This section describes the execution flow of the solution. An example implementation of this pattern is available in the GitHub repository. Access the repository to follow along with the code.

To illustrate the execution flow, we use an example database with data about Olympics statistics and another with the company’s employee vacation schedule. We follow the execution flow for the domain regarding Olympics statistics using the user query “In what games did Isabelle Werth, Nedo Nadi, and Allyson Felix compete?” to show the inputs and outputs of the steps in the execution flow, as illustrated in the following figure.

Preprocess the request

The first step of the NL2SQL flow is to preprocess the request. The main objective of this step is to classify the user query into a domain. As explained earlier, this narrows down the scope of the problem to the appropriate data domain for SQL generation. Additionally, this step identifies and extracts the referenced named resources in the user query. These are then used to call the identity service in the next step to get the database identifiers for these named resources.

Using the earlier mentioned example, the inputs and outputs of this step are as follows:

user_query = "In what games did Isabelle Werth, Nedo Nadi and Allyson Felix compete?"
pre_processed_request = request_pre_processor.run(user_query)
domain = pre_processed_request[app_consts.DOMAIN]

# Output pre_processed_request:
  {'user_query': 'In what games did Isabelle Werth, Nedo Nadi and Allyson Felix compete?',
   'domain': 'olympics',
   'named_resources': {'allyson felix', 'isabelle werth', 'nedo nadi'} }

Resolve identifiers (to database IDs)

This step processes the named resources’ strings extracted in the previous step and resolves them to be identifiers that can be used in database queries. As mentioned earlier, the named resources (for example, “group22”, “user123”, and “I”) are looked up using solution-specific means, such through database lookups or an ID service.

The following code shows the execution of this step in our running example:

named_resources = pre_processed_request[app_consts.NAMED_RESOURCES]
if len(named_resources) > 0:
  identifiers = id_service_facade.resolve(named_resources)
  # add identifiers to the pre_processed_request object
  pre_processed_request[app_consts.IDENTIFIERS] = identifiers
else:
  pre_processed_request[app_consts.IDENTIFIERS] = []

# Output pre_processed_request:
  {'user_query': 'In what games did Isabelle Werth, Nedo Nadi and Allyson Felix compete?',
   'domain': 'olympics',
   'named_resources': {'allyson felix', 'isabelle werth', 'nedo nadi'},
   'identifiers': [ {'id': 34551, 'role': 32, 'name': 'allyson felix'},
   {'id': 129726, 'role': 32, 'name': 'isabelle werth'},
   {'id': 84026, 'role': 32, 'name': 'nedo nadi'} ] }

Prepare the request

This step is pivotal in this pattern. Having obtained the domain and the named resources along with their looked-up IDs, we use the corresponding context for that domain to generate the following:

A prompt for the LLM to generate a SQL query corresponding to the user query
A SQL script to create the domain-specific schema

To create the prompt for the LLM, this step assembles the system prompt, the user prompt, and the received user query from the input, along with the domain-specific schema definition, including new temporary tables created as well as any join hints, and finally the few-shot examples for the domain. Other than the user query that is received as in input, other components are based on the values provided in the context for that domain.

A SQL script for creating required domain-specific temporary structures (such as views and tables) is constructed from the information in the context. The domain-specific schema in the LLM prompt, join hints, and the few-shot examples are aligned with the schema that gets generated by running this script. In our example, this step is shown in the following code. The output is a dictionary with two keys, llm_prompt and sql_preamble. The value strings for these have been clipped here; the full output can be seen in the Jupyter notebook.

prepared_request = request_preparer.run(pre_processed_request)

# Output prepared_request:
{'llm_prompt': 'You are a SQL expert. Given the following SQL tables definitions, ...
CREATE TABLE games (id INTEGER PRIMARY KEY, games_year INTEGER, ...);
...
<example>
question: How many gold medals has Yukio Endo won? answer: ```{"sql":
"SELECT a.id, count(m.medal_name) as "count"
FROM athletes_in_focus a INNER JOIN games_competitor gc ...
WHERE m.medal_name = 'Gold' GROUP BY a.id;" }```
</example>
...
'sql_preamble': [ 'CREATE temp TABLE athletes_in_focus (row_id INTEGER
PRIMARY KEY, id INTEGER, full_name TEXT DEFAULT NULL);',
'INSERT INTO athletes_in_focus VALUES
(1,84026,'nedo nadi'), (2,34551,'allyson felix'), (3,129726,'isabelle werth');"]}

Generate SQL

Now that the prompt has been prepared along with any information necessary to provide the proper context to the LLM, we provide that information to the SQL-generating LLM in this step. The goal is to have the LLM output SQL with the correct join structure, filters, and columns. See the following code:

llm_response = llm_service_facade.invoke(prepared_request[ 'llm_prompt' ])
generated_sql = llm_response[ 'llm_output' ]

# Output generated_sql:
{'sql': 'SELECT g.games_name, g.games_year FROM athletes_in_focus a
JOIN games_competitor gc ON gc.person_id = a.id
JOIN games g ON gc.games_id = g.id;'}

Execute the SQL

After the SQL query is generated by the LLM, we can send it off to the next step. At this step, the SQL preamble and the generated SQL are merged to create a complete SQL script for execution. The complete SQL script is then executed against the data store, a response is fetched, and then the response is passed back to the client or end-user. See the following code:

sql_script = prepared_request[ 'sql_preamble' ] + [ generated_sql[ 'sql' ] ]
database = app_consts.get_database_for_domain(domain)
results = rdbms_service_facade.execute_sql(database, sql_script)

# Output results:
{'rdbms_output': [
('games_name', 'games_year'),
('2004 Summer', 2004),
...
('2016 Summer', 2016)],
'processing_status': 'success'}

Solution benefits

Overall, our tests have shown several benefits, such as:

High accuracy – This is measured by a string matching of the generated query with the target SQL query for each test case. In our tests, we observed over 95% accuracy for 100 queries, spanning three data domains.
High consistency – This is measured in terms of the same SQL generated being generated across multiple runs. We observed over 95% consistency for 100 queries, spanning three data domains. With the test configuration, the queries were accurate most of the time; a small number occasionally produced inconsistent results.
Low cost and latency – The approach supports the use of small, low-cost, low-latency LLMs. We observed SQL generation in the 1–3 second range using models Meta’s Code Llama 13B and Anthropic’s Claude Haiku 3.
Scalability – The methods that we employed in terms of data abstractions facilitate scaling independent of the number of entities or identifiers in the data for a given use case. For instance, in our tests consisting of a list of 200 different named resources per row of a table, and over 10,000 such rows, we measured a latency range of 2–5 seconds for SQL generation and 3.5–4.0 seconds for SQL execution.
Solving complexity – Using the data abstractions for simplifying complexity enabled the accurate generation of arbitrarily complex enterprise queries, which almost certainly would not be possible otherwise.

We attribute the success of the solution with these excellent but lightweight models (compared to a Meta Llama 70B variant or Anthropic’s Claude Sonnet) to the points noted earlier, with the reduced LLM task complexity being the driving force. The implementation code demonstrates how this is achieved. Overall, by using the optimizations outlined in this post, natural language SQL generation for enterprise data is much more feasible than would be otherwise.

AWS solution architecture

In this section, we illustrate how you might implement the architecture on AWS. The end-user sends their natural language queries to the NL2SQL solution using a REST API. Amazon API Gateway is used to provision the REST API, which can be secured by Amazon Cognito. The API is linked to an AWS Lambda function, which implements and orchestrates the processing steps described earlier using a programming language of the user’s choice (such as Python) in a serverless manner. In this example implementation, where Amazon Bedrock is noted, the solution uses Anthropic’s Claude Haiku 3.

Briefly, the processing steps are as follows:

Determine the domain by invoking an LLM on Amazon Bedrock for classification.
Invoke Amazon Bedrock to extract relevant named resources from the request.
After the named resources are determined, this step calls a service (the Identity Service) that returns identifier specifics relevant to the named resources for the task at hand. The Identity Service is logically a key/value lookup service, which might support for multiple domains.
This step runs on Lambda to create the LLM prompt to generate the SQL, and to define temporary SQL structures that will be executed by the SQL engine along with the SQL generated by the LLM (in the next step).
Given the prepared prompt, this step invokes an LLM running on Amazon Bedrock to generate the SQL statements that correspond to the input natural language query.
This step executes the generated SQL query against the target database. In our example implementation, we used an SQLite database for illustration purposes, but you could use another database server.

The final result is obtained by running the preceding pipeline on Lambda. When the workflow is complete, the result is provided as a response to the REST API request.

The following diagram illustrates the solution architecture.

Conclusion

In this post, the AWS and Cisco teams unveiled a new methodical approach that addresses the challenges of enterprise-grade SQL generation. The teams were able to reduce the complexity of the NL2SQL process while delivering higher accuracy and better overall performance.

Though we’ve walked you through an example use case focused on answering questions about Olympic athletes, this versatile pattern can be seamlessly adapted to a wide range of business applications and use cases. The demo code is available in the GitHub repository. We invite you to leave any questions and feedback in the comments.

About the authors

Renuka Kumar is a Senior Engineering Technical Lead at Cisco, where she has architected and led the development of Cisco’s Cloud Security BU’s AI/ML capabilities in the last 2 years, including launching first-to-market innovations in this space. She has over 20 years of experience in several cutting-edge domains, with over a decade in security and privacy. She holds a PhD from the University of Michigan in Computer Science and Engineering.

Toby Fotherby is a Senior AI and ML Specialist Solutions Architect at AWS, helping customers use the latest advances in AI/ML and generative AI to scale their innovations. He has over a decade of cross-industry expertise leading strategic initiatives and master’s degrees in AI and Data Science. Toby also leads a program training the next generation of AI Solutions Architects.

Shweta Keshavanarayana is a Senior Customer Solutions Manager at AWS. She works with AWS Strategic Customers and helps them in their cloud migration and modernization journey. Shweta is passionate about solving complex customer challenges using creative solutions. She holds an undergraduate degree in Computer Science & Engineering. Beyond her professional life, she volunteers as a team manager for her sons’ U9 cricket team, while also mentoring women in tech and serving the local community.

Thomas Matthew is an AL/ML Engineer at Cisco. Over the past decade, he has worked on applying methods from graph theory and time series analysis to solve detection and exfiltration problems found in Network security. He has presented his research and work at Blackhat and DevCon. Currently, he helps integrate generative AI technology into Cisco’s Cloud Security product offerings.

Daniel Vaquero is a Senior AI/ML Specialist Solutions Architect at AWS. He helps customers solve business challenges using artificial intelligence and machine learning, creating solutions ranging from traditional ML approaches to generative AI. Daniel has more than 12 years of industry experience working on computer vision, computational photography, machine learning, and data science, and he holds a PhD in Computer Science from UCSB.

Atul Varshneya is a former Principal AI/ML Specialist Solutions Architect with AWS. He currently focuses on developing solutions in the areas of AI/ML, particularly in generative AI. In his career of 4 decades, Atul has worked as the technology R&D leader in multiple large companies and startups.

Jessica Wu is an Associate Solutions Architect at AWS. She helps customers build highly performant, resilient, fault-tolerant, cost-optimized, and sustainable architectures.

AWS Field Experience reduced cost and delivered low latency and high performance with Amazon Nova Lite foundation model

AWS Field Experience (AFX) empowers Amazon Web Services (AWS) sales teams with generative AI solutions built on Amazon Bedrock, improving how AWS sellers and customers interact. The AFX team uses AI to automate tasks and provide intelligent insights and recommendations, streamlining workflows for both customer-facing roles and internal support functions. Their approach emphasizes operational efficiency and practical enhancements to daily processes.

Last year, AFX introduced Account Summaries as the first in a forthcoming lineup of tools designed to support and streamline sales workflows. By integrating structured and unstructured data—from sales collateral and customer engagements to external insights and machine learning (ML) outputs—the tool delivers summarized insights that offer a comprehensive view of customer accounts. These summaries provide concise overviews and timely updates, enabling teams to make informed decisions during customer interactions.

The following screenshot shows an example of Account Summary for a customer account, including an executive summary, company overview, and recent account changes.

Migration to the Amazon Nova Light foundation model

Initially, AFX selected a range of models available on Amazon Bedrock, each chosen for its specific capabilities tailored to the diverse requirements of various summary sections. This was done to optimize accuracy, response time, and cost efficiency. However, following the introduction of state-of-the-art Amazon Nova foundation models in December 2024, the AFX team consolidated all its generative AI workload onto the Nova Lite model to capitalize on its industry-leading price performance and optimized latency.

Since moving to the Nova Lite model, the AFX team has achieved a remarkable 90% reduction in inference costs. This has empowered them to scale operations and deliver greater business value that directly supports their mission of creating efficient, high-performing sales processes.

Because Account Summaries are often used by sellers during on-the-go customer engagements, response speed is critical for maintaining seller efficiency. The Nova Lite model’s ultra-low latency helps ensure that sellers receive fast, reliable responses, without compromising on the quality of the insights.

The AFX team also highlighted the seamless migration experience, noting that their existing prompting, reasoning, and evaluation criteria transferred smoothly for the Amazon Nova Lite model without requiring significant modifications. The combination of tailored prompt controls and authorized reference content creates a bounded response framework, minimizing hallucinations elements and inaccuracies.

Overall impact

Since using the Nova Lite model, over 15,600 summaries have been generated by 3,600 sellers—with 1,500 of those sellers producing more than four summaries each. Impressively, the generative AI Account Summaries have achieved a 72% favorability rate, underscoring strong seller confidence and widespread approval.

AWS sellers report saving an average of 35 minutes per summary, a benefit that significantly boosts productivity and allocates more time for customer engagements. Additionally, about one-third of surveyed sellers noted that the summaries positively influenced their customer interactions, and those using generative AI Account Summaries experienced a 4.9% increase in the value of opportunities created.

A member of the AFX team explained, “The Amazon Nova Lite model has significantly reduced our costs without compromising performance. It allowed us to get fast, reliable account summaries, making customer interaction more productive and impactful.”

Conclusion

The AFX team’s product migration to the Nova Lite model has delivered tangible enterprise value by enhancing sales workflows. By migrating to the Amazon Nova Lite model, the team has not only achieved significant cost savings and reduced latency, but has also empowered sellers with a leading intelligent and reliable solution. This process has translated into real-world benefits—saving time, simplifying research, and bolstering customer engagement—laying a solid foundation for ongoing business goals and sustained success.

Get started with Amazon Nova on the Amazon Bedrock console. Learn more at the Amazon Nova product page.

About the Authors

Anuj Jauhari is a Senior Product Marketing Manager at Amazon Web Services, where he helps customers realize value from innovations in generative AI.

Ashwin Nadagoudar is a Software Development Manager at Amazon Web Services, leading go-to-market (GTM) strategies and user journey initiatives with generative AI.

Sonciary Perez is a Principal Product Manager at Amazon Web Services, supporting the transformation of AWS Sales through AI-powered solutions that drive seller productivity and accelerate revenue growth.

Combine keyword and semantic search for text and images using Amazon Bedrock and Amazon OpenSearch Service

Customers today expect to find products quickly and efficiently through intuitive search functionality. A seamless search journey not only enhances the overall user experience, but also directly impacts key business metrics such as conversion rates, average order value, and customer loyalty. According to a McKinsey study, 78% of consumers are more likely to make repeat purchases from companies that provide personalized experiences. As a result, delivering exceptional search functionality has become a strategic differentiator for modern ecommerce services. With ever expanding product catalogs and increasing diversity of brands, harnessing advanced search technologies is essential for success.

Semantic search enables digital commerce providers to deliver more relevant search results by going beyond keyword matching. It uses an embeddings model to create vector embeddings that capture the meaning of the input query. This helps the search be more resilient to phrasing variations and to accept multimodal inputs such as text, image, audio, and video. For example, a user inputs a query containing text and an image of a product they like, and the search engine translates both into vector embeddings using a multimodal embeddings model and retrieves related items from the catalog using embeddings similarities. To learn more about semantic search and how Amazon Prime Video uses it to help customers find their favorite content, see Amazon Prime Video advances search for sports using Amazon OpenSearch Service.

While semantic search provides contextual understanding and flexibility, keyword search remains a crucial component for a comprehensive ecommerce search solution. At its core, keyword search provides the essential baseline functionality of accurately matching user queries to product data and metadata, making sure explicit product names, brands, or attributes can be reliably retrieved. This matching capability is vital, because users often have specific items in mind when initiating a search, and meeting these explicit needs with precision is important to deliver a satisfactory experience.

Hybrid search combines the strengths of keyword search and semantic search, enabling retailers to deliver more accurate and relevant results to their customers. Based on OpenSearch blog post, hybrid search improves result quality by 8–12% compared to keyword search and by 15% compared to natural language search. However, combining keyword search and semantic search presents significant complexity because different query types provide scores on different scales. Using Amazon OpenSearch Service hybrid search, customers can seamlessly integrate these approaches by combining relevance scores from multiple search types into one unified score.

OpenSearch Service is the AWS recommended vector database for Amazon Bedrock. It’s a fully managed service that you can use to deploy, operate, and scale OpenSearch on AWS. OpenSearch is a distributed open-source search and analytics engine composed of a search engine and vector database. OpenSearch Service can help you deploy and operate your search infrastructure with native vector database capabilities delivering as low as single-digit millisecond latencies for searches across billions of vectors, making it ideal for real-time AI applications. To learn more, see Improve search results for AI using Amazon OpenSearch Service as a vector database with Amazon Bedrock.

Multimodal embedding models like Amazon Titan Multimodal Embeddings G1, available through Amazon Bedrock, play a critical role in enabling hybrid search functionality. These models generate embeddings for both text and images by representing them in a shared semantic space. This allows systems to retrieve relevant results across modalities such as finding images using text queries or combining text with image inputs.

In this post, we walk you through how to build a hybrid search solution using OpenSearch Service powered by multimodal embeddings from the Amazon Titan Multimodal Embeddings G1 model through Amazon Bedrock. This solution demonstrates how you can enable users to submit both text and images as queries to retrieve relevant results from a sample retail image dataset.

Overview of solution

In this post, you will build a solution that you can use to search through a sample image dataset in the retail space, using a multimodal hybrid search system powered by OpenSearch Service. This solution has two key workflows: a data ingestion workflow and a query workflow.

Data ingestion workflow

The data ingestion workflow generates vector embeddings for text, images, and metadata using Amazon Bedrock and the Amazon Titan Multimodal Embeddings G1 model. Then, it stores the vector embeddings, text, and metadata in an OpenSearch Service domain.

In this workflow, shown in the following figure, we use a SageMaker JupyterLab notebook to perform the following actions:

Read text, images, and metadata from an Amazon Simple Storage Service (Amazon S3) bucket, and encode images in Base64 format.
Send the text, images, and metadata to Amazon Bedrock using its API to generate embeddings using the Amazon Titan Multimodal Embeddings G1 model.
The Amazon Bedrock API replies with embeddings to the Jupyter notebook.
Store both the embeddings and metadata in an OpenSearch Service domain.

Query workflow

In the query workflow, an OpenSearch search pipeline is used to convert the query input to embeddings using the embeddings model registered with OpenSearch. Then, within the OpenSearch search pipeline results processor, results of semantic search and keyword search are combined using the normalization processor to provide relevant search results to users. Search pipelines take away the heavy lifting of building score results normalization and combination outside your OpenSearch Service domain.

The workflow consists of the following steps shown in the following figure:

The client submits a query input containing text, a Base64 encoded image, or both to OpenSearch Service. Text submitted is used for both semantic and keyword search, and the image is used for semantic search.
The OpenSearch search pipeline performs the keyword search using textual inputs and a neural search using vector embeddings generated by Amazon Bedrock using Titan Multimodal Embeddings G1 model.
The normalization processor within the pipeline scales search results using techniques like min_max and combines keyword and semantic scores using arithmetic_mean.
Ranked search results are returned to the client.

Walkthrough overview

To deploy the solution, complete the following high-level steps:

Create a connector for Amazon Bedrock in OpenSearch Service.
Create an OpenSearch search pipeline and enable hybrid search.
Create an OpenSearch Service index for storing the multimodal embeddings and metadata.
Ingest sample data to the OpenSearch Service index.
Create OpenSearch Service query functions to test search functionality.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account.
Amazon Bedrock with Amazon Titan Multimodal Embeddings G1 enabled. For more information, see Access Amazon Bedrock foundation models.
An OpenSearch Service domain. For instructions, see Getting started with Amazon OpenSearch Service.
An Amazon SageMaker notebook. For instructions, see Quick setup for Amazon SageMaker.
Familiarity with AWS Identity and Access Management (IAM), Amazon Elastic Compute Cloud (Amazon EC2), OpenSearch Service, and SageMaker.
Familiarity with Python programming language.

The code is open source and hosted on GitHub.

Create a connector for Amazon Bedrock in OpenSearch Service

To use OpenSearch Service machine learning (ML) connectors with other AWS services, you need to set up an IAM role allowing access to that service. In this section, we demonstrate the steps to create an IAM role and then create the connector.

Create an IAM role

Complete the following steps to set up an IAM role to delegate Amazon Bedrock permissions to OpenSearch Service:

Add the following policy to the new role to allow OpenSearch Service to invoke the Amazon Titan Multimodal Embeddings G1 model:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": "arn:aws:bedrock:region:account-id:foundation-model/amazon.titan-embed-image-v1"
        }
    ]
}

Modify the role trust policy as follows. You can follow the instructions in IAM role management to edit the trust relationship of the role.

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Effect": "Allow",
			"Principal": {
			"Service": "opensearchservice.amazonaws.com"
		},
			"Action": "sts:AssumeRole"
		}
	]
}

Connect an Amazon Bedrock model to OpenSearch

After you create the role, you can use the Amazon Resource Name (ARN) of the role to define the constant in the SageMaker notebook along with the OpenSearch domain endpoint. Complete the following steps:

Register a model group. Note the model group ID returned in the response to register a model in a later step.
Create a connector, which facilitates registering and deploying external models in OpenSearch. The response will contain the connector ID.
Register the external model to the model group and deploy the model. In this step, you register and deploy the model at the same time—by setting up deploy=true, the registered model is deployed as well.

Create an OpenSearch search pipeline and enable hybrid search

A search pipeline runs inside the OpenSearch Service domain and can have three types of processors: search request processor, search response processor, and search phase result processor. For our search pipeline, we use the search phase result processor, which runs between the search phases at the coordinating node level. The processor uses the normalization processor and normalizes the score from keyword and semantic search. For hybrid search, min-max normalization and arithmetic_mean combination techniques are preferred, but you can also try L2 normalization and geometric_mean or harmonic_mean combination techniques depending on your data and use case.

payload={
	"phase_results_processors": [
		{
			"normalization-processor": {
				"normalization": {
					"technique": "min_max"
				},
				"combination": {
					"technique": "arithmetic_mean",
					"parameters": {
						"weights": [
							OPENSEARCH_KEYWORD_WEIGHT,
							1 - OPENSEARCH_KEYWORD_WEIGHT
						]
					}
				}
			}
		}
	]
}
response = requests.put(
url=f"{OPENSEARCH_ENDPOINT}/_search/pipeline/"+OPENSEARCH_SEARCH_PIPELINE_NAME,
		json=payload,
		headers={"Content-Type": "application/json"},
		auth=open_search_auth
)

Create an OpenSearch Service index for storing the multimodal embeddings and metadata

For this post, we use the Amazon Berkley Objects Dataset, which is a collection of 147,702 product listings with multilingual metadata and 398,212 unique catalog images. In this example, we only use Shoes and listings that are in en_US as shown in section Prepare listings dataset for Amazon OpenSearch ingestion of the notebook.

Use the following code to create an OpenSearch index to ingest the sample data:

response = opensearch_client.indices.create(
	index=OPENSEARCH_INDEX_NAME,
	body={
		"settings": {
			"index.knn": True,
			"number_of_shards": 2
		},
		"mappings": {
			"properties": {
				"amazon_titan_multimodal_embeddings": {
					"type": "knn_vector",
					"dimension": 1024,
					"method": {
						"name": "hnsw",
						"engine": "lucene",
						"parameters": {}
					}
				}
			}
		}
	}
)

Ingest sample data to the OpenSearch Service index

In this step, you select the relevant features used for generating embeddings. The images are converted to Base64. The combination of a selected feature and a Base64 image is used to generate multimodal embeddings, which are stored in the OpenSearch Service index along with the metadata using a OpenSearch bulk operation, and ingest listings in batches.

Create OpenSearch Service query functions to test search functionality

With the sample data ingested, you can run queries against this data to test the hybrid search functionality. To facilitate this process, we created helper functions to perform the queries in the query workflow section of the notebook. In this section, you explore specific parts of the functions that differentiate the search methods.

Keyword search

For keyword search, send the following payload to the OpenSearch domain search endpoint:

payload = {
	"query": {
		"multi_match": { 
			"query": query_text,
		}
	},
}

Semantic search

For semantic search, you can send the text and image as part of the payload. Model_id in the request is the external embeddings model that you connected earlier. OpenSearch will invoke the model and convert text and image to embeddings.

payload = {
	"query": {
		"neural": {
			"vector_embedding": {
				"query_text": query_text,
				"query_image": query_jpg_image,
				"model_id": model_id,
				"k": 5
			}
		}
	}
}

Hybrid search

This method uses the OpenSearch pipeline you created. The payload has both the semantic and neural search.

payload = {
"query": {
	"hybrid": {
		"queries": [
				{
					"multi_match": { 
							"query": query_text,
						}
				},
				{
					"neural": {
						"vector_embedding": {
							"query_text": query_text,
							"query_image": query_jpg_image,
							"model_id": model_id,
							"k": 5
						}
					}
				}
			]
		}
	}
}

Test search methods

To compare the multiple search methods, you can query the index using query_text which provides specific information about the desired output, and query_jpg_image which provides the overall abstraction of the desired style of the output.

query_text = "leather sandals in Petal Blush"
search_image_path = '16/16e48774.jpg'

Keyword search

The following output lists the top three keyword search results. The keyword search successfully located leather sandals in the color Petal Blush, but it didn’t take the desired style into consideration.

--------------------------------------------------------------------------------------------------------------------------------
Score: 8.4351 	 Item ID: B01MYDNG7C
Item Name: Amazon Brand - The Fix Women's Cantu Ruffle Ankle Wrap Dress Sandal, Petal Blush, 9.5 B US
Fabric Type: Leather	 Material: None 	 Color: Petal Blush	 Style: Cantu Ruffle Ankle Wrap Sandal
--------------------------------------------------------------------------------------------------------------------------------
Score: 8.4351 	 Item ID: B06XH8M37Q
Item Name: Amazon Brand - The Fix Women's Farah Single Buckle Platform Dress Sandal, Petal Blush, 6.5 B US
Fabric Type: 100% Leather	 Material: None 	 Color: Petal Blush	 Style: Farah Single Buckle Platform Sandal
--------------------------------------------------------------------------------------------------------------------------------
Score: 8.4351 	 Item ID: B01MSCV2YB
Item Name: Amazon Brand - The Fix Women's Conley Lucite Heel Dress Sandal,Petal Blush,7.5 B US
Fabric Type: Leather	 Material: Suede 	 Color: Petal Blush	 Style: Conley Lucite Heel Sandal
--------------------------------------------------------------------------------------------------------------------------------

Semantic search

Semantic search successfully located leather sandal and considered the desired style. However, the similarity to the provided images took priority over the specific color provided in query_text.

--------------------------------------------------------------------------------------------------------------------------------
Score: 0.7072 	 Item ID: B01MZF96N7
Item Name: Amazon Brand - The Fix Women's Bonilla Block Heel Cutout Tribal Dress Sandal, Havana Tan, 7 B US
Fabric Type: Leather	 Material: Suede 	 Color: Havana Tan	 Style: Bonilla Block Heel Cutout Tribal Sandal
--------------------------------------------------------------------------------------------------------------------------------
Score: 0.7018 	 Item ID: B01MUG3C0Q
Item Name: Amazon Brand - The Fix Women's Farrell Triangle-Cutout Square Toe Flat Dress Sandal, Light Rose/Gold, 7.5 B US
Fabric Type: Synthetic	 Material: Leather 	 Color: Light Rose/Gold	 Style: Farrell Cutout Tribal Square Toe Flat Sandal
--------------------------------------------------------------------------------------------------------------------------------
Score: 0.6858 	 Item ID: B01MYDNG7C
Item Name: Amazon Brand - The Fix Women's Cantu Ruffle Ankle Wrap Dress Sandal, Petal Blush, 9.5 B US
Fabric Type: Leather	 Material: None 	 Color: Petal Blush	 Style: Cantu Ruffle Ankle Wrap Sandal
--------------------------------------------------------------------------------------------------------------------------------

Hybrid search

Hybrid search returned similar results to the semantic search because they use the same embeddings model. However, by combining the output of keyword and semantic searches, the ranking of the Petal Blush sandal that most closely matches query_jpg_image increases, moving it the top of the results list.

--------------------------------------------------------------------------------------------------------------------------------
Score: 0.6838 	 Item ID: B01MYDNG7C
Item Name: Amazon Brand - The Fix Women's Cantu Ruffle Ankle Wrap Dress Sandal, Petal Blush, 9.5 B US
Fabric Type: Leather	 Material: None 	 Color: Petal Blush	 Style: Cantu Ruffle Ankle Wrap Sandal
--------------------------------------------------------------------------------------------------------------------------------
Score: 0.6 	 Item ID: B01MZF96N7
Item Name: Amazon Brand - The Fix Women's Bonilla Block Heel Cutout Tribal Dress Sandal, Havana Tan, 7 B US
Fabric Type: Leather	 Material: Suede 	 Color: Havana Tan	 Style: Bonilla Block Heel Cutout Tribal Sandal
--------------------------------------------------------------------------------------------------------------------------------
Score: 0.5198 	 Item ID: B01MUG3C0Q
Item Name: Amazon Brand - The Fix Women's Farrell Triangle-Cutout Square Toe Flat Dress Sandal, Light Rose/Gold, 7.5 B US
Fabric Type: Synthetic	 Material: Leather 	 Color: Light Rose/Gold	 Style: Farrell Cutout Tribal Square Toe Flat Sandal
--------------------------------------------------------------------------------------------------------------------------------

Clean up

After you complete this walkthrough, clean up all the resources you created as part of this post. This is an important step to make sure you don’t incur any unexpected charges. If you used an existing OpenSearch Service domain, in the Cleanup section of the notebook, we provide suggested cleanup actions, including delete the index, un-deploy the model, delete the model, delete the model group, and delete the Amazon Bedrock connector. If you created an OpenSearch Service domain exclusively for this exercise, you can bypass these actions and delete the domain.

Conclusion

In this post, we explained how to implement multimodal hybrid search by combining keyword and semantic search capabilities using Amazon Bedrock and Amazon OpenSearch Service. We showcased a solution that uses Amazon Titan Multimodal Embeddings G1 to generate embeddings for text and images, enabling users to search using both modalities. The hybrid approach combines the strengths of keyword search and semantic search, delivering accurate and relevant results to customers.

We encourage you to test the notebook in your own account and get firsthand experience with hybrid search variations. In addition to the outputs shown in this post, we provide a few variations in the notebook. If you’re interested in using custom embeddings models in Amazon SageMaker AI instead, see Hybrid Search with Amazon OpenSearch Service. If you want a solution that offers semantic search only, see Build a contextual text and image search engine for product recommendations using Amazon Bedrock and Amazon OpenSearch Serverless and Build multimodal search with Amazon OpenSearch Service.

About the Authors

Renan Bertolazzi is an Enterprise Solutions Architect helping customers realize the potential of cloud computing on AWS. In this role, Renan is a technical leader advising executives and engineers on cloud solutions and strategies designed to innovate, simplify, and deliver results.

Birender Pal is a Senior Solutions Architect at AWS, where he works with strategic enterprise customers to design scalable, secure and resilient cloud architectures. He supports digital transformation initiatives with a focus on cloud-native modernization, machine learning, and Generative AI. Outside of work, Birender enjoys experimenting with recipes from around the world.

Sarath Krishnan is a Senior Solutions Architect with Amazon Web Services. He is passionate about enabling enterprise customers on their digital transformation journey. Sarath has extensive experience in architecting highly available, scalable, cost-effective, and resilient applications on the cloud. His area of focus includes DevOps, machine learning, MLOps, and generative AI.

Specific challenges for enterprise-scale NL2SQL

Solution design and methodology

Narrowing scope to specific data domains

Managing identifiers for SQL generation (resource IDs)

Handling complex data structures: Abstracting domain data structures

Augmenting data with data definitions for prompt construction

Bringing it all together: The execution flow

Preprocess the request

Resolve identifiers (to database IDs)

Prepare the request

Generate SQL

Execute the SQL

Solution benefits

AWS solution architecture

Conclusion

About the authors

Migration to the Amazon Nova Light foundation model

Overall impact

Conclusion

About the Authors

Overview of solution

Data ingestion workflow

Query workflow

Walkthrough overview

Prerequisites

Create a connector for Amazon Bedrock in OpenSearch Service

Create an IAM role

Connect an Amazon Bedrock model to OpenSearch

Create an OpenSearch search pipeline and enable hybrid search

Create an OpenSearch Service index for storing the multimodal embeddings and metadata

Ingest sample data to the OpenSearch Service index

Create OpenSearch Service query functions to test search functionality

Keyword search

Semantic search

Hybrid search

Test search methods

Keyword search

Semantic search

Hybrid search

Clean up

Conclusion

About the Authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.