Fine-tune a BGE embedding model using synthetic data from Amazon Bedrock

Fine-tune a BGE embedding model using synthetic data from Amazon Bedrock

Have you ever faced the challenge of obtaining high-quality data for fine-tuning your machine learning (ML) models? Generating synthetic data can provide a robust solution, especially when real-world data is scarce or sensitive. For instance, when developing a medical search engine, obtaining a large dataset of real user queries and relevant documents is often infeasible due to privacy concerns surrounding personal health information. However, synthetic data generation techniques can be employed to create realistic query-document pairs that resemble authentic user searches and relevant medical content, enabling the training of accurate retrieval models while preserving user privacy.

In this post, we demonstrate how to use Amazon Bedrock to create synthetic data, fine-tune a BAAI General Embeddings (BGE) model, and deploy it using Amazon SageMaker.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

You can find the full code associated with this post at the accompanying GitHub repository.

Solution overview

BGE stands for Beijing Academy of Artificial Intelligence (BAAI) General Embeddings. It is a family of embedding models with a BERT-like architecture, designed to produce high-quality embeddings from text data. The BGE models come in three sizes:

  • bge-large-en-v1.5: 1.34 GB, 1,024 embedding dimensions
  • bge-base-en-v1.5: 0.44 GB, 768 embedding dimensions
  • bge-small-en-v1.5: 0.13 GB, 384 embedding dimensions

For comparing two pieces of text, the BGE model functions as a bi-encoder architecture, processing each piece of text through the same model in parallel to obtain their embeddings.

Generating synthetic data can significantly enhance the performance of your models by providing ample, high-quality training data without the constraints of traditional data collection methods. This post guides you through generating synthetic data using Amazon Bedrock, fine-tuning a BGE model, evaluating its performance, and deploying it with SageMaker.

The high-level steps are as follows:

  1. Set up an Amazon SageMaker Studio environment with the necessary AWS Identity and Access Management (IAM) policies.
  2. Open SageMaker Studio.
  3. Create a Conda environment for dependencies.
  4. Generate synthetic data using Meta Llama 3 on Amazon Bedrock.
  5. Fine-tune the BGE embedding model with the generated data.
  6. Merge the model weights.
  7. Test the model locally.
  8. Evaluate and compare the fine-tuned model.
  9. Deploy the model using SageMaker and Hugging Face Text Embeddings Inference (TEI).
  10. Test the deployed model.

Prerequisites

First-time users need an AWS account and an IAM user role with the following permission policies attached:

  • AmazonSageMakerFullAccess
  • IAMFullAccess (or a custom IAM policy that grants iam:GetRole and iam:AttachRolePolicy permissions for the specific SageMaker execution role and the required policies: AmazonBedrockFullAccess, AmazonS3FullAccess, and AmazonEC2ContainerRegistryFullAccess)

Create a SageMaker Studio domain and user

Complete the following steps to create a SageMaker Studio domain and user:

  1. On the SageMaker console, under Admin configurations in the navigation pane, choose Domains.
  2. Choose Create domain.

SageMaker Domains

  1. Choose Set up for single user (Quick setup). Your domain, along with an IAM role with the AmazonSageMakerFullAccess policy, will be automatically created.
  2. After the domain is prepared, choose Add user.
  3. Provide a name for the new user profile and choose the IAM role (use the default role you created in step 4).
  4. Choose Next on the next three screens, then choose Submit.

After you add the user profile, update the IAM role.

  1. On the IAM console, choose Roles in the navigation pane.
  2. Navigate to the Domain settings page of your newly created domain and locate the IAM role created earlier (it should have a name similar to AmazonSageMaker-ExecutionRole-YYYYMMDDTHHMMSS).
  3. On the role details page, on the Add permissions drop down menu, choose Attach policies.
  4. Select the following policies and Add permissions to add them to the role.
    1. AmazonBedrockFullAccess
    2. AmazonS3FullAccess
    3. AmazonEC2ContainerRegistryFullAccess

Open SageMaker Studio

To open SageMaker studio, complete the following steps:

  1. On the SageMaker console, choose Studio in the navigation pane.
  2. On the SageMaker Studio landing page, select the newly created user profile and choose Open Studio.
  3. After you launch SageMaker Studio, choose JupyterLab.
  4. In the top-right corner, choose Create JupyterLab Space.
  5. Give the space a name, such as embedding-finetuning, and choose Create space.
  6. Change the instance type to ml.g5.2xlarge and the Storage (GB) value to 100.

You may need to request a service quota increase before being able to select the ml.g5.2xlarge instance type.

  1. Choose Run space and wait a few minutes for the space to start.
  2. Choose Open JupyterLab.

Set up a Conda environment in SageMaker Studio

Next, you create a Conda environment with the necessary dependencies for running the code in this post. You can use the environment.yml file provided in the code repository to create this.

  1. Open the previous terminal, or choose Terminal in Launcher to open a new one.
  2. Clone the code repository, and enter the directory:
    # TODO: replace this with final public version 
    git clone https://gitlab.aws.dev/austinmw/Embedding-Finetuning-Blog

  3. Create the Conda environment by running the following command (this step will take several minutes to complete):
    conda env create -f environment.yml

  4. Activate the environment by running the following commands one by one:
    conda init source ~/.bashrc conda activate ft-embedding-blog

  5. Add the newly created Conda environment to Jupyter:
    python -m ipykernel install --user --name=ft-embedding-blog

  6. From the Launcher, open the repository folder named embedding-finetuning-blog and open the file Embedding Blog.ipynb.
  7. On the Kernel drop down menu in the notebook, choose Change Kernel, then choose ft-embedding-blog.

You may need to refresh your browser if it doesn’t show up as available.

Now you have a Jupyter notebook that includes the necessary dependencies required to run the code in this post.

Generate synthetic data using Amazon Bedrock

We start by adapting LlamaIndex’s embedding model fine-tuning guide to use Amazon Bedrock to generate synthetic data for fine-tuning. We use the sample data and evaluation procedures outlined in this guide.

To generate synthetic data, we use the Meta Llama3-70B-Instruct model on Amazon Bedrock, which offers great a price and performance. The process involves the following steps:

  1. Download the training and validation data, which consists of PDFs from Uber and Lyft 10K documents. These PDFs will serve as the source for generating document chunks.
  2. Parse the PDFs into plain text chunks using LlamaIndex functionality. The Lyft corpus will be used as the training dataset, and the Uber corpus will be used as the evaluation dataset.
  3. Clean the parsed data by removing samples that are too short or contain special characters that could cause errors during training.
  4. Set up the large language model (LLM) Meta Llama3-70B-Instruct and define a prompt template for generating questions based on the context provided by the document chunks.
  5. Use the LLM to generate synthetic question answer pairs for each document chunk. The document chunks serve as the context, and the generated questions are designed to be answerable using the information within the corresponding chunk.
  6. Save the generated synthetic data in JSONL format, where each line is a dictionary containing the query (generated question), positive passages (the document chunk used as context), and negative passages (if available). This format is compatible with the FlagEmbedding library, which will be used for fine-tuning the BGE model.

By generating synthetic question-answer pairs using the Meta Llama3-70B-Instruct model and the document chunks from the Uber and Lyft datasets, you create a high-quality dataset that can be used to fine-tune the BGE embedding model for improved performance in retrieval tasks.

Fine-Tune the BGE embedding model

For fine-tuning, you can use the bge-base-en-v1.5 model, which offers a good balance between performance and resource requirements. You define retrieval instructions for the query to enhance the model’s performance during fine-tuning and inference.

Before fine-tuning, generate hard negatives using a predefined script available from the FlagEmbedding library. Hard negative mining is an essential step that helps improve the model’s ability to distinguish between similar but not identical text pairs. By including hard negatives in the training data, you encourage the model to learn more discriminative embeddings.

You then initiate the fine-tuning process using the FlagEmbedding library, which trains the model with InfoNCE contrastive loss. The library provides a convenient way to fine-tune the BGE model using the synthetic data you generated earlier. During fine-tuning, the model learns to produce embeddings that bring similar query-document pairs closer together in the embedding space while pushing dissimilar pairs further apart.

Merge the model weights

After fine-tuning, you can use the LM-Cocktail library to merge the fine-tuned weights with the original weights of the BGE model. LM-Cocktail creates new model parameters by calculating a weighted average of the parameters from two or more models. This process helps mitigate the problem of catastrophic forgetting, where the model might lose its previously learned knowledge during fine-tuning.

By merging the fine-tuned weights with the original weights, you obtain a model that benefits from the specialized knowledge acquired during fine-tuning while retaining the general language understanding capabilities of the original model. This approach often leads to improved performance compared to using either the fine-tuned or the original model alone.

Test the model locally

Before you evaluate the fine-tuned BGE model on the validation set, it’s a good idea to perform a quick local test to make sure the model behaves as expected. You can do this by comparing the cosine similarity scores for pairs of queries and documents that you expect to have high similarity and those that you expect to have low similarity.

To test the model, prepare two small sets of document-query pairs:

  • Similar document-query pairs – These are pairs where the document and query are closely related and should have a high cosine similarity score
  • Different document-query pairs – These are pairs where the document and query are not closely related and should have a lower cosine similarity score

Then use the fine-tuned BGE model to generate embeddings for each document and query in both sets of pairs. By calculating the cosine similarity between the document and query embeddings for each pair, you can assess how well the model captures the semantic similarity between them.

When comparing the cosine similarity scores, we expect to see higher scores for the similar document-query pairs compared to the different document-query pairs. This would indicate that the fine-tuned model is able to effectively distinguish between similar and dissimilar pairs, assigning higher similarity scores to the pairs that are more closely related.

If the local testing results align with your expectations, it provides a quick confirmation that the fine-tuned model is performing as intended. You can then move on to a more comprehensive evaluation of the model’s performance using the validation set.

However, if the local testing results are not satisfactory, it may be necessary to investigate further and identify potential issues with the fine-tuning process or the model architecture before proceeding to the evaluation step.

This local testing step serves as a quick sanity check to make sure the fine-tuned model is behaving reasonably before investing time and resources in a full evaluation on the validation set. It can help catch obvious issues early on and provide confidence in the model’s performance before moving forward with more extensive testing.

Evaluate the model

We evaluate the performance of the fine-tuned BGE model using two procedures:

  • Hit rate – This straightforward metric assesses the model’s performance by checking if the retrieved results for a given query include the relevant document. You calculate the hit rate by taking each query-document pair from the validation set, retrieving the top-K documents using the fine-tuned model, and verifying if the relevant document is present in the retrieved results.
  • InformationRetrievalEvaluator – This procedure, provided by the sentence-transformers library, offers a more comprehensive suite of metrics for detailed performance analysis. It evaluates the model on various information retrieval tasks and provides metrics such as Mean Average Precision (MAP), Normalized Discounted Cumulative Gain (NDCG), and more. However, InformationRetrievalEvaluator is only compatible with sentence-transformers

To get a better understanding of the fine-tuned model’s performance, you can compare it against the base (non-fine-tuned) BGE model and the Amazon Titan Text Embeddings V2 model on Amazon Bedrock. This comparison helps you assess the effectiveness of the fine-tuning process and determine if the fine-tuned model outperforms the baseline models.

By evaluating the model using both the hit rate and InformationRetrievalEvaluator (when applicable), you gain insights into its performance on different aspects of retrieval tasks and can make informed decisions about its suitability for your specific use case.

Deploy the model

To deploy the fine-tuned BGE model, you can deploy the Hugging Face Text Embedding Inference (TEI) container to SageMaker. TEI is a high-performance toolkit for deploying and serving popular text embeddings and sequence classification models, including support for FlagEmbedding models. It provides a fast and efficient serving framework for your fine-tuned model on SageMaker.

The deployment process involves the following steps:

  1. Upload the fine-tuned model to the Hugging Face Hub or Amazon Simple Storage Service (Amazon S3).
  2. Retrieve the new Hugging Face Embedding Container image URI.
  3. Deploy the model to SageMaker.
  4. Optionally, set up auto scaling for the endpoint to automatically adjust the number of instances based on the incoming request traffic. Auto scaling helps make sure the endpoint can handle varying workloads efficiently.

By deploying the fine-tuned BGE model using TEI on SageMaker, you can integrate it into your applications and use it for efficient text embedding and retrieval tasks. The deployment process outlined in this post provides a scalable and manageable solution for serving the model in production environments.

Test the deployed model

After you deploy the fine-tuned BGE model using TEI on SageMaker, you can test the model by sending requests to the SageMaker endpoint and evaluating the model’s responses.

To test the deployed model, you can run the model and optionally add instructions. If the model was fine-tuned with instructions for queries or passages, it’s important to match the instructions used during fine-tuning when performing inference. In this case, you used instructions for queries but not for passages, so you can follow the same approach during testing.

To test the deployed model, you send queries to the SageMaker endpoint using the tei_endpoint.predict() method provided by the SageMaker SDK. You prepare a batch of queries, optionally prepending any instructions used during fine-tuning, and pass them to the predict() method. The model generates embeddings for each query, which are returned in the response.

By examining the generated embeddings, you can assess the quality and relevance of the model’s output. You can compare the embeddings of similar queries and verify that they have high cosine similarity scores, indicating that the model accurately captures the semantic meaning of the queries.

Additionally, you can measure the average response time of the deployed model to evaluate its performance and make sure it adheres to the required latency constraints for your application.

Integrate the model with LangChain

Additionally, you can integrate the deployed BGE model with LangChain, a library for building applications with language models. To do this, you create a custom content handler that inherits from LangChain’s EmbeddingsContentHandler. This handler implements methods to convert input data into a format compatible with the SageMaker endpoint and converts the endpoint’s output into embeddings.

You then create a SagemakerEndpointEmbeddings instance, specifying the endpoint name, SageMaker runtime client, and custom content handler. This instance wraps the deployed BGE model and integrates it with LangChain workflows.

Using the embed_documents method of the SagemakerEndpointEmbeddings instance, you generate embeddings for documents or queries, which can be used for downstream tasks like similarity search, clustering, or classification.

Integrating the deployed BGE model with LangChain allows you to take advantage of LangChain’s features and abstractions to build sophisticated language model applications that utilize the fine-tuned BGE embeddings. Testing the integration makes sure the model performs as expected and can be seamlessly incorporated into real-world workflows and applications.

Clean up

After you’re finished with the deployed endpoint, don’t forget to delete it to prevent unexpected SageMaker costs.

Conclusion

In this post, we walked through the process of fine-tuning a BGE embedding model using synthetic data generated from Amazon Bedrock. We covered key steps, including generating high-quality synthetic data, fine-tuning the model, evaluating performance, and deploying the optimized model using Amazon SageMaker.

By using synthetic data and advanced fine-tuning techniques like hard negative mining and model merging, you can significantly enhance the performance of embedding models for your specific use cases. This approach is especially valuable when real-world data is limited or difficult to obtain.

To get started, we encourage you to experiment with the code and techniques demonstrated in this post. Adapt them to your own datasets and models to unlock performance improvements in your applications. You can find all the code used in this post in our GitHub repository.

Resources


About the Authors

austinmw photoAustin Welch is a Senior Applied Scientist at Amazon Web Services Generative AI Innovation Center.

bryost photoBryan Yost is a Principle Deep Learning Architect at Amazon Web Services Generative AI Innovation Center.

nmehdi photoMehdi Noori is a Senior Applied Scientist at Amazon Web Services Generative AI Innovation Center.

Read More

Boost post-call analytics with Amazon Q in QuickSight

Boost post-call analytics with Amazon Q in QuickSight

In today’s customer-centric business world, providing exceptional customer service is crucial for success. Contact centers play a vital role in shaping customer experiences, and analyzing post-call interactions can provide valuable insights to improve agent performance, identify areas for improvement, and enhance overall customer satisfaction.

Amazon Web Services (AWS) has AI and generative AI solutions that you can integrate into your existing contact centers to improve post-call analysis.

Post Call Analytics (PCA) is a solution that does most of the heavy lifting associated with providing an end-to-end solution that can process call recordings from your existing contact center. PCA provides actionable insights to spot emerging trends, identify agent coaching opportunities, and assess the general sentiment of calls.

Complementing PCA, we have Live call analytics with agent assist (LCA) for real-time analysis while calls are produced, providing AI and generative AI capabilities.

In this post, we show you how to unlock powerful post-call analytics and visualizations, empowering your organization to make data-driven decisions and drive continuous improvement.

Enrich and boost your post-call recording files with Amazon Q and Amazon Quicksight

Amazon QuickSight is a unified business intelligence (BI) service that provides modern interactive dashboards, natural language querying, paginated reports, machine learning (ML) insights, and embedded analytics at scale.

Amazon Q is a powerful, new capability in Amazon QuickSight that you can use to ask questions about your data using natural language and share presentation-ready data stories to communicate insights to others.

These capabilities can significantly enhance your post-call analytics workflow, making it easier to derive insights from your contact center data.

To get started using Amazon Q in QuickSight, first you will need Quicksight Enterprise Edition, which you can sign up for by following this process.

Amazon Q in QuickSight provides users a suite of new generative BI capabilities.

Depending on the user’s role, they will have access to different sets of capabilities. For instance a Reader Pro user can create data stories and executive summaries. If the user is an Author Pro user, they will also be able to create topics and build dashboards using natural language. The following figure shows the available roles and their capabilities.

The following are some key ways that Amazon Q in QuickSight can boost your post-call analytics productivity.

  • Quick insights: Instead of spending time building complex dashboards and visualizations, you can enable users to quickly get answers to your questions about call volumes, agent performance, customer sentiment, and more. Amazon Q in QuickSight understands the context of your data and generates relevant visualizations on the fly.
  • One-time analysis: With Amazon Q in QuickSight, you can perform one-time analysis on your post-call data without any prior setup. Ask your questions using natural language, and QuickSight will provide the relevant insights, allowing you to explore your data in new ways and uncover hidden patterns.
  • Natural language interface: Amazon Q in QuickSight has a natural language interface that makes it accessible to non-technical users. Business analysts, managers, and executives can ask questions about post-call data without needing to learn complex querying languages or data visualization tools.
  • Contextual recommendations: Amazon Q in QuickSight can provide contextual recommendations based on your questions and the data available. For example, if you ask about customer sentiment, it might suggest analyzing sentiment by agent, call duration, or other relevant dimensions.
  • Automated dashboards: Amazon Q can help accelerate dashboard development based on your questions, saving you the effort of manually building and maintaining dashboards for post-call analytics.

By using Amazon Q in QuickSight, your organization can streamline post-call analytics, enabling faster insights, better decision-making, and improved customer experiences. With its natural language interface and automated visualizations, Amazon Q empowers users at all levels to explore and understand post-call data more efficiently.

Let’s dive into a couple of the capabilities available to Pro users, such as building executive summaries and data stories for post-call analytics.

Executive summaries

When a user is just starting to explore a new dashboard that has been shared with them, it often takes time to familiarize themselves with what is contained in the dashboard and where they should be looking for key insights. Executive summaries are a great way to use AI to highlight key insights and draw the user’s attention to specific visuals that contain metrics worth looking into further.

You can build an executive summary on any dashboard that you have access to. Such as the dashboard shown in the following figure.

As shown in the following figure, you can change to another sheet, or even apply filters and regenerate the summary to get a fresh set of highlights for the filtered set of data.

The key benefits of using executive summaries include:

  • Automated insights: Amazon Q can automatically surface key insights and trends from your post-call data, making it possible to quickly create executive summaries that highlight the most important information.
  • Customized views: Executives can customize the visualizations and summaries generated by Amazon Q to align with their specific requirements and preferences, ensuring that the executive summaries are tailored to their needs.

Data storytelling

After a user has found an interesting trend or insight within a dashboard, they often need to communicate with others to drive a decision on what to do next. That decision might be made in a meeting or offline, but a presentation with key metrics and a structured narrative is often the basis for presenting the argument. This is exactly what data stories are designed to support. Rather than taking screenshots and pasting into a document or email, at which point you lose all governance and the data becomes static, stories in QuickSight are interactive, governed, and can be updated in a click.

To build a story, you always start from a dashboard. You then select visuals to support your story and input a prompt of what you want the story to be about. In the example, we want to generate a story to get insights and recommendations to improve call center operations (shown in the following figure).

As the following figure shows, after a few moments, you will see a fully structured story including visuals and insights, including recommendations for next steps.

Key benefits of using data stories:

  1. Narrative exploration: With Amazon Q, you can explore your post-call data through a narrative approach, asking follow-up questions based on the insights generated. This allows you to build a compelling data story that uncovers the underlying patterns and trends in your contact center operations.
  2. Contextual recommendations: Amazon Q can provide contextual recommendations for additional visualizations or analyses based on your questions and the data available. These recommendations can help you uncover new perspectives and enrich your data storytelling.
  3. Automated narratives: Amazon Q can generate automated narratives that explain the visualizations and insights, making it easier to communicate the data story to stakeholders who might not be familiar with the technical details.
  4. Interactive presentations: By integrating Amazon Q with QuickSight presentation mode, you can create interactive data storytelling experiences. Executives and stakeholders can ask questions during the presentation, and Amazon Q will generate visualizations and insights in real time, enabling a more engaging and dynamic data storytelling experience.

Conclusion

By using the capabilities of Amazon Q in QuickSight, you can uncover valuable insights from your call recordings and post-call analytics data. These insights can then inform data-driven decisions to improve customer experiences, optimize contact center operations, and drive overall business performance.

In the era of customer-centricity, post-call analytics has become a game-changer for contact center operations. By using the power of Amazon Q and Amazon QuickSight on top of your PCA data, you can unlock a wealth of insights, optimize agent performance, and deliver exceptional customer experiences. Embrace the future of customer service with cutting-edge AI and analytics solutions from AWS, and stay ahead of the competition in today’s customer-centric landscape.


About the Author

Daniel Martinez is a Solutions Architect in Iberia Enterprise, part of the worldwide commercial sales organization (WWCS) at AWS.

Read More

Create a next generation chat assistant with Amazon Bedrock, Amazon Connect, Amazon Lex, LangChain, and WhatsApp

Create a next generation chat assistant with Amazon Bedrock, Amazon Connect, Amazon Lex, LangChain, and WhatsApp

This post is co-written with Harrison Chase, Erick Friis and Linda Ye from LangChain.

Generative AI is set to revolutionize user experiences over the next few years. A crucial step in that journey involves bringing in AI assistants that intelligently use tools to help customers navigate the digital landscape. In this post, we demonstrate how to deploy a contextual AI assistant. Built using Amazon Bedrock Knowledge Bases, Amazon Lex, and Amazon Connect, with WhatsApp as the channel, our solution provides users with a familiar and convenient interface.

Amazon Bedrock Knowledge Bases gives foundation models (FMs) and agents contextual information from your company’s private data sources for Retrieval Augmented Generation (RAG) to deliver more relevant, accurate, and customized responses. It also offers a powerful solution for organizations seeking to enhance their generative AI–powered applications. This feature simplifies the integration of domain-specific knowledge into conversational AI through native compatibility with Amazon Lex and Amazon Connect. By automating document ingestion, chunking, and embedding, it eliminates the need to manually set up complex vector databases or custom retrieval systems, significantly reducing development complexity and time.

The result is improved accuracy in FM responses, with reduced hallucinations due to grounding in verified data. Cost efficiency is achieved through minimized development resources and lower operational costs compared to maintaining custom knowledge management systems. The solution’s scalability quickly accommodates growing data volumes and user queries thanks to AWS serverless offerings. It also uses the robust security infrastructure of AWS to maintain data privacy and regulatory compliance. With the ability to continuously update and add to the knowledge base, AI applications stay current with the latest information. By choosing Amazon Bedrock Knowledge Bases, organizations can focus on creating value-added AI applications while AWS handles the intricacies of knowledge management and retrieval, enabling faster deployment of more accurate and capable AI solutions with less effort.

Prerequisites

To implement this solution, you need the following:

Solution overview

This solution uses several key AWS AI services to build and deploy the AI assistant:

  • Amazon Bedrock – Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI
  • Amazon Bedrock Knowledge Bases – Gives the AI assistant contextual information from a company’s private data sources
  • Amazon OpenSearch Service – Works as vector store that is natively supported by Amazon Bedrock Knowledge Bases
  • Amazon Lex – Enables building the conversational interface for the AI assistant, including defining intents and slots
  • Amazon Connect – Powers the integration with WhatsApp to make the AI assistant available to users on the popular messaging application
  • AWS Lambda – Runs the code to integrate the services and implement the LangChain agent that forms the core logic of the AI assistant
  • Amazon API Gateway – Receives the incoming requests triggered from WhatsApp and routes the request to AWS Lambda for further processing
  • Amazon DynamoDB – Stores the messages received and generated to enable conversation memory
  • Amazon SNS – Handles the routing of the outgoing response from Amazon Connect
  • LangChain – Provides a powerful abstraction layer for building the LangChain agent that helps your FMs perform context-aware reasoning
  • LangSmith – Uploads agent traces to LangSmith for added observability, including debugging, monitoring, and testing and evaluation capabilities

The following diagram illustrates the architecture.

Solution Architecture

Flow description

Numbers in red on the right side of the diagram illustrate the data ingestion process:

  1. Upload files to Amazon Simple Storage Service (Amazon S3) Data Source
  2. New files trigger Lambda Function
  3. Lambda Function invokes sync operation of the knowledge base data source
  4. Amazon Bedrock Knowledge Bases fetches the data from Amazon S3, chunks it, and generates the embeddings through the FM of your selection
  5. Amazon Bedrock Knowledge Bases stores the embeddings in Amazon OpenSearch Service

Numbers on the left side of the diagram illustrate the messaging process:

  1. User initiates communication by sending a message through WhatsApp to the webhook hosted on .
  2. Amazon API Gateway routes the incoming message to the inbound message handler, executed on AWS Lambda.
  3. The inbound message handler records the user’s contact details in Amazon DynamoDB.
  4. For first-time users, the inbound message handler establishes a new session in Amazon Connect and logs it in DynamoDB. For returning users, it resumes their existing Amazon Connect session.
  5. Amazon Connect forwards the user’s message to Amazon Lex for natural language processing.
  6. Amazon Lex triggers the LangChain AI assistant, implemented as a Lambda function.
  7. The LangChain AI assistant retrieves the conversation history from DynamoDB.
  8. Using Amazon Bedrock Knowledge Bases, the LangChain AI assistant fetches relevant contextual information.
  9. The LangChain AI assistant compiles a prompt, incorporating context data and the user’s query, and submits it to a FM running on Amazon Bedrock.
  10. Amazon Bedrock processes the input and returns the model’s response to the LangChain AI assistant.
  11. The LangChain AI assistant relays the model’s response back to Amazon Lex.
  12. Amazon Lex transmits the model’s response to Amazon Connect.
  13. Amazon Connect publishes the model’s response to Amazon Simple Notification Service (Amazon SNS).
  14. Amazon SNS triggers the outbound message handler Lambda function.
  15. The outbound message handler retrieves the relevant chat contact information from Amazon DynamoDB.
  16. The outbound message handler dispatches the response to the user through Meta’s WhatsApp API.

Deploying this AI assistant involves three main steps:

  1. Create the knowledge base using Amazon Bedrock Knowledge Bases and ingest relevant product documentation, FAQs, knowledge articles, and other useful data that the AI assistant can use to answer user questions. The data should cover the key use cases and topics the AI assistant will support.
  2. Create a LangChain agent that powers the AI assistant’s logic. The agent is implemented in a Lambda function and uses the knowledge base as its primary tool to look up information. Deploying the agent with other resources is automated through the provided AWS CloudFormation template. See the list of resources in the next section.
  3. Create the Amazon Connect instance and configure the WhatsApp integration. This allows users to chat with the AI assistant using WhatsApp, providing a familiar interface and enabling rich interactions such as images and buttons. WhatsApp’s popularity improves the accessibility of the AI assistant.

Solution deployment

We’ve provided pre-built AWS CloudFormation templates that deploy everything you need in your AWS account.

  1. Sign in to the AWS console if you aren’t already.
  2. Choose the following Launch Stack button to open the CloudFormation console and create a new stack.
  3. Enter the following parameters:
    • StackName: Name your Stack, for example, WhatsAppAIStack
    • LangchainAPIKey: The API key generated through LangChain
Region Deploy button Template URL – use to upgrade existing stack to a new release AWS CDK stack to customize as needed
N. Virginia (us-east-1) Launch Stack button YML GitHub
  1. Check the box to acknowledge that you are creating AWS Identity and Access Management (IAM) resources and choose Create Stack.
  2. Wait for the stack creation to be complete in approximately 10 minutes, which will create the following:
  3. Upload files to the data source (Amazon S3) created for WhatsApp. As soon as you upload a file, the data source will synchronize automatically.
  4. To test the agent, on the Amazon Lex console, select the most recently created assistant. Choose English, choose Test, and send it a message.

Create the Amazon Connect instance and integrate WhatsApp

Configure Amazon Connect to integrate with your WhatsApp business account and enable the WhatsApp channel for the AI assistant:

  1. Navigate to Amazon Connect in the AWS console. If you haven’t already, create an instance. Copy your Instance ARN under Distribution settings. You will need this information later to link your WhatsApp business account.
  2. Choose your instance, then in the navigation panel, choose Flows. Scroll down and select Amazon Lex. Select your bot and choose Add Amazon Lex Bot.
  3. In the navigation panel, choose Overview. Under Access Information, choose Log in for emergency access.
  4. On the Amazon Connect console, under Routing in the navigation panel, choose Flows. Choose Create flow. Drag a Get customer input block onto the flow. Select the block. Select Text-to-speech or chat text and add an intro message such as, “Hello, how can I help you today?” Scroll down and choose Amazon Lex, then select the Amazon Lex bot you created in step 2.
  5. After you save the block, add another block called “Disconnect.” Drag the Entry arrow to the Get customer input and the Get customer input arrow to Disconnect. Choose Publish.
  6. After it’s published, choose Show additional flow information at the bottom of the navigation panel. Copy the flow’s Amazon Resource Name (ARN), which you will need to deploy the WhatsApp integration. The following screenshot shows the Amazon Connect console with the flow.

Connect Flow Diagram

  1. Deploy the WhatsApp integration as detailed in Provide WhatsApp messaging as a channel with Amazon Connect.

Testing the solution

Interact with the AI assistant through WhatsApp, as shown in the following video:

Clean up

To avoid incurring ongoing costs, delete the resources after you are done:

  1. Delete the CloudFormation stacks.
  2. Delete the Amazon Connect instance.

Conclusion

This post showed you how to create an intelligent conversational AI assistant by integrating Amazon Bedrock, Amazon Lex, and Amazon Connect and deploying it on WhatsApp.

The solution ingests relevant data into a knowledge base on Amazon Bedrock Knowledge Bases, implements a LangChain agent that uses the knowledge base to answer questions, and makes the agent available to users through WhatsApp. This provides an accessible, intelligent AI assistant that can guide users through your company’s products and services.

Possible next steps include customizing the AI assistant for your specific use case, expanding the knowledge base, and analyzing conversation logs using LangSmith to identify issues, improve errors, and break down performance bottlenecks in your FM call sequence.


About the Authors

Kenton Blacutt is an AI Consultant within the GenAI Innovation Center. He works hands-on with customers helping them solve real-world business problems with cutting edge AWS technologies, especially Amazon Q and Bedrock. In his free time, he likes to travel, experiment with new AI techniques, and run an occasional marathon.

Lifeth Álvarez is a Cloud Application Architect at Amazon. She enjoys working closely with others, embracing teamwork and autonomous learning. She likes to develop creative and innovative solutions, applying special emphasis on details. She enjoys spending time with family and friends, reading, playing volleyball, and teaching others.

Mani Khanuja is a Tech Lead – Generative AI Specialist, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such as AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Linda Ye leads product marketing at LangChain. Previously, she worked at Sentry, Splunk, and Harness, driving product and business value for technical audiences, and studied economics at Sanford. In her free time, Linda enjoys writing half-baked novels, playing tennis, and reading.

Erick Friis, Founding Engineer at LangChain, currently spends most of his time on the open source side of the company. He’s an ex-founder with a passion for language-based applications. He spends his free time outdoors on skis or training for triathlons.

Harrison Chase is the CEO and cofounder of LangChain, an open source framework and toolkit that helps developers build context-aware reasoning applications. Prior to starting LangChain, he led the ML team at Robus Intelligence, led the entity linking team at Kensho, and studied statistics and computer science at Harvard.

Read More