Amazon AWS – Page 50

Advance environmental sustainability in clinical trials using AWS

November 1, 2024

by Sidharth Rampally Amazon AWS

Traditionally, clinical trials not only place a significant burden on patients and participants due to the costs associated with transportation, lodging, meals, and dependent care, but also have an environmental impact. With the advancement of available technologies, decentralized clinical trials have become a widely popular topic of discussion and offer a more sustainable approach. Decentralized clinical trials reduce the need to travel to study sites by lowering the financial burden on all parties involved, thereby accelerating patient recruitment and reducing dropout rates. Decentralized clinical trials use technologies such as wearable devices, patient apps, smartphones, and telemedicine to accelerate recruitment, reduce dropout, and minimize the carbon footprint of clinical research. AWS can play a key role in enabling fast implementation of these decentralized clinical trials.

In this post, we discuss how to use AWS to support a decentralized clinical trial across the four main pillars of a decentralized clinical trial (virtual trials, personalized patient engagement, patient-centric trial design, and centralized data management). By exploring these AWS powered alternatives, we aim to demonstrate how organizations can drive progress towards more environmentally friendly clinical research practices.

The challenge and impact of sustainability on clinical trials

With the rise of greenhouse gas emissions globally, finding ways to become more sustainable is quickly becoming a challenge across all industries. At the same time, global health awareness and investments in clinical research have increased as a result of motivations by major events like the COVID-19 pandemic. For instance, in 2021, we saw a significant increase in awareness of clinical research studies seeking volunteers, which was reported at 63% compared to 54% in 2019 by Applied Clinical Trials. This suggests that the COVID-19 pandemic brought increased attention to clinical trials among the public and magnified the importance of including diverse populations in clinical research.

These clinical research trials study new tests and treatments while evaluating their effects on human health outcomes. People often volunteer to take part in clinical trials to test medical interventions, including drugs, biological products, surgical procedures, radiological procedures, devices, behavioral treatments, and preventive care. The rise of clinical trials presents a major sustainability challenge—they are often not sustainable and can contribute substantially to greenhouse gas emissions due to how they are being implemented. The main sources of these are usually associated with the intensive energy use associated with research premises and air travel.

This post discusses an alternative to clinical trials—by decentralizing clinical trials, we can reduce the major greenhouse gas emissions caused by human activities present in clinical trials today.

The CRASH trial case study

We can further examine the impact of carbon emissions associated with clinical trials through the carbon audit of the CRASH trial case lead by medical research journal, BMJ. The CRASH trial was a clinical trial conducted from 1999–2004 and recruited patients from 49 countries in the span of 5 years. In the study, the effect of intravenous corticosteroids (a drug produced by Pfizer) on death within 14 days in 10,008 adults with clinically significant head injuries was examined. BMJ conducted an audit on the total emissions of greenhouse gases that were produced by the trials and calculated that roughly 126 metric tons (carbon dioxide equivalent) was emitted during a 1-year period. Over a 5-year period, it would mean that the entire trial would be responsible for about 630 metric tons of carbon dioxide equivalent.

Much of these greenhouse gas emissions can be attributed to travel (such as air travel, hotel, meetings), distribution associated for drugs and documents, and electricity used in coordination centers. According to the EPA, the average passenger vehicle emits about 4.6 metric tons of carbon dioxide per year. In comparison, 630 tons of carbon dioxide would be equivalent to the annual emissions of around 137 passenger vehicles. Similarly, the average US household generates about 20 metric tons of carbon dioxide per year from energy use. 630 tons of carbon dioxide would also be equal to the annual emissions of around 31 average US homes. 630 tons of carbon dioxide already represents a very substantial amount of greenhouse gas for one clinical trial. According to sources from government databases and research institutions, there are around 300,000–600,000 clinical trials conducted globally each year, amplifying this impact by several hundred thousand times.

Clinical trials vs. decentralized clinical trials

Decentralized clinical trials present opportunities to address the sustainability challenges associated with traditional clinical trial models. As a byproduct of decentralized trials, there are also improvements in the patient experience by reducing their burden, making the process more convenient and sustainable.

Today, clinical trials can contribute significantly to greenhouse gas emissions, primarily through energy use in research facilities and air travel. In contrast to the energy-intensive nature of centralized trial sites, the distributed nature of decentralized clinical trials offers a more practical and cost-effective approach to implementing renewable energy solutions.

For centralized clinical trials, many are conducted in energy-intensive healthcare facilities. Traditional trial sites, such as hospitals and dedicated research centers, can have high energy demands for equipment, lighting, and climate control. These facilities often rely on regional or national power grids for their energy needs. Integrating renewable energy solutions in these facilities can also be costly and challenging, because it can involve significant investments into new equipment, renewable energy projects, and more.

In decentralized clinical trials, the reduction in infrastructure and onsite resources will allow for a lower energy demand overall. This, in turn, will result in benefits such as simplified trial designs, reduced bureaucracy, and less human travel required for video conferencing. Furthermore, the additional appointments required for clinical trials might create additional time and financial burdens for participants. Decentralized clinical trials can reduce the burden on patients for in-person visits and increase patient retention and long-term follow-up.

Core pillars on how AWS can power sustainable decentralized clinical trials

AWS customers have developed proven solutions that power sustainable decentralized clinical trials. SourceFuse is an AWS partner that has developed a mobile app and web interface that enables patients to participate in decentralized clinical trials remotely from their homes, eliminating the environmental impact of travel and paper-based data collection. The platform’s cloud-centered architecture, built on AWS services, supports the scalable and sustainable operation of these remote clinical trials.

In this post, we provide sustainability-oriented guidance focused on four key areas: virtual trials, personalized patient engagement, patient-centric trial design, and centralized data management. The following figure showcases the AWS services that can help in these four areas.

Personalized remote patient engagement

The average dropout rate for clinical trials is 30%, so providing an omnichannel experience for subjects to interact with trial facilitators is imperative. Because decentralized clinical trials provide flexibility for patients to participate at home, the experience for patients to collect and report data should be seamless. One solution is to use voice applications to enable patient data reporting, using Amazon Alexa and Amazon Connect. For example, a patient can report symptoms to their Amazon Echo device, invoking an automated patient outreach scheduler using Amazon Connect.

Trial facilitators can also use Amazon Pinpoint to connect with customers through multiple channels. They can use Amazon Pinpoint to send medication reminders, automate surveys, or push other communications without the need for paper mail delivery.

Virtual trials

Decentralized clinical trials reduce emissions compared to regular clinical trials by eliminating the need for travel and physical infrastructure. Instead, a core component of decentralized clinical trials is a secure, scalable data infrastructure with strong data analytics capabilities. Amazon Redshift is a fully managed cloud data warehouse that trial scientists can use to perform analytics.

Clinical Research Organizations (CROs) and life sciences organizations can also use AWS for mobile device and wearable data capture. Patients, in the comfort of their own home, can collect data passively through wearables, activity trackers, and other smart devices. This data is streamed to AWS IoT Core, which can write data to Amazon Data Firehose in real time. This data can then be sent to services like Amazon Simple Storage Service (Amazon S3) and AWS Glue for data processing and insight extraction.

Patient-centric trial design

A key characteristic of decentralized clinical trials is patient-centric protocol design, which prioritizes the patients’ needs throughout the entire clinical trial process. This involves patient-reported outcomes and often implement flexible participation, which can complicate protocol development and necessitate more extensive regulatory documentation. This can add days or even weeks to the lifespan of a trial, leading to avoidable costs. Amazon SageMaker enables trial developers to build and train machine learning (ML) models that reduce the likelihood of protocol amendments and inconsistencies. Models can also be built to determine the appropriate sample size and recruitment timelines.

With SageMaker, you can optimize your ML environment for sustainability. Amazon SageMaker Debugger provides profiler capabilities to detect under-utilization of system resources, which helps right-size your environment and avoid unnecessary carbon emissions. Organizations can further reduce emissions by choosing deployment regions near renewable energy projects. Currently, there are 22 AWS data center regions where 100% of the electricity consumed is matched by renewable energy sources. Additionally, you can use Amazon Q, a generative AI-powered assistant, to surface and generate potential amendments to avoid expensive costs associated with protocol revisions.

Centralized data management

CROs and bio-pharmaceutical companies are striving to achieve end-to-end data linearity for all clinical trials within an organization. They want to see traceability across the board, while achieving data harmonization for regulatory clinical trial guardrails. The pipeline approach to data management in clinical trials has led to siloed, disconnected data across an organization, because separate storage is used for each trial. Decentralized clinical trials, however, often employ a singular data lake for all of an organization’s clinical trials.

With a centralized data lake, organizations can avoid the duplication of data across separate trial databases. This leads to savings in storage costs and computing resources, as well as a reduction in the environmental impact of maintaining multiple data silos. To build a data management platform, the process could begin with ingesting and normalizing clinical trial data using AWS HealthLake. HealthLake is designed to ingest data from various sources, such as electronic health records, medical imaging, and laboratory results, and automatically transform the data into the industry-standard FHIR format. This clinical voice application solution built entirely on AWS showcases the advantages of having a centralized location for clinical data, such as avoiding data drift and redundant storage.

With the normalized data now available in HealthLake, the next step would be to orchestrate the various data processing and analysis workflows using AWS Step Functions. You can use Step Functions to coordinate the integration of the HealthLake data into a centralized data lake, as well as invoke subsequent processing and analysis tasks. This could involve using serverless computing with AWS Lambda to perform event-driven data transformation, quality checks, and enrichment activities. By combining the power powerful data normalization capabilities of HealthLake and the orchestration features of Step Functions, the platform can provide a robust, scalable, and streamlined approach to managing decentralized clinical trial data within the organization.

Conclusion

In this post, we discussed the critical importance of sustainability in clinical trials. We provided an overview of the key distinctions between traditional centralized clinical trials and decentralized clinical trials. Importantly, we explored how AWS technologies can enable the development of more sustainable clinical trials, addressing the four main pillars that underpin a successful decentralized trial approach.

To learn more about how AWS can power sustainable clinical trials for your organization, reach out to your AWS Account representatives. For more information about optimizing your workloads for sustainability, see Optimizing Deep Learning Workloads for Sustainability on AWS.

References

[1] https://www.appliedclinicaltrialsonline.com/view/awareness-of-clinical-research-increases-among-underrepresented-groups

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839193/

[3] https://pubmed.ncbi.nlm.nih.gov/15474134/

[4] ClinicalTrials.gov and https://www.iqvia.com/insights/the-iqvia-institute/reports/the-global-use-of-medicines-2022

[5] https://aws.amazon.com/startups/learn/next-generation-data-management-for-clinical-trials-research-built-on-aws?lang=en-US#overview

[6] https://pubmed.ncbi.nlm.nih.gov/39148198/

About the Authors

Sid Rampally is a Customer Solutions Manager at AWS driving GenAI acceleration for Life Sciences customers. He writes about topics relevant to his customers, focusing on data engineering and machine learning. In his spare time, Sid enjoys walking his dog in Central Park and playing hockey.

Nina Chen is a Customer Solutions Manager at AWS specializing in leading software companies to leverage the power of the AWS cloud to accelerate their product innovation and growth. With over 4 years of experience working in the strategic Independent Software Vendor (ISV) vertical, Nina enjoys guiding ISV partners through their cloud transformation journeys, helping them optimize their cloud infrastructure, driving product innovation, and deliver exceptional customer experiences.

Use Amazon Q to find answers on Google Drive in an enterprise

November 1, 2024

by Glen Ireland Amazon AWS

Amazon Q Business is a generative AI-powered assistant designed to enhance enterprise operations. It’s a fully managed service that helps provide accurate answers to users’ questions while adhering to the security and access restrictions of the content. You can tailor Amazon Q Business to your specific business needs by connecting to your company’s information and enterprise systems using built-in connectors to a variety of enterprise data sources. It enables users in various roles, such as marketing managers, project managers, and sales representatives, to have tailored conversations, solve business problems, generate content, take action, and more, through a web interface. This service aims to help make employees work smarter, move faster, and drive significant impact by providing immediate and relevant information to help them with their tasks.

One such enterprise data repository you can use to store and manage content is Google Drive. Google Drive is a cloud-based storage service that provides a centralized location for storing digital assets, including documents, knowledge articles, and spreadsheets. This service helps your teams collaborate effectively by enabling the sharing and organization of important files across the enterprise. To use Google Drive within Amazon Q Business, you can configure the Amazon Q Business Google Drive connector. This connector allows Amazon Q Business to securely index files stored in Google Drive using access control lists (ACLs). These ACLs make sure that users only access the documents they’re permitted to view, allowing them to ask questions and retrieve information relevant to their work directly through Amazon Q Business.

This post covers the steps to configure the Amazon Q Business Google Drive connector, including authentication setup and verifying the secure indexing of your Google Drive content.

Index Google Drive documents using the Amazon Q Google Drive connector

The Amazon Q Google Drive connector can index Google Drive documents hosted in a Google Workspace account. The connector can’t index documents stored on Google Drive in a personal Google Gmail account. Amazon Q Business can authenticate with your Google Workspace using a service account or OAuth 2.0 authentication. A service account enables indexing files for user accounts across an enterprise in a Google Workspace. Using OAuth 2.0 authentication allows for crawling and indexing files in a single Google Workspace account. This post shows you how to configure Amazon Q Business to authenticate using a Google service account.

Google prescribes that in order to index multiple users’ documents, the crawler must support the capability to authenticate with a service account with domain-wide delegation. This allows the connector to index the documents of all users in your drive and shared drives. Amazon Q Business connectors only crawl the documents that the Amazon Q Business application administrator specifies need to be crawled. Administrators can specify the paths to crawl, specific file name patterns, or types. Amazon Q Business doesn’t use customer data to train any models. All customer data is indexed only in the customer account. Also, Amazon Q Business Connectors will only index content specified by the administrator. It won’t index any content on its own without explicitly being configured to do so by the administrator of Amazon Q Business.

You can configure the Amazon Q Google Drive connector to crawl and index file types supported by Amazon Q Business. Google Write documents are exported as Microsoft Word and Google Sheet documents are exported as Microsoft Excel during the crawling phase.

Metadata

Every document has structural attributes—or metadata—attached to it. Document attributes can include information such as document title, document author, time created, time updated, and document type.

When you connect Amazon Q Business to a data source, it automatically maps specific data source document attributes to fields within an Amazon Q Business index. If a document attribute in your data source doesn’t have an attribute mapping already available, or if you want to map additional document attributes to index fields, you can use the custom field mappings to specify how a data source attribute maps to an Amazon Q Business index field. You can create field mappings by editing your data source after your application and retriever are created.

There are four default metadata attributes indexed for each Google Drive document: authors, source URL, creation date, and last update date. You can also select additional reserved data field mappings.

Amazon Q Business crawls Google Drive ACLs defined in a Google Workspace for document security. Google Workspace users and groups are mapped to the _user_id and _group_ids fields associated with the Amazon Q Business application in AWS IAM Identity Center. These user and group associations are persisted in the user store associated with the Amazon Q Business index created for crawled Google Drive documents.

Overview of ACLs in Amazon Q Business

In the context of knowledge management and generative AI chatbot applications, an ACL plays a crucial role in managing who can access information and what actions they can perform within the system. They also facilitate knowledge sharing within specific groups or teams while restricting access to others.

In this solution, we deploy an Amazon Q web experience to demonstrate that two business users can only ask questions about documents they have access to according to the ACL. With the Amazon Q Business Google Drive connector, the Google Workspace ACL will be ingested with documents. This enables Amazon Q Business to control the scope of documents that each user can access in the Amazon Q web experience.

Authentication types

An Amazon Q Business application requires you to use IAM Identity Center to manage user access. Although it’s recommended to have an IAM Identity Center instance configured (with users federated and groups added) before you start, you can also choose to create and configure an IAM Identity Center instance for your Amazon Q Business application using the Amazon Q console.

You can also add users to your IAM Identity Center instance from the Amazon Q Business console, if you aren’t federating identity. When you add a new user, make sure that the user is enabled in your IAM Identity Center instance and that they have verified their email ID. They need to complete these steps before they can log in to your Amazon Q Business web experience.

Your identity source in IAM Identity Center defines where your users and groups are managed. After you configure your identity source, you can look up users or groups to grant them single sign-on access to AWS accounts, applications, or both.

You can have only one identity source per organization in AWS Organizations. You can choose one of the following as your identity source:

IAM Identity Center directory – When you enable IAM Identity Center for the first time, it’s automatically configured with an IAM Identity Center directory as your default identity source. This is where you create your users and groups, and assign their level of access to your AWS accounts and applications. For more details, see Manage identities in IAM Identity Center.
Active Directory – Choose this option if you want to continue managing users in either your AWS Managed Microsoft AD directory using AWS Directory Service or your self- managed directory in Active Directory (AD).
External identity provider – Choose this option if you want to manage users in other external identity providers (IdPs) through the SAML 2.0 standard, such as Okta.
IAM identity provider – Amazon Q Business applications can now federate with an enterprise’s IAM IdP. For more information, refer to Build private and secure enterprise generative AI applications with Amazon Q Business using IAM Federation.

Overview of solution

With Amazon Q Business, you can configure multiple data sources to provide a central place to search across your document repository. For our solution, we demonstrate how to index Google Drive data using the Amazon Q Business Google Drive connector. We complete the following steps:

Configure Google Workspace prerequisites.
Configure an Amazon Q Business application.
Connect Google Drive to Amazon Q Business.
Create users and index the data in the Google Drive.
Run a sample query to test the solution.

Configure Google Workspace prerequisites

For this solution, Amazon Q will connect to a Google Workspace and crawl Google Drive documents owned by business users in different groups using a service account. Complete the following steps to configure your Google Workspace:

Log in to the Google API console as an admin user.
Choose the dropdown menu next to the search box, then choose New Project.
Enter the project name, choose the Google organization, and choose Create.

The Google Drive and Admin SDK APIs need to be enabled for Amazon Q to crawl Google Drive files.

Search for each API on the Google Cloud console and choose Enable.
Search for Service Accounts to access the IAM & Admin navigation pane and choose Create Service Account.
Enter the service account name, service account ID, and description, and choose Done.
Choose the email of the service account created in the previous step.
On the Keys tab, choose Add Key, then choose Create New Key.
For Key type, select JSON, and choose Create to download and locally save a new private key.

Now we enable domain-wide delegation for the five required API scopes on the Domain-wide Delegation page.

Choose Add new.
Add the following comma delimited API scopes for client ID generated for the private key created in the previous step:
https://www.googleapis.com/auth/drive.readonly,
https://www.googleapis.com/auth/drive.metadata.readonly,
https://www.googleapis.com/auth/admin.directory.group.readonly,
https://www.googleapis.com/auth/admin.directory.user.readonly,
https://www.googleapis.com/auth/cloud-platform
Choose Authorize.

Now we create users and add them to groups.

Navigate to the Google Workspace Admin console and choose Users in the navigation pane.
Choose Add new user to create two new business users.
Choose Groups in the navigation pane.
Choose Create group to create two Google groups and add one business user to each group.
Upload files that Amazon Q supports into each business user’s Google Drive.

In this solution, we upload the Amazon 2020 annual report to the first business user’s Google Drive and upload the Amazon 2021 annual report and Amazon 2022 annual report to the second business user’s Google Drive.

The business user that uploaded the Amazon 2021 annual report can also share it with the other business user’s Google group.

Choose the options menu (three vertical dots) for the Google Drive file and choose Share.
Enter the name of the other Google group and choose Send.

Create an Amazon Q Business application with a Google Drive connector

An Amazon Q Business application needs to be created with a Google Drive connector to crawl and index Google Drive files. To create an Amazon Q application, complete the following steps:

On the Amazon Q console, choose Applications in the navigation pane.
Choose Create application.
For Application name, enter a name.
Leave application configuration settings as defaults.
Choose Create.
After the application is created, choose Data Sources.
Then choose Select retriever and Confirm to use a Native retriever and Enterprise provisioning.
After confirming retriever settings, choose Add data source, and then choose the plus sign next to Google Drive.
Under Name and description, enter a data source name and optional description.
Under Authentication, select Google service account and choose Create a new secret from the AWS Secrets Manager secret drop down to create an AWS Secrets Manager secret.
Enter a secret name, admin account email, client email, and the JSON key you downloaded earlier, then choose Save.
Under IAM role, choose Create a new service role.
Under Additional Configuration, choose User email, and add the two recently created Google Workspace business user email addresses.
Under Sync run schedule, for Frequency, choose Run on demand.
Choose Add data source.

Create and manage users

To create an Amazon Q web experience accessible by Google Workspace users, you need to create corresponding users in IAM Identity Center. Amazon Q applications are only accessible by IAM Identity Center users with user identities that own indexed documents. To create the IAM Identity Center users, complete the following steps:

On the IAM Identity Center console, choose Users in the navigation pane.
Choose Add user.
Create IAM Identity Center users that mirror your Google Workspace users by entering the required user information.
Accept the IAM Identity Center invitation sent through email to each new business user and set each business user’s IAM Identity Center password.
On the Amazon Q Business console, navigate to the application with the Google Drive data source.
Choose Manage user access.
Choose Add groups and users, select Assign existing users and groups, and choose Next.
Assign users to the Amazon Q application, choose Assign, and choose Confirm if each business user is subscribed to Q Business Pro.

After you add IAM Identity Center users to your Amazon Q application, its web experience URL will appear in the Q Business applications list. You can use the URL to connect to the Amazon Q web experience with either of your Google business users. By default, each user can only ask questions about documents in their Google Drive.

Run sample queries in Amazon Q

To test the Amazon Q application with the Amazon annual reports you uploaded to Google Drive, complete the following steps:

On the Amazon Q Business console, navigate to the data source you created.
Run an on-demand sync of the data source by choosing Sync now.
Navigate to the web experience URL in a new private browser window and log in as the first business user.
Ask Amazon Q a question, such as how many employees work at Amazon.

The source documents should be the Amazon 2020 and 2021 annual reports, assuming the first business user uploaded the Amazon 2020 annual report and the second business user shared the Amazon 2021 annual report with the first business user.

Navigate to the web experience URL in a new private browser window and log in as the second business user.
Ask Amazon Q the same question (how many employees work at Amazon).

The source documents should be the Amazon 2021 and 2022 annual reports.

Troubleshooting

In this section, we share some common issues and troubleshooting tips.

IAM Identity Center login error

You might receive an error on the IAM Identity Center login page that says “We couldn’t verify your sign-in credentials.”

To troubleshoot, complete the following steps:

Confirm that the business users that mirror the Google Workspace users were created in IAM Identity Center.
If the users exist, navigate to the user in IAM Identity Center and choose Reset password, then select Generate a one-time password and share the password with the user.

A password will be provided for login and the user will be asked to change their password after a successful login.

Google Drive data source crawling or indexing failure

If the Google Drive data source crawling or indexing fails, complete the following steps:

Confirm the business users provisioned in the Google Workspace are members of the Google groups.
Inspect the Amazon CloudWatch logs for the last time the Google Drive data source was crawled for users with Google Drive files in the Google Workspace.
If the crawler didn’t successfully log the indexing of an expected user’s files, check the IAM Identity Center users, then compare the attributes in the Secrets Manager secret to the corresponding Google Workspace attributes, including client ID, service account email, and service account private key.
Use the Amazon Q Business document-level sync reports to confirm the intended Google Drive documents were indexed by Amazon Q.

Google Drive data source crawling and indexing job doesn’t crawl and index documents

If the Google Drive data source crawling and indexing job doesn’t crawl and index any documents, complete the following steps:

Confirm the business users provisioned in the Google Workspace are members of the Google groups.
Confirm there are IAM Identity Center users that mirror the Google Workspace users.
Confirm both IAM Identity Center users subscribe to Q Business Pro.
Confirm the Google Workspace admin user has enabled the Google Drive API.

Amazon Q web experience doesn’t return expected answers from the expected source

If the Amazon Q web experience doesn’t return expected answers from the expected source, complete the following steps:

Upload the expected source document into an Amazon Q Business chat session by choosing the paperclip icon in the Amazon Q chat interface and then choosing the file.

After you upload the document into the session, if the expected answers are generated from the expected document, the document wasn’t successfully indexed from the Google Drive data source.

If Amazon Q doesn’t return the expected answer for the uploaded document, modify the prompt used to ask the question.

Clean up

To prevent incurring additional costs, it’s essential to clean up and remove any resources created during the implementation of this solution. Specifically, you should delete the Amazon Q application, which will consequently remove the associated index and data connectors. However, any Secrets Manager secrets created during the Amazon Q application setup process need to be removed separately. Failing to clean up these resources may result in ongoing charges, so it’s crucial to take the necessary steps to completely remove all components related to this solution.

Complete the following steps to delete the Amazon Q application, secret, and IAM Identity Center users in your AWS account:

On the Amazon Q Business console, choose Applications in the navigation pane.
Select the application that you created and on the Actions menu, choose Delete and confirm the deletion.
On the Secrets Manager console, choose Secrets in the navigation pane.
Select the secret that was created for the Google Drive connector and on the Actions menu, choose Delete.
Specify the waiting period as 7 days and choose Schedule deletion.
On the IAM Identity Center console, choose Users in the navigation pane.
Select the two users that you created and choose Delete users to remove these users.

Additionally, you should remove the business users added to your Google Workspace during the implementation of this solution because Google Workspaces costs are billed on a per-user basis.

Conclusion

In this post, you created an Amazon Q application that indexed Google Drive documents using the Google Drive connector. You were able to connect to the Amazon Q conversational interface as each of your business users and ask questions about the documents each user could access in accordance with the ACL.

You can continue to experiment by adding more PDF documents to your business users’ Google Drives and re-syncing your Amazon Q Google Drive data source.

Amazon Q Business offers other connectors, such as for Confluence Cloud. To learn more about the Amazon Q Business Confluence Cloud connector, refer to Connecting Confluence (Cloud) to Amazon Q Business.

About the Authors

Glen Ireland is a Senior Enterprise Account Engineer at AWS in the Worldwide Public Sector. Glen’s areas of focus include empowering customers interested in building generative AI solutions using Amazon Q.

Julia Hu is a Specialist Solutions Architect who helps AWS customers and partners build generative AI solutions using Amazon Q Business on AWS. Julia has over 4 years of experience developing solutions for customers adopting AWS services on the forefront of cloud technology.

How Druva used Amazon Bedrock to address foundation model complexity when building Dru, Druva’s backup AI copilot

November 1, 2024

by David Gildea Amazon AWS

This post is co-written with David Gildea and Tom Nijs from Druva.

Druva enables cyber, data, and operational resilience for thousands of enterprises, and is trusted by 60 of the Fortune 500. Customers use Druva Data Resiliency Cloud to simplify data protection, streamline data governance, and gain data visibility and insights. Independent software vendors (ISVs) like Druva are integrating AI assistants into their user applications to make software more accessible.

Dru, the Druva backup AI copilot, enables real-time interaction and personalized responses, with users engaging in a natural conversation with the software. From finding inconsistencies and errors across the environment to scheduling backup jobs and setting retention policies, users need only ask and Dru responds. Dru can also recommend actions to improve the environment, remedy backup failures, and identify opportunities to enhance security.

In this post, we show how Druva approached natural language querying (NLQ)—asking questions in English and getting tabular data as answers—using Amazon Bedrock, the challenges they faced, sample prompts, and key learnings.

Use case overview

The following screenshot illustrates the Dru conversation interface.

In a single conversation interface, Dru provides the following:

Interactive reporting with real-time insights – Users can request data or customized reports without extensive searching or navigating through multiple screens. Dru also suggests follow-up questions to enhance user experience.
Intelligent responses and a direct conduit to Druva’s documentation – Users can gain in-depth knowledge about product features and functionalities without manual searches or watching training videos. Dru also suggests resources for further learning.
Assisted troubleshooting – Users can request summaries of top failure reasons and receive suggested corrective measures. Dru on the backend decodes log data, deciphers error codes, and invokes API calls to troubleshoot.
Simplified admin operations, with increased seamlessness and accessibility – Users can perform tasks like creating a new backup policy or triggering a backup, managed by Druva’s existing role-based access control (RBAC) mechanism.
Customized website navigation through conversational commands – Users can instruct Dru to navigate to specific website locations, eliminating the need for manual menu exploration. Dru also suggests follow-up actions to speed up task completion.

Challenges and key learnings

In this section, we discuss the challenges and key learnings of Druva’s journey.

Overall orchestration

Originally, we adopted an AI agent approach and relied on the foundation model (FM) to make plans and invoke tools using the reasoning and acting (ReAct) method to answer user questions. However, we found the objective too broad and complicated for the AI agent. The AI agent would take more than 60 seconds to plan and respond to a user question. Sometimes it would even get stuck in a thought-loop, and the overall success rate wasn’t satisfactory.

We decided to move to the prompt chaining approach using a directed acyclic graph (DAG). This approach allowed us to break the problem down into multiple steps:

Identify the API route.
Generate and invoke private API calls.
Generate and run data transformation Python code.

Each step became an independent stream, so our engineers could iteratively develop and evaluate the performance and speed until they worked well in isolation. The workflow also became more controllable by defining proper error paths.

Stream 1: Identify the API route

Out of the hundreds of APIs that power Druva products, we needed to match the exact API the application needs to call to answer the user question. For example, “Show me my backup failures for the past 72 hours, grouped by server.” Having similar names and synonyms in API routes make this retrieval problem more complex.

Originally, we formulated this task as a retrieval problem. We tried different methods, including k-nearest neighbor (k-NN) search of vector embeddings, BM25 with synonyms, and a hybrid of both across fields including API routes, descriptions, and hypothetical questions. We found that the simplest and most accurate way was to formulate it as a classification task to the FM. We curated a small list of examples in question-API route pairs, which helped improve the accuracy and make the output format more consistent.

Stream 2: Generate and invoke private API calls

Next, we API call with the correct parameters and invoke it. FM hallucination of parameters, particularly those with free-form JSON object, is one of the major challenges in the whole workflow. For example, the unsupported key server can appear in the generated parameter:

"filter": {
    "and": [
        {
            "gte": {
                "key": "dt",
                "value": 1704067200
            }
        },
        {
            "eq": {
                "key": "server",
                "value": "xyz"
            }
        }
    ]
}

We tried different prompting techniques, such as few-shot prompting and chain of thought (CoT), but the success rate was still unsatisfactory. To make API call generation and invocation more robust, we separated this task into two steps:

First, we used an FM to generate parameters in a JSON dictionary instead of a full API request headers and body.
Afterwards, we wrote a postprocessing function to remove parameters that didn’t conform to the API schema.

This method provided a successful API invocation, at the expense of getting more data than required for downstream processing.

Stream 3: Generate and run data transformation Python code

Next, we took the response from the API call and transformed it to answer the user question. For example, “Create a pandas dataframe and group it by server column.” Similar to stream 2, FM hallucination is again an obstacle. Generated code can contain syntax errors, such as confusing PySpark functions with Pandas functions.

After trying many different prompting techniques without success, we looked at the reflection pattern, asking the FM to self-correct code in a loop. This improved the success rate at the expense of more FM invocations, which were slower and more expensive. We found that although smaller models are faster and more cost-effective, at times they had inconsistent results. Anthropic’s Claude 2.1 on Amazon Bedrock gave more accurate results on the second try.

Model choices

Druva selected Amazon Bedrock for several compelling reasons, with security and latency being the most important. A key factor in this decision was the seamless integration with Druva’s services. Using Amazon Bedrock aligned naturally with Druva’s existing environment on AWS, maintaining a secure and efficient extension of their capabilities.

Additionally, one of our primary challenges in developing Dru involved selecting the optimal FMs for specific tasks. Amazon Bedrock effectively addresses this challenge with its extensive array of available FMs, each offering unique capabilities. This variety enabled Druva to conduct the rapid and comprehensive testing of various FMs and their parameters, facilitating the selection of the most suitable one. The process was streamlined because Druva didn’t need to delve into the complexities of running or managing these diverse FMs, thanks to the robust infrastructure provided by Amazon Bedrock.

Through the experiments, we found that different models performed better in specific tasks. For example, Meta Llama 2 performed better with code generation task; Anthropic Claude Instance was good in efficient and cost-effective conversation; whereas Anthropic Claude 2.1 was good in getting desired responses in retry flows.

These were the latest models from Anthropic and Meta at the time of this writing.

Solution overview

The following diagram shows how the three streams work together as a single workflow to answer user questions with tabular data.

The following are the steps of the workflow:

The authenticated user submits a question to Dru, for example, “Show me my backup job failures for the last 72 hours,” as an API call.
The request arrives at the microservice on our existing Amazon Elastic Container Service (Amazon ECS) cluster. This process consists of the following steps:
1. A classification task using the FM provides the available API routes in the prompt and asks for the one that best matches with user question.
2. An API parameters generation task using the FM gets the corresponding API swagger, then asks the FM to suggest key-value pairs to the API call that can retrieve data to answer the question.
3. A custom Python function verifies, formats, and invokes the API call, then passes the data in JSON format to the next step.
4. A Python code generation task using the FM samples a few records of data from the previous step, then asks the FM to write Python code to transform the data to answer the question.
5. A custom Python function runs the Python code and returns the answer in tabular format.

To maintain user and system security, we make sure in our design that:

The FM can’t directly connect to any Druva backend services.
The FM resides in a separate AWS account and virtual private cloud (VPC) from the backend services.
The FM can’t initiate actions independently.
The FM can only respond to questions sent from Druva’s API.
Normal customer permissions apply to the API calls made by Dru.
The call to the API (Step 1) is only possible for authenticated user. The authentication component lives outside the Dru solution and is used across other internal solutions.
To avoid prompt injection, jailbreaking, and other malicious activities, a separate module checks for these before the request reaches this service (Amazon API Gateway in Step 1).

For more details, refer to Druva’s Secret Sauce: Meet the Technology Behind Dru’s GenAI Magic.

Implementation details

In this section, we discuss Steps 2a–2e in the solution workflow.

2a. Look up the API definition

This step uses an FM to perform classification. It takes the user question and a full list of available API routes with meaningful names and descriptions as the input, and responds The following is a sample prompt:

Please read the following API routes carefully as I’ll ask you a question about them:
<api_routes>{api_routes}</api_routes>
Which API route can best answer “{question}”?

2b. Generate the API call

This step uses an FM to generate API parameters. It first looks up the corresponding swagger for the API route (from Step 2a). Next, it passes the swagger and the user question to an FM and responds with some key-value pairs to the API route that can retrieve relevant data. The following is a sample prompt:

Please read the following swagger carefully as I’ll ask you a question about it:
<swagger>{swagger}</swagger>
Produce a key-value JSON dict of the available request parameters based on “{question}” with reference to the swagger.

2c. Validate and invoke the API call

In the previous step, even with an attempt to ground responses with swagger, the FM can still hallucinate wrong or nonexistent API parameters. This step uses a programmatic way to verify, format, and invoke the API call to get data. The following is the pseudo code:

for each input parameter (key/value)
  if parameter key not in swagger then
    drop parameter
  else if parameter value data type not match swagger then
    drop parameter
  else
    URL encode parameter
  end if
end for

2d. Generate Python code to transform data

This step uses an FM to generate Python code. It first samples a few records of input data to reduce input tokens. Then it passes the sample data and the user question to an FM and responds with a Python script that transforms data to answer the question. The following is a sample prompt:

Please read the following sample data carefully as I’ll ask you a question about them:
<sample_data>{5_rows_of_data_in_json}</sample_data>
Write a Python script using pandas to transform the data to answer the question “{question}”.

2e. Run the Python code

This step involves a Python script, which imports the generated Python package, runs the transformation, and returns the tabular data as the final response. If an error occurs, it will invoke the FM to try to correct the code. When everything fails, it returns the input data. The following is the pseudo code:

for maximum number of retries
  run data transformation function
  if error then
    invoke foundation model to correct code
  end if
end for
if success then
  return transformed data
else
  return input data
end if

Conclusion

Using Amazon Bedrock for the solution foundation led to remarkable achievements in accuracy, as evidenced by the following metrics in our evaluations using an internal dataset:

Stream 1: Identify the API route – Achieved a perfect accuracy rate of 100%
Stream 2: Generate and invoke private API calls – Maintained this standard with a 100% accuracy rate
Stream 3: Generate and run data transformation Python code – Attained a highly commendable accuracy of 90%

These results are not just numbers; they are a testament to the robustness and efficiency of the Amazon Bedrock based solution. With such high levels of accuracy, Druva is now poised to confidently broaden their horizons. Our next goal is to extend this solution to encompass a wider range of APIs across Druva products. The next expansion will be scaling up usage and substantially enrich the experience of Druva customers. By integrating more APIs, Druva will offer a more seamless, responsive, and contextual interaction with Druva products, further enhancing the value delivered to Druva users.

To learn more about Druva’s AI solutions, visit the Dru solution page, where you can see some of these capabilities in action through recorded demos. Visit the AWS Machine Learning blog to see how other customers are using Amazon Bedrock to solve their business problems.

About the Authors

David Gildea is the VP of Product for Generative AI at Druva. With over 20 years of experience in cloud automation and emerging technologies, David has led transformative projects in data management and cloud infrastructure. As the founder and former CEO of CloudRanger, he pioneered innovative solutions to optimize cloud operations, later leading to its acquisition by Druva. Currently, David leads the Labs team in the Office of the CTO, spearheading R&D into generative AI initiatives across the organization, including projects like Dru Copilot, Dru Investigate, and Amazon Q. His expertise spans technical research, commercial planning, and product development, making him a prominent figure in the field of cloud technology and generative AI.

Tom Nijs is an experienced backend and AI engineer at Druva, passionate about both learning and sharing knowledge. With a focus on optimizing systems and using AI, he’s dedicated to helping teams and developers bring innovative solutions to life.

Corvus Lee is a Senior GenAI Labs Solutions Architect at AWS. He is passionate about designing and developing prototypes that use generative AI to solve customer problems. He also keeps up with the latest developments in generative AI and retrieval techniques by applying them to real-world scenarios.

Fahad Ahmed is a Senior Solutions Architect at AWS and assists financial services customers. He has over 17 years of experience building and designing software applications. He recently found a new passion of making AI services accessible to the masses.

Create a generative AI–powered custom Google Chat application using Amazon Bedrock

October 31, 2024

by Nizar Kheir Amazon AWS

AWS offers powerful generative AI services, including Amazon Bedrock, which allows organizations to create tailored use cases such as AI chat-based assistants that give answers based on knowledge contained in the customers’ documents, and much more. Many businesses want to integrate these cutting-edge AI capabilities with their existing collaboration tools, such as Google Chat, to enhance productivity and decision-making processes.

This post shows how you can implement an AI-powered business assistant, such as a custom Google Chat app, using the power of Amazon Bedrock. The solution integrates large language models (LLMs) with your organization’s data and provides an intelligent chat assistant that understands conversation context and provides relevant, interactive responses directly within the Google Chat interface.

This solution showcases how to bridge the gap between Google Workspace and AWS services, offering a practical approach to enhancing employee efficiency through conversational AI. By implementing this architectural pattern, organizations that use Google Workspace can empower their workforce to access groundbreaking AI solutions powered by Amazon Web Services (AWS) and make informed decisions without leaving their collaboration tool.

With this solution, you can interact directly with the chat assistant powered by AWS from your Google Chat environment, as shown in the following example.

Solution overview

We use the following key services to build this intelligent chat assistant:

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI
AWS Lambda, a serverless computing service, lets you handle the application logic, processing requests, and interaction with Amazon Bedrock
Amazon DynamoDB lets you store session memory data to maintain context across conversations
Amazon API Gateway lets you create a secure API endpoint for the custom Google Chat app to communicate with our AWS based solution.

The following figure illustrates the high-level design of the solution.

The workflow includes the following steps:

The process begins when a user sends a message through Google Chat, either in a direct message or in a chat space where the application is installed.
The custom Google Chat app, configured for HTTP integration, sends an HTTP request to an API Gateway endpoint. This request contains the user’s message and relevant metadata.
Before processing the request, a Lambda authorizer function associated with the API Gateway authenticates the incoming message. This verifies that only legitimate requests from the custom Google Chat app are processed.
After it’s authenticated, the request is forwarded to another Lambda function that contains our core application logic. This function is responsible for interpreting the user’s request and formulating an appropriate response.
The Lambda function interacts with Amazon Bedrock through its runtime APIs, using either the RetrieveAndGenerate API that connects to a knowledge base, or the Converse API to chat directly with an LLM available on Amazon Bedrock. This also allows the Lambda function to search through the organization’s knowledge base and generate an intelligent, context-aware response using the power of LLMs. The Lambda function also uses a DynamoDB table to keep track of the conversation history, either directly with a user or within a Google Chat space.
After receiving the generated response from Amazon Bedrock, the Lambda function sends this answer back through API Gateway to the Google Chat app.
Finally, the AI-generated response appears in the user’s Google Chat interface, providing the answer to their question.

This architecture allows for a seamless integration between Google Workspace and AWS services, creating an AI-driven assistant that enhances information accessibility within the familiar Google Chat environment. You can customize this architecture to connect other solutions that you develop in AWS to Google Chat.

In the following sections, we explain how to deploy this architecture.

Prerequisites

To implement the solution outlined in this post, you must have the following:

A Linux or MacOS development environment with at least 20 GB of free disk space. It can be a local machine or a cloud instance. If you use an AWS Cloud9 instance, make sure you have increased the disk size to 20 GB.
The AWS Command Line Interface (AWS CLI) installed on your development environment. This tool allows you to interact with AWS services through command line commands.
An AWS account and an AWS Identity and Access Management (IAM) principal with sufficient permissions to create and manage the resources needed for this application. If you don’t have an AWS account, refer to How do I create and activate a new Amazon Web Services account? To configure the AWS CLI with the associated credentials, typically, you set up an AWS access key ID and secret access key for a designated IAM user with appropriate permissions.
Request access to Amazon Bedrock FMs. In this post, we use either Anthropic’s Claude Sonnet 3 or Amazon Titan Text G1 Premier available in Amazon Bedrock, but you can also choose other models that are supported for Amazon Bedrock knowledge bases.
Optionally, an Amazon Bedrock knowledge base created in your account, which allows you to integrate your own documents into your generative AI applications. If you don’t have an existing knowledge base, refer to Create an Amazon Bedrock knowledge base. Alternatively, the solution proposes an option without a knowledge base, with answers generated only by the FM on the backend.
A Business or Enterprise Google Workspace account with access to Google Chat. You also need a Google Cloud project with billing enabled. To check that an existing project has billing enabled, see Verify the billing status of your projects.
Docker installed on your development environment.

Deploy the solution

The application presented in this post is available in the accompanying GitHub repository and provided as an AWS Cloud Development Kit (AWS CDK) project. Complete the following steps to deploy the AWS CDK project in your AWS account:

Clone the GitHub repository on your local machine.
Install the Python package dependencies that are needed to build and deploy the project. This project is set up like a standard Python project. We recommend that you create a virtual environment within this project, stored under the .venv. To manually create a virtual environment on MacOS and Linux, use the following command:
```
python3 -m venv .venv
```

After the initialization process is complete and the virtual environment is created, you can use the following command to activate your virtual environment:
```
source .venv/bin/activate
```

Install the Python package dependencies that are needed to build and deploy the project. In the root directory, run the following command:
```
pip install -r requirements.txt
```

Run the cdk bootstrap command to prepare an AWS environment for deploying the AWS CDK application.
Run the script init-script.bash:

chmod u+x init-script.bash
./init-script.bash

This script prompts you for the following:

The Amazon Bedrock knowledge base ID to associate with your Google Chat app (refer to the prerequisites section). Keep this blank if you decide not to use an existing knowledge base.
Which LLM you want to use in Amazon Bedrock for text generation. For this solution, you can choose between Anthropic’s Claude Sonnet 3 or Amazon Titan Text G1 – Premier

The following screenshot shows the input variables to the init-script.bash script.

The script deploys the AWS CDK project in your account. After it runs successfully, it outputs the parameter ApiEndpoint, whose value designates the invoke URL for the HTTP API endpoint deployed as part of this project. Note the value of this parameter because you use it later in the Google Chat app configuration.
The following screenshot shows the output of the init-script.bash script.

You can also find this parameter on the AWS CloudFormation console, on the stack’s Outputs tab.

Register a new app in Google Chat

To integrate the AWS powered chat assistant into Google Chat, you create a custom Google Chat app. Google Chat apps are extensions that bring external services and resources directly into the Google Chat environment. These apps can participate in direct messages, group conversations, or dedicated chat spaces, allowing users to access information and take actions without leaving their chat interface.

For our AI-powered business assistant, we create an interactive custom Google Chat app that uses the HTTP integration method. This approach allows our app to receive and respond to user messages in real time, providing a seamless conversational experience.

After you have deployed the AWS CDK stack in the previous section, complete the following steps to register a Google Chat app in the Google Cloud portal:

Open the Google Cloud portal and log in with your Google account.
Search for “Google Chat API” and navigate to the Google Chat API page, which lets you build Google Chat apps to integrate your services with Google Chat.
If this is your first time using the Google Chat API, choose ACTIVATE. Otherwise, choose MANAGE.
On the Configuration tab, under Application info, provide the following information, as shown in the following screenshot:
1. For App name, enter an app name (for example, bedrock-chat).
2. For Avatar URL, enter the URL for your app’s avatar image. As a default, you can provide the Google chat product icon.
3. For Description, enter a description of the app (for example, Chat App with Amazon Bedrock).

Under Interactive features, turn on Enable Interactive features.
Under Functionality, select Receive 1:1 messages and Join spaces and group conversations, as shown in the following screenshot.

Under Connection settings, provide the following information:
1. Select App URL.
2. For App URL, enter the Invoke URL associated with the deployment stage of the HTTP API gateway. This is the ApiEndpoint parameter that you noted at the end of the deployment of the AWS CDK template.
3. For Authentication Audience, select App URL, as shown in the following screenshot.

Under Visibility, select Make this Chat app available to specific people and groups in <your-company-name> and provide email addresses for individuals and groups who will be authorized to use your app. You need to add at least your own email if you want to access the app.

Choose Save.

The following animation illustrates these steps on the Google Cloud console.

By completing these steps, the new Amazon Bedrock chat app should be accessible on the Google Chat console for the persons or groups that you authorized in your Google Workspace.

To dispatch interaction events to the solution deployed in this post, Google Chat sends requests to your API Gateway endpoint. To verify the authenticity of these requests, Google Chat includes a bearer token in the Authorization header of every HTTPS request to your endpoint. The Lambda authorizer function provided with this solution verifies that the bearer token was issued by Google Chat and targeted at your specific app using the Google OAuth client library. You can further customize the Lambda authorizer function to implement additional control rules based on User or Space objects included in the request from Google Chat to your API Gateway endpoint. This allows you to fine-tune access control, for example, by restricting certain features to specific users or limiting the app’s functionality in particular chat spaces, enhancing security and customization options for your organization.

Converse with your custom Google Chat app

You can now converse with the new app within your Google Chat interface. Connect to Google Chat with an email that you authorized during the configuration of your app and initiate a conversation by finding the app:

Choose New chat in the chat pane, then enter the name of the application (bedrock-chat) in the search field.
Choose Chat and enter a natural language phrase to interact with the application.

Although we previously demonstrated a usage scenario that involves a direct chat with the Amazon Bedrock application, you can also invoke the application from within a Google chat space, as illustrated in the following demo.

Customize the solution

In this post, we used Amazon Bedrock to power the chat-based assistant. However, you can customize the solution to use a variety of AWS services and create a solution that fits your specific business needs.

To customize the application, complete the following steps:

Edit the file lambda/lambda-chat-app/lambda-chatapp-code.py in the GitHub repository you cloned to your local machine during deployment.
Implement your business logic in this file.

The code runs in a Lambda function. Each time a request is processed, Lambda runs the lambda_handler function:

def lambda_handler(event, context):
    if event['requestContext']['http']['method'] == 'POST':
        # A POST request indicates a Google Chat App Event sent by the application        
        data = json.loads(event['body'])
        # Invoke handle_post function that includes the logic to process Google chat app events
        response = handle_post(data)
        return { 'text': response }
    else:
        return {
            'statusCode': 405,
            'body': json.dumps("Method Not Allowed. This function must be called from Google Chat.")
        }

When Google Chat sends a request, the lambda_handler function calls the handle_post function.

Let’s replace the handle_post function with the following code:

def handle_post(data):
    if data['type'] == 'MESSAGE':
        user_message = data['message']['text']  
        space_name = data['space']['name']
        return f"Hello! You said: {user_message}nThe space name is: {space_name}"

Save your file, then run the following command in your terminal to deploy your new code:

cdk deploy

The deployment should take about a minute. When it’s complete, you can go to Google Chat and test your new business logic. The following screenshot shows an example chat.

As the image shows, your function gets the user message and a space name. You can use this space name as a unique ID for the conversation, which lets you to manage history.

As you become more familiar with the solution, you may want to explore advanced Amazon Bedrock features to significantly expand its capabilities and make it more robust and versatile. Consider integrating Amazon Bedrock Guardrails to implement safeguards customized to your application requirements and responsible AI policies. Consider also expanding the assistant’s capabilities through function calling, to perform actions on behalf of users, such as scheduling meetings or initiating workflows. You could also use Amazon Bedrock Prompt Flows to accelerate the creation, testing, and deployment of workflows through an intuitive visual builder. For more advanced interactions, you could explore implementing Amazon Bedrock Agents capable of reasoning about complex problems, making decisions, and executing multistep tasks autonomously.

Performance optimization

The serverless architecture used in this post provides a scalable solution out of the box. As your user base grows or if you have specific performance requirements, there are several ways to further optimize performance. You can implement API caching to speed up repeated requests or use provisioned concurrency for Lambda functions to eliminate cold starts. To overcome API Gateway timeout limitations in scenarios requiring longer processing times, you can increase the integration timeout on API Gateway, or you might replace it with an Application Load Balancer, which allows for extended connection durations. You can also fine-tune your choice of Amazon Bedrock model to balance accuracy and speed. Finally, Provisioned Throughput in Amazon Bedrock lets you provision a higher level of throughput for a model at a fixed cost.

Clean up

In this post, you deployed a solution that lets you interact directly with a chat assistant powered by AWS from your Google Chat environment. The architecture incurs usage cost for several AWS services. First, you will be charged for model inference and for the vector databases you use with Amazon Bedrock Knowledge Bases. AWS Lambda costs are based on the number of requests and compute time, and Amazon DynamoDB charges depend on read/write capacity units and storage used. Additionally, Amazon API Gateway incurs charges based on the number of API calls and data transfer. For more details about pricing, refer to Amazon Bedrock pricing.

There might also be costs associated with using Google services. For detailed information about potential charges related to Google Chat, refer to the Google Chat product documentation.

To avoid unnecessary costs, clean up the resources created in your AWS environment when you’re finished exploring this solution. Use the cdk destroy command to delete the AWS CDK stack previously deployed in this post. Alternatively, open the AWS CloudFormation console and delete the stack you deployed.

Conclusion

In this post, we demonstrated a practical solution for creating an AI-powered business assistant for Google Chat. This solution seamlessly integrates Google Workspace with AWS hosted data by using LLMs on Amazon Bedrock, Lambda for application logic, DynamoDB for session management, and API Gateway for secure communication. By implementing this solution, organizations can provide their workforce with a streamlined way to access AI-driven insights and knowledge bases directly within their familiar Google Chat interface, enabling natural language interaction and data-driven discussions without the need to switch between different applications or platforms.

Furthermore, we showcased how to customize the application to implement tailored business logic that can use other AWS services. This flexibility empowers you to tailor the assistant’s capabilities to their specific requirements, providing a seamless integration with your existing AWS infrastructure and data sources.

AWS offers a comprehensive suite of cutting-edge AI services to meet your organization’s unique needs, including Amazon Bedrock and Amazon Q. Now that you know how to integrate AWS services with Google Chat, you can explore their capabilities and build awesome applications!

About the Authors

Nizar Kheir is a Senior Solutions Architect at AWS with more than 15 years of experience spanning various industry segments. He currently works with public sector customers in France and across EMEA to help them modernize their IT infrastructure and foster innovation by harnessing the power of the AWS Cloud.

Lior Perez is a Principal Solutions Architect on the construction team based in Toulouse, France. He enjoys supporting customers in their digital transformation journey, using big data, machine learning, and generative AI to help solve their business challenges. He is also personally passionate about robotics and Internet of Things (IoT), and he constantly looks for new ways to use technologies for innovation.

Discover insights from Gmail using the Gmail connector for Amazon Q Business

October 31, 2024

by Divyajeet Singh Amazon AWS

A number of organizations use Gmail for their business email needs. Gmail for business is part of Google Workspace, which provides a set of productivity and collaboration tools like Google Drive, Gmail, and Google Calendar. Google Drive supports storing documents such as Emails contain a wealth of information found in different places, such as within the subject of an email, the message content, or even attachments. Performing an intelligent search on emails with co-workers can help you find answers to questions, improving productivity and enhancing the overall customer experience for the organization.

Amazon Q Business is a fully managed, generative AI-powered assistant designed to enhance enterprise operations. It can be tailored to specific business needs by connecting to company data, information, and systems through over 40 built-in connectors.

Amazon Q Business enables users in various roles, such as marketers, project managers, and sales representatives, to have tailored conversations, solve problems, generate content, take action, and more, all through a web-based interface. This tool aims to make employees work smarter, move faster, and drive more significant impact by providing immediate and relevant information and streamlining tasks.

With the Gmail connector for Amazon Q Business, you can enhance productivity and streamline communication processes within your organization. This integration empowers you to use advanced search capabilities and intelligent email management using natural language.

In this post, we guide you through the process of setting up the Gmail connector, enabling seamless interaction between Gmail and Amazon Q Business. Whether you’re a small startup or a large enterprise, this solution can help you maximize the potential of your Gmail data and empower your team with actionable insights.

Finding accurate answers from content in Gmail mailbox using Amazon Q Business

After you integrate Amazon Q Business with Gmail, you can ask a question and Amazon Q Business can index through your mailbox and find relevant answers. For example, you can make the following queries:

Natural language search – You can search for emails and attachments within your mailbox using natural language, making it effortless to find your desired information without having to remember specific keywords or filters
Summarization – You can request a concise summary of the conversations and attachments matching your search query, allowing you to quickly grasp the key points without having to manually sift through individual items
Query clarification – If your query is ambiguous or lacks sufficient context, Amazon Q Business can engage in a dialogue to clarify the intent, so you receive the most relevant and accurate results

Overview of the Gmail connector for Amazon Q Business

To crawl and index contents in Gmail, you can configure the Gmail connector for Amazon Q Business as a data source in your Amazon Q Business application. When you connect Amazon Q Business to a data source and initiate the sync process, Amazon Q Business crawls and indexes documents from the data source into its index.

A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. A data source is a data repository or location that Amazon Q Business connects to in order to retrieve your email data. After you set up the connector, you can create one or multiple data sources within Amazon Q Business and configure them to start indexing emails from your Gmail account.

Types of documents

Gmail messages can be sorted and stored inside your email inbox using folders and labels.

Let’s looks at what are considered as documents in the context of the Gmail connector for Amazon Q Business. The connector supports the crawling of the following entities in Gmail:

Email – Each email is considered a single document
Attachment – Each email attachment is considered a single document

Additionally, supported custom metadata and custom objects are also crawled during the sync process.

The Gmail connector for Amazon Q Business also supports the indexing of a rich set of metadata from the various entities in Gmail. It further provides the ability to map these source metadata fields to Amazon Q index fields for indexing. These field mappings allow you to map Gmail field names to Amazon Q index field names. There are three types of metadata fields that Amazon Q connectors support:

Default fields – These are required with each document, such as the title, creation date, or author
Optional fields – These are provided by the data source, and the administrator can optionally choose one or more of these fields if they contain important and relevant information to produce accurate answers
Custom metadata fields – These are fields created in the data source in addition to what the data source already provides

Refer to Gmail data source connector field mappings for more information.

Authentication

Before we index the content from Gmail, we need to first establish a secure connection between the Gmail connector for Amazon Q Business with your Google service account. To establish a secure connection, we need to authenticate with the data source.

The connector supports authentication using a Google service account. We describe the process of creating an account later in this post. For more information about authentication, see Gmail connector overview.

Secure querying with ACL crawling and identity crawling

Secure querying is when a user runs a query and is returned answers only from documents that the user has access to. To enable users to do secure querying, Amazon Q Business honors the access control lists (ACLs) of the documents. Amazon Q Business does this by first supporting the indexing of ACLs. Indexing documents with ACLs is crucial for maintaining data security, because documents without ACLs are considered public. Additionally, the user’s credentials (email address) are passed along with the query so that answers from documents that are relevant and which user is authorized to access are displayed.

When connecting a Gmail data source, Amazon Q Business crawls the ACL information attached to a document (user and group information) from your Gmail instance. In Gmail, user IDs are mapped to _user_id. User IDs exist in Gmail on files with set access permissions. They’re mapped from the user emails as the IDs in Gmail.

When a user logs in to a web application to conduct a search, the user’s credentials, such as an email address, need to match what is in the ACL of the document to return results from that document. The web application that the user uses to retrieve answers is connected to an identity provider (IdP) or AWS IAM Identity Center. The user’s credentials from the IdP or IAM Identity Center are referred to here as the federated user credentials. The federated user credentials are passed along with the query so that Amazon Q can return the answers from the documents that this user has access to.

Refer to How Amazon Q Business connector crawls Gmail ACLs for more information.

Solution overview

In the following sections, we demonstrate how to set up the Gmail connector for Amazon Q Business. Then we provide examples of how to use the AI-powered chat interface to gain insights from the connected data source.

In our solution, we index emails from Gmail by configuring the Gmail data source connector. This connector allows you to query your Gmail data using Amazon Q Business as your query engine.

After the configuration is complete, you can configure how often Amazon Q Business should synchronize with your Gmail account to keep up to date with the email content. This process makes sure that your email interactions are systematically updated within Amazon Q Business, enabling you to query and uncover valuable insights from your Gmail data.

The following diagram illustrates the solution architecture. Google Workspace is the data source. Emails and attachments along with the ACL information are passed to Amazon Q Business from the Google workspace. The user submits a query to the Amazon Q Business application. Amazon Q Business retrieves the ACL of the user and provides answers based on the emails and attachments that the user has access to.

Prerequisites

You should have the following:

An Amazon Q Business application. If you haven’t created one yet, refer to Build private and secure enterprise generative AI apps with Amazon Q Business and AWS IAM Identity Center for instructions.
A Google Workspace account and an organization for your business with one or many users that have access to Gmail.
Administrator account credentials to Google Workspace and the Google Cloud console.
Access to AWS Secrets Manager.
Privileges to create a new Amazon Q application (or add data sources to existing applications), AWS resources, and AWS Identity and Access Management (IAM) roles and policies.

Configure the Gmail connector for an Amazon Q Business application

To enable Amazon Q Business to access and index emails from Gmail accounts within the organization, it’s essential to configure the organization’s Google workspace. In the steps that follow, we create a service account that will be used by the Gmail connector for Amazon Q Business to index emails.

We provide the service account with authorization scopes to allow access to the required Gmail APIs. The authorization scopes express the permissions you request users to authorize for your application and are applicable to emails within your organization’s Google workspace.

Complete the following steps:

Log in to your organization’s Google Cloud account.
Create a new project with an appropriate name and assign it to your organization. In our example, we name the project GmailConnector.
Choose Create.

After you create the project, on the navigation menu, choose APIs and Services and Library to view the API Library.

On the API Library page, search for and choose Admin SDK API.

The Admin SDK API enables managing the Google workspace account resources and audit usage.

Choose Enable.

Similarly, search for the Gmail API on the API Library

The Gmail API can help in viewing and managing the Gmail mailbox data like threads, messages, and labels.

Choose Enable to enable this API.

We now create a service account. The service account will be used by the Amazon Q Business Gmail data source connector to access the organization’s emails based on the allowed API scope.

On the navigation menu, choose IAM and Admin and Service accounts.

Choose Create service account.

Name the service account Amazon-q-integration-gmail, enter a description, and choose Create and continue.
Skip the optional sections Grant this service account access to project and Grant users access to this service account.
Choose Done.

Choose the service account you created to navigate to the service account details page.
Note the unique ID for the service account—the unique ID is also known as the client ID, and will be used in later steps.

Next, we create the keys for the service account, which will allow it to be used by the Gmail connector for Amazon Q Business.

On the Keys tab, choose Add key and Create new key.

When prompted for the key type, select the recommended option JSON and choose Create.

This will download the private key to your computer, which must be kept safe to allow configuration within the Amazon Q console. The following screenshot shows an example of the credentials JSON file.

On the Details tab, expand the Advanced settings section and choose View Google Workspace Admin console in the Domain-wide Delegation

Granting access to the service account using a domain-wide delegation to your organization’s data must be treated as a privileged operation and done with caution. You can reverse the access grant by disabling or deleting the service account or removing access through the Google Workspace Admin console.

Use the Google Workspace Admin credentials to log in to the Google Workspace Admin console.
Under Security on the navigation menu, under Access and data control, choose API controls.
In the Domain-wide delegation section, choose Manage domain-wide delegation.

Choose Add new.

In the Add a new client ID dialog, enter the unique ID for the service account you created.
Enter the following scopes to allow the service account to access the emails from Gmail:
- https://www.googleapis.com/auth/gmail.readonly – This scope allows to you to view your email messages and settings.
- https://www.googleapis.com/auth/admin.directory.user.readonly – This scope allows to see and download your organization’s Google Workspace directory.

For more details about all the scopes available, refer to OAuth 2.0 Scopes for Google APIs.

Choose Authorize.

This concludes the configuration within the Google Cloud console and Google Workspace Admin console.

Create the Gmail connector for an Amazon Q Business application

This post assumes that an Amazon Q Business application has already been created beforehand. If you haven’t created one yet, refer to Build private and secure enterprise generative AI apps with Amazon Q Business and AWS IAM Identity Center for instructions.

Complete the following steps to configure the connector:

On the Amazon Q Business console, choose Applications in the navigation pane.
Select the application that you want to add the Gmail connector to.
On the Actions menu, choose Edit.

On the Update application page, leave all values unchanged and choose Update.

On the Update retriever page, leave all values as default and choose Next.

On the Connect data sources page, on the All tab, search for Gmail in the search field.
Choose the plus sign next to Gmail, which will open up a page to set up the data source.

In the Name and description section, enter a name and description.

In the Authentication section, choose Create and add new secret.

In the Create an AWS Secrets Manager secret pop-up, provide the following information:
- Enter a name for your Secrets Manager secret.
- For Client email and Private key, refer to the JSON file that you downloaded to your local machine earlier.
- For Admin account email, enter the admin account for your Google
- For Private key, enter the private key details.
- Choose Save.

In the IAM role section, for IAM role, choose Create a new service role (recommended).

In the Sync scope section, select Message attachments and enter a value for Maximum file size.
Optionally, configure the following under Additional configuration (we leave everything as default for this post):
- For Date range, enter the start and end dates for emails to be crawled. Emails received on or after the start date and before the end date are included in the sync scope.
- For Email domains, enter the email from domains, email to domains, subject, CC emails, and BCC emails you want to include or exclude in your index.
- For Keywords in subjects, include or exclude any documents with at least one keyword mentioned in their subjects
- For Labels, add regular expression patterns to include or exclude certain labels or attachment types. You can add up to 100 patterns.
- For Attachments, add regular expression patterns to include or exclude certain attachments. You can add up to 100 patterns.

In the Sync mode section, select New, modified, or deleted content sync.
In the Sync run schedule section, choose the frequency that works best for your use case. For this post, we choose Run on demand.

Choose Add data source and wait for the retriever to be created.

After the data source is created, you’re redirected to the Connect data sources page to add more data sources as needed.

Verify your data source is added and choose Next.

On the Update groups and users page, choose Add groups and users.

The users and groups that you add in this section are from the IAM Identity Center users and groups set up by your administrator.

In the Add or assign users and groups pop-up window, select Assign existing users and groups to add existing users configured in your connected IAM Identity Center, then choose Next.

Optionally, if you have permissions to add users to connected IAM Identity Center, you can select Add new users.

Choose Get started.

Search for users by user display name or groups by group name.
Choose the users or groups you want you add and choose Assign.

The groups and users that you added should now be available on the Groups or Users tabs.

Choose Assign.

For each group or user entry, an Amazon Q Business subscription tier needs to be assigned.

To enable a subscription for a group, on the Update groups and users page, choose the Groups tab (if individual users need to be assigned a subscription, choose the Users tab).
Under the Subscription column, select Choose subscription and choose a subscription (Q Business Lite or Q Business Pro).
Choose Update application to complete adding and setting up the Gmail connector for Amazon Q Business.

Configure Gmail field mappings

To help you structure data for retrieval and chat filtering, Amazon Q Business crawls data source document attributes or metadata and maps them to fields in your Amazon Q index. Amazon Q has reserved fields that it uses when querying your application. When possible, Amazon Q automatically maps these built-in fields to attributes in your data source.

If a built-in field doesn’t have a default mapping, or if you want to map additional index fields, use the custom field mappings to specify how a data source attribute maps to your Amazon Q application.

On the Amazon Q Business console, choose your application.
Under Data sources, select your data source.
On the Actions menu, choose Edit.

In the Field mappings section, select the required fields to crawl under Messages and Message attachments and any types that are available.

The Gmail connector setup for Amazon Q Business is now complete.

To test the connectivity to Gmail and initiate the data synchronization, choose Sync now. The initial sync process may take several minutes to complete.

When the sync is complete, in the Sync run history section, you can see the sync status along with a summary of how may total items were added, deleted, modified, and failed during the sync process.

Query Gmail data using the Amazon Q web experience

Now that the data synchronization is complete, you can start exploring insights from Amazon Q. In the newly created Amazon Q application, choose Customize web experience to open a new tab with a preview of the UI and options to customize as per your needs.

You can customize the Title, Subtitle, and Welcome message fields according to your needs, which will be reflected in the UI.

For this walkthrough, we use the defaults and choose View web experience to be redirected to the login page for the Amazon Q application.

Log in to the application using the credentials for the user that were added to the Amazon Q application. After the login is successful, you’re redirected to the Amazon Q assistant UI, where you can ask questions using natural language and get insights from your Gmail index.

The Gmail data source connected to this Amazon Q Business application has email and Gmail attachments. We demonstrate how the Amazon Q application lets you ask questions on your email using natural language and receive responses and insights for those queries.

Let’s begin by asking Amazon Q to summarize key points from Matt Garma’s (CEO of AWS) email. The following screenshot displays the response and it also includes the email source from where it is generating the response.

For our next example, let’s ask Amazon Q to provide details about return issue customer is facing for a bicycle order they placed with Amazon. Following screenshot shows the details about the issue being faced by the customer and includes the email source from where Amazon Q is generating the response.

Troubleshooting

Troubleshooting your Amazon Q Business Gmail connector provides information about error codes you might see for the Gmail connector and suggested troubleshooting actions. If you encounter an HTTP status code 403 (Forbidden) error when you open your Amazon Q Business application, it means that the user is unable to access the application. . See Troubleshooting Amazon Q Business and identity provider integration for common causes and how to address them.

Frequently asked questions

In this section, we provide guidance to frequently asked questions.

Amazon Q Business is unable to answer your questions

This could happen due to a several reasons:

No permissions – ACLs applied to your account doesn’t allow you to query certain data sources. If this is the case, reach out to your application administrator to make sure your ACLs are configured to access the data sources.
Data connector sync failed – The data connector might have failed to sync information from the source to the Amazon Q Business application. Verify the data connector’s sync run schedule and sync history to confirm the sync is successful.

If neither of these reasons are true in your case, open a support case to get this resolved.

How to generate responses from authoritative data sources

You can configure these options using Amazon Q Business application global controls under Admin controls and guardrails.

Log in as an Amazon Q Business application administrator.
Navigate to the application and choose Admin controls and guardrails in the navigation pane.
Choose Edit in the Global controls section to control these options.

For more information, refer to Admin controls and guardrails in Amazon Q Business.

Amazon Q Business responds using old (stale) data even though your data source is updated

Each Amazon Q Business data connector can be configured with unique sync run schedule frequency. Verify the sync status and sync schedule frequency for your data connector to see when the last sync ran successfully. Your data connector’s sync run schedule could be set to sync at a scheduled time of day, week, or month. If it’s set to run on demand, the sync has to be run manually. When the sync run is complete, verify the sync history to make sure the run has successfully synced all new issues. Refer to Sync run schedule for more information on each option.

How to set up Amazon Q Business using a different IdP

You can set up Amazon Q Business with another SAML 2.0-compliant IdP, such as Okta, Entra ID, or Ping Identity. For more information, see Creating an Amazon Q Business application using Identity Federation through IAM.

Expand the solution

You can explore other features in Amazon Q Business. For example, the Amazon Q Business document enrichment feature helps you control both which documents and document attributes are ingested into your index and how they’re ingested. With document enrichment, you can create, modify, or delete document attributes and document content when you ingest them into your Amazon Q Business index. For example, you can scrub personally identifiable information (PII) by choosing to delete any document attributes related to PII.

Amazon Q Business also offers the following features:

Filtering using metadata – Use document attributes to customize and control users’ chat experience. This is currently supported only if you use the Amazon Q Business API.
Source attribution with citations – Verify responses using Amazon Q Business source attributions.
Upload files and chat – Let users upload files directly into chat and use uploaded file data to perform web experience tasks.
Quick prompts – Feature sample prompts to inform users of the capabilities of their Amazon Q Business web experience.

To improve retrieved results and customize the user chat experience, you can map document attributes from your data sources to fields in your Amazon Q index. To learn more, see Gmail data source connector field mappings.

Clean up

To avoid incurring future charges, clean up any resources you created as part of this solution, including the Amazon Q application:

On the Amazon Q console, choose Applications in the navigation pane.
Select the dashboard you created.
On the Actions menu, choose Delete.
Delete the IAM roles created for the application and data retriever.
If you used IAM Identity Center for this walkthrough, delete your IAM Identity Center instance.

Conclusion

In this post, we discussed how to configure the Gmail connector for Amazon Q Business and use the AI-powered chat interface to gain insights from the connected data source.

To learn more about the Gmail connector for Amazon Q Business, refer to Connecting Gmail to Amazon Q Business, the Amazon Q User Guide, and the Amazon Q Developer Guide.

About the Authors

Divyajeet (DJ) Singh is a Sr. Solutions Architect at AWS Canada. He loves working with customers to help them solve their unique business challenges using the cloud. In his free time, he enjoys spending time with family and friends, and exploring new places.

Temi Aremu is a Solutions Architect at AWS Canada. She is passionate about helping customers solve their business problems with the power of the AWS Cloud. Temi’s areas of interest are analytics, machine learning, and empowering the next generation of women in STEM.

Vineet Kachhawaha is a Sr. Solutions Architect at AWS focusing on AI/ML and generative AI. He co-leads the AWS for Legal Tech team within AWS. He is passionate about working with enterprise customers and partners to design, deploy, and scale AI/ML applications to derive business value.

Vijai Gandikota is a Principal Product Manager in the Amazon Q and Amazon Kendra organization of Amazon Web Services. He is responsible for the Amazon Q and Amazon Kendra connectors, ingestion, security, and other aspects of the Amazon Q and Amazon Kendra services.

Dipti Kulkarni is a Software Development Manager on the Amazon Q and Amazon Kendra engineering team of Amazon Web Services, where she manages the connector development and integration teams.

Accelerate custom labeling workflows in Amazon SageMaker Ground Truth without using AWS Lambda

October 31, 2024

by Sundar Raghavan Amazon AWS

Amazon SageMaker Ground Truth enables the creation of high-quality, large-scale training datasets, essential for fine-tuning across a wide range of applications, including large language models (LLMs) and generative AI. By integrating human annotators with machine learning, SageMaker Ground Truth significantly reduces the cost and time required for data labeling. Whether it’s annotating images, videos, or text, SageMaker Ground Truth allows you to build accurate datasets while maintaining human oversight and feedback at scale. This human-in-the-loop approach is crucial for aligning foundation models with human preferences, enhancing their ability to perform tasks tailored to your specific requirements.

To support various labeling needs, SageMaker Ground Truth provides built-in workflows for common tasks like image classification, object detection, and semantic segmentation. Additionally, it offers the flexibility to create custom workflows, enabling you to design your own UI templates for specialized data labeling tasks, tailored to your unique requirements.

Previously, setting up a custom labeling job required specifying two AWS Lambda functions: a pre-annotation function, which is run on each dataset object before it’s sent to workers, and a post-annotation function, which is run on the annotations of each dataset object and consolidates multiple worker annotations if needed. Although these functions offer valuable customization capabilities, they also add complexity for users who don’t require additional data manipulation. In these cases, you would have to write functions that merely returned your input unchanged, increasing development effort and the potential for errors when integrating the Lambda functions with the UI template and input manifest file.

Today, we’re pleased to announce that you no longer need to provide pre-annotation and post-annotation Lambda functions when creating custom SageMaker Ground Truth labeling jobs. These functions are now optional on both the SageMaker console and the CreateLabelingJob API. This means you can create custom labeling workflows more efficiently when you don’t require extra data processing.

In this post, we show you how to set up a custom labeling job without Lambda functions using SageMaker Ground Truth. We guide you through configuring the workflow using a multimodal content evaluation template, explain how it works without Lambda functions, and highlight the benefits of this new capability.

Solution overview

When you omit the Lambda functions in a custom labeling job, the workflow simplifies:

No pre-annotation function – The data from the input manifest file is inserted directly into the UI template. You can reference the data object fields in your template without needing a Lambda function to map them.
No post-annotation function – Each worker’s annotation is saved directly to your specified Amazon Simple Storage Service (Amazon S3) bucket as an individual JSON file, with the annotation stored under a worker-response key. Without a post-annotation Lambda function, the output manifest file references these worker response files instead of including all annotations directly within the manifest.

In the following sections, we walk through how to set up a custom labeling job without Lambda functions using a multimodal content evaluation template, which allows you to evaluate model-generated descriptions of images. Annotators can review an image, a prompt, and the model’s response, then evaluate the response based on criteria such as accuracy, relevance, and clarity. This provides crucial human feedback for fine-tuning models using Reinforcement Learning from Human Feedback (RLHF) or evaluating LLMs.

Prepare the input manifest file

To set up our labeling job, we begin by preparing the input manifest file that the template will use. The input manifest is a JSON Lines file where each line represents a dataset item to be labeled. Each line contains a source field for embedded data or a source-ref field for references to data stored in Amazon S3. These fields are used to provide the data objects that annotators will label. For detailed information on the input manifest file structure, refer to Input manifest files.

For our specific task—evaluating model-generated descriptions of images—we structure the input manifest to include the following fields:

“source” – The prompt provided to the model
“image” – The S3 URI of the image associated with the prompt
“modelResponse” – The model’s generated description of the image

By including these fields, we’re able to present both the prompt and the related data directly to the annotators within the UI template. This approach eliminates the need for a pre-annotation Lambda function because all necessary information is readily accessible in the manifest file.

The following code is an example of what a line in our input manifest might look like:

{
  "source": "Describe the following image in four lines",
  "image": "s3://your-bucket-name/path-to-image/image.jpeg",
  "modelResponse": "The image features a stylish pair of over-ear headphones with cushioned ear cups and a tan leather headband on a wooden desk. Soft natural light fills a cozy home office, with a laptop, smartphone, and notebook nearby. A cup of coffee and a pen add to the workspace's relaxed vibe. The setting blends modern tech with a warm, inviting atmosphere."
}

Insert the prompt in the UI template

In your UI template, you can insert the prompt using {{ task.input.source }}, display the image using an <img> tag with src="{{ task.input.image | grant_read_access }}" (the grant_read_access Liquid filter provides the worker with access to the S3 object), and show the model’s response with {{ task.input.modelResponse }}. Annotators can then evaluate the model’s response based on predefined criteria, such as accuracy, relevance, and clarity, using tools like sliders or text input fields for additional comments. You can find the complete UI template for this task in our GitHub repository.

Create the labeling job on the SageMaker console

To configure the labeling job using the AWS Management Console, complete the following steps:

On the SageMaker console, under Ground Truth in the navigation pane, choose Labeling job.
Choose Create labeling job.
Specify your input manifest location and output path.
Select Custom as the task type.
Choose Next.
Enter a task title and description.
Under Template, upload your UI template.

The annotation Lambda functions are now an optional setting under Additional configuration.

Choose Preview to display the UI template for review.

Choose Create to create the labeling job.

Create the labeling job using the CreateLabelingJob API

You can also create the custom labeling job programmatically by using the AWS SDK to invoke the CreateLabelingJob API. After uploading the input manifest files to an S3 bucket and setting up a work team, you can define your labeling job in code, omitting the Lambda function parameters if they’re not needed. The following example demonstrates how to do this using Python and Boto3.

In the API, the pre-annotation Lambda function is specified using the PreHumanTaskLambdaArn parameter within the HumanTaskConfig structure. The post-annotation Lambda function is specified using the AnnotationConsolidationLambdaArn parameter within the AnnotationConsolidationConfig structure. With the recent update, both PreHumanTaskLambdaArn and AnnotationConsolidationConfig are now optional. This means you can omit them if your labeling workflow doesn’t require additional data preprocessing or postprocessing.

The following code is an example of how to create a labeling job without specifying the Lambda functions:

response = sagemaker.create_labeling_job(
    LabelingJobName="Lambda-free-job-demo",
    LabelAttributeName="label",
    InputConfig={
        "DataSource": {
            "S3DataSource": {
                "ManifestS3Uri": "s3://customer-bucket/path-to-manifest"
            }
        }
    },
    OutputConfig={
        "S3OutputPath": "s3://customer-bucket/path-to-output-file"
    },
    RoleArn="arn:aws:iam::012345678910:role/CustomerRole",

    # Notice, no PreHumanTaskLambdaArn or AnnotationConsolidationConfig!
    HumanTaskConfig={
        "TaskAvailabilityLifetimeInSeconds": 21600,
        "TaskTimeLimitInSeconds": 3600,
        "WorkteamArn": "arn:aws:sagemaker:us-west-2:058264523720:workteam/private-crowd/customer-work-team-name",
        "TaskDescription": " Evaluate model-generated text responses based on a reference image.",
        "MaxConcurrentTaskCount": 1000,
        "TaskTitle": " Evaluate Model Responses Based on Image References",
        "NumberOfHumanWorkersPerDataObject": 1,
        "UiConfig": {
            "UiTemplateS3Uri": "s3://customer-bucket/path-to-ui-template"
        }
    }
)

When the annotators submit their evaluations, their responses are saved directly to your specified S3 bucket. The output manifest file includes the original data fields and a worker-response-ref that points to a worker response file in S3. This worker response file contains all the annotations for that data object. If multiple annotators have worked on the same data object, their individual annotations are included within this file under an answers key, which is an array of responses. Each response includes the annotator’s input and metadata such as acceptance time, submission time, and worker ID.

This means that all annotations for a given data object are collected in one place, allowing you to process or analyze them later according to your specific requirements, without needing a post-annotation Lambda function. You have access to all the raw annotations and can perform any necessary consolidation or aggregation as part of your post-processing workflow.

Benefits of labeling jobs without Lambda functions

Creating custom labeling jobs without Lambda functions offers several benefits:

Simplified setup – You can create custom labeling jobs more quickly by skipping the creation and configuration of Lambda functions when they’re not needed.
Time savings – Reducing the number of components in your labeling workflow saves development and debugging time.
Reduced complexity – Fewer moving parts mean a lower chance of encountering configuration errors or integration issues.
Cost reduction – By not using Lambda functions, you reduce the associated costs of deploying and invoking these resources.
Flexibility – You retain the ability to use Lambda functions for preprocessing and annotation consolidation when your project requires these capabilities. This update offers simplicity for straightforward tasks and flexibility for more complex requirements.

This feature is currently available in all AWS Regions that support SageMaker Ground Truth. In the future, look out for built-in task types that don’t require annotation Lambda functions, providing a simplified experience for SageMaker Ground Truth across the board.

Conclusion

The introduction of workflows for custom labeling jobs in SageMaker Ground Truth without Lambda functions significantly simplifies the data labeling process. By making Lambda functions optional, we’ve made it simpler and faster to set up custom labeling jobs, reducing potential errors and saving valuable time.

This update maintains the flexibility of custom workflows while removing unnecessary steps for those who don’t require specialized data processing. Whether you’re conducting simple labeling tasks or complex multi-stage annotations, SageMaker Ground Truth now offers a more streamlined path to high-quality labeled data.

We encourage you to explore this new feature and see how it can enhance your data labeling workflows. To get started, check out the following resources:

Browse over 80 available UI templates to suit your labeling needs on GitHub
Follow the step-by-step guide on creating custom labeling workflows to tailor your data labeling tasks

About the Authors

Sundar Raghavan is an AI/ML Specialist Solutions Architect at AWS, helping customers leverage SageMaker and Bedrock to build scalable and cost-efficient pipelines for computer vision applications, natural language processing, and generative AI. In his free time, Sundar loves exploring new places, sampling local eateries and embracing the great outdoors.

Alan Ismaiel is a software engineer at AWS based in New York City. He focuses on building and maintaining scalable AI/ML products, like Amazon SageMaker Ground Truth and Amazon Bedrock Model Evaluation. Outside of work, Alan is learning how to play pickleball, with mixed results.

Yinan Lang is a software engineer at AWS GroundTruth. He worked on GroundTruth, MechanicalTurk and Bedrock infrastructure, as well as customer facing projects for GroundTruth Plus. He also focuses on product security and worked on fixing risks and creating security tests. In leisure time, he is an audiophile and particularly loves to practice keyboard compositions by Bach.

George King is a summer 2024 intern at Amazon AI. He studies Computer Science and Math at the University of Washington and is currently between his second and third year. George loves being outdoors, playing games (chess and all kinds of card games), and exploring Seattle, where he has lived his entire life.

Unlock organizational wisdom using voice-driven knowledge capture with Amazon Transcribe and Amazon Bedrock

October 30, 2024

by Jundong Qiao Amazon AWS

Preserving and taking advantage of institutional knowledge is critical for organizational success and adaptability. This collective wisdom, comprising insights and experiences accumulated by employees over time, often exists as tacit knowledge passed down informally. Formalizing and documenting this invaluable resource can help organizations maintain institutional memory, drive innovation, enhance decision-making processes, and accelerate onboarding for new employees. However, effectively capturing and documenting this knowledge presents significant challenges. Traditional methods, such as manual documentation or interviews, are often time-consuming, inconsistent, and prone to errors. Moreover, the most valuable knowledge frequently resides in the minds of seasoned employees, who may find it difficult to articulate or lack the time to document their expertise comprehensively.

This post introduces an innovative voice-based application workflow that harnesses the power of Amazon Bedrock, Amazon Transcribe, and React to systematically capture and document institutional knowledge through voice recordings from experienced staff members. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Our solution uses Amazon Transcribe for real-time speech-to-text conversion, enabling accurate and immediate documentation of spoken knowledge. We then use generative AI, powered by Amazon Bedrock, to analyze and summarize the transcribed content, extracting key insights and generating comprehensive documentation.

The front-end of our application is built using React, a popular JavaScript library for creating dynamic UIs. This React-based UI seamlessly integrates with Amazon Transcribe, providing users with a real-time transcription experience. As employees speak, they can observe their words converted to text in real-time, permitting immediate review and editing.

By combining the React front-end UI with Amazon Transcribe and Amazon Bedrock, we’ve created a comprehensive solution for capturing, processing, and preserving valuable institutional knowledge. This approach not only streamlines the documentation process but also enhances the quality and accessibility of the captured information, supporting operational excellence and fostering a culture of continuous learning and improvement within organizations.

Solution overview

This solution uses a combination of AWS services, including Amazon Transcribe, Amazon Bedrock, AWS Lambda, Amazon Simple Storage Service (Amazon S3), and Amazon CloudFront, to deliver real-time transcription and document generation. This solution uses a combination of cutting-edge technologies to create a seamless knowledge capture process:

User interface – A React-based front-end, distributed through Amazon CloudFront, provides an intuitive interface for employees to input voice data.
Real-time transcription – Amazon Transcribe streaming converts speech to text in real time, providing accurate and immediate transcription of spoken knowledge.
Intelligent processing – A Lambda function, powered by generative AI models through Amazon Bedrock, analyzes and summarizes the transcribed text. It goes beyond simple summarization by performing the following actions:
- Extracting key concepts and terminologies.
- Structuring the information into a coherent, well-organized document.
Secure storage – Raw audio files, processed information, summaries, and generated content are securely stored in Amazon S3, providing scalable and durable storage for this valuable knowledge repository. S3 bucket policies and encryption are implemented to enforce data security and compliance.

This solution uses a custom authorization Lambda function with Amazon API Gateway instead of more comprehensive identity management solutions such as Amazon Cognito. This approach was chosen for several reasons:

Simplicity – As a sample application, it doesn’t demand full user management or login functionality
Minimal user friction – Users don’t need to create accounts or log in, simplifying the user experience
Quick implementation – For rapid prototyping, this approach can be faster to implement than setting up a full user management system
Temporary credential management – Businesses can use this approach to offer secure, temporary access to AWS services without embedding long-term credentials in the application

Although this solution works well for this specific use case, it’s important to note that for production applications, especially those dealing with sensitive data or needing user-specific functionality, a more robust identity solution such as Amazon Cognito would typically be recommended.

The following diagram illustrates the architecture of our solution.

The workflow includes the following steps:

Users access the front-end UI application, which is distributed through CloudFront
The React web application sends an initial request to Amazon API Gateway
API Gateway forwards the request to the authorization Lambda function
The authorization function checks the request against the AWS Identity and Access Management (IAM) role to confirm proper permissions
The authorization function sends temporary credentials back to the front-end application through API Gateway
With the temporary credentials, the React web application communicates directly with Amazon Transcribe for real-time speech-to-text conversion as the user records their input
After recording and transcription, the user sends (through the front-end UI) the transcribed texts and audio files to the backend through API Gateway
API Gateway routes the authorized request (containing transcribed text and audio files) to the orchestration Lambda function
The orchestration function sends the transcribed text for summarization
The orchestration function receives summarized text from Amazon Bedrock to generate content
The orchestration function stores the generated PDF files and recorded audio files in the artifacts S3 bucket

Prerequisites

You need the following prerequisites:

An active AWS account
Docker installed
The AWS CDK Toolkit 2.114.1+ installed and bootstrapped to the us-east-1 AWS Region
Python 3.12+ installed
Model access to Anthropic’s Claude enabled in Amazon Bedrock
An IAM user or role with access to Amazon Transcribe, Amazon Bedrock, Amazon S3, and Lambda

Deploy the solution with the AWS CDK

The AWS Cloud Development Kit (AWS CDK) is an open source software development framework for defining cloud infrastructure as code and provisioning it through AWS CloudFormation. Our AWS CDK stack deploys resources from the following AWS services:

Amazon Bedrock
Amazon CloudFront
AWS CodeBuild
Amazon EventBridge
IAM
AWS Key Management Service (AWS KMS)
AWS Lambda
Amazon S3
AWS Systems Manager Parameter Store
Amazon Transcribe
AWS WAF

To deploy the solution, complete the following steps:

Clone the GitHub repository: genai-knowledge-capture-webapp
Follow the Prerequisites section in the README.md file to set up your local environment

As of this writing, this solution supports deployment to the us-east-1 Region. The CloudFront distribution in this solution is geo-restricted to the US and Canada by default. To change this configuration, refer to the react-app-deploy.ts GitHub repo.

Invoke npm install to install the dependencies
Invoke cdk deploy to deploy the solution

The deployment process typically takes 20–30 minutes. When the deployment is complete, CodeBuild will build and deploy the React application, which typically takes 2–3 minutes. After that, you can access the UI at the ReactAppUrl URL that is output by the AWS CDK.

Amazon Transcribe Streaming within React application

Our solution’s front-end is built using React, a popular JavaScript library for creating dynamic user interfaces. We integrate Amazon Transcribe streaming into our React application using the aws-sdk/client-transcribe-streaming library. This integration enables real-time speech-to-text functionality, so users can observe their spoken words converted to text instantly.

The real-time transcription offers several benefits for knowledge capture:

With the immediate feedback, speakers can correct or clarify their statements in the moment
The visual representation of spoken words can help maintain focus and structure in the knowledge sharing process
It reduces the cognitive load on the speaker, who doesn’t need to worry about note-taking or remembering key points

In this solution, the Amazon Transcribe client is managed in a reusable React hook, useAudioTranscription.ts. An additional React hook, useAudioProcessing.ts, implements the necessary audio stream processing. Refer to the GitHub repo for more information. The following is a simplified code snippet demonstrating the Amazon Transcribe client integration:

// Create Transcribe client
transcribeClientRef.current = new TranscribeStreamingClient({
  region: credentials.Region,
  credentials: {
    accessKeyId: credentials.AccessKeyId,
    secretAccessKey: credentials.SecretAccessKey,
    sessionToken: credentials.SessionToken,
  },
});

// Create Transcribe Start Command
const transcribeStartCommand = new StartStreamTranscriptionCommand({
  LanguageCode: transcribeLanguage,
  MediaEncoding: audioEncodingType,
  MediaSampleRateHertz: audioSampleRate,
  AudioStream: getAudioStreamGenerator(),
});

// Start Transcribe session
const data = await transcribeClientRef.current.send(
  transcribeStartCommand
);
console.log("Transcribe session established ", data.SessionId);
setIsTranscribing(true);

// Process Transcribe result stream
if (data.TranscriptResultStream) {
  try {
    for await (const event of data.TranscriptResultStream) {
      handleTranscriptEvent(event, setTranscribeResponse);
    }
  } catch (error) {
    console.error("Error processing transcript result stream:", error);
  }
}

For optimal results, we recommend using a good-quality microphone and speaking clearly. At the time of writing, the system supports major dialects of English, with plans to expand language support in future updates.

Use the application

After deployment, open the ReactAppUrl link (https://<cloud front domain name>.cloudfront.net) in your browser (the solution supports Chrome, Firefox, Edge, Safari, and Brave browsers on Mac and Windows). A web UI opens, as shown in the following screenshot.

To use this application, complete the following steps:

Enter a question or topic.
Enter a file name for the document.
Choose Start Transcription and start recording your input for the given question or topic. The transcribed text will be shown in the Transcription box in real time.
After recording, you can edit the transcribed text.
You can also choose the play icon to play the recorded audio clips.
Choose Generate Document to invoke the backend service to generate a document from the input question and associated transcription. Meanwhile, the recorded audio clips are sent to an S3 bucket for future analysis.

The document generation process uses FMs from Amazon Bedrock to create a well-structured, professional document. The FM model performs the following actions:

Organizes the content into logical sections with appropriate headings
Identifies and highlights important concepts or terminologies
Generates a brief executive summary at the beginning of the document
Applies consistent formatting and styling

The audio files and generated documents are stored in a dedicated S3 bucket, as shown in the following screenshot, with appropriate encryption and access controls in place.

Choose View Document after you generate the document, and you will notice a professional PDF document generated with the user’s input in your browser, accessed through a presigned URL.

Additional information

To further enhance your knowledge capture solution and address specific use cases, consider the additional features and best practices discussed in this section.

Custom vocabulary with Amazon Transcribe

For industries with specialized terminology, Amazon Transcribe offers a custom vocabulary feature. You can define industry-specific terms, acronyms, and phrases to improve transcription accuracy. To implement this, complete the following steps:

Create a custom vocabulary file with your specialized terms
Use the Amazon Transcribe API to add this vocabulary to your account
Specify the custom vocabulary in your transcription requests

Asynchronous file uploads

For handling large audio files or improving user experience, implement an asynchronous upload process:

Create a separate Lambda function for file uploads
Use Amazon S3 presigned URLs to allow direct uploads from the client to Amazon S3
Invoke the upload Lambda function using S3 Event Notifications

Multi-topic document generation

For generating comprehensive documents covering multiple topics, refer to the following AWS Prescriptive Guidance pattern: Document institutional knowledge from voice inputs by using Amazon Bedrock and Amazon Transcribe. This pattern provides a scalable approach to combining multiple voice inputs into a single, coherent document.

Key benefits of this approach include:

Efficient capture of complex, multifaceted knowledge
Improved document structure and coherence
Reduced cognitive load on subject matter experts (SMEs)

Use captured knowledge as a knowledge base

The knowledge captured through this solution can serve as a valuable, searchable knowledge base for your organization. To maximize its utility, you can integrate with enterprise search solutions such as Amazon Bedrock Knowledge Bases to make the captured knowledge quickly discoverable. Additionally, you can set up regular review and update cycles to keep the knowledge base current and relevant.

Clean up

When you’re done testing the solution, remove it from your AWS account to avoid future costs:

Invoke cdk destroy to remove the solution
You may also need to manually remove the S3 buckets created by the solution

Summary

This post demonstrates the power of combining AWS services such as Amazon Transcribe and Amazon Bedrock with popular front-end frameworks such as React to create a robust knowledge capture solution. By using real-time transcription and generative AI, organizations can efficiently document and preserve valuable institutional knowledge, fostering innovation, improving decision-making, and maintaining a competitive edge in dynamic business environments.

We encourage you to explore this solution further by deploying it in your own environment and adapting it to your organization’s specific needs. The source code and detailed instructions are available in our genai-knowledge-capture-webapp GitHub repository, providing a solid foundation for your knowledge capture initiatives.

By embracing this innovative approach to knowledge capture, organizations can unlock the full potential of their collective wisdom, driving continuous improvement and maintaining their competitive edge.

About the Authors

Jundong Qiao is a Machine Learning Engineer at AWS Professional Service, where he specializes in implementing and enhancing AI/ML capabilities across various sectors. His expertise encompasses building next-generation AI solutions, including chatbots and predictive models that drive efficiency and innovation.

Michael Massey is a Cloud Application Architect at Amazon Web Services. He helps AWS customers achieve their goals by building highly-available and highly-scalable solutions on the AWS Cloud.

Praveen Kumar Jeyarajan is a Principal DevOps Consultant at AWS, supporting Enterprise customers and their journey to the cloud. He has 13+ years of DevOps experience and is skilled in solving myriad technical challenges using the latest technologies. He holds a Masters degree in Software Engineering. Outside of work, he enjoys watching movies and playing tennis.

Achieve multi-Region resiliency for your conversational AI chatbots with Amazon Lex

October 30, 2024

by Sanjeet Sanda Amazon AWS

Global Resiliency is a new Amazon Lex capability that enables near real-time replication of your Amazon Lex V2 bots in a second AWS Region. When you activate this feature, all resources, versions, and aliases associated after activation will be synchronized across the chosen Regions. With Global Resiliency, the replicated bot resources and aliases in the second Region will have the same identifiers as those in the source Region. This consistency allows you to seamlessly route traffic to any Region by simply changing the Region identifier, providing uninterrupted service availability. In the event of a Regional outage or disruption, you can swiftly redirect your bot traffic to a different Region. Applications now have the ability to use replicated Amazon Lex bots across Regions in an active-active or active-passive manner for improved availability and resiliency. With Global Resiliency, you no longer need to manually manage separate bots across Regions, because the feature automatically replicates and keeps Regional configurations in sync. With just a few clicks or commands, you gain robust Amazon Lex bot replication capabilities. Applications that are using Amazon Lex bots can now fail over from an impaired Region seamlessly, minimizing the risk of costly downtime and maintaining business continuity. This feature streamlines the process of maintaining robust and highly available conversational applications. These include interactive voice response (IVR) systems, chatbots for digital channels, and messaging platforms, providing a seamless and resilient customer experience.

In this post, we walk you through enabling Global Resiliency for a sample Amazon Lex V2 bot. We showcase the replication process of bot versions and aliases across multiple Regions. Additionally, we discuss how to handle integrations with AWS Lambda and Amazon CloudWatch after enabling Global Resiliency.

Solution overview

For this exercise, we create a BookHotel bot as our sample bot. We use an AWS CloudFormation template to build this bot, including defining intents, slots, and other required components such as a version and alias. Throughout our demonstration, we use the us-east-1 Region as the source Region, and we replicate the bot in the us-west-2 Region, which serves as the replica Region. We then replicate this bot, enable logging, and integrate it with a Lambda function.

To better understand the solution, refer to the following architecture diagram.

Enabling Global Resiliency for an Amazon Lex bot is straightforward using the AWS Management Console, AWS Command Line Interface (AWS CLI), or APIs. We walk through the instructions to replicate the bot later in this post.
After replication is successfully enabled, the bot will be replicated across Regions, providing a unified experience. This allows you to distribute IVR or chat application requests between Regions in either an active-active or active-passive setup, depending on your use case.
A key benefit of Global Resiliency is that developers can continuously work on bot improvements in the source Region, and changes are automatically synchronized to the replica Region. This streamlines the development workflow without compromising resiliency.

At the time of writing, Global Resiliency only works with predetermined pairs of Regions. For more information, see Use Global Resiliency to deploy bots to other Regions.

Prerequisites

You should have the following prerequisites:

An AWS account with administrator access
Access to Amazon Lex Global Resiliency (contact your Amazon Connect Solutions Architect or Technical Account Manager)
Working knowledge of the following services:
- AWS CloudFormation
- Amazon CloudWatch
- AWS Lambda
- Amazon Lex

Create a sample Amazon Lex bot

To set up a sample bot for our use case, refer to Manage your Amazon Lex bot via AWS CloudFormation templates. For this example, we create a bot named BookHotel in the source Region (us-east-1). Complete the following steps:

Download the CloudFormation template and deploy it in the source Region (us-east-1). For instructions, see Create a stack from the CloudFormation console.

Upon successful deployment, the BookHotel bot will be created in the source Region.

On the Amazon Lex console, choose Bots in the navigation pane and locate the BookHotel.

Verify that the Global Resiliency option is available under Deployment in the navigation pane. If this option isn’t visible, the Global Resiliency feature may not be enabled for your account. In this case, refer to the prerequisites section for enabling the Global Resiliency feature.

Our sample BookHotel bot has one version (Version 1, in addition to the draft version) and an alias named BookHotelDemoAlias (in addition to the TestBotAlias).

Enable Global Resiliency

To activate Global Resiliency and set up bot replication in a replica Region, complete the following steps:

On the Amazon Lex console, choose us-east-1 as your Region.
Choose Bots in the navigation pane and locate the BookHotel.
Under Deployment in the navigation pane, choose Global Resiliency.

You can see the replication details here. Because you haven’t enabled Global Resiliency yet, all the details are blank.

Choose Create replica to create a draft version of your bot.

In your source Region (us-east-1), after the bot replication is complete, you will see Replication status as Enabled.

Switch to the replica Region (us-west-2).

You can see that the BookHotel bot is replicated. This is a read-only replica and the bot ID in the replica Region matches the bot ID in the source Region.

Under Deployment in the navigation pane, choose Global Resiliency.

You can see the replication details here, which are the same as that in the source Region BookHotel bot.

You have verified that the bot is replicated successfully after Global Resiliency is enabled. Only new versions and aliases created from this point onward will be replicated. As a next step, we create a bot version and alias to demonstrate the replication.

Create a new bot version and alias

Complete the following steps to create a new bot version and alias:

On the Amazon Lex console in your source Region (us-east-1), navigate to the BookHotel.
Choose Bot versions in the navigation pane, and choose Create new version to create Version 2.

Version 2 now has Global Resiliency enabled, whereas Version 1 and the draft version do not, because they were created prior to enabling Global Resiliency.

Choose Aliases in the navigation pane, then choose Create new alias.
Create a new alias for the BookHotel bot called BookHotelDemoAlias_GR and point that to the new version.

Similarly, the BookHotelDemoAlias_GR now has Global Resiliency enabled, whereas aliases created before enabling Global Resiliency, such as BookHotelDemoAlias and TestBotAlias, don’t have Global Resiliency enabled.

Choose Global Resiliency in the navigation pane to view the source and replication details.

The details for Last replicated version are now updated to Version 2.

Switch to the replica Region (us-west-2) and choose Global Resiliency in the navigation pane.

You can see that the new Global Resiliency enabled version (Version 2) is replicated and the new alias BookHotelDemoAlias_GR is also present.

You have verified that the new version and alias were created after Global Resiliency is replicated to the replica Region. You can now make Amazon Lex runtime calls to both Regions.

Handling integrations with Lambda and CloudWatch after enabling Global Resiliency

Amazon Lex has integrations with other AWS services such as enabling custom logic with Lambda functions and logging with conversation logs using CloudWatch and Amazon Simple Storage Service (Amazon S3). In this section, we associate a Lambda function and CloudWatch group for the BookHotel bot in the source Region (us-east-1) and validate its association in the replica Region (us-west-2).

Download the CloudFormation template to deploy a sample Lambda and CloudWatch log group.
Deploy the CloudFormation stack to the source Region (us-east-1). For instructions, see Create a stack from the CloudFormation console.

This will deploy a Lambda function (book-hotel-lambda) and a CloudWatch log group (/lex/book-hotel-bot) in the us-east-1 Region.

Deploy the CloudFormation stack to the replica Region (us-west-2).

This will deploy a Lambda function (book-hotel-lambda) and a CloudWatch log group (/lex/book-hotel-bot) in the us-west-2 Region. The Lambda function name and CloudWatch log group name must be the same in both Regions.

On the Amazon Lex console in the source Region (us-east-1), navigate to the BookHotel.
Choose Aliases in the navigation pane, and choose the BookHotelDemoAlias_GR.
In the Languages section, choose English (US).
Select the book-hotel-lambda function and associate it with the BookHotel bot by choosing Save.
Navigate back to the BookHotelDemoAlias_GR alias, and in the Conversation logs section, choose Manage conversation logs.
Enable Text logs and select the /lex/book-hotel-bot log group, then choose Save.

Conversation text logs are now enabled for the BookHotel bot in us-east-1.

Switch to the replica Region (us-west-2) and navigate to the BookHotel.
Choose Aliases in the navigation pane, and choose the BookHotelDemoAlias_GR.

You can see that the conversation logs are already associated with the /lex/book-hotel-bot CloudWatch group the us-west-2 Region.

In the Languages section, choose English (US).

You can see that the book-hotel-lambda function is associated with the BookHotel alias.

Through this process, we have demonstrated how Lambda functions and CloudWatch log groups are automatically associated with the corresponding bot resources in the replica Region for the replicated bots, providing a seamless and consistent integration across both Regions.

Disabling Global Resiliency

You have the flexibility to disable Global Resiliency at any time. By disabling Global Resiliency, your source bot, along with its associated aliases and versions, will no longer be replicated across other Regions. In this section, we demonstrate the process to disable Global Resiliency.

On the Amazon Lex console in your source Region (us-east-1), choose Bots in the navigation pane and locate the BookHotel.
Under Deployment in the navigation pane, choose Global Resiliency.
Choose Disable Global Resiliency.

Enter confirm in the confirmation box and choose Delete.

This action initiates the deletion of the replicated BookHotel bot in the replica Region.

The replication status will change to Deleting, and after a few minutes, the deletion process will be complete. You will then see the Create replica option available again. If you don’t see it, try refreshing the page.

Check the Bot versions page of the BookHotel bot to confirm that Version 2 is still the latest version.
Check the Aliases page to confirm that the BookHotelDemoAlias_GR alias is still present on the source bot.

Applications referring to this alias can continue to function as normal in the source Region.

Switch to the replica Region (us-west-2) to confirm that the BookHotel bot has been deleted from this Region.

You can reenable Global Resiliency on the source Region (us-east-1) by going through the process described earlier in this post.

Clean up

To prevent incurring charges, complete the following steps to clean up the resources created during this demonstration:

Disable Global Resiliency for the bot by following the instructions detailed earlier in this post.
Delete the book-hotel-lambda-cw-stack CloudFormation stack from the us-west-2. For instructions, see Delete a stack on the CloudFormation console.
Delete the book-hotel-lambda-cw-stack CloudFormation stack from the us-east-1.
Delete the book-hotel-stack CloudFormation stack from the us-east-1.

Integrations with Amazon Connect

Amazon Lex Global Resiliency seamlessly complements Amazon Connect Global Resiliency, providing you with a comprehensive solution for maintaining business continuity and resilience across your conversational AI and contact center infrastructure. Amazon Connect Global Resiliency enables you to automatically maintain your instances synchronized across two Regions, making sure that all configuration resources, such as contact flows, queues, and agents, are true replicas of each other.

With the addition of Amazon Lex Global Resiliency, Amazon Connect customers gain the added benefit of automated synchronization of their Amazon Lex V2 bots associated with their contact flows. This integration provides a consistent and uninterrupted experience during failover scenarios, because your Amazon Lex interactions seamlessly transition between Regions without any disruption. By combining these complementary features, you can achieve end-to-end resilience. This minimizes the risk of downtime and makes sure your conversational AI and contact center operations remain highly available and responsive, even in the case of Regional failures or capacity constraints.

Global Resiliency APIs

Global Resiliency provides API support to create and manage replicas. These are supported in the AWS CLI and AWS SDKs. In this section, we demonstrate usage with the AWS CLI.

Create a bot replica in the replica Region using the CreateBotReplica.
Monitor the bot replication status using the DescribeBotReplica.
List the replicated bots using the ListBotReplicas.
List all the version replication statuses applicable for Global Resiliency using the ListBotVersionReplicas.

This list includes only the replicated bot versions, which were created after Global Resiliency was enabled. In the API response, a botVersionReplicationStatus of Available indicates that the bot version was replicated successfully.

List all the alias replication statuses applicable for Global Resiliency using the ListBotAliasReplicas.

This list includes only the replicated bot aliases, which were created after Global Resiliency was enabled. In the API response, a botAliasReplicationStatus of Available indicates that the bot alias was replicated successfully.

Conclusion

In this post, we introduced the Global Resiliency feature for Amazon Lex V2 bots. We discussed the process to enable Global Resiliency using the console and reviewed some of the new APIs released as part of this feature.

As the next step, you can explore Global Resiliency and apply the techniques discussed in this post to replicate bots and bot versions across Regions. This hands-on practice will solidify your understanding of managing and replicating Amazon Lex V2 bots in your solution architecture.

About the Authors

Priti Aryamane is a Specialty Consultant at AWS Professional Services. With over 15 years of experience in contact centers and telecommunications, Priti specializes in helping customers achieve their desired business outcomes with customer experience on AWS using Amazon Lex, Amazon Connect, and generative AI features.

Sanjeet Sanda is a Specialty Consultant at AWS Professional Services with over 20 years of experience in telecommunications, contact center technology, and customer experience. He specializes in designing and delivering customer-centric solutions with a focus on integrating and adapting existing enterprise call centers into Amazon Connect and Amazon Lex environments. Sanjeet is passionate about streamlining adoption processes by using automation wherever possible. Outside of work, Sanjeet enjoys hanging out with his family, having barbecues, and going to the beach.

Yogesh Khemka is a Senior Software Development Engineer at AWS, where he works on large language models and natural language processing. He focuses on building systems and tooling for scalable distributed deep learning training and real-time inference.

Create and fine-tune sentence transformers for enhanced classification accuracy

October 30, 2024

by Kara Yang Amazon AWS

Sentence transformers are powerful deep learning models that convert sentences into high-quality, fixed-length embeddings, capturing their semantic meaning. These embeddings are useful for various natural language processing (NLP) tasks such as text classification, clustering, semantic search, and information retrieval.

In this post, we showcase how to fine-tune a sentence transformer specifically for classifying an Amazon product into its product category (such as toys or sporting goods). We showcase two different sentence transformers, paraphrase-MiniLM-L6-v2 and a proprietary Amazon large language model (LLM) called M5_ASIN_SMALL_V2.0, and compare their results. M5 LLMS are BERT-based LLMs fine-tuned on internal Amazon product catalog data using product title, bullet points, description, and more. They are currently being used for use cases such as automated product classification and similar product recommendations. Our hypothesis is that M5_ASIN_SMALL_V2.0 will perform better for the use case of Amazon product category classification due to it being fine-tuned with Amazon product data. We prove this hypothesis in the following experiment illustrated in this post.

Solution overview

In this post, we demonstrate how to fine-tune a sentence transformer with Amazon product data and how to use the resulting sentence transformer to improve classification accuracy of product categories using an XGBoost decision tree. For this demonstration, we use a public Amazon product dataset called Amazon Product Dataset 2020 from a kaggle competition. This dataset contains the following attributes and fields:

Domain name – amazon.com
Date range – January 1, 2020, through January 31, 2020
File extension – CSV
Available fields – Uniq Id, Product Name, Brand Name, Asin, Category, Upc Ean Code, List Price, Selling Price, Quantity, Model Number, About Product, Product Specification, Technical Details, Shipping Weight, Product Dimensions, Image, Variants, SKU, Product Url, Stock, Product Details, Dimensions, Color, Ingredients, Direction To Use, Is Amazon Seller, Size Quantity Variant, and Product Description
Label field – Category

Prerequisites

Before you begin, install the following packages. You can do this in either an Amazon SageMaker notebook or your local Jupyter notebook by running the following commands:

!pip install sentencepiece --quiet
!pip install sentence_transformers --quiet
!pip install xgboost –-quiet
!pip install scikit-learn –-quiet/

Preprocess the data

The first step needed for fine-tuning a sentence transformer is to preprocess the Amazon product data for the sentence transformer to be able to consume the data and fine-tune effectively. It involves normalizing the text data, defining the product’s main category by extracting the first category from the Category field, and selecting the most important fields from the dataset that contribute to classifying the product’s main category accurately. We use the following code for preprocessing:

import pandas as pd
from sklearn.preprocessing import LabelEncoder

data = pd.read_csv('marketing_sample_for_amazon_com-ecommerce__20200101_20200131__10k_data.csv')
data.columns = data.columns.str.lower().str.replace(' ', '_')
data['main_category'] = data['category'].str.split("|").str[0]
data["all_text"] = data.apply(
    lambda r: " ".join(
        [
            str(r["product_name"]) if pd.notnull(r["product_name"]) else "",
            str(r["about_product"]) if pd.notnull(r["about_product"]) else "",
            str(r["product_specification"]) if pd.notnull(r["product_specification"]) else "",
            str(r["technical_details"]) if pd.notnull(r["technical_details"]) else ""
        ]
    ),
    axis=1
)
label_encoder = LabelEncoder()
labels_transform = label_encoder.fit_transform(data['main_category'])
data['label']=labels_transform
data[['all_text','label']]

The following screenshot shows an example of what our dataset looks like after it has been preprocessed.

Fine-tune the sentence transformer paraphrase-MiniLM-L6-v2

The first sentence transformer we fine-tune is called paraphrase-MiniLM-L6-v2. It uses the popular BERT model as its underlying architecture to transform product description text into a 384-dimensional dense vector embedding that will be consumed by our XGBoost classifier for product category classification. We use the following code to fine-tune paraphrase-MiniLM-L6-v2 using the preprocessed Amazon product data:

from sentence_transformers import SentenceTransformer
model_name='paraphrase-MiniLM-L6-v2'
model = SentenceTransformer(model_name)

The first step is to define a classification head that represents the 24 product categories that an Amazon product can be classified into. This classification head will be used to train the sentence transformer specifically to be more effective at transforming product descriptions according to the 24 product categories. The idea is that all product descriptions that are within the same category should be transformed into a vector embedding that is closer in distance compared to product descriptions that belong in different categories.

The following code is for fine-tuning sentence transformer 1:

import torch.nn as nn

# Define classification head
class ClassificationHead(nn.Module):
    def __init__(self, embedding_dim, num_classes):
        super(ClassificationHead, self).__init__()
        self.linear = nn.Linear(embedding_dim, num_classes)

    def forward(self, features):
        x = features['sentence_embedding']
        x = self.linear(x)
        return x

# Define the number of classes for a classification task.
num_classes = 24
print('class number:', num_classes)
classification_head = ClassificationHead(model.get_sentence_embedding_dimension(), num_classes)

# Combine SentenceTransformer model and classification head."
class SentenceTransformerWithHead(nn.Module):
    def __init__(self, transformer, head):
        super(SentenceTransformerWithHead, self).__init__()
        self.transformer = transformer
        self.head = head

    def forward(self, input):
        features = self.transformer(input)
        logits = self.head(features)
        return logits

model_with_head = SentenceTransformerWithHead(model, classification_head)

We then set the fine-tuning parameters. For this post, we train on five epochs, optimize for cross-entropy loss, and use the AdamW optimization method. We chose epoch 5 because, after testing various epoch values, we observed that the loss minimized at epoch 5. This made it the optimal number of training iterations for achieving the best classification results.

The following code is for fine-tuning sentence transformer 2:

import os
os.environ["TORCH_USE_CUDA_DSA"] = "1"
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

from sentence_transformers import SentenceTransformer, InputExample, LoggingHandler
import torch
from torch.utils.data import DataLoader
from transformers import AdamW, get_linear_schedule_with_warmup

train_sentences = data['all_text']
train_labels = data['label']
# training parameters
num_epochs = 5
batch_size = 2
learning_rate = 2e-5

# Convert the dataset to PyTorch tensors.
train_examples = [InputExample(texts=[s], label=l) for s, l in zip(train_sentences, train_labels)]

# Customize collate_fn to convert InputExample objects into tensors.
def collate_fn(batch):
    texts = [example.texts[0] for example in batch]
    labels = torch.tensor([example.label for example in batch])
    return texts, labels

train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=batch_size, collate_fn=collate_fn)

# Define the loss function, optimizer, and learning rate scheduler.
criterion = nn.CrossEntropyLoss()
optimizer = AdamW(model_with_head.parameters(), lr=learning_rate)
total_steps = len(train_dataloader) * num_epochs
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=total_steps)

# Training loop
loss_list=[]
for epoch in range(num_epochs):
    model_with_head.train()
    for step, (texts, labels) in enumerate(train_dataloader):
        labels = labels.to(model.device)
        optimizer.zero_grad()

        # Encode text and pass through classification head.
        inputs = model.tokenize(texts)
        input_ids = inputs['input_ids'].to(model.device)
        input_attention_mask = inputs['attention_mask'].to(model.device)
        inputs_final = {'input_ids': input_ids, 'attention_mask': input_attention_mask}
        
        # move model_with_head to the same device
        model_with_head = model_with_head.to(model.device)
        logits = model_with_head(inputs_final)
        
        loss = criterion(logits, labels)
        loss.backward()
        optimizer.step()
        scheduler.step()
        if step % 100 == 0:
            print(f"Epoch {epoch}, Step {step}, Loss: {loss.item()}")

    print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}')
    model_save_path = f'./intermediate-output/epoch-{epoch}'
    model.save(model_save_path)
    loss_list.append(loss.item())
# Save the final model
model_final_save_path='st_ft_epoch_5'
model.save(model_final_save_path)

To observe whether our resulting fine-tuned sentence transformer improves our product category classification accuracy, we use it as our text embedder in the XGBoost classifier in the next step.

XGBoost classification

XGBoost (Extreme Gradient Boosting) classification is a machine learning technique used for classification tasks. It’s an implementation of the gradient boosting framework designed to be efficient, flexible, and portable. For this post, we have XGBoost consume the product description text embedding output of our sentence transformers and observe product category classification accuracy. We use the following code to use the standard paraphrase-MiniLM-L6-v2 sentence transformer before it was fine-tuned to classify Amazon products to their respective categories:

from sklearn.model_selection import train_test_split
import xgboost as xgb
from sklearn.metrics import accuracy_score

model = SentenceTransformer('paraphrase-MiniLM-L6-v2')  
data['text_embedding'] = data['all_text'].apply(lambda x: model.encode(str(x)))
text_embeddings = pd.DataFrame(data['text_embedding'].tolist(), index=data.index, dtype=float)

# Convert numeric columns stored as strings to floats
numeric_columns = ['selling_price', 'shipping_weight', 'product_dimensions']  # Add more columns as needed
for col in numeric_columns:
    data[col] = pd.to_numeric(data[col], errors='coerce')

# Convert categorical columns to category type
categorical_columns = ['model_number', 'is_amazon_seller']  # Add more columns as needed
for col in categorical_columns:
    data[col] = data[col].astype('category')
    
X_0 = data[['selling_price','model_number','is_amazon_seller']]
X = pd.concat([X_0, text_embeddings], axis=1)
label_encoder = LabelEncoder()
data['main_category_encoded'] = label_encoder.fit_transform(data['main_category'])
y = data['main_category_encoded']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Re-encode the labels to ensure they are consecutive integers starting from 0
unique_labels = sorted(set(y_train) | set(y_test))
label_mapping = {label: idx for idx, label in enumerate(unique_labels)}

y_train = y_train.map(label_mapping)
y_test = y_test.map(label_mapping)

# Enable categorical support for XGBoost
dtrain = xgb.DMatrix(X_train, label=y_train, enable_categorical=True)
dtest = xgb.DMatrix(X_test, label=y_test, enable_categorical=True)

param = {
    'max_depth': 6,
    'eta': 0.3,
    'objective': 'multi:softmax',
    'num_class': len(label_mapping),
    'eval_metric': 'mlogloss'
}

num_round = 100
bst = xgb.train(param, dtrain, num_round)

# Evaluate the model
y_pred = bst.predict(dtest)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

Accuracy: 0.78

We observe a 78% accuracy using the stock paraphrase-MiniLM-L6-v2 sentence transformer. To observe the results of the fine-tuned paraphrase-MiniLM-L6-v2 sentence transformer, we need to update the beginning of the code as follows. All other code remains the same.

model = SentenceTransformer('st_ft_epoch_5')  
data['text_embedding_miniLM_ft10'] = data['all_text'].apply(lambda x: model.encode(str(x)))
text_embeddings = pd.DataFrame(data['text_embedding_finetuned'].tolist(), index=data.index, dtype=float)
X_pa_finetuned = pd.concat([X_0, text_embeddings], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X_pa_finetuned, y, test_size=0.2, random_state=42)

# Re-encode the labels to ensure they are consecutive integers starting from 0
unique_labels = sorted(set(y_train) | set(y_test))
label_mapping = {label: idx for idx, label in enumerate(unique_labels)}

y_train = y_train.map(label_mapping)
y_test = y_test.map(label_mapping)

# Build and train the XGBoost model
# Enable categorical support for XGBoost
dtrain = xgb.DMatrix(X_train, label=y_train, enable_categorical=True)
dtest = xgb.DMatrix(X_test, label=y_test, enable_categorical=True)

param = {
    'max_depth': 6,
    'eta': 0.3,
    'objective': 'multi:softmax',
    'num_class': len(label_mapping),
    'eval_metric': 'mlogloss'
}

num_round = 100
bst = xgb.train(param, dtrain, num_round)

y_pred = bst.predict(dtest)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Optionally, convert the predicted labels back to the original category labels
inverse_label_mapping = {idx: label for label, idx in label_mapping.items()}
y_pred_labels = pd.Series(y_pred).map(inverse_label_mapping)

Accuracy: 0.94

With the fine-tuned paraphrase-MiniLM-L6-v2 sentence transformer, we observe a 94% accuracy, a 16% increase from the baseline of 78% accuracy. From this observation, we conclude that fine-tuning paraphrase-MiniLM-L6-v2 is effective for classifying Amazon product data into product categories.

Fine-tune the sentence transformer M5_ASIN_SMALL_V20

Now we create a sentence transformer from a BERT-based model called M5_ASIN_SMALL_V2.0. It’s a 40-million-parameter BERT-based model trained at M5, an internal team at Amazon specializing in fine-tuning LLMs using Amazon product data. It was distilled from a larger teacher model (approximately 5 billion parameters), which was pre-trained on a large amount of unlabeled ASIN data and pre-fine-tuned on a set of Amazon supervised learning tasks (multi-task pre-fine-tuning). It is a multi-task, multi-lingual, multi-locale, and multi-modal BERT-based encoder-only model trained on text and structured data input. Its neural network architectural details are as follows:

Model backbone:
Hidden size: 384
Number of hidden layers: 24
Number of attention heads: 16
Intermediate size: 1536
Vocabulary size: 256,035
Number of backbone parameters: 42,587,904
Number of word embedding parameters (bert.embedding.*): 98,517,504
Total number of parameters: 141,259,023

Because M5_ASIN_SMALL_V20 was pre-trained on Amazon product data specifically, we hypothesize that building a sentence transformer from it will increase the accuracy of product category classification. We complete the following steps to build a sentence transformer from M5_ASIN_SMALL_V20, fine-tune it, and input it into an XGBoost classifier to observe accuracy impact:

Load a pre-trained M5 model that you want to use as the base encoder.
Use the M5 model within the SentenceTransformer framework to create a sentence transformer.
Add a pooling layer to create fixed-size sentence embeddings from the variable-length output of the BERT model.
Combine the M5 model and pooling layer into a single model.
Fine-tune the model on a relevant dataset.

See the following code for Steps 1–3:

from sentence_transformers import models 
from transformers import AutoTokenizer

# Step 1: Load Pre-trained M5 Model
model_path = 'M5_ASIN_SMALL_V20'  # or your custom model path
transformer_model = models.Transformer(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Step 2: Define Pooling Layer
pooling_model = models.Pooling(transformer_model.get_word_embedding_dimension(),
                               pooling_mode_mean_tokens=True)

# Step 3: Create SentenceTransformer Model
model_mean_m5_base = SentenceTransformer(modules=[transformer_model, pooling_model])

The rest of the code remains the same as fine-tuning for the paraphrase-MiniLM-L6-v2 sentence transformer, except that we use the fine-tuned M5 sentence transformer instead to create embeddings for the texts in the dataset:

loaded_model = SentenceTransformer('m5_ft_epoch_5_mean')
data['text_embedding_m5'] = data['all_text'].apply(lambda x: loaded_model.encode(str(x)))

Result

We observe similar results to paraphrase-MiniLM-L6-v2 when looking at accuracy before fine-tuning, observing a 78% accuracy for M5_ASIN_SMALL_V20. However, we observe that the fine-tuned M5_ASIN_SMALL_V20 sentence transformer performs better than the fine-tuned paraphrase-MiniLM-L6-v2. Its accuracy is 98%, compared to 94% for the fine-tuned paraphrase-MiniLM-L6-v2. We fine-tuned the sentence transformers for 5 epochs, because experiments showed this was the optimal number to minimize loss. The following graph summarizes our observations of accuracy improvement with fine-tuning for 5 epochs in a single comparison chart.

Clean up

We recommend using GPUs to fine-tune the sentence transformers, for example, ml.g5.4xlarge or ml.g4dn.16xlarge. Be sure to clean up resources to avoid incurring additional costs.

If you’re using a SageMaker notebook instance, refer to Clean up Amazon SageMaker notebook instance resources. If you’re using Amazon SageMaker Studio, refer to Delete or stop your Studio running instances, applications, and spaces.

Conclusion

In this post, we explored sentence transformers and how to use them effectively for text classification tasks. We dived deep into the sentence transformer paraphrase-MiniLM-L6-v2, demonstrated how to use a BERT-based model like M5_ASIN_SMALL_V20 to create a sentence transformer, showed how to fine-tune sentence transformers, and showed the accuracy effects of fine-tuning sentence transformers.

Fine-tuning sentence transformers has proven to be highly effective for classifying product descriptions into categories, significantly enhancing prediction accuracy. As a next step, we encourage you to explore different sentence transformers from Hugging Face.

Lastly, if you want to explore M5, note that it is proprietary to Amazon and you can only access it as an Amazon partner or customer as of the time of this publication. Connect with your Amazon point of contact if you’re an Amazon partner or customer wanting to use M5, and they will guide you through M5’s offerings and how it can be used for your use case.

About the Authors

Kara Yang is a Data Scientist at AWS Professional Services in the San Francisco Bay Area, with extensive experience in AI/ML. She specializes in leveraging cloud computing, machine learning, and Generative AI to help customers address complex business challenges across various industries. Kara is passionate about innovation and continuous learning.

Farshad Harirchi is a Principal Data Scientist at AWS Professional Services. He helps customers across industries, from retail to industrial and financial services, with the design and development of generative AI and machine learning solutions. Farshad brings extensive experience in the entire machine learning and MLOps stack. Outside of work, he enjoys traveling, playing outdoor sports, and exploring board games.

James Poquiz is a Data Scientist with AWS Professional Services based in Orange County, California. He has a BS in Computer Science from the University of California, Irvine and has several years of experience working in the data domain having played many different roles. Today he works on implementing and deploying scalable ML solutions to achieve business outcomes for AWS clients.

Empower your generative AI application with a comprehensive custom observability solution

October 29, 2024

by Ishan Singh Amazon AWS

Recently, we’ve been witnessing the rapid development and evolution of generative AI applications, with observability and evaluation emerging as critical aspects for developers, data scientists, and stakeholders. Observability refers to the ability to understand the internal state and behavior of a system by analyzing its outputs, logs, and metrics. Evaluation, on the other hand, involves assessing the quality and relevance of the generated outputs, enabling continual improvement.

Comprehensive observability and evaluation are essential for troubleshooting, identifying bottlenecks, optimizing applications, and providing relevant, high-quality responses. Observability empowers you to proactively monitor and analyze your generative AI applications, and evaluation helps you collect feedback, refine models, and enhance output quality.

In the context of Amazon Bedrock, observability and evaluation become even more crucial. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. As the complexity and scale of these applications grow, providing comprehensive observability and robust evaluation mechanisms are essential for maintaining high performance, quality, and user satisfaction.

We have built a custom observability solution that Amazon Bedrock users can quickly implement using just a few key building blocks and existing logs using FMs, Amazon Bedrock Knowledge Bases, Amazon Bedrock Guardrails, and Amazon Bedrock Agents. This solution uses decorators in your application code to capture and log metadata such as input prompts, output results, run time, and custom metadata, offering enhanced security, ease of use, flexibility, and integration with native AWS services.

Notably, the solution supports comprehensive Retrieval Augmented Generation (RAG) evaluation so you can assess the quality and relevance of generated responses, identify areas for improvement, and refine the knowledge base or model accordingly.

In this post, we set up the custom solution for observability and evaluation of Amazon Bedrock applications. Through code examples and step-by-step guidance, we demonstrate how you can seamlessly integrate this solution into your Amazon Bedrock application, unlocking a new level of visibility, control, and continual improvement for your generative AI applications.

By the end of this post, you will:

Understand the importance of observability and evaluation in generative AI applications
Learn about the key features and benefits of this solution
Gain hands-on experience in implementing the solution through step-by-step demonstrations
Explore best practices for integrating observability and evaluation into your Amazon Bedrock workflows

Prerequisites

To implement the observability solution discussed in this post, you need the following prerequisites:

An active Amazon Web Services (AWS) account and AWS Identity and Access Management (IAM) role with Amazon Bedrock access
Access to the FMs you plan to use
Basic understanding of decorators in your preferred programming language (Python or Node.js)
A clone of the amazon-bedrock-samples GitHub repository
Basic familiarity with AWS services such as Amazon Data Firehose, Amazon Athena, and AWS Glue crawlers (optional, depending on the specific components used in the solution)

Solution overview

The observability solution for Amazon Bedrock empowers users to track and analyze interactions with FMs, knowledge bases, guardrails, and agents using decorators in their source code. Key highlights of the solution include:

Decorator – Decorators are applied to functions invoking Amazon Bedrock APIs, capturing input prompt, output results, custom metadata, custom metrics, and latency related metrics.
Flexible logging –You can use this solution to store logs either locally or in Amazon Simple Storage Service (Amazon S3) using Amazon Data Firehose, enabling integration with existing monitoring infrastructure. Additionally, you can choose what gets logged.
Dynamic data partitioning – The solution enables dynamic partitioning of observability data based on different workflows or components of your application, such as prompt preparation, data preprocessing, feedback collection, and inference. This feature allows you to separate data into logical partitions, making it easier to analyze and process data later.
Security – The solution uses AWS services and adheres to AWS Cloud Security best practices so your data remains within your AWS account.
Cost optimization – This solution uses serverless technologies, making it cost-effective for the observability infrastructure. However, some components may incur additional usage-based costs.
Multiple programming language support – The GitHub repository provides the observability solution in both Python and Node.js versions, catering to different programming preferences.

Here’s a high-level overview of the observability solution architecture:

The following steps explain how the solution works:

Application code using Amazon Bedrock is decorated with @bedrock_logs.watch to save the log
Logged data streams through Amazon Data Firehose
AWS Lambda transforms the data and applies dynamic partitioning based on call_type variable
Amazon S3 stores the data securely
Optional components for advanced analytics
AWS Glue creates tables from S3 data
Amazon Athena enables data querying
Visualize logs and insights in your favorite dashboard tool

This architecture provides comprehensive logging, efficient data processing, and powerful analytics capabilities for your Amazon Bedrock applications.

Getting started

To help you get started with the observability solution, we have provided example notebooks in the attached GitHub repository, covering knowledge bases, evaluation, and agents for Amazon Bedrock. These notebooks demonstrate how to integrate the solution into your Amazon Bedrock application and showcase various use cases and features including feedback collected from users or quality assurance (QA) teams.

The repository contains well-documented notebooks that cover topics such as:

Setting up the observability infrastructure
Integrating the decorator pattern into your application code
Logging model inputs, outputs, and custom metadata
Collecting and analyzing feedback data
Evaluating model responses and knowledge base performance
Example visualization for observability data using AWS services

To get started with the example notebooks, follow these steps:

Clone the GitHub repository

git clone https://github.com/aws-samples/amazon-bedrock-samples.git

Navigate to the observability solution directory

cd amazon-bedrock-samples/evaluation-observe/Custom-Observability-Solution

Follow the instructions in the README file to set up the required AWS resources and configure the solution
Open the provided Jupyter notebooks and follow along with the examples and demonstrations

These notebooks provide a hands-on learning experience and serve as a starting point for integrating our solution into your generative AI applications. Feel free to explore, modify, and adapt the code examples to suit your specific requirements.

Key features

The solution offers a range of powerful features to streamline observability and evaluation for your generative AI applications on Amazon Bedrock:

Decorator-based implementation – Use decorators to seamlessly integrate observability logging into your application functions, capturing inputs, outputs, and metadata without modifying the core logic
Selective logging – Choose what to log by selectively capturing function inputs, outputs, or excluding sensitive information or large data structures that might not be relevant for observability
Logical data partitioning – Create logical partitions in the observability data based on different workflows or application components, enabling easier analysis and processing of specific data subsets
Human-in-the-loop evaluation – Collect and associate human feedback with specific model responses or sessions, facilitating comprehensive evaluation and continual improvement of your application’s performance and output quality
Multi-component support – Support observability and evaluation for various Amazon Bedrock components, including InvokeModel, batch inference, knowledge bases, agents, and guardrails, providing a unified solution for your generative AI applications
Comprehensive evaluation – Evaluate the quality and relevance of generated responses, including RAG evaluation for knowledge base applications, using the open source RAGAS library to compute evaluation metrics

This concise list highlights the key features you can use to gain insights, optimize performance, and drive continual improvement for your generative AI applications on Amazon Bedrock. For a detailed breakdown of the features and implementation specifics, refer to the comprehensive documentation in the GitHub repository.

Implementation and best practices

The solution is designed to be modular and flexible so you can customize it according to your specific requirements. Although the implementation is straightforward, following best practices is crucial for the scalability, security, and maintainability of your observability infrastructure.

Solution deployment

This solution includes an AWS CloudFormation template that streamlines the deployment of required AWS resources, providing consistent and repeatable deployments across environments. The CloudFormation template provisions resources such as Amazon Data Firehose delivery streams, AWS Lambda functions, Amazon S3 buckets, and AWS Glue crawlers and databases.

Decorator pattern

The solution uses the decorator pattern to integrate observability logging into your application functions seamlessly. The @bedrock_logs.watch decorator wraps your functions, automatically logging inputs, outputs, and metadata to Amazon Kinesis Firehose. Here’s an example of how to use the decorator:

# import observability
from observability import BedrockLogs

# instantiate BedrockLogs in Firehose mode
bedrock_logs = BedrockLogs(delivery_stream_name='your-firehose-delivery-stream', feedback_variables=True)

# decorate your function
@bedrock_logs.watch(capture_input=True, capture_output=True, call_type='<your-custom-dataset-name>')
def your_function(arg1, arg2):
    # Your function code here along with any custom metric of your choosing
    return output

Human-in-the-loop evaluation

The solution supports human-in-the-loop evaluation so you can incorporate human feedback into the performance evaluation of your generative AI application. You can involve end users, experts, or QA teams in the evaluation process, providing insights to enhance output quality and relevance. Here’s an example of how you can implement human-in-the-loop evaluation:

@bedrock_logs.watch(call_type='Retrieve-and-Generate-with-KB')
def main(input_arguments):
    # Your code to interact with Amazon Bedrock Knowledge Base or Agent
    return response, custom_metric, etc.

@bedrock_logs.watch(call_type='observation-feedback')
def observation_level_feedback(feedback):
    pass

# Invoke main function with user input and get run_id and observation_id
tuple_of_function_outputs, run_id, observation_id = main(input_arguments)

# Collect human feedback on model response in your application
user_feedback = 'thumbs-up'

observation_feedback_from_front_end = {
    'user_id': 'User-1',
    'f_run_id': run_id,
    'f_observation_id': observation_id,
    'actual_feedback': user_feedback
}

# Log the human-in-loop feedback using observation_level_feedback function
observation_level_feedback(observation_feedback_from_front_end)

By using the run_id and observation_id generated, you can associate human feedback with specific model responses or sessions. This feedback can then be analyzed and used to refine the knowledge base, fine-tune models, or identify areas for improvement.

Best practices

It’s recommended to follow these best practices:

Plan call types in advance – Determine the logical partitions (call_type) for your observability data based on different workflows or application components. This enables easier analysis and processing of specific data subsets.
Use feedback variables – Configure feedback_variables=True when initializing BedrockLogs to generate run_id and observation_id. These IDs can be used to join logically partitioned datasets, associating feedback data with corresponding model responses.
Extend for general steps – Although the solution is designed for Amazon Bedrock, you can use the decorator pattern to log observability data for general steps such as prompt preparation, postprocessing, or other custom workflows.
Log custom metrics – If you need to calculate custom metrics such as latency, context relevance, faithfulness, or any other metric, you can pass these values in the response of your decorated function, and the solution will log them alongside the observability data.
Selective logging – Use the capture_input and capture_output parameters to selectively log function inputs or outputs or exclude sensitive information or large data structures that might not be relevant for observability.
Comprehensive evaluation – Evaluate the quality and relevance of generated responses, including RAG evaluation for knowledge base applications, using the KnowledgeBasesEvaluations

By following these best practices and using the features of the solution, you can set up comprehensive observability and evaluation for your generative AI applications to gain valuable insights, identify areas for improvement, and enhance the overall user experience.

In the next post in this three-part series, we dive deeper into observability and evaluation for RAG and agent-based generative AI applications, providing in-depth insights and guidance.

Clean up

To avoid incurring costs and maintain a clean AWS account, you can remove the associated resources by deleting the AWS CloudFormation stack you created for this walkthrough. You can follow the steps provided in the Deleting a stack on the AWS CloudFormation console documentation to delete the resources created for this solution.

Conclusion and next steps

This comprehensive solution empowers you to seamlessly integrate comprehensive observability into your generative AI applications in Amazon Bedrock. Key benefits include streamlined integration, selective logging, custom metadata tracking, and comprehensive evaluation capabilities, including RAG evaluation. Use AWS services such as Athena to analyze observability data, drive continual improvement, and connect with your favorite dashboard tool to visualize the data.

This post focused is on Amazon Bedrock, but it can be extended to broader machine learning operations (MLOps) workflows or integrated with other AWS services such as AWS Lambda or Amazon SageMaker. We encourage you to explore this solution and integrate it into your workflows. Access the source code and documentation in our GitHub repository and start your integration journey. Embrace the power of observability and unlock new heights for your generative AI applications.

About the authors

Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.

Chris Pecora is a Generative AI Data Scientist at Amazon Web Services. He is passionate about building innovative products and solutions while also focused on customer-obsessed science. When not running experiments and keeping up with the latest developments in generative AI, he loves spending time with his kids.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.