Amazon AWS – Page 51

Unearth insights from audio transcripts generated by Amazon Transcribe using Amazon Bedrock

November 6, 2024

by Ana Echeverri Amazon AWS

Generative AI continues to push the boundaries of what’s possible. One area garnering significant attention is the use of generative AI to analyze audio and video transcripts, increasing our ability to extract valuable insights from content stored in audio or video files. Speech data is unique and complex, which makes it difficult to analyze and extract insights. Manually transcribing and analyzing it can be time-consuming and resource-intensive.

Existing methods for extracting insights from speech data often require tedious human transcription and review. You can use automatic voice recognition tools to convert your audio and video data to text. However, you still have to rely on manual processes for extracting specific insights and data points, or get summaries of the content. This approach is time-consuming and as organizations amass vast amounts of this content, the need for a more efficient and insightful solution becomes increasingly pressing. There is a significant opportunity to add business value given the amount of data organizations store in these formats and the valuable insights that might otherwise go undiscovered. The following are some of the new insights and capabilities that can be obtained through the use of large language models (LLM) with audio transcripts:

LLMs can analyze and understand the context of a conversation, not just the words spoken, but also the implied meaning, intent, and emotions. Previously, this would have required extensive human interpretation.
LLMs can perform advanced sentiment analysis. Previously, sentiment analysis could be captured, but LLMs can capture more emotions, such as sarcasm, ambivalence, or mixed feelings by understanding the context of the conversation.
LLMs can generate concise summarizations not just by extracting content, but by understanding the context of the conversation.
Users can now ask complex, natural language questions and receive insightful answers.
LLMs can infer personas or roles in a conversation, enabling targeted insights and actions.
LLMs can support the creation of new content based on audio assets or conversations following predetermined templates or flows.

In this post, we examine how to create business value through speech analytics with some examples focused on the following:

Automatically summarizing, categorizing, and analyzing marketing content such as podcasts, recorded interviews, or videos, and creating new marketing materials based on those assets
Automatically extracting key points, summaries, and sentiment from a recorded meeting (such as an earnings call)
Transcribing and analyzing contact center calls to improve customer experience.

The first step in getting these audio data insights involves transcribing the audio file using Amazon Transcribe. Amazon Transcribe is a machine learning (ML) based managed service that automatically converts speech to text, enabling developers to seamlessly integrate speech-to-text capabilities into their applications. It also recognizes multiple speakers, automatically redacts personally identifiable information (PII), and allows you to enhance the accuracy of a transcription by providing custom vocabularies specific to your industries or use case, or by using custom language models.

The second step involves using foundation models (FMs) with Amazon Bedrock to summarize the content, identify topics, and recognize conclusions, extracting valuable insights that can guide strategic decisions and innovations. Automatic generation of new content also adds value, increasing creativity and productivity.

Generative AI is reshaping the way we analyze audio transcripts, enabling you to unlock insights such as customer sentiment, pain points, common themes, avenues for risk mitigation, and more, that were previously obfuscated.

Use case overview

In this post, we discuss three example use cases in detail. The code artifacts are in Python. We used a Jupyter notebook to run the code snippets. You can follow along by creating and running a notebook in Amazon SageMaker Studio.

Audio summarization and insights, and automated generation of new content using Amazon Transcribe and Amazon Bedrock

Through this use case, we demonstrate how to take an existing marketing asset (a video) and create a new blog post to announce the launch of the video, create an abstract, and extract the main topics and the search engine optimization (SEO) keywords present in the post for documenting and categorizing the asset.

Transcribe audio with Amazon Transcribe

In this case, we use an AWS re:Invent 2023 technical talk as a sample. For the purpose of this notebook, we downloaded the MP4 file for the recording and stored it in an Amazon Simple Storage Service (Amazon S3) bucket.

The first step is to transcribe the audio file using Amazon Transcribe:

# Create a Amazon Transcribe transcirption job by specifying the audio/video file's S3 location
import boto3
import time
import random
transcribe = boto3.client('transcribe')
response = transcribe.start_transcription_job(
    TranscriptionJobName=f"podcast-transcription-{int(time.time())}_{random.randint(1000, 9999)}",
    LanguageCode='en-US',
    MediaFormat='mp3',
    Media={
        'MediaFileUri': '<S3 URI of the media file>'
    },
    OutputBucketName='<name of the S3 bucket that will store the output>',
    OutputKey='transcribe_bedrock_blog/output-files/',
    Settings={
        'ShowSpeakerLabels': True,
        'MaxSpeakerLabels': 3
    }
)
max_tries = 60
while max_tries > 0:
    max_tries -= 1
    job = transcribe.get_transcription_job(TranscriptionJobName=response['TranscriptionJob']['TranscriptionJobName'])
    job_status = job["TranscriptionJob"]["TranscriptionJobStatus"]
    if job_status in ["COMPLETED", "FAILED"]:
        if job_status == "COMPLETED":
            print(
                f"Download the transcript fromn"
                f"t{job['TranscriptionJob']['Transcript']['TranscriptFileUri']}."
            )
        break
    else:
        print(f"Waiting for {response['TranscriptionJob']['TranscriptionJobName']}. Current status is {job_status}.")
    time.sleep(10)

The transcription job will take a few minutes to complete.

When the job is complete, you can inspect the transcription output and check the plain text transcript that was generated (the following has been trimmed for brevity):

# Get the Transcribe Output JSON file
s3 = boto3.client('s3')
output_bucket = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/',2)[1]
output_file_key = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/',2)[2]
s3_response_object = s3.get_object(Bucket=output_bucket, Key=output_file_key)
object_content = s3_response_object['Body'].read()

transcription_output = json.loads(object_content)

# Let's see what we have inside the job output JSON
print(transcription_output['results']['transcripts'][0]['transcript'])

……….Once the alert comes, how do you kind of correlate these alerts, not by just text signals and text passing, but understanding the topology, the infrastructure topology that is supporting that application or that business service. It is the topology that ultimately gives the source of truth when an alert comes, right. That’s what we mean by a correlation that is assisted with topology in in in the thing that ultimately results in finding a probable root cause. And once ……….

After you have validated the existence of the text, you can use Amazon Bedrock to analyze the output:

# Now let's use this transcript to extract insights with a help of a Large Language Model on Amazon Bedrock
# First let's initialize the bedrock runtime client to invoke the model. 
bedrock_runtime = boto3.client('bedrock-runtime')
# Selecting Claude 3 Sonnet
model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'

Using the transcription from the technical talk, we use Amazon Bedrock to call an FM (we use Anthropic’s Claude 3 Sonnet on Amazon Bedrock in this case). You can choose from the language models available on Amazon Bedrock from AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon.

You can now perform additional tasks.

Extract the main topics with Amazon Bedrock

The following prompt provides instructions to ask the LLM for the main topics in the technical talk:

# Extracting the main topics with Amazon Bedrock
main_topics_prompt = """Based on the contents of <transcript></transcript>, what are the main topics being discussed? Display the topics as a list.

<transcript>
{transcript}
</transcript>
"""

user_message = {"role": "user", "content": main_topics_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1000,
"messages": messages
}
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
main_topics = response_body['content'][0]['text']
print(main_topics)

We have created a prompt that uses prompting best practices for Anthropic’s Claude. In this case, we pass the transcript within the <transcript></transcript> XML tags and ask for main topics discussed. Based on that, we get the following output:

Based on the contents of the transcript, the main topics being discussed are:

AI and Machine Learning in IT Operations

Challenges faced by IT Operations teams

Observable data explosion from cloud, on-premise, and edge infrastructure

The convergence of IT, SRE, and DevOps roles

Discovery to Resolution pipeline for IT Operations

Alert correlation and intelligent alerting using AI/ML

Probable root cause analysis with AI/ML

Automated remediation and runbook automation

Predictive analytics and preventive maintenance using AI/ML

Benefits of AI/ML for IT Operations (proactive operations, reduced human effort, improved service health, cost optimization)

Generate a blog post to announce the video with Amazon Bedrock

The following prompt provides instructions to ask the LLM for the generation of a new blog post announcing the launch of the preceding recording. Just like the preceding prompt, we use prompting best practices for Anthropic’s Claude:

# Generating a blog post to announce the Podcast with Amazon Bedrock
generate_blog_prompt = """The contents of <transcript></transcript> are a transcription of a technical talk video recording. Based on this information write a blogpost that summarizes the technical talk and announces that the recording of this talk is now available for public consumption. Limit the blogpost to 1500 words. Do not include any information that is not mentioned in <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": generate_blog_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
generate_blog = response_body['content'][0]['text']
print(generate_blog)

Based on that, we get the following output:

Here is a draft 1462 word blogpost summarizing the technical talk and announcing the recording is now available:

Title: Revolutionizing IT Operations with AI/ML: A Future-Ready Approach

The world of IT operations is undergoing a seismic shift, driven by the explosion of observable data, the convergence of IT roles, and the advent of hybrid multi-cloud environments. Traditional reactive approaches are no longer sufficient to keep pace with the complexity and scale of modern IT systems. This is where AI and machine learning (ML) come into play, offering a future-ready approach to revolutionize IT operations.

In a recent technical talk, Verma from HPE OpsRamp outlined the drivers behind this AI/ML imperative and demonstrated how this cutting-edge technology is fundamentally transforming IT operations. The recording of this insightful session is now available for public consumption.

The Drivers Behind the AI/ML Imperative

The first driving force behind the adoption of AI/ML in IT operations is the sheer volume of observable data that………………..

Create an abstract with Amazon Bedrock

The following prompt provides instructions to create an abstract summarizing the content of the technical talk:

# Creating an Abstract with Amazon Bedrock
abstract_prompt = """The contents of <transcript></transcript> are a transcription of a technical talk video recording. Based on this information write an abstract for this technical talk in no more than 100 words. Do not include any information that is not mentioned in <transcript></transcript>.

<transcript>
{transcript}
</transcript>
"""

user_message = {"role": "user", "content": abstract_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1000,
"messages": messages
}
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
abstract = response_body['content'][0]['text']
print(abstract)

Based on this, we get the following output:

Here is a 96 word abstract for the technical talk:

This talk discussed how AI and machine learning innovations are transforming IT operations for hybrid multi-cloud workloads. The speakers outlined the challenges of dealing with exploding observable data from modern infrastructure and the convergence of IT roles. They demonstrated how AI/ML techniques like intelligent alerting, alert correlation, probable root cause analysis, and automated remediation can drive more proactive, predictive operations. Key benefits showcased included reduced human effort, improved service health, and cost optimization. The talk featured a demo of the OpsRamp platform leveraging AI/ML models to streamline the discovery-to-resolution pipeline for managing edge-to-cloud environments.

Extract SEO keywords from the generated blog post with Amazon Bedrock

The following prompt provides instructions to extract the main SEO keywords from the generated blog post. Based on online research on how to extract SEO keywords from long-form text, we came up with the following list. This demonstrates how you can empower an LLM like Anthropic’s Claude to follow instructions and best practices for a particular task or domain. Also, the prompt specifies that the output should be in JSON. This is helpful in use cases where you want to programmatically get results from an LLM and therefore require consistent formatting. Based on best practices for Anthropic’s Claude, we use the Assistant message in the messages API to pre-fill the model’s response to have further control on the output format:

# Extracting SEO keywords from the generated blog post
SEO_keywords_prompt = """Extract the most relevant keywords and phrases from the given blog post text present in <blog></blog> that would be valuable for SEO (search engine optimization) based on the instructions present in <instructions></instructions> below. The ideal keywords should capture the main topics, concepts, entities, and high-value terms present in the content. Use JSON format with key "keywords" and value as an array of keywords. Skip the preamble; go straight into the JSON. 

<blog>
{textblog}
</blog>

<instructions>
1. Carefully read through the entire blog post text to understand the main topics, concepts, and ideas covered.
2. Identify important nouns, noun phrases, multi-word phrases, and relevant adjective-noun combinations that relate to the core subject matter of the post.
3. Look for words and phrases that potential searchers might use to find content like this.
4. Prioritize terms that are highly specific and relevant to the blog topic over generic words.
5. Vary the keyword length and include both head terms (shorter, more popular keywords) and long-tail terms (longer, more specific phrases).
6. Aim to extract around 10-20 of the most valuable, high-impact keywords and phrases for SEO.
</instructions>
"""

user_message =  {"role": "user", "content": SEO_keywords_prompt.format(textblog = generate_blog)}
assistant_message = {"role": "assistant", "content": '{"keywords": ['}
messages = [user_message, assistant_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
SEO_keywords = response_body['content'][0]['text']
SEO_keywords = '{"keywords": [' + SEO_keywords
SEO_keywords_json = json.loads(SEO_keywords)
print(SEO_keywords_json)

Based on this, we get the following output:

{‘keywords’: [‘AI-driven IT operations’, ‘machine learning IT operations’, ‘proactive IT operations’, ‘predictive IT operations’, ‘AI for hybrid cloud’, ‘AI for multi-cloud’, ‘FutureOps’, ‘AI-assisted IT operations’, ‘AI-powered event correlation’, ‘intelligent alerting’, ‘automated remediation workflows’, ‘predictive analytics for IT’, ‘AI anomaly detection’, ‘AI root cause analysis’, ‘AI-driven observability’, ‘AI for DevOps’, ‘AI for SRE’, ‘AI IT operations management’]}

For consistent formatting and structured output, you can also use the Converse and ConverseStream APIs in Amazon Bedrock and use the tool calling capabilities of the LLMs that offer it.

Generate a new blog post version emphasizing specific SEO keywords with Amazon Bedrock

The following prompt provides instructions to generate a new blog post emphasizing three keywords or phrases from the ones extracted in the previous step. In this case, we use the first three keywords in the returned keywords list. The instructions specified in the prompt are derived from SEO content best practices:

# Extracting SEO keywords from the generated blog post
rewrite_SEO_keywords_prompt = """Rewrite the given blog post text present in <blog></blog> to improve its ranking for 3 target keywords listed in <keywords></keywords>. Rewrite the blog post, naturally incorporating the target keywords multiple times throughout the text. Keep the rewritten content high-quality, readable, and relevant, while strategically emphasizing the keywords to improve SEO.

<blog>
{textblog}
</blog>

<keywords>
{kw1},
{kw2},
{kw3}
</keywords>
"""

user_message = {"role": "user", "content": rewrite_SEO_keywords_prompt.format(textblog = generate_blog, kw1 = SEO_keywords_json['keywords'][0], kw2 = SEO_keywords_json['keywords'][1], kw3 = SEO_keywords_json['keywords'][2])}
messages = [user_message]

body=json.dumps(
{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1000,
"messages": messages
}
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
rewrite_SEO_blog = response_body['content'][0]['text']
print(rewrite_SEO_blog)

Based on this, we get the following output:

<blog>

Title: Unleash the Power of AI-Driven and Machine Learning IT Operations for Proactive IT Operations

In today’s rapidly evolving digital landscape, traditional IT operations approaches are struggling to keep pace. The explosion of data from cloud workloads, Kubernetes clusters, and distributed edge locations has overwhelmed conventional monitoring tools, leading to fragmented views and reactive operations that respond to issues after they occur. To stay ahead of the curve, organizations must embrace AI-driven and machine learning IT operations, paving the way for proactive IT operations that enhance efficiency and minimize downtime.

At a recent technical talk, Verma and Radha Krishna Kunni from OpsRamp (recently acquired by HPE) delved into the transformative impact of AI and machine learning on IT operations, DevOps, and SRE for hybrid multi-cloud environments. They highlighted the key challenges ops teams face today and introduced the innovative “FutureOps” approach, which leverages AI and machine learning to revolutionize IT operations.

The full video recording of this insightful technical talk is now available [link], providing a comprehensive understanding of…………

Summarize content discussed in a recorded meeting using Amazon Transcribe and Amazon Bedrock

Through this use case, we demonstrate how to take an existing recording from a meeting (we use a recording from an AWS earnings call) to summarize the content discussed, extract the key points, and provide details on the sentiment of the meeting. For additional information on this use case, see Live Meeting Assistant with Amazon Transcribe, Amazon Bedrock, and Amazon Bedrock Knowledge Bases or Amazon Q Business.

Transcribe audio with Amazon Transcribe

In this use case, we use an Amazon 2024 Q1 earnings call as a sample. For the purpose of this notebook, we downloaded the WAV file for the recording and stored in an S3 bucket.

The first step is to transcribe the audio file using Amazon Transcribe:

# Create a Amazon Transcribe transcription job by specifying the audio/video file's S3 location
import boto3
import time
import random
transcribe = boto3.client('transcribe')
response = transcribe.start_transcription_job(
    TranscriptionJobName=f"meeting-transcription-{int(time.time())}_{random.randint(1000, 9999)}",
    LanguageCode='en-US',
    MediaFormat='mp3',
    Media={
        'MediaFileUri': '<S3 URI of the media file>'
    },
    OutputBucketName='<name of the S3 bucket that will store the output>',
    OutputKey='transcribe_bedrock_blog/output-files/',
    Settings={
        'ShowSpeakerLabels': True,
        'MaxSpeakerLabels': 10
    }
)
# Check whether the transcribe job is complete

max_tries = 60
while max_tries > 0:
    max_tries -= 1
    job = transcribe.get_transcription_job(TranscriptionJobName=response['TranscriptionJob']['TranscriptionJobName'])
    job_status = job["TranscriptionJob"]["TranscriptionJobStatus"]
    if job_status in ["COMPLETED", "FAILED"]:
        if job_status == "COMPLETED":
            print(
                f"Download the transcript fromn"
                f"t{job['TranscriptionJob']['Transcript']['TranscriptFileUri']}."
            )
        break
    else:
        print(f"Waiting for {response['TranscriptionJob']['TranscriptionJobName']}. Current status is {job_status}.")
    time.sleep(10)

The transcription job will take a few minutes to complete.

When the job is complete, you can inspect the transcription output and check for the plain text transcript that was generated:

import json
# Get the Transcribe Output JSON file
s3 = boto3.client('s3')
output_bucket = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/',2)[1]
output_file_key = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/',2)[2]
s3_response_object = s3.get_object(Bucket=output_bucket, Key=output_file_key)
object_content = s3_response_object['Body'].read()

transcription_output = json.loads(object_content)

# Let's see what we have inside the job output JSON
print(transcription_output['results']['transcripts'][0]['transcript'])

Thank you for standing by. Good day, everyone and welcome to the amazon.com first quarter, 2024 financial results teleconference. At this time, all participants are in a listen only mode. After the presentation, we will conduct a question and answer session. Today’s call is being recorded and for opening remarks, I’ll be turning the call over to the Vice President of Investor………

After you have validated the existence of the text, you can use Amazon Bedrock to analyze the output:

# Now let's use this transcript to extract insights with a help of a Large Language Model on Amazon Bedrock
# First let's initialize the bedrock runtime client to invoke the model. 
bedrock_runtime = boto3.client('bedrock-runtime')
# Selecting Claude 3 Sonnet
model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'

Using the transcription from the earnings call recording, we use Amazon Bedrock to call an FM (we use Anthropic’s Claude 3 Sonnet in this case). You can choose from other FMs available on Amazon Bedrock.

You can now perform additional tasks.

Identify the financial ratios highlighted during this earnings call

The following prompt provides instructions to identify financial ratios highlighted during the earnings call and their implications:

# Identify the financial ratios highlighted during this earnings call 
financial_ratios_prompt = """Based on the contents of <transcript></transcript>,identify the financial ratios highlighted during this earnings call and their implications <transcript></transcript> . 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": financial_ratios_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
financial_ratios = response_body['content'][0]['text']
print(financial_ratios)

Based on this, we get the following output:

Based on the earnings call transcript, the following financial ratios and their implications were highlighted:

Operating Income Margin:

Amazon reported its highest ever quarterly operating income of $15.3 billion, which was $3.3 billion above the high end of their guidance range. This was driven by strong operational performance across all three reportable segments (North America, International, and AWS) and better-than-expected operating leverage, including lower cost to serve.

North America segment operating income was $5 billion with an operating margin of 5.8%, up 460 basis points year-over-year, driven by improvements in cost to serve, including benefits from regionalization efforts, more consolidated customer shipments, and improved leverage.

International segment operating income was $903 million with an operating margin of 2.8%, up 710 basis points year-over-year, primarily driven by cost efficiencies through network design enhancements and improved volume leverage in established countries, as well as progress in emerging countries.

AWS operating income was $9.4 billion, an increase of $4.3 billion year-over-year, with improved leverage from managing infrastructure and fixed costs while growing at a healthy rate.

Implication: The higher operating income margins across all segments indicate Amazon’s focus on driving efficiencies and improving profitability while continuing to invest in growth opportunities.

Revenue Growth:

Worldwide revenue was $143.3 billion, up 13% year-over-year (excluding the impact of foreign exchange).

AWS revenue grew 17.2% year-over-year, accelerating from 13.2% in Q4 2023, driven by strong demand for both generative AI and non-generative AI workloads.

Advertising revenue grew 24% year-over-year (excluding the impact of foreign exchange), primarily driven by sponsored products and improvements in relevancy and measurement capabilities.

Implication: The strong revenue growth, particularly in AWS and advertising, highlights Amazon’s diversified revenue streams and the growth opportunities in cloud computing and digital advertising.

Capital Expenditures (Capex):

Amazon anticipates a meaningful increase in overall capital expenditures in 2024, primarily driven by higher infrastructure Capex for growth in AWS, including generative AI investments.

In Q1 2024, Capex was $14 billion, expected to be the lowest quarter of the year.

Implication: The increase in Capex signals Amazon’s confidence in the strong demand for AWS and their commitment to investing in emerging technologies like generative AI to drive future growth.

Overall, the financial ratios and commentary indicate Amazon’s focus on improving profitability, driving operational efficiencies, and investing in growth opportunities, particularly in AWS and generative AI, while maintaining a diversified revenue stream and managing costs effectively.

Identify the speakers from the earnings call with Amazon Bedrock

The following prompt provides instructions to identify the speakers in the meeting from the transcription:

# Identify the speakers from the earnings call 
speakers_prompt = """Based on the contents of <transcript></transcript>,identify the speakers on this earnings call <transcript></transcript> . 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": speakers_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
speakers = response_body['content'][0]['text']
print(speakers)

Based on this, we get the following output:

Based on the transcript, the key speakers on this Amazon earnings call appear to be:

Andy Jassy – CEO of Amazon

Brian Olsavsky – CFO of Amazon

Dave Fildes – Vice President of Investor Relations at Amazon

The call begins with opening remarks from Dave Fildes, followed by prepared statements from Andy Jassy and Brian Olsavsky. They then take questions from analysts, with Andy and Brian providing the responses.

Obtain the challenges or negative areas discussed on the earnings call with Amazon Bedrock

The following prompt provides instructions to obtain the challenges or negative areas discussed from the transcription:

# Obtain the challenges or negative areas discussed on earnings call

challenges_prompt = """Based on the contents of <transcript></transcript>,Obtaining the challenges or negative areas discussed on earnings <transcript></transcript> . 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": challenges_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
challenges = response_body['content'][0]['text']
print(challenges)

Based on this, we get the following output:

Based on the transcript, some of the key challenges or negative areas discussed include:

Foreign exchange headwinds: Amazon faced an unfavorable impact from global currencies weakening against the U.S. dollar in Q1, leading to a $700 million or 50 basis point headwind to revenue compared to guidance.

Increasing capital expenditures: Amazon expects to meaningfully increase its capital expenditures year-over-year in 2024, primarily driven by higher infrastructure spending for AWS growth, including investments in generative AI capabilities.

Consumer spending concerns: Amazon mentioned keeping an eye on consumer spending trends, specifically in Europe, where it appears weaker relative to the U.S.

International segment profitability: While the international segment’s profitability improved, with an operating margin of 2.8%, Amazon acknowledged the need to continue working on cost efficiencies and profitability, particularly in emerging countries.

Cost optimization challenges: Although Amazon believes the majority of cost optimization efforts are behind them, there is still a need to continually streamline processes, optimize inventory placement, and invest in automation to further reduce the cost to serve.

Overall, the challenges centered around foreign exchange impacts, increasing capital intensity for AWS and generative AI investments, consumer demand uncertainties, and ongoing efforts to improve operational efficiencies and international profitability.

Get insights from a call center call between an agent and a customer using Amazon Transcribe and Amazon Bedrock

Through this use case, we demonstrate how to take an existing call recording from a contact center and summarize the content discussed, extract the main topic, key phrases, call reason, customer satisfaction, overall call sentiment, and sentiment about the products and services discussed. For additional details about this use case, see Live call analytics and agent assist for your contact center with Amazon language AI services and Post call analytics for your contact center with Amazon language AI services.

Transcribe audio with Amazon Transcribe

The first step is to transcribe the audio file using Amazon Transcribe. In this case, we use a sample from the Amazon Transcribe Post Call Analytics Solution GitHub repository. For the purpose of this notebook, we downloaded the WAV file and stored it in an S3 bucket.

# Create a Amazon Transcribe transcirption job by specifying the audio/video file's S3 location
import boto3
import time
import random
transcribe = boto3.client('transcribe')
response = transcribe.start_transcription_job(
    TranscriptionJobName=f"call_center-transcription-{int(time.time())}_{random.randint(1000, 9999)}",
    LanguageCode='en-US',
    MediaFormat='mp3',
    Media={
        'MediaFileUri': '<S3 URI of the media file>'
    },
    OutputBucketName='<name of the S3 bucket that will store the output>',
    OutputKey='transcribe_bedrock_blog/output-files/',
    Settings={
        'ShowSpeakerLabels': True,
        'MaxSpeakerLabels': 3
    }
)
max_tries = 60
while max_tries > 0:
    max_tries -= 1
    job = transcribe.get_transcription_job(TranscriptionJobName=response['TranscriptionJob']['TranscriptionJobName'])
    job_status = job["TranscriptionJob"]["TranscriptionJobStatus"]
    if job_status in ["COMPLETED", "FAILED"]:
        if job_status == "COMPLETED":
            print(
                f"Download the transcript fromn"
                f"t{job['TranscriptionJob']['Transcript']['TranscriptFileUri']}."
            )
        break
    else:
        print(f"Waiting for {response['TranscriptionJob']['TranscriptionJobName']}. Current status is {job_status}.")
    time.sleep(10)

The transcription job will take a few minutes to complete.

When the job’s complete, you can inspect the transcription output and check for the plain text transcript that was generated:

import json
# Get the Transcribe Output JSON file
s3 = boto3.client('s3')
output_bucket = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/',2)[1]
output_file_key = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/',2)[2]
s3_response_object = s3.get_object(Bucket=output_bucket, Key=output_file_key)
object_content = s3_response_object['Body'].read()

transcription_output = json.loads(object_content)

# Let's see what we have inside the job output JSON
print(transcription_output['results']['transcripts'][0]['transcript'])

Thank you for calling Big Jim’s Auto. This is Travis. How can I help you today? Hello, my name is Violet King and I bought a car not too long ago and a light is coming on um a light on the dashboard. And so I was wondering what I should do about that. Ok. It may depend on what kind of light we’re looking at here today, ma’am. Uh Could I get your first and last name spelled out for me so I can just get some information pulled up? Yes. My name is Violet Vviolet. My last name is King King. Ok, I got the call last week. Ok. Uh And what kind of car are we examining today? It’s, it’s a Ford Fusion. It’s 2017, 2017 Ford Fusion. OK. And for verification, ma’am. Do you happen to know the purchase date of the car? Yes, it, it, it was last Tuesday, August 10th. You say the 10th? Ok. And can you describe to me what kind of light we’re looking at? Y yes, it’s uh I, I think it’s a, an oil, an oil light, an oil light? Ok. Ok. And uh just for clarity on my end, ma’am. Um, uh, is this the first call you’ve made regarding this? Yes. Ok. And, uh, this might be kind of a silly question because I know you just got the car. But sometimes they make me ask a silly question about how many miles has the car been driven since you bought it? Oh. I’m not sure. Should I check? Uh, no, that’s ok. I’ll just, I’ll just put in that. We don’t know at this time. It’s ok. Um, so, um, uh, under the warranty we offer, um, we, the, we don’t handle in house oil changes. Um, we basically, when, when someone buys a car from us, the warranty, we have, it, it covers some stuff like, um, weather damage. Um, and, uh, if the engine light comes on, we take a look at that, but the oil change is something that we just don’t have, uh, here at the dealership that’s a little bit outsourced out and they’re pretty backed up right now because a lot of people have been, uh, staying in due to the recent pandemic and now everyone’s just starting to get out and a whole bunch of places are just completely bogged down. So we have a place that we typically outsource to, um, and they’re, they’re pretty reasonable. They’re about, I wanna say somewhere between 25 and $35 to do an oil change. So it’s really not that bad, but they’re a little bit backed up right now from what I’ve heard, I would recommend giving them a call as soon as you can before they close. Ok. What is their number? Uh, give me a second. Let me just rustle through the desk here. See if I can find their information. Uh, ok. Ok. Yes, I’m all right with them one moment, please. Sure. Ok. All their number is 888 333 2222. Ok, and they can fix my car. Yeah, they should be able to handle the oil change. I’m sorry, that’s not something that we cover under the warranty that uh, we have, um, but they should be able to get you settled and, uh, sorted. Ok. Ok. Thank you. Cool. No problem. Have a good one. Thank you. Yup. Bye.

After you have validated the existence of the text, you can use Amazon Bedrock to analyze the output:

# Now let's use this transcript to extract insights with a help of a Large Language Model on Amazon Bedrock
# First let's initialize the bedrock runtime client to invoke the model. 
bedrock_runtime = boto3.client('bedrock-runtime')
# Selecting Claude 3 Sonnet
model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'

Using the transcription from the recorded call center call, we use Amazon Bedrock to call an FM (we use Anthropic’s Claude 3 Sonnet in this case), but you can choose from the other language models available on Amazon Bedrock.

You can now perform additional tasks.

Summarize the call between agent and client with Amazon Bedrock

The following prompt provides instructions to summarize the call discussion from the transcription:

#Summarize the call between agent and customer
summarize_prompt = """Based on the contents of <transcript></transcript>,summarize the call between agent and customer with focus on resolution <transcript></transcript> . 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": summarize_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
summarization = response_body['content'][0]['text']
print(summarization)

We get the following output:

Based on the contents of the transcript, here is a summary of the call between the agent (Travis) and the customer (Violet King) with a focus on resolution:

Violet King called about a light on the dashboard of her recently purchased 2017 Ford Fusion from Big Jim’s Auto. The light appeared to be an oil light. Travis explained that while their warranty covers certain issues like weather damage and engine lights, it does not cover oil changes. He recommended calling an outsourced oil change service that Big Jim’s Auto typically uses, which charges between $25-35 for an oil change.

Travis provided the phone number for the oil change service (888-333-2222) and mentioned that they are currently backed up due to the recent pandemic. He advised Violet to call them as soon as possible before they close to get her oil changed and resolve the issue with the oil light on her dashboard.

The resolution was for Violet to contact the recommended third-party oil change service to have her car’s oil changed, which should address the oil light issue she was experiencing with her newly purchased vehicle.

Extract the main topics with Amazon Bedrock

The following prompt provides instructions to extract the main topics discussed in the conversation from the transcription:

# Extracting the main topics from the conversation
maintopic_prompt = """The contents of <transcript></transcript> are a transcription of a conversation between agent and client. Based on the information, extract the main topics from the conversation  <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": maintopic_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
main_topic = response_body['content'][0]['text']
print(main_topic)

We get the following output:

Based on the conversation transcript, the main topics appear to be:

Dashboard warning light (specifically an oil light) on a recently purchased 2017 Ford Fusion.

Determining if the issue is covered under the warranty provided by the dealership (Big Jim’s Auto).

Recommendation to contact an external auto service provider (phone number provided) for an oil change service, as the dealership does not handle oil changes in-house.

Confirming that the external auto service provider can likely resolve the oil light issue by performing an oil change.

Extract the key phrases with Amazon Bedrock

The following prompt provides instructions to extract the key phrases discussed in the conversation from the transcription:

# Extracting the key phrases with Amazon Bedrock
keyphrase_prompt = """The contents of <transcript></transcript> are a transcription of a conversation between agent and client. Based on the information, extract the key phrases discussed in the conversation <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": keyphrase_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
keyphrases = response_body['content'][0]['text']
print(keyphrases)

We get the following output:

Based on the conversation transcript, here are the key phrases discussed:

Oil light

2017 Ford Fusion

Purchase date: August 10th

First call regarding the issue

Car mileage unknown

Warranty does not cover oil changes

Outsourced oil change service recommended

Oil change service contact number: 888-333-2222

Oil change service cost: $25 – $35

Service is backed up due to the pandemic

Extract the reason why the client called the call center with Amazon Bedrock

The following prompt provides instructions to extract the reason for this client call to the call center from the transcription:

# Extracting the reason why client called the call center
reason_prompt = """The content of <transcript></transcript> is  transcription of a conversation between agent and client. Based on the information, extract the reason why client called the call center <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": reason_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
reason = response_body['content'][0]['text']
print(reason)

We get the following output:

Based on the transcription, the client called the call center because a light (specifically an oil light) was coming on the dashboard of their recently purchased 2017 Ford Fusion car. The client was seeking guidance on what to do about the oil light being on.

Extract the level of customer satisfaction with Amazon Bedrock

The following prompt provides instructions to extract the level of customer satisfaction experienced by the client from the transcription:

# Extracting the level of Customer Satisfaction 
satisfaction_prompt = """The content of <transcript></transcript> is  transcription of a conversation between agent and client. Based on the information, extract the level of customer satisfaction <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": satisfaction_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
csat= response_body['content'][0]['text']
print(csat)

We get the following output:

Based on the transcript, the level of customer satisfaction seems to be moderate.

Evidence:

The agent provided clear explanations regarding the issue with the oil light and why oil changes are not covered under their warranty.

The agent offered a recommendation for an external service provider that could perform the oil change, along with their contact information.

The customer acknowledged the information provided by the agent, indicating some level of satisfaction with the response.

However, there are no explicit statements from the customer expressing high satisfaction or dissatisfaction. The interaction remains polite and resolves the customer’s initial query, but there is no strong indication of exceptional satisfaction or disappointment.

Obtain the overall customer sentiment with Amazon Bedrock

The following prompt provides instructions to obtain the overall customer sentiment from the transcription:

# Extracting the overall customer sentiment.
sentiment_prompt = """The content of <transcript></transcript> is transcription of a conversation between agent and client. Based on the information, what is the overall customer sentiment of the conversation <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": sentiment_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
sentiment = response_body['content'][0]['text']
print(sentiment)

We get the following output:

Based on the conversation transcript, the overall customer sentiment seems neutral to slightly positive. Although the customer, Violet King, was initially concerned about a warning light on her recently purchased car’s dashboard, the agent (Travis) explained the situation clearly and provided a recommendation for getting an oil change from a third-party service provider. The customer acknowledged and accepted the suggestion without expressing significant frustration or dissatisfaction. The conversation ended on a polite note with the customer thanking the agent.

Obtain sentiment about the products or services discussed during the call with Amazon Bedrock

The following prompt provides instructions to obtain sentiment about products and services discussed during the call from the transcription:

## Obtaining sentiment about the products or services discussed.
sentiment_product_prompt = """The content of <transcript></transcript> is transcription of a conversation between agent and client. Based on the information, what is the overall sentiment about the products discussed in the conversation <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": sentiment_product_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
sentiment_product = response_body['content'][0]['text']
print(sentiment_product)

We get the following output:

Based on the conversation transcript, the overall sentiment about the products (the 2017 Ford Fusion car) discussed seems neutral to slightly positive. There are no major complaints or strong negative sentiments expressed about the car itself. The issue raised is regarding an oil light coming on, which seems to be a routine maintenance matter that needs to be addressed by getting an oil change. The agent helpfully provides the customer with information on where to get an oil change done, although mentioning that the warranty does not cover routine oil changes at the dealership. Overall, there is no indication of major dissatisfaction with the car itself from the conversation.

Create an email to send to the client with the call summary

The following prompt provides instructions to create an email to send to the client after the call is completed, summarizing the call:

## Creating an email to send to the client with the call summary
email_prompt = """The content of <transcript></transcript> is transcription of a conversation between agent and client. Based on the information, create an email to send to the client after the call is completed summarizing the call. 
User formal but friendly english style. <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": email_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
email_conversation = response_body['content'][0]['text']
print(email_conversation)

We get the following output:

Dear Ms. Violet King,

I hope this email finds you well. I’m writing to summarize our conversation regarding the issue with your 2017 Ford Fusion that you purchased last Tuesday, August 10th.

During our call, you mentioned that an oil light was illuminated on the dashboard of your vehicle. As discussed, our dealership’s warranty does not cover routine maintenance services like oil changes. However, we have a trusted partner that we typically recommend for such services.

The recommended auto service provider is reachable at 888-333-2222. They are experienced in handling oil changes and should be able to assist you with your vehicle’s needs. Please note that they have been experiencing a high volume of requests due to the recent pandemic, so it’s advisable to call them as soon as possible to schedule an appointment.

Conclusion

Using generative AI through Amazon Bedrock to analyze audio transcripts generated by Amazon Transcribe unlocks valuable insights that would otherwise remain hidden within the audio data. By combining the powerful speech-to-text capabilities of Amazon Transcribe with the natural language understanding and generation capabilities of LLMs like those available through Amazon Bedrock, businesses can more efficiently extract key information, generate summaries, identify topics and sentiments, and create new content from their audio and video assets. This approach not only saves time and resources compared to manual transcription and analysis, but also opens up new opportunities for using existing content in innovative ways.

Whether it’s repurposing marketing materials, quickly capturing key points from meetings, or improving customer experience through call center analytics, the combination of Amazon Transcribe and large language models (LLMs) on Amazon Bedrock provides a powerful solution for unlocking the full potential of audio data.As these use cases have demonstrated, this technology can be applied across various domains, from content creation and SEO optimization to business intelligence and customer service. By staying at the forefront of these advancements, organizations can gain a competitive edge by effectively harnessing the wealth of information contained within their audio and video repositories, driving insights, and making more informed decisions.

About the Authors

Ana Maria Echeverri is an AI/ML Worldwide Service Specialist at AWS, focused on driving adoption of generative AI speech analytics use cases. She has worked in the data and AI industry for over 30 years, with over 10 years focused on helping organizations grow their AI maturity and capabilities for successful execution of AI strategies.

Vishesh Jha is a Senior Solutions Architect at AWS. His area of interest lies in generative AI, and he has helped customers and partners get started with NLP using AWS services such as Amazon Bedrock, Amazon Transcribe, and Amazon SageMaker. He is an avid soccer fan, and in his free time enjoys watching and playing the sport. He also loves cooking, gaming, and traveling with his family.

Bala Krishna Jakka is a Technical Account manager at AWS, with a passion for contact center and generative AI technologies. With extensive expertise in helping organizations use cutting-edge solutions, he thrives on staying ahead of the curve in this rapidly evolving field. When not immersed in the realms of AI and customer experience, he finds joy in the game of cricket, showcasing his skills on the pitch. A devoted family man, he cherishes the moments spent with his loved ones, creating lasting memories and finding balance amidst the demands of his professional pursuits

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

November 1, 2024

by Yanyan Zhang Amazon AWS

Fine-tuning is a powerful approach in natural language processing (NLP) and generative AI, allowing businesses to tailor pre-trained large language models (LLMs) for specific tasks. This process involves updating the model’s weights to improve its performance on targeted applications. By fine-tuning, the LLM can adapt its knowledge base to specific data and tasks, resulting in enhanced task-specific capabilities. To achieve optimal results, having a clean, high-quality dataset is of paramount importance. A well-curated dataset forms the foundation for successful fine-tuning. Additionally, careful adjustment of hyperparameters such as learning rate multiplier and batch size plays a crucial role in optimizing the model’s adaptation to the target task.

The capabilities in Amazon Bedrock for fine-tuning LLMs offer substantial benefits for enterprises. This feature enables companies to optimize models like Anthropic’s Claude 3 Haiku on Amazon Bedrock for custom use cases, potentially achieving performance levels comparable to or even surpassing more advanced models such as Anthropic’s Claude 3 Opus or Anthropic’s Claude 3.5 Sonnet. The result is a significant improvement in task-specific performance, while potentially reducing costs and latency. This approach offers a versatile solution to satisfy your goals for performance and response time, allowing businesses to balance capability, domain knowledge, and efficiency in your AI-powered applications.

In this post, we explore the best practices and lessons learned for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock. We discuss the important components of fine-tuning, including use case definition, data preparation, model customization, and performance evaluation. This post dives deep into key aspects such as hyperparameter optimization, data cleaning techniques, and the effectiveness of fine-tuning compared to base models. We also provide insights on how to achieve optimal results for different dataset sizes and use cases, backed by experimental data and performance metrics.

As part of this post, we first introduce general best practices for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock, and then present specific examples with the TAT- QA dataset (Tabular And Textual dataset for Question Answering).

Recommended use cases for fine-tuning

The use cases that are the most well-suited for fine-tuning Anthropic’s Claude 3 Haiku include the following:

Classification – For example, when you have 10,000 labeled examples and want Anthropic’s Claude 3 Haiku to do well at this task.
Structured outputs – For example, when you have 10,000 labeled examples specific to your use case and need Anthropic’s Claude 3 Haiku to accurately identify them.
Tools and APIs – For example, when you need to teach Anthropic’s Claude 3 Haiku how to use your APIs well.
Particular tone or language – For example, when you need Anthropic’s Claude 3 Haiku to respond with a particular tone or language specific to your brand.

Fine-tuning Anthropic’s Claude 3 Haiku has demonstrated superior performance compared to few-shot prompt engineering on base Anthropic’s Claude 3 Haiku, Anthropic’s Claude 3 Sonnet, and Anthropic’s Claude 3.5 Sonnet across various tasks. These tasks include summarization, classification, information retrieval, open-book Q&A, and custom language generation such as SQL. However, achieving optimal performance with fine-tuning requires effort and adherence to best practices.

To better illustrate the effectiveness of fine-tuning compared to other approaches, the following table provides a comprehensive overview of various problem types, examples, and their likelihood of success when using fine-tuning versus prompting with Retrieval Augmented Generation (RAG). This comparison can help you understand when and how to apply these different techniques effectively.

Problem	Examples	Likelihood of Success with Fine-tuning	Likelihood of Success with Prompting + RAG
Make the model follow a specific format or tone	Instruct the model to use a specific JSON schema or talk like the organization’s customer service reps	Very High	High
Teach the model a new skill	Teach the model how to call APIs, fill out proprietary documents, or classify customer support tickets	High	Medium
Teach the model a new skill, and hope it learns similar skills	Teach the model to summarize contract documents, in order to learn how to write better contract documents	Low	Medium
Teach the model new knowledge, and expect it to use that knowledge for general tasks	Teach the model the organizations’ acronyms or more music facts	Low	Medium

Prerequisites

Before diving into the best practices and optimizing fine-tuning LLMs on Amazon Bedrock, familiarize yourself with the general process and how-to outlined in Fine-tune Anthropic’s Claude 3 Haiku in Amazon Bedrock to boost model accuracy and quality. The post provides essential background information and context for the fine-tuning process, including step-by-step guidance on fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock both through the Amazon Bedrock console and Amazon Bedrock API.

LLM fine-tuning lifecycle

The process of fine-tuning an LLM like Anthropic’s Claude 3 Haiku on Amazon Bedrock typically follows these key stages:

Use case definition – Clearly define the specific task or knowledge domain for fine-tuning
Data preparation – Gather and clean high-quality datasets relevant to the use case
Data formatting – Structure the data following best practices, including semantic blocks and system prompts where appropriate
Model customization – Configure the fine-tuning job on Amazon Bedrock, setting parameters like learning rate and batch size, enabling features like early stopping to prevent overfitting
Training and monitoring – Run the training job and monitor the status of training job
Performance evaluation – Assess the fine-tuned model’s performance against relevant metrics, comparing it to base models
Iteration and deployment – Based on the result, refine the process if needed, then deploy the model for production

Throughout this journey, depending on the business case, you may choose to combine fine-tuning with techniques like prompt engineering for optimal results. The process is inherently iterative, allowing for continuous improvement as new data or requirements emerge.

Use case and dataset

The TAT-QA dataset is related to a use case for question answering on a hybrid of tabular and textual content in finance where tabular data is organized in table formats such as HTML, JSON, Markdown, and LaTeX. We focus on the task of answering questions about the table. The evaluation metric is the F1 score that measures the word-to-word matching of the extracted content between the generated output and the ground truth answer. The TAT-QA dataset has been divided into train (28,832 rows), dev (3,632 rows), and test (3,572 rows).

The following screenshot provides a snapshot of the TAT-QA data, which comprises a table with tabular and textual financial data. Following this financial data table, a detailed question-answer set is presented to demonstrate the complexity and depth of analysis possible with the TAT-QA dataset. This comprehensive table is from the paper TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance, and it includes several key components:

Reasoning types – Each question is categorized by the type of reasoning required
Questions – A variety of questions that test different aspects of understanding and interpreting the financial data
Answers – The correct responses to each question, showcasing the precision required in financial analysis
Scale – Where applicable, the unit of measurement for the answer
Derivation – For some questions, the calculation or logic used to arrive at the answer is provided

The following screenshot shows a formatted version of the data as JSONL and is passed to Anthropic’s Claude 3 Haiku for fine-tuning training data. The preceding table has been structured in JSONL format with system, user role (which contains the data and the question), and assistant role (which has answers). The table is enclosed within the XML tag <table><table>, helping Anthropic’s Claude 3 Haiku parse the prompt with the data from the table. For the model fine-tuning and performance evaluation, we randomly selected 10,000 examples from the TAT-QA dataset to fine-tune the model, and randomly picked 3,572 records from the remainder of the dataset as testing data.

Best practices for data cleaning and data validation

When fine-tuning the Anthropic’s Claude 3 Haiku model, the quality of training data is paramount and serves as the primary determinant of the output quality, surpassing the importance of any other step in the fine-tuning process. Our experiments have consistently shown that high-quality datasets, even if smaller in size, yield better results than a larger but less refined one. This “quality over quantity” approach should guide the entire data preparation process. Data cleaning and validation are essential steps in maintaining the quality of the training set. The following are two effective methods:

Human evaluation – This method involves subject matter experts (SMEs) manually reviewing each data point for quality and relevance. Though time-consuming, it provides unparalleled insight into the nuances of the specific tasks.
LLM as a judge – For large datasets, using Anthropic’s Claude models as a judge can be more efficient. For example, you can use Anthropic’s Claude 3.5 Sonnet as a judge to decide whether each provided training record meets the high quality requirement. The following is an example prompt template:

{'prompt': {
'system': "You are a reliable and impartial expert judge in question/answering data assessment. ",
'messages': [
{'role': 'user', 'content': [{'type': 'text', 'text': 'Your task is to take a question, an answer, and a context which may include multiple documents, and provide a judgment on whether the answer to the question is correct or not. This decision should be based either on the provided context or your general knowledge and memory. If the answer contradicts the information in context, it's incorrect. A correct answer is ideally derived from the given context. If no context is given, a correct answer should be factually true and directly and unambiguously address the question.nnProvide a short step-by-step reasoning with a maximum of 4 sentences within the <reason></reason> xml tags and provide a single correct or incorrect response within the <judgement></judgement> xml tags.n <context>n...n</context>n<question>n...n</question>n<answer>n...n</answer>n'}]}]}}

The following is a sample output from Anthropic’s Claude 3.5 Sonnet:

{'id': 'job_id',
'type': 'message',
'role': 'assistant',
'model': 'claude-3-5-sonnet-20240620',
'content': [{'type': 'text',
'text': '<reason>n1. I'll check the table for information... </reason>nn<judgement>correct</judgement>'}],
'stop_reason': 'end_turn',
'stop_sequence': None,
'usage': {'input_tokens': 923, 'output_tokens': 90}}

This LLM-as-a-judge approach is effective for large datasets, allowing for efficient and consistent quality assessment across a wide range of examples. It can help identify and filter out low-quality or irrelevant data points, making sure only the most suitable examples are used for fine-tuning.

The format of your training data is equally important. Although it’s optional, it’s highly recommended to include a system prompt that clearly defines the model’s role and tasks. In addition, including rationales within XML tags can provide valuable context for the model and facilitate extraction of key information. Prompt optimization is one of the key factors in improving model performance. Following established guidelines, such as those provided by Anthropic, can significantly enhance results. This might include structuring prompts with semantic blocks within XML tags, both in training samples and at inference time.

By adhering to these best practices in data cleaning, validation, and formatting, you can create a high-quality dataset that forms the foundation for successful fine-tuning. In the world of model training, quality outweighs quantity, and a well-prepared dataset is key to unlocking the full potential of fine-tuning Anthropic’s Claude 3 Haiku.

Best practices for performing model customization training jobs

When fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock, it’s crucial to optimize your training parameters to achieve the best possible performance. Our experiments have revealed several key insights that can guide you in effectively setting up your customization training jobs.

One of the most critical aspects of fine-tuning is selecting the right hyperparameters, particularly learning rate multiplier and batch size (see the appendix in this post for definitions). Our experiment results have shown that these two factors can significantly impact the model’s performance, with improvements ranging from 2–10% across different tasks. For the learning rate multiplier, the value ranges between 0.1–2.0, with a default value of 1.0. We suggest starting with the default value and potentially adjusting this value based on your evaluation result. Batch size is another important parameter, and its optimal value can vary depending on your dataset size. Based on our hyperparameter tuning experiments across different use cases, the API allows a range of 4–256, with a default of 32. However, we’ve observed that dynamically adjusting the batch size based on your dataset size can lead to better results:

For datasets with 1,000 or more examples, aim for a batch size between 32–64
For datasets between 500–1,000 examples, a batch size between 16–32 is generally suitable
For smaller datasets with fewer than 500 examples, consider a batch size between 4–16

The following chart illustrates how model performance improves as the size of the training dataset increases, as well as the change of optimal parameters, using the TAT-QA dataset. Each data point is annotated with the optimal learning rate multiplier (LRM), batch size (BS), and number of epochs (Epoch) used to achieve the best performance with the dataset size. We can observe that larger datasets tend to benefit from higher learning rates and batch sizes, whereas smaller datasets require more training epochs. The red dashed line is the baseline Anthropic’s Claude 3 Haiku performance without fine-tuning efforts.

By following these guidelines, you can configure an Anthropic’s Claude 3 Haiku fine-tuning job with a higher chance of success. However, remember that these are general recommendations and the optimal settings may vary depending on your specific use case and dataset characteristics.

In scenarios with large amounts of data (1,000–10,000 examples), the learning rate tends to have a more significant impact on performance. Conversely, for smaller datasets (32–100 examples), the batch size becomes the dominant factor.

Performance evaluations

The fine-tuned Anthropic’s Claude 3 Haiku model demonstrated substantial performance improvements over base models when evaluated on the financial Q&A task, highlighting the effectiveness of the fine-tuning process on specialized data. Based on the evaluation results, we found the following:

Fine-tuned Anthropic’s Claude 3 Haiku performed better than Anthropic’s Claude 3 Haiku, Anthropic’s Claude 3 Sonnet, and Anthropic’s Claude 3.5 Sonnet for TAT-QA dataset across the target use case of question answering on financial text and tabular content.
For the performance evaluation metric F1 score (see the appendix for definition), fine-tuned Anthropic’s Claude 3 Haiku achieved a score of 91.2%, which is a 24.60% improvement over the Anthropic’s Claude 3 Haiku base model’s score of 73.2%. Fine-tuned Anthropic’s Claude 3 Haiku also achieved a 19.6% improvement over the Anthropic’s Claude 3 Sonnet base model’s performance, which obtained an F1 score of 76.3%. Fine-tuned Anthropic’s Claude 3 Haiku even achieved better performance over the Anthropic’s Claude 3.5 Sonnet base model.

The following table provides a detailed comparison of the performance metrics for the fine-tuned Claude 3 Haiku model against various base models, illustrating the significant improvements achieved through fine-tuning.

.	.	.	.	.	Fine-Tuned Model Performance	Base Model Performance			Improvement: Fine-Tuned Anthropic’s Claude 3 Haiku vs. Base Models
Target Use Case	Task Type	Fine-Tuning Data Size	Test Data Size	Eval Metric	Anthropic’s Claude 3 Haiku	Anthropic’s Claude 3 Haiku (Base Model)	Anthropic’s Claude 3 Sonnet	Anthropic’s Claude 3.5 Sonnet	vs. Anthropic’s Claude 3 Haiku Base	vs. Anthropic’s Claude 3 Sonnet Base	vs. Anthropic’s Claude 3.5 Sonnet Base
TAT-QA	Q&A on financial text and tabular content	10,000	3,572	F1 score	91.2%	73.2%	76.3%	83.0%	24.6%	19.6%	9.9%

Few-shot examples improve performance not only on the base model, but also on fine-tuned models, especially when the fine-tuning data is small.

Fine-tuning also demonstrated significant benefits in reducing token usage. On the TAT-QA HTML test set (893 examples), the fine-tuned Anthropic’s Claude 3 Haiku model reduced the average output token count by 35% compared to the base model, as shown in the following table.

Model	Average Output Token	% Reduced	Median	% Reduced	Standard Deviation	Minimum Token	Maximum Token
Anthropic’s Claude 3 Haiku Base	34	–	28	–	27	13	245
Anthropic’s Claude 3 Haiku Fine-Tuned	22	35%	17	39%	14	13	179

We use the following figures to illustrate the token count distribution for both the base Anthropic’s Claude 3 Haiku and fine-tuned Anthropic’s Claude 3 Haiku models. The left graph shows the distribution for the base model, and the right graph displays the distribution for the fine-tuned model. These histograms demonstrate a shift towards more concise output in the fine-tuned model, with a notable reduction in the frequency of longer token sequences.

To further illustrate this improvement, consider the following example from the test set:

Question: "How did the company adopt Topic 606?"
Ground truth answer: "the modified retrospective method"
Base Anthropic’s Claude 3 Haiku response: "The company adopted the provisions of Topic 606 in fiscal 2019 utilizing the modified retrospective method"
Fine-tuned Anthropic’s Claude 3 Haiku response: "the modified retrospective method"

As evident from this example, the fine-tuned model produces a more concise and precise answer, matching the ground truth exactly, whereas the base model includes additional, unnecessary information. This reduction in token usage, combined with improved accuracy, can lead to enhanced efficiency and reduced costs in production deployments.

Conclusion

Fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock offers significant performance improvements for specialized tasks. Our experiments demonstrate that careful attention to data quality, hyperparameter optimization, and best practices in the fine-tuning process can yield substantial gains over base models. Key takeaways include the following:

The importance of high-quality, task-specific datasets, even if smaller in size
Optimal hyperparameter settings vary based on dataset size and task complexity
Fine-tuned models consistently outperform base models across various metrics
The process is iterative, allowing for continuous improvement as new data or requirements emerge

Although fine-tuning provides impressive results, combining it with other techniques like prompt engineering may lead to even better outcomes. As LLM technology continues to evolve, mastering fine-tuning techniques will be crucial for organizations looking to use these powerful models for specific use cases and tasks.

Now you’re ready to fine-tune Anthropic’s Claude 3 Haiku on Amazon Bedrock for your use case. We look forward to seeing what you build when you put this new technology to work for your business.

Appendix

We used the following hyperparameters as part of our fine-tuning:

Learning rate multiplier – Learning rate multiplier is one of the most critical hyperparameters in LLM fine-tuning. It influences the learning rate at which model parameters are updated after each batch.
Batch size – Batch size is the number of training examples processed in one iteration. It directly impacts GPU memory consumption and training dynamics.
Epoch – One epoch means the model has seen every example in the dataset one time. The number of epochs is a crucial hyperparameter that affects model performance and training efficiency.

For our evaluation, we used the F1 score, which is an evaluation metric to assess the performance of LLMs and traditional ML models.

To compute the F1 score for LLM evaluation, we need to define precision and recall at the token level. Precision measures the proportion of generated tokens that match the reference tokens, and recall measures the proportion of reference tokens that are captured by the generated tokens. The F1 score ranges from 0–100, with 100 being the best possible score and 0 being the lowest. However, interpretation can vary depending on the specific task and requirements.

We calculate these metrics as follows:

Precision = (Number of matching tokens in generated text) / (Total number of tokens in generated text)
Recall = (Number of matching tokens in generated text) / (Total number of tokens in reference text)
F1 = (2 * (Precision * Recall) / (Precision + Recall)) * 100

For example, let’s say the LLM generates the sentence “The cat sits on the mat in the sun” and the reference sentence is “The cat sits on the soft mat under the warm sun.” The precision would be 6/9 (6 matching tokens out of 9 generated tokens), and the recall would be 6/11 (6 matching tokens out of 11 reference tokens).

Precision = 6/9 ≈ 0.667
Recall = 6/11 ≈ 0.545
F1 score = (2 * (0.667 * 0.545) / (0.667 + 0.545)) * 100 ≈ 59.90

About the Authors

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Sovik Kumar Nath is an AI/ML and Generative AI Senior Solutions Architect with AWS. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. He has double master’s degrees from the University of South Florida and University of Fribourg, Switzerland, and a bachelor’s degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, and adventures.

Jennifer Zhu is a Senior Applied Scientist at AWS Bedrock, where she helps building and scaling generative AI applications with foundation models. Jennifer holds a PhD degree from Cornell University, and a master degree from University of San Francisco. Outside of work, she enjoys reading books and watching tennis games.

Fang Liu is a principal machine learning engineer at Amazon Web Services, where he has extensive experience in building AI/ML products using cutting-edge technologies. He has worked on notable projects such as Amazon Transcribe and Amazon Bedrock. Fang Liu holds a master’s degree in computer science from Tsinghua University.

Yanjun Qi is a Senior Applied Science Manager at the Amazon Bedrock Science. She innovates and applies machine learning to help AWS customers speed up their AI and cloud adoption.

Track, allocate, and manage your generative AI cost and usage with Amazon Bedrock

November 1, 2024

by Kyle Blocksom Amazon AWS

As enterprises increasingly embrace generative AI , they face challenges in managing the associated costs. With demand for generative AI applications surging across projects and multiple lines of business, accurately allocating and tracking spend becomes more complex. Organizations need to prioritize their generative AI spending based on business impact and criticality while maintaining cost transparency across customer and user segments. This visibility is essential for setting accurate pricing for generative AI offerings, implementing chargebacks, and establishing usage-based billing models.

Without a scalable approach to controlling costs, organizations risk unbudgeted usage and cost overruns. Manual spend monitoring and periodic usage limit adjustments are inefficient and prone to human error, leading to potential overspending. Although tagging is supported on a variety of Amazon Bedrock resources—including provisioned models, custom models, agents and agent aliases, model evaluations, prompts, prompt flows, knowledge bases, batch inference jobs, custom model jobs, and model duplication jobs—there was previously no capability for tagging on-demand foundation models. This limitation has added complexity to cost management for generative AI initiatives.

To address these challenges, Amazon Bedrock has launched a capability that organization can use to tag on-demand models and monitor associated costs. Organizations can now label all Amazon Bedrock models with AWS cost allocation tags, aligning usage to specific organizational taxonomies such as cost centers, business units, and applications. To manage their generative AI spend judiciously, organizations can use services like AWS Budgets to set tag-based budgets and alarms to monitor usage, and receive alerts for anomalies or predefined thresholds. This scalable, programmatic approach eliminates inefficient manual processes, reduces the risk of excess spending, and ensures that critical applications receive priority. Enhanced visibility and control over AI-related expenses enables organizations to maximize their generative AI investments and foster innovation.

Introducing Amazon Bedrock application inference profiles

Amazon Bedrock recently introduced cross-region inference, enabling automatic routing of inference requests across AWS Regions. This feature uses system-defined inference profiles (predefined by Amazon Bedrock), which configure different model Amazon Resource Names (ARNs) from various Regions and unify them under a single model identifier (both model ID and ARN). While this enhances flexibility in model usage, it doesn’t support attaching custom tags for tracking, managing, and controlling costs across workloads and tenants.

To bridge this gap, Amazon Bedrock now introduces application inference profiles, a new capability that allows organizations to apply custom cost allocation tags to track, manage, and control their Amazon Bedrock on-demand model costs and usage. This capability enables organizations to create custom inference profiles for Bedrock base foundation models, adding metadata specific to tenants, thereby streamlining resource allocation and cost monitoring across varied AI applications.

Creating application inference profiles

Application inference profiles allow users to define customized settings for inference requests and resource management. These profiles can be created in two ways:

Single model ARN configuration: Directly create an application inference profile using a single on-demand base model ARN, allowing quick setup with a chosen model.
Copy from system-defined inference profile: Copy an existing system-defined inference profile to create an application inference profile, which will inherit configurations such as cross-Region inference capabilities for enhanced scalability and resilience.

The application inference profile ARN has the following format, where the inference profile ID component is a unique 12-digit alphanumeric string generated by Amazon Bedrock upon profile creation.

arn:aws:bedrock:<region>:<account_id>:application-inference-profile/<inference_profile_id>

System-defined compared to application inference profiles

The primary distinction between system-defined and application inference profiles lies in their type attribute and resource specifications within the ARN namespace:

System-defined inference profiles: These have a type attribute of SYSTEM_DEFINED and utilize the inference-profile resource type. They’re designed to support cross-Region and multi-model capabilities but are managed centrally by AWS.

{
 …
"inferenceProfileArn": "arn:aws:bedrock:us-east-1:<Account ID>:inference-profile/us-1.anthropic.claude-3-sonnet-20240229-v1:0",
"inferenceProfileId": "us-1.anthropic.claude-3-sonnet-20240229-v1:0",
"inferenceProfileName": "US-1 Anthropic Claude 3 Sonnet",
"status": "ACTIVE",
"type": "SYSTEM_DEFINED",
…
}

Application inference profiles: These profiles have a type attribute of APPLICATION and use the application-inference-profile resource type. They’re user-defined, providing granular control and flexibility over model configurations and allowing organizations to tailor policies with attribute-based access control (ABAC) using AWS Identity and Access Management (IAM). This enables more precise IAM policy authoring to manage Amazon Bedrock access more securely and efficiently.
```
{
…
"inferenceProfileArn": "arn:aws:bedrock:us-east-1:<Account ID>:application-inference-profile/<Auto generated ID>",
"inferenceProfileId": <Auto generated ID>,
"inferenceProfileName": <User defined name>,
"status": "ACTIVE",
"type": "APPLICATION"
…
}
```

These differences are important when integrating with Amazon API Gateway or other API clients to help ensure correct model invocation, resource allocation, and workload prioritization. Organizations can apply customized policies based on profile type, enhancing control and security for distributed AI workloads. Both models are shown in the following figure.

Establishing application inference profiles for cost management

Imagine an insurance provider embarking on a journey to enhance customer experience through generative AI. The company identifies opportunities to automate claims processing, provide personalized policy recommendations, and improve risk assessment for clients across various regions. However, to realize this vision, the organization must adopt a robust framework for effectively managing their generative AI workloads.

The journey begins with the insurance provider creating application inference profiles that are tailored to their diverse business units. By assigning AWS cost allocation tags, the organization can effectively monitor and track their Bedrock spend patterns. For example, the claims processing team established an application inference profile with tags such as dept:claims, team:automation, and app:claims_chatbot. This tagging structure categorizes costs and allows assessment of usage against budgets.

Users can manage and use application inference profiles using Bedrock APIs or the boto3 SDK:

CreateInferenceProfile: Initiates a new inference profile, allowing users to configure the parameters for the profile.
GetInferenceProfile: Retrieves the details of a specific inference profile, including its configuration and current status.
ListInferenceProfiles: Lists all available inference profiles within the user’s account, providing an overview of the profiles that have been created.
TagResource: Allows users to attach tags to specific Bedrock resources, including application inference profiles, for better organization and cost tracking.
ListTagsForResource: Fetches the tags associated with a specific Bedrock resource, helping users understand how their resources are categorized.
UntagResource: Removes specified tags from a resource, allowing for management of resource organization.
Invoke models with application inference profiles:

- Converse API: Invokes the model using a specified inference profile for conversational interactions.
- ConverseStream API: Similar to the Converse API but supports streaming responses for real-time interactions.
- InvokeModel API: Invokes the model with a specified inference profile for general use cases.
- InvokeModelWithResponseStream API: Invokes the model and streams the response, useful for handling large data outputs or long-running processes.

Note that application inference profile APIs cannot be accessed through the AWS Management Console.

Invoke model with application inference profile using Converse API

The following example demonstrates how to create an application inference profile and then invoke the Converse API to engage in a conversation using that profile –

def create_inference_profile(profile_name, model_arn, tags):
    """Create Inference Profile using base model ARN"""
    response = bedrock.create_inference_profile(
        inferenceProfileName=profile_name,
        description="test",
        modelSource={'copyFrom': model_arn},
        tags=tags
    )
    print("CreateInferenceProfile Response:", response['ResponseMetadata']['HTTPStatusCode']),
    print(f"{response}n")
    return response

# Create Inference Profile
print("Testing CreateInferenceProfile...")
tags = [{'key': 'dept', 'value': 'claims'}]
base_model_arn = "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"
claims_dept_claude_3_sonnet_profile = create_inference_profile("claims_dept_claude_3_sonnet_profile", base_model_arn, tags)

# Extracting the ARN and retrieving Application Inference Profile ID
claims_dept_claude_3_sonnet_profile_arn = claims_dept_claude_3_sonnet_profile['inferenceProfileArn']

def converse(model_id, messages):
    """Use the Converse API to engage in a conversation with the specified model"""
    response = bedrock_runtime.converse(
        modelId=model_id,
        messages=messages,
        inferenceConfig={
            'maxTokens': 300,  # Specify max tokens if needed
        }
    )
    
    status_code = response.get('ResponseMetadata', {}).get('HTTPStatusCode')
    print("Converse Response:", status_code)
    parsed_response = parse_converse_response(response)
    print(parsed_response)
    return response

# Example of Converse API with Application Inference Profile
print("nTesting Converse...")
prompt = "nnHuman: Tell me about Amazon Bedrock.nnAssistant:"
messages = [{"role": "user", "content": [{"text": prompt}]}]
response = converse(claims_dept_claude_3_sonnet_profile_arn, messages)

Tagging, resource management, and cost management with application inference profiles

Tagging within application inference profiles allows organizations to allocate costs with specific generative AI initiatives, ensuring precise expense tracking. Application inference profiles enable organizations to apply cost allocation tags at creation and support additional tagging through the existing TagResource and UnTagResource APIs, which allow metadata association with various AWS resources. Custom tags such as project_id, cost_center, model_version, and environment help categorize resources, improving cost transparency and allowing teams to monitor spend and usage against budgets.

Visualize cost and usage with application inference profiles and cost allocation tags

Leveraging cost allocation tags with tools like AWS Budgets, AWS Cost Anomaly Detection, AWS Cost Explorer, AWS Cost and Usage Reports (CUR), and Amazon CloudWatch provides organizations insights into spending trends, helping detect and address cost spikes early to stay within budget.

With AWS Budgets, organization can set tag-based thresholds and receive alerts as spending approach budget limits, offering a proactive approach to maintaining control over AI resource costs and quickly addressing any unexpected surges. For example, a $10,000 per month budget could be applied on a specific chatbot application for the Support Team in the Sales Department by applying the following tags to the application inference profile: dept:sales, team:support, and app:chat_app. AWS Cost Anomaly Detection can also monitor tagged resources for unusual spending patterns, making it easier to operationalize cost allocation tags by automatically identifying and flagging irregular costs.

The following AWS Budgets console screenshot illustrates an exceeded budget threshold:

For deeper analysis, AWS Cost Explorer and CUR enable organizations to analyze tagged resources daily, weekly, and monthly, supporting informed decisions on resource allocation and cost optimization. By visualizing cost and usage based on metadata attributes, such as tag key/value and ARN, organizations gain an actionable, granular view of their spending.

The following AWS Cost Explorer console screenshot illustrates a cost and usage graph filtered by tag key and value:

The following AWS Cost Explorer console screenshot illustrates a cost and usage graph filtered by Bedrock application inference profile ARN:

Organizations can also use Amazon CloudWatch to monitor runtime metrics for Bedrock applications, providing additional insights into performance and cost management. Metrics can be graphed by application inference profile, and teams can set alarms based on thresholds for tagged resources. Notifications and automated responses triggered by these alarms enable real-time management of cost and resource usage, preventing budget overruns and maintaining financial stability for generate AI workloads.

The following Amazon CloudWatch console screenshot highlights Bedrock runtime metrics filtered by Bedrock application inference profile ARN:

The following Amazon CloudWatch console screenshot highlights an invocation limit alarm filtered by Bedrock application inference profile ARN:

Through the combined use of tagging, budgeting, anomaly detection, and detailed cost analysis, organizations can effectively manage their AI investments. By leveraging these AWS tools, teams can maintain a clear view of spending patterns, enabling more informed decision-making and maximizing the value of their generative AI initiatives while ensuring critical applications remain within budget.

Retrieving application inference profile ARN based on the tags for Model invocation

Organizations often use a generative AI gateway or large language model proxy when calling Amazon Bedrock APIs, including model inference calls. With the introduction of application inference profiles, organizations need to retrieve the inference profile ARN to invoke model inference for on-demand foundation models. There are two primary approaches to obtain the appropriate inference profile ARN.

Static configuration approach: This method involves maintaining a static configuration file in the AWS Systems Manager Parameter Store or AWS Secrets Manager that maps tenant/workload keys to their corresponding application inference profile ARNs. While this approach offers simplicity in implementation, it has significant limitations. As the number of inference profiles scales from tens to hundreds or even thousands, managing and updating this configuration file becomes increasingly cumbersome. The static nature of this method requires manual updates whenever changes occur, which can lead to inconsistencies and increased maintenance overhead, especially in large-scale deployments where organizations need to dynamically retrieve the correct inference profile based on tags.
Dynamic retrieval using the Resource Groups API: The second, more robust approach leverages the AWS Resource Groups GetResources API to dynamically retrieve application inference profile ARNs based on resource and tag filters. This method allows for flexible querying using various tag keys such as tenant ID, project ID, department ID, workload ID, model ID, and region. The primary advantage of this approach is its scalability and dynamic nature, enabling real-time retrieval of application inference profile ARNs based on current tag configurations.

However, there are considerations to keep in mind. The GetResources API has throttling limits, necessitating the implementation of a caching mechanism. Organizations should maintain a cache with a Time-To-Live (TTL) based on the API’s output to optimize performance and reduce API calls. Additionally, implementing thread safety is crucial to help ensure that organizations always read the most up-to-date inference profile ARNs when the cache is being refreshed based on the TTL.

As illustrated in the following diagram, this dynamic approach involves a client making a request to the Resource Groups service with specific resource type and tag filters. The service returns the corresponding application inference profile ARN, which is then cached for a set period. The client can then use this ARN to invoke the Bedrock model through the InvokeModel or Converse API.

By adopting this dynamic retrieval method, organizations can create a more flexible and scalable system for managing application inference profiles, allowing for more straightforward adaptation to changing requirements and growth in the number of profiles.

The architecture in the preceding figure illustrates two methods for dynamically retrieving inference profile ARNs based on tags. Let’s describe both approaches with their pros and cons:

Bedrock client maintaining the cache with TTL: This method involves the client directly querying the AWS ResourceGroups service using the GetResources API based on resource type and tag filters. The client caches the retrieved keys in a client-maintained cache with a TTL. The client is responsible for refreshing the cache by calling the GetResources API in the thread safe way.
Lambda-based Method: This approach uses AWS Lambda as an intermediary between the calling client and the ResourceGroups API. This method employs Lambda Extensions core with an in-memory cache, potentially reducing the number of API calls to ResourceGroups. It also interacts with Parameter Store, which can be used for configuration management or storing cached data persistently.

Both methods use similar filtering criteria (resource-type-filter and tag-filters) to query the ResourceGroup API, allowing for precise retrieval of inference profile ARNs based on attributes such as tenant, model, and Region. The choice between these methods depends on factors such as the expected request volume, desired latency, cost considerations, and the need for additional processing or security measures. The Lambda-based approach offers more flexibility and optimization potential, while the direct API method is simpler to implement and maintain.

Overview of Amazon Bedrock resources tagging capabilities

The tagging capabilities of Amazon Bedrock have evolved significantly, providing a comprehensive framework for resource management across multi-account AWS Control Tower setups. This evolution enables organizations to manage resources across development, staging, and production environments, helping organizations track, manage, and allocate costs for their AI/ML workloads.

At its core, the Amazon Bedrock resource tagging system spans multiple operational components. Organizations can effectively tag their batch inference jobs, agents, custom model jobs, knowledge bases, prompts, and prompt flows. This foundational level of tagging supports granular control over operational resources, enabling precise tracking and management of different workload components. The model management aspect of Amazon Bedrock introduces another layer of tagging capabilities, encompassing both custom and base models, and distinguishes between provisioned and on-demand models, each with its own tagging requirements and capabilities.

With the introduction of application inference profiles, organizations can now manage and track their on-demand Bedrock base foundation models. Because teams can create application inference profiles derived from system-defined inference profiles, they can configure more precise resource tracking and cost allocation at the application level. This capability is particularly valuable for organizations that are running multiple AI applications across different environments, because it provides clear visibility into resource usage and costs at a granular level.

The following diagram visualizes the multi-account structure and demonstrates how these tagging capabilities can be implemented across different AWS accounts.

Conclusion

In this post we introduced the latest feature from Amazon Bedrock, application inference profiles. We explored how it operates and discussed key considerations. The code sample for this feature is available in this GitHub repository. This new capability enables organizations to tag, allocate, and track on-demand model inference workloads and spending across their operations. Organizations can label all Amazon Bedrock models using tags and monitoring usage according to their specific organizational taxonomy—such as tenants, workloads, cost centers, business units, teams, and applications. This feature is now generally available in all AWS Regions where Amazon Bedrock is offered.

About the authors

Kyle T. Blocksom is a Sr. Solutions Architect with AWS based in Southern California. Kyle’s passion is to bring people together and leverage technology to deliver solutions that customers love. Outside of work, he enjoys surfing, eating, wrestling with his dog, and spoiling his niece and nephew.

Dhawal Patel is a Principal Machine Learning Architect at AWS. He has worked with organizations ranging from large enterprises to mid-sized startups on problems related to distributed computing, and Artificial Intelligence. He focuses on Deep learning including NLP and Computer Vision domains. He helps customers achieve high performance model inference on SageMaker.

Advance environmental sustainability in clinical trials using AWS

November 1, 2024

by Sidharth Rampally Amazon AWS

Traditionally, clinical trials not only place a significant burden on patients and participants due to the costs associated with transportation, lodging, meals, and dependent care, but also have an environmental impact. With the advancement of available technologies, decentralized clinical trials have become a widely popular topic of discussion and offer a more sustainable approach. Decentralized clinical trials reduce the need to travel to study sites by lowering the financial burden on all parties involved, thereby accelerating patient recruitment and reducing dropout rates. Decentralized clinical trials use technologies such as wearable devices, patient apps, smartphones, and telemedicine to accelerate recruitment, reduce dropout, and minimize the carbon footprint of clinical research. AWS can play a key role in enabling fast implementation of these decentralized clinical trials.

In this post, we discuss how to use AWS to support a decentralized clinical trial across the four main pillars of a decentralized clinical trial (virtual trials, personalized patient engagement, patient-centric trial design, and centralized data management). By exploring these AWS powered alternatives, we aim to demonstrate how organizations can drive progress towards more environmentally friendly clinical research practices.

The challenge and impact of sustainability on clinical trials

With the rise of greenhouse gas emissions globally, finding ways to become more sustainable is quickly becoming a challenge across all industries. At the same time, global health awareness and investments in clinical research have increased as a result of motivations by major events like the COVID-19 pandemic. For instance, in 2021, we saw a significant increase in awareness of clinical research studies seeking volunteers, which was reported at 63% compared to 54% in 2019 by Applied Clinical Trials. This suggests that the COVID-19 pandemic brought increased attention to clinical trials among the public and magnified the importance of including diverse populations in clinical research.

These clinical research trials study new tests and treatments while evaluating their effects on human health outcomes. People often volunteer to take part in clinical trials to test medical interventions, including drugs, biological products, surgical procedures, radiological procedures, devices, behavioral treatments, and preventive care. The rise of clinical trials presents a major sustainability challenge—they are often not sustainable and can contribute substantially to greenhouse gas emissions due to how they are being implemented. The main sources of these are usually associated with the intensive energy use associated with research premises and air travel.

This post discusses an alternative to clinical trials—by decentralizing clinical trials, we can reduce the major greenhouse gas emissions caused by human activities present in clinical trials today.

The CRASH trial case study

We can further examine the impact of carbon emissions associated with clinical trials through the carbon audit of the CRASH trial case lead by medical research journal, BMJ. The CRASH trial was a clinical trial conducted from 1999–2004 and recruited patients from 49 countries in the span of 5 years. In the study, the effect of intravenous corticosteroids (a drug produced by Pfizer) on death within 14 days in 10,008 adults with clinically significant head injuries was examined. BMJ conducted an audit on the total emissions of greenhouse gases that were produced by the trials and calculated that roughly 126 metric tons (carbon dioxide equivalent) was emitted during a 1-year period. Over a 5-year period, it would mean that the entire trial would be responsible for about 630 metric tons of carbon dioxide equivalent.

Much of these greenhouse gas emissions can be attributed to travel (such as air travel, hotel, meetings), distribution associated for drugs and documents, and electricity used in coordination centers. According to the EPA, the average passenger vehicle emits about 4.6 metric tons of carbon dioxide per year. In comparison, 630 tons of carbon dioxide would be equivalent to the annual emissions of around 137 passenger vehicles. Similarly, the average US household generates about 20 metric tons of carbon dioxide per year from energy use. 630 tons of carbon dioxide would also be equal to the annual emissions of around 31 average US homes. 630 tons of carbon dioxide already represents a very substantial amount of greenhouse gas for one clinical trial. According to sources from government databases and research institutions, there are around 300,000–600,000 clinical trials conducted globally each year, amplifying this impact by several hundred thousand times.

Clinical trials vs. decentralized clinical trials

Decentralized clinical trials present opportunities to address the sustainability challenges associated with traditional clinical trial models. As a byproduct of decentralized trials, there are also improvements in the patient experience by reducing their burden, making the process more convenient and sustainable.

Today, clinical trials can contribute significantly to greenhouse gas emissions, primarily through energy use in research facilities and air travel. In contrast to the energy-intensive nature of centralized trial sites, the distributed nature of decentralized clinical trials offers a more practical and cost-effective approach to implementing renewable energy solutions.

For centralized clinical trials, many are conducted in energy-intensive healthcare facilities. Traditional trial sites, such as hospitals and dedicated research centers, can have high energy demands for equipment, lighting, and climate control. These facilities often rely on regional or national power grids for their energy needs. Integrating renewable energy solutions in these facilities can also be costly and challenging, because it can involve significant investments into new equipment, renewable energy projects, and more.

In decentralized clinical trials, the reduction in infrastructure and onsite resources will allow for a lower energy demand overall. This, in turn, will result in benefits such as simplified trial designs, reduced bureaucracy, and less human travel required for video conferencing. Furthermore, the additional appointments required for clinical trials might create additional time and financial burdens for participants. Decentralized clinical trials can reduce the burden on patients for in-person visits and increase patient retention and long-term follow-up.

Core pillars on how AWS can power sustainable decentralized clinical trials

AWS customers have developed proven solutions that power sustainable decentralized clinical trials. SourceFuse is an AWS partner that has developed a mobile app and web interface that enables patients to participate in decentralized clinical trials remotely from their homes, eliminating the environmental impact of travel and paper-based data collection. The platform’s cloud-centered architecture, built on AWS services, supports the scalable and sustainable operation of these remote clinical trials.

In this post, we provide sustainability-oriented guidance focused on four key areas: virtual trials, personalized patient engagement, patient-centric trial design, and centralized data management. The following figure showcases the AWS services that can help in these four areas.

Personalized remote patient engagement

The average dropout rate for clinical trials is 30%, so providing an omnichannel experience for subjects to interact with trial facilitators is imperative. Because decentralized clinical trials provide flexibility for patients to participate at home, the experience for patients to collect and report data should be seamless. One solution is to use voice applications to enable patient data reporting, using Amazon Alexa and Amazon Connect. For example, a patient can report symptoms to their Amazon Echo device, invoking an automated patient outreach scheduler using Amazon Connect.

Trial facilitators can also use Amazon Pinpoint to connect with customers through multiple channels. They can use Amazon Pinpoint to send medication reminders, automate surveys, or push other communications without the need for paper mail delivery.

Virtual trials

Decentralized clinical trials reduce emissions compared to regular clinical trials by eliminating the need for travel and physical infrastructure. Instead, a core component of decentralized clinical trials is a secure, scalable data infrastructure with strong data analytics capabilities. Amazon Redshift is a fully managed cloud data warehouse that trial scientists can use to perform analytics.

Clinical Research Organizations (CROs) and life sciences organizations can also use AWS for mobile device and wearable data capture. Patients, in the comfort of their own home, can collect data passively through wearables, activity trackers, and other smart devices. This data is streamed to AWS IoT Core, which can write data to Amazon Data Firehose in real time. This data can then be sent to services like Amazon Simple Storage Service (Amazon S3) and AWS Glue for data processing and insight extraction.

Patient-centric trial design

A key characteristic of decentralized clinical trials is patient-centric protocol design, which prioritizes the patients’ needs throughout the entire clinical trial process. This involves patient-reported outcomes and often implement flexible participation, which can complicate protocol development and necessitate more extensive regulatory documentation. This can add days or even weeks to the lifespan of a trial, leading to avoidable costs. Amazon SageMaker enables trial developers to build and train machine learning (ML) models that reduce the likelihood of protocol amendments and inconsistencies. Models can also be built to determine the appropriate sample size and recruitment timelines.

With SageMaker, you can optimize your ML environment for sustainability. Amazon SageMaker Debugger provides profiler capabilities to detect under-utilization of system resources, which helps right-size your environment and avoid unnecessary carbon emissions. Organizations can further reduce emissions by choosing deployment regions near renewable energy projects. Currently, there are 22 AWS data center regions where 100% of the electricity consumed is matched by renewable energy sources. Additionally, you can use Amazon Q, a generative AI-powered assistant, to surface and generate potential amendments to avoid expensive costs associated with protocol revisions.

Centralized data management

CROs and bio-pharmaceutical companies are striving to achieve end-to-end data linearity for all clinical trials within an organization. They want to see traceability across the board, while achieving data harmonization for regulatory clinical trial guardrails. The pipeline approach to data management in clinical trials has led to siloed, disconnected data across an organization, because separate storage is used for each trial. Decentralized clinical trials, however, often employ a singular data lake for all of an organization’s clinical trials.

With a centralized data lake, organizations can avoid the duplication of data across separate trial databases. This leads to savings in storage costs and computing resources, as well as a reduction in the environmental impact of maintaining multiple data silos. To build a data management platform, the process could begin with ingesting and normalizing clinical trial data using AWS HealthLake. HealthLake is designed to ingest data from various sources, such as electronic health records, medical imaging, and laboratory results, and automatically transform the data into the industry-standard FHIR format. This clinical voice application solution built entirely on AWS showcases the advantages of having a centralized location for clinical data, such as avoiding data drift and redundant storage.

With the normalized data now available in HealthLake, the next step would be to orchestrate the various data processing and analysis workflows using AWS Step Functions. You can use Step Functions to coordinate the integration of the HealthLake data into a centralized data lake, as well as invoke subsequent processing and analysis tasks. This could involve using serverless computing with AWS Lambda to perform event-driven data transformation, quality checks, and enrichment activities. By combining the power powerful data normalization capabilities of HealthLake and the orchestration features of Step Functions, the platform can provide a robust, scalable, and streamlined approach to managing decentralized clinical trial data within the organization.

Conclusion

In this post, we discussed the critical importance of sustainability in clinical trials. We provided an overview of the key distinctions between traditional centralized clinical trials and decentralized clinical trials. Importantly, we explored how AWS technologies can enable the development of more sustainable clinical trials, addressing the four main pillars that underpin a successful decentralized trial approach.

To learn more about how AWS can power sustainable clinical trials for your organization, reach out to your AWS Account representatives. For more information about optimizing your workloads for sustainability, see Optimizing Deep Learning Workloads for Sustainability on AWS.

References

[1] https://www.appliedclinicaltrialsonline.com/view/awareness-of-clinical-research-increases-among-underrepresented-groups

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839193/

[3] https://pubmed.ncbi.nlm.nih.gov/15474134/

[4] ClinicalTrials.gov and https://www.iqvia.com/insights/the-iqvia-institute/reports/the-global-use-of-medicines-2022

[5] https://aws.amazon.com/startups/learn/next-generation-data-management-for-clinical-trials-research-built-on-aws?lang=en-US#overview

[6] https://pubmed.ncbi.nlm.nih.gov/39148198/

About the Authors

Sid Rampally is a Customer Solutions Manager at AWS driving GenAI acceleration for Life Sciences customers. He writes about topics relevant to his customers, focusing on data engineering and machine learning. In his spare time, Sid enjoys walking his dog in Central Park and playing hockey.

Nina Chen is a Customer Solutions Manager at AWS specializing in leading software companies to leverage the power of the AWS cloud to accelerate their product innovation and growth. With over 4 years of experience working in the strategic Independent Software Vendor (ISV) vertical, Nina enjoys guiding ISV partners through their cloud transformation journeys, helping them optimize their cloud infrastructure, driving product innovation, and deliver exceptional customer experiences.

Use Amazon Q to find answers on Google Drive in an enterprise

November 1, 2024

by Glen Ireland Amazon AWS

Amazon Q Business is a generative AI-powered assistant designed to enhance enterprise operations. It’s a fully managed service that helps provide accurate answers to users’ questions while adhering to the security and access restrictions of the content. You can tailor Amazon Q Business to your specific business needs by connecting to your company’s information and enterprise systems using built-in connectors to a variety of enterprise data sources. It enables users in various roles, such as marketing managers, project managers, and sales representatives, to have tailored conversations, solve business problems, generate content, take action, and more, through a web interface. This service aims to help make employees work smarter, move faster, and drive significant impact by providing immediate and relevant information to help them with their tasks.

One such enterprise data repository you can use to store and manage content is Google Drive. Google Drive is a cloud-based storage service that provides a centralized location for storing digital assets, including documents, knowledge articles, and spreadsheets. This service helps your teams collaborate effectively by enabling the sharing and organization of important files across the enterprise. To use Google Drive within Amazon Q Business, you can configure the Amazon Q Business Google Drive connector. This connector allows Amazon Q Business to securely index files stored in Google Drive using access control lists (ACLs). These ACLs make sure that users only access the documents they’re permitted to view, allowing them to ask questions and retrieve information relevant to their work directly through Amazon Q Business.

This post covers the steps to configure the Amazon Q Business Google Drive connector, including authentication setup and verifying the secure indexing of your Google Drive content.

Index Google Drive documents using the Amazon Q Google Drive connector

The Amazon Q Google Drive connector can index Google Drive documents hosted in a Google Workspace account. The connector can’t index documents stored on Google Drive in a personal Google Gmail account. Amazon Q Business can authenticate with your Google Workspace using a service account or OAuth 2.0 authentication. A service account enables indexing files for user accounts across an enterprise in a Google Workspace. Using OAuth 2.0 authentication allows for crawling and indexing files in a single Google Workspace account. This post shows you how to configure Amazon Q Business to authenticate using a Google service account.

Google prescribes that in order to index multiple users’ documents, the crawler must support the capability to authenticate with a service account with domain-wide delegation. This allows the connector to index the documents of all users in your drive and shared drives. Amazon Q Business connectors only crawl the documents that the Amazon Q Business application administrator specifies need to be crawled. Administrators can specify the paths to crawl, specific file name patterns, or types. Amazon Q Business doesn’t use customer data to train any models. All customer data is indexed only in the customer account. Also, Amazon Q Business Connectors will only index content specified by the administrator. It won’t index any content on its own without explicitly being configured to do so by the administrator of Amazon Q Business.

You can configure the Amazon Q Google Drive connector to crawl and index file types supported by Amazon Q Business. Google Write documents are exported as Microsoft Word and Google Sheet documents are exported as Microsoft Excel during the crawling phase.

Metadata

Every document has structural attributes—or metadata—attached to it. Document attributes can include information such as document title, document author, time created, time updated, and document type.

When you connect Amazon Q Business to a data source, it automatically maps specific data source document attributes to fields within an Amazon Q Business index. If a document attribute in your data source doesn’t have an attribute mapping already available, or if you want to map additional document attributes to index fields, you can use the custom field mappings to specify how a data source attribute maps to an Amazon Q Business index field. You can create field mappings by editing your data source after your application and retriever are created.

There are four default metadata attributes indexed for each Google Drive document: authors, source URL, creation date, and last update date. You can also select additional reserved data field mappings.

Amazon Q Business crawls Google Drive ACLs defined in a Google Workspace for document security. Google Workspace users and groups are mapped to the _user_id and _group_ids fields associated with the Amazon Q Business application in AWS IAM Identity Center. These user and group associations are persisted in the user store associated with the Amazon Q Business index created for crawled Google Drive documents.

Overview of ACLs in Amazon Q Business

In the context of knowledge management and generative AI chatbot applications, an ACL plays a crucial role in managing who can access information and what actions they can perform within the system. They also facilitate knowledge sharing within specific groups or teams while restricting access to others.

In this solution, we deploy an Amazon Q web experience to demonstrate that two business users can only ask questions about documents they have access to according to the ACL. With the Amazon Q Business Google Drive connector, the Google Workspace ACL will be ingested with documents. This enables Amazon Q Business to control the scope of documents that each user can access in the Amazon Q web experience.

Authentication types

An Amazon Q Business application requires you to use IAM Identity Center to manage user access. Although it’s recommended to have an IAM Identity Center instance configured (with users federated and groups added) before you start, you can also choose to create and configure an IAM Identity Center instance for your Amazon Q Business application using the Amazon Q console.

You can also add users to your IAM Identity Center instance from the Amazon Q Business console, if you aren’t federating identity. When you add a new user, make sure that the user is enabled in your IAM Identity Center instance and that they have verified their email ID. They need to complete these steps before they can log in to your Amazon Q Business web experience.

Your identity source in IAM Identity Center defines where your users and groups are managed. After you configure your identity source, you can look up users or groups to grant them single sign-on access to AWS accounts, applications, or both.

You can have only one identity source per organization in AWS Organizations. You can choose one of the following as your identity source:

IAM Identity Center directory – When you enable IAM Identity Center for the first time, it’s automatically configured with an IAM Identity Center directory as your default identity source. This is where you create your users and groups, and assign their level of access to your AWS accounts and applications. For more details, see Manage identities in IAM Identity Center.
Active Directory – Choose this option if you want to continue managing users in either your AWS Managed Microsoft AD directory using AWS Directory Service or your self- managed directory in Active Directory (AD).
External identity provider – Choose this option if you want to manage users in other external identity providers (IdPs) through the SAML 2.0 standard, such as Okta.
IAM identity provider – Amazon Q Business applications can now federate with an enterprise’s IAM IdP. For more information, refer to Build private and secure enterprise generative AI applications with Amazon Q Business using IAM Federation.

Overview of solution

With Amazon Q Business, you can configure multiple data sources to provide a central place to search across your document repository. For our solution, we demonstrate how to index Google Drive data using the Amazon Q Business Google Drive connector. We complete the following steps:

Configure Google Workspace prerequisites.
Configure an Amazon Q Business application.
Connect Google Drive to Amazon Q Business.
Create users and index the data in the Google Drive.
Run a sample query to test the solution.

Configure Google Workspace prerequisites

For this solution, Amazon Q will connect to a Google Workspace and crawl Google Drive documents owned by business users in different groups using a service account. Complete the following steps to configure your Google Workspace:

Log in to the Google API console as an admin user.
Choose the dropdown menu next to the search box, then choose New Project.
Enter the project name, choose the Google organization, and choose Create.

The Google Drive and Admin SDK APIs need to be enabled for Amazon Q to crawl Google Drive files.

Search for each API on the Google Cloud console and choose Enable.
Search for Service Accounts to access the IAM & Admin navigation pane and choose Create Service Account.
Enter the service account name, service account ID, and description, and choose Done.
Choose the email of the service account created in the previous step.
On the Keys tab, choose Add Key, then choose Create New Key.
For Key type, select JSON, and choose Create to download and locally save a new private key.

Now we enable domain-wide delegation for the five required API scopes on the Domain-wide Delegation page.

Choose Add new.
Add the following comma delimited API scopes for client ID generated for the private key created in the previous step:
https://www.googleapis.com/auth/drive.readonly,
https://www.googleapis.com/auth/drive.metadata.readonly,
https://www.googleapis.com/auth/admin.directory.group.readonly,
https://www.googleapis.com/auth/admin.directory.user.readonly,
https://www.googleapis.com/auth/cloud-platform
Choose Authorize.

Now we create users and add them to groups.

Navigate to the Google Workspace Admin console and choose Users in the navigation pane.
Choose Add new user to create two new business users.
Choose Groups in the navigation pane.
Choose Create group to create two Google groups and add one business user to each group.
Upload files that Amazon Q supports into each business user’s Google Drive.

In this solution, we upload the Amazon 2020 annual report to the first business user’s Google Drive and upload the Amazon 2021 annual report and Amazon 2022 annual report to the second business user’s Google Drive.

The business user that uploaded the Amazon 2021 annual report can also share it with the other business user’s Google group.

Choose the options menu (three vertical dots) for the Google Drive file and choose Share.
Enter the name of the other Google group and choose Send.

Create an Amazon Q Business application with a Google Drive connector

An Amazon Q Business application needs to be created with a Google Drive connector to crawl and index Google Drive files. To create an Amazon Q application, complete the following steps:

On the Amazon Q console, choose Applications in the navigation pane.
Choose Create application.
For Application name, enter a name.
Leave application configuration settings as defaults.
Choose Create.
After the application is created, choose Data Sources.
Then choose Select retriever and Confirm to use a Native retriever and Enterprise provisioning.
After confirming retriever settings, choose Add data source, and then choose the plus sign next to Google Drive.
Under Name and description, enter a data source name and optional description.
Under Authentication, select Google service account and choose Create a new secret from the AWS Secrets Manager secret drop down to create an AWS Secrets Manager secret.
Enter a secret name, admin account email, client email, and the JSON key you downloaded earlier, then choose Save.
Under IAM role, choose Create a new service role.
Under Additional Configuration, choose User email, and add the two recently created Google Workspace business user email addresses.
Under Sync run schedule, for Frequency, choose Run on demand.
Choose Add data source.

Create and manage users

To create an Amazon Q web experience accessible by Google Workspace users, you need to create corresponding users in IAM Identity Center. Amazon Q applications are only accessible by IAM Identity Center users with user identities that own indexed documents. To create the IAM Identity Center users, complete the following steps:

On the IAM Identity Center console, choose Users in the navigation pane.
Choose Add user.
Create IAM Identity Center users that mirror your Google Workspace users by entering the required user information.
Accept the IAM Identity Center invitation sent through email to each new business user and set each business user’s IAM Identity Center password.
On the Amazon Q Business console, navigate to the application with the Google Drive data source.
Choose Manage user access.
Choose Add groups and users, select Assign existing users and groups, and choose Next.
Assign users to the Amazon Q application, choose Assign, and choose Confirm if each business user is subscribed to Q Business Pro.

After you add IAM Identity Center users to your Amazon Q application, its web experience URL will appear in the Q Business applications list. You can use the URL to connect to the Amazon Q web experience with either of your Google business users. By default, each user can only ask questions about documents in their Google Drive.

Run sample queries in Amazon Q

To test the Amazon Q application with the Amazon annual reports you uploaded to Google Drive, complete the following steps:

On the Amazon Q Business console, navigate to the data source you created.
Run an on-demand sync of the data source by choosing Sync now.
Navigate to the web experience URL in a new private browser window and log in as the first business user.
Ask Amazon Q a question, such as how many employees work at Amazon.

The source documents should be the Amazon 2020 and 2021 annual reports, assuming the first business user uploaded the Amazon 2020 annual report and the second business user shared the Amazon 2021 annual report with the first business user.

Navigate to the web experience URL in a new private browser window and log in as the second business user.
Ask Amazon Q the same question (how many employees work at Amazon).

The source documents should be the Amazon 2021 and 2022 annual reports.

Troubleshooting

In this section, we share some common issues and troubleshooting tips.

IAM Identity Center login error

You might receive an error on the IAM Identity Center login page that says “We couldn’t verify your sign-in credentials.”

To troubleshoot, complete the following steps:

Confirm that the business users that mirror the Google Workspace users were created in IAM Identity Center.
If the users exist, navigate to the user in IAM Identity Center and choose Reset password, then select Generate a one-time password and share the password with the user.

A password will be provided for login and the user will be asked to change their password after a successful login.

Google Drive data source crawling or indexing failure

If the Google Drive data source crawling or indexing fails, complete the following steps:

Confirm the business users provisioned in the Google Workspace are members of the Google groups.
Inspect the Amazon CloudWatch logs for the last time the Google Drive data source was crawled for users with Google Drive files in the Google Workspace.
If the crawler didn’t successfully log the indexing of an expected user’s files, check the IAM Identity Center users, then compare the attributes in the Secrets Manager secret to the corresponding Google Workspace attributes, including client ID, service account email, and service account private key.
Use the Amazon Q Business document-level sync reports to confirm the intended Google Drive documents were indexed by Amazon Q.

Google Drive data source crawling and indexing job doesn’t crawl and index documents

If the Google Drive data source crawling and indexing job doesn’t crawl and index any documents, complete the following steps:

Confirm the business users provisioned in the Google Workspace are members of the Google groups.
Confirm there are IAM Identity Center users that mirror the Google Workspace users.
Confirm both IAM Identity Center users subscribe to Q Business Pro.
Confirm the Google Workspace admin user has enabled the Google Drive API.

Amazon Q web experience doesn’t return expected answers from the expected source

If the Amazon Q web experience doesn’t return expected answers from the expected source, complete the following steps:

Upload the expected source document into an Amazon Q Business chat session by choosing the paperclip icon in the Amazon Q chat interface and then choosing the file.

After you upload the document into the session, if the expected answers are generated from the expected document, the document wasn’t successfully indexed from the Google Drive data source.

If Amazon Q doesn’t return the expected answer for the uploaded document, modify the prompt used to ask the question.

Clean up

To prevent incurring additional costs, it’s essential to clean up and remove any resources created during the implementation of this solution. Specifically, you should delete the Amazon Q application, which will consequently remove the associated index and data connectors. However, any Secrets Manager secrets created during the Amazon Q application setup process need to be removed separately. Failing to clean up these resources may result in ongoing charges, so it’s crucial to take the necessary steps to completely remove all components related to this solution.

Complete the following steps to delete the Amazon Q application, secret, and IAM Identity Center users in your AWS account:

On the Amazon Q Business console, choose Applications in the navigation pane.
Select the application that you created and on the Actions menu, choose Delete and confirm the deletion.
On the Secrets Manager console, choose Secrets in the navigation pane.
Select the secret that was created for the Google Drive connector and on the Actions menu, choose Delete.
Specify the waiting period as 7 days and choose Schedule deletion.
On the IAM Identity Center console, choose Users in the navigation pane.
Select the two users that you created and choose Delete users to remove these users.

Additionally, you should remove the business users added to your Google Workspace during the implementation of this solution because Google Workspaces costs are billed on a per-user basis.

Conclusion

In this post, you created an Amazon Q application that indexed Google Drive documents using the Google Drive connector. You were able to connect to the Amazon Q conversational interface as each of your business users and ask questions about the documents each user could access in accordance with the ACL.

You can continue to experiment by adding more PDF documents to your business users’ Google Drives and re-syncing your Amazon Q Google Drive data source.

Amazon Q Business offers other connectors, such as for Confluence Cloud. To learn more about the Amazon Q Business Confluence Cloud connector, refer to Connecting Confluence (Cloud) to Amazon Q Business.

About the Authors

Glen Ireland is a Senior Enterprise Account Engineer at AWS in the Worldwide Public Sector. Glen’s areas of focus include empowering customers interested in building generative AI solutions using Amazon Q.

Julia Hu is a Specialist Solutions Architect who helps AWS customers and partners build generative AI solutions using Amazon Q Business on AWS. Julia has over 4 years of experience developing solutions for customers adopting AWS services on the forefront of cloud technology.

How Druva used Amazon Bedrock to address foundation model complexity when building Dru, Druva’s backup AI copilot

November 1, 2024

by David Gildea Amazon AWS

This post is co-written with David Gildea and Tom Nijs from Druva.

Druva enables cyber, data, and operational resilience for thousands of enterprises, and is trusted by 60 of the Fortune 500. Customers use Druva Data Resiliency Cloud to simplify data protection, streamline data governance, and gain data visibility and insights. Independent software vendors (ISVs) like Druva are integrating AI assistants into their user applications to make software more accessible.

Dru, the Druva backup AI copilot, enables real-time interaction and personalized responses, with users engaging in a natural conversation with the software. From finding inconsistencies and errors across the environment to scheduling backup jobs and setting retention policies, users need only ask and Dru responds. Dru can also recommend actions to improve the environment, remedy backup failures, and identify opportunities to enhance security.

In this post, we show how Druva approached natural language querying (NLQ)—asking questions in English and getting tabular data as answers—using Amazon Bedrock, the challenges they faced, sample prompts, and key learnings.

Use case overview

The following screenshot illustrates the Dru conversation interface.

In a single conversation interface, Dru provides the following:

Interactive reporting with real-time insights – Users can request data or customized reports without extensive searching or navigating through multiple screens. Dru also suggests follow-up questions to enhance user experience.
Intelligent responses and a direct conduit to Druva’s documentation – Users can gain in-depth knowledge about product features and functionalities without manual searches or watching training videos. Dru also suggests resources for further learning.
Assisted troubleshooting – Users can request summaries of top failure reasons and receive suggested corrective measures. Dru on the backend decodes log data, deciphers error codes, and invokes API calls to troubleshoot.
Simplified admin operations, with increased seamlessness and accessibility – Users can perform tasks like creating a new backup policy or triggering a backup, managed by Druva’s existing role-based access control (RBAC) mechanism.
Customized website navigation through conversational commands – Users can instruct Dru to navigate to specific website locations, eliminating the need for manual menu exploration. Dru also suggests follow-up actions to speed up task completion.

Challenges and key learnings

In this section, we discuss the challenges and key learnings of Druva’s journey.

Overall orchestration

Originally, we adopted an AI agent approach and relied on the foundation model (FM) to make plans and invoke tools using the reasoning and acting (ReAct) method to answer user questions. However, we found the objective too broad and complicated for the AI agent. The AI agent would take more than 60 seconds to plan and respond to a user question. Sometimes it would even get stuck in a thought-loop, and the overall success rate wasn’t satisfactory.

We decided to move to the prompt chaining approach using a directed acyclic graph (DAG). This approach allowed us to break the problem down into multiple steps:

Identify the API route.
Generate and invoke private API calls.
Generate and run data transformation Python code.

Each step became an independent stream, so our engineers could iteratively develop and evaluate the performance and speed until they worked well in isolation. The workflow also became more controllable by defining proper error paths.

Stream 1: Identify the API route

Out of the hundreds of APIs that power Druva products, we needed to match the exact API the application needs to call to answer the user question. For example, “Show me my backup failures for the past 72 hours, grouped by server.” Having similar names and synonyms in API routes make this retrieval problem more complex.

Originally, we formulated this task as a retrieval problem. We tried different methods, including k-nearest neighbor (k-NN) search of vector embeddings, BM25 with synonyms, and a hybrid of both across fields including API routes, descriptions, and hypothetical questions. We found that the simplest and most accurate way was to formulate it as a classification task to the FM. We curated a small list of examples in question-API route pairs, which helped improve the accuracy and make the output format more consistent.

Stream 2: Generate and invoke private API calls

Next, we API call with the correct parameters and invoke it. FM hallucination of parameters, particularly those with free-form JSON object, is one of the major challenges in the whole workflow. For example, the unsupported key server can appear in the generated parameter:

"filter": {
    "and": [
        {
            "gte": {
                "key": "dt",
                "value": 1704067200
            }
        },
        {
            "eq": {
                "key": "server",
                "value": "xyz"
            }
        }
    ]
}

We tried different prompting techniques, such as few-shot prompting and chain of thought (CoT), but the success rate was still unsatisfactory. To make API call generation and invocation more robust, we separated this task into two steps:

First, we used an FM to generate parameters in a JSON dictionary instead of a full API request headers and body.
Afterwards, we wrote a postprocessing function to remove parameters that didn’t conform to the API schema.

This method provided a successful API invocation, at the expense of getting more data than required for downstream processing.

Stream 3: Generate and run data transformation Python code

Next, we took the response from the API call and transformed it to answer the user question. For example, “Create a pandas dataframe and group it by server column.” Similar to stream 2, FM hallucination is again an obstacle. Generated code can contain syntax errors, such as confusing PySpark functions with Pandas functions.

After trying many different prompting techniques without success, we looked at the reflection pattern, asking the FM to self-correct code in a loop. This improved the success rate at the expense of more FM invocations, which were slower and more expensive. We found that although smaller models are faster and more cost-effective, at times they had inconsistent results. Anthropic’s Claude 2.1 on Amazon Bedrock gave more accurate results on the second try.

Model choices

Druva selected Amazon Bedrock for several compelling reasons, with security and latency being the most important. A key factor in this decision was the seamless integration with Druva’s services. Using Amazon Bedrock aligned naturally with Druva’s existing environment on AWS, maintaining a secure and efficient extension of their capabilities.

Additionally, one of our primary challenges in developing Dru involved selecting the optimal FMs for specific tasks. Amazon Bedrock effectively addresses this challenge with its extensive array of available FMs, each offering unique capabilities. This variety enabled Druva to conduct the rapid and comprehensive testing of various FMs and their parameters, facilitating the selection of the most suitable one. The process was streamlined because Druva didn’t need to delve into the complexities of running or managing these diverse FMs, thanks to the robust infrastructure provided by Amazon Bedrock.

Through the experiments, we found that different models performed better in specific tasks. For example, Meta Llama 2 performed better with code generation task; Anthropic Claude Instance was good in efficient and cost-effective conversation; whereas Anthropic Claude 2.1 was good in getting desired responses in retry flows.

These were the latest models from Anthropic and Meta at the time of this writing.

Solution overview

The following diagram shows how the three streams work together as a single workflow to answer user questions with tabular data.

The following are the steps of the workflow:

The authenticated user submits a question to Dru, for example, “Show me my backup job failures for the last 72 hours,” as an API call.
The request arrives at the microservice on our existing Amazon Elastic Container Service (Amazon ECS) cluster. This process consists of the following steps:
1. A classification task using the FM provides the available API routes in the prompt and asks for the one that best matches with user question.
2. An API parameters generation task using the FM gets the corresponding API swagger, then asks the FM to suggest key-value pairs to the API call that can retrieve data to answer the question.
3. A custom Python function verifies, formats, and invokes the API call, then passes the data in JSON format to the next step.
4. A Python code generation task using the FM samples a few records of data from the previous step, then asks the FM to write Python code to transform the data to answer the question.
5. A custom Python function runs the Python code and returns the answer in tabular format.

To maintain user and system security, we make sure in our design that:

The FM can’t directly connect to any Druva backend services.
The FM resides in a separate AWS account and virtual private cloud (VPC) from the backend services.
The FM can’t initiate actions independently.
The FM can only respond to questions sent from Druva’s API.
Normal customer permissions apply to the API calls made by Dru.
The call to the API (Step 1) is only possible for authenticated user. The authentication component lives outside the Dru solution and is used across other internal solutions.
To avoid prompt injection, jailbreaking, and other malicious activities, a separate module checks for these before the request reaches this service (Amazon API Gateway in Step 1).

For more details, refer to Druva’s Secret Sauce: Meet the Technology Behind Dru’s GenAI Magic.

Implementation details

In this section, we discuss Steps 2a–2e in the solution workflow.

2a. Look up the API definition

This step uses an FM to perform classification. It takes the user question and a full list of available API routes with meaningful names and descriptions as the input, and responds The following is a sample prompt:

Please read the following API routes carefully as I’ll ask you a question about them:
<api_routes>{api_routes}</api_routes>
Which API route can best answer “{question}”?

2b. Generate the API call

This step uses an FM to generate API parameters. It first looks up the corresponding swagger for the API route (from Step 2a). Next, it passes the swagger and the user question to an FM and responds with some key-value pairs to the API route that can retrieve relevant data. The following is a sample prompt:

Please read the following swagger carefully as I’ll ask you a question about it:
<swagger>{swagger}</swagger>
Produce a key-value JSON dict of the available request parameters based on “{question}” with reference to the swagger.

2c. Validate and invoke the API call

In the previous step, even with an attempt to ground responses with swagger, the FM can still hallucinate wrong or nonexistent API parameters. This step uses a programmatic way to verify, format, and invoke the API call to get data. The following is the pseudo code:

for each input parameter (key/value)
  if parameter key not in swagger then
    drop parameter
  else if parameter value data type not match swagger then
    drop parameter
  else
    URL encode parameter
  end if
end for

2d. Generate Python code to transform data

This step uses an FM to generate Python code. It first samples a few records of input data to reduce input tokens. Then it passes the sample data and the user question to an FM and responds with a Python script that transforms data to answer the question. The following is a sample prompt:

Please read the following sample data carefully as I’ll ask you a question about them:
<sample_data>{5_rows_of_data_in_json}</sample_data>
Write a Python script using pandas to transform the data to answer the question “{question}”.

2e. Run the Python code

This step involves a Python script, which imports the generated Python package, runs the transformation, and returns the tabular data as the final response. If an error occurs, it will invoke the FM to try to correct the code. When everything fails, it returns the input data. The following is the pseudo code:

for maximum number of retries
  run data transformation function
  if error then
    invoke foundation model to correct code
  end if
end for
if success then
  return transformed data
else
  return input data
end if

Conclusion

Using Amazon Bedrock for the solution foundation led to remarkable achievements in accuracy, as evidenced by the following metrics in our evaluations using an internal dataset:

Stream 1: Identify the API route – Achieved a perfect accuracy rate of 100%
Stream 2: Generate and invoke private API calls – Maintained this standard with a 100% accuracy rate
Stream 3: Generate and run data transformation Python code – Attained a highly commendable accuracy of 90%

These results are not just numbers; they are a testament to the robustness and efficiency of the Amazon Bedrock based solution. With such high levels of accuracy, Druva is now poised to confidently broaden their horizons. Our next goal is to extend this solution to encompass a wider range of APIs across Druva products. The next expansion will be scaling up usage and substantially enrich the experience of Druva customers. By integrating more APIs, Druva will offer a more seamless, responsive, and contextual interaction with Druva products, further enhancing the value delivered to Druva users.

To learn more about Druva’s AI solutions, visit the Dru solution page, where you can see some of these capabilities in action through recorded demos. Visit the AWS Machine Learning blog to see how other customers are using Amazon Bedrock to solve their business problems.

About the Authors

David Gildea is the VP of Product for Generative AI at Druva. With over 20 years of experience in cloud automation and emerging technologies, David has led transformative projects in data management and cloud infrastructure. As the founder and former CEO of CloudRanger, he pioneered innovative solutions to optimize cloud operations, later leading to its acquisition by Druva. Currently, David leads the Labs team in the Office of the CTO, spearheading R&D into generative AI initiatives across the organization, including projects like Dru Copilot, Dru Investigate, and Amazon Q. His expertise spans technical research, commercial planning, and product development, making him a prominent figure in the field of cloud technology and generative AI.

Tom Nijs is an experienced backend and AI engineer at Druva, passionate about both learning and sharing knowledge. With a focus on optimizing systems and using AI, he’s dedicated to helping teams and developers bring innovative solutions to life.

Corvus Lee is a Senior GenAI Labs Solutions Architect at AWS. He is passionate about designing and developing prototypes that use generative AI to solve customer problems. He also keeps up with the latest developments in generative AI and retrieval techniques by applying them to real-world scenarios.

Fahad Ahmed is a Senior Solutions Architect at AWS and assists financial services customers. He has over 17 years of experience building and designing software applications. He recently found a new passion of making AI services accessible to the masses.

Create a generative AI–powered custom Google Chat application using Amazon Bedrock

October 31, 2024

by Nizar Kheir Amazon AWS

AWS offers powerful generative AI services, including Amazon Bedrock, which allows organizations to create tailored use cases such as AI chat-based assistants that give answers based on knowledge contained in the customers’ documents, and much more. Many businesses want to integrate these cutting-edge AI capabilities with their existing collaboration tools, such as Google Chat, to enhance productivity and decision-making processes.

This post shows how you can implement an AI-powered business assistant, such as a custom Google Chat app, using the power of Amazon Bedrock. The solution integrates large language models (LLMs) with your organization’s data and provides an intelligent chat assistant that understands conversation context and provides relevant, interactive responses directly within the Google Chat interface.

This solution showcases how to bridge the gap between Google Workspace and AWS services, offering a practical approach to enhancing employee efficiency through conversational AI. By implementing this architectural pattern, organizations that use Google Workspace can empower their workforce to access groundbreaking AI solutions powered by Amazon Web Services (AWS) and make informed decisions without leaving their collaboration tool.

With this solution, you can interact directly with the chat assistant powered by AWS from your Google Chat environment, as shown in the following example.

Solution overview

We use the following key services to build this intelligent chat assistant:

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI
AWS Lambda, a serverless computing service, lets you handle the application logic, processing requests, and interaction with Amazon Bedrock
Amazon DynamoDB lets you store session memory data to maintain context across conversations
Amazon API Gateway lets you create a secure API endpoint for the custom Google Chat app to communicate with our AWS based solution.

The following figure illustrates the high-level design of the solution.

The workflow includes the following steps:

The process begins when a user sends a message through Google Chat, either in a direct message or in a chat space where the application is installed.
The custom Google Chat app, configured for HTTP integration, sends an HTTP request to an API Gateway endpoint. This request contains the user’s message and relevant metadata.
Before processing the request, a Lambda authorizer function associated with the API Gateway authenticates the incoming message. This verifies that only legitimate requests from the custom Google Chat app are processed.
After it’s authenticated, the request is forwarded to another Lambda function that contains our core application logic. This function is responsible for interpreting the user’s request and formulating an appropriate response.
The Lambda function interacts with Amazon Bedrock through its runtime APIs, using either the RetrieveAndGenerate API that connects to a knowledge base, or the Converse API to chat directly with an LLM available on Amazon Bedrock. This also allows the Lambda function to search through the organization’s knowledge base and generate an intelligent, context-aware response using the power of LLMs. The Lambda function also uses a DynamoDB table to keep track of the conversation history, either directly with a user or within a Google Chat space.
After receiving the generated response from Amazon Bedrock, the Lambda function sends this answer back through API Gateway to the Google Chat app.
Finally, the AI-generated response appears in the user’s Google Chat interface, providing the answer to their question.

This architecture allows for a seamless integration between Google Workspace and AWS services, creating an AI-driven assistant that enhances information accessibility within the familiar Google Chat environment. You can customize this architecture to connect other solutions that you develop in AWS to Google Chat.

In the following sections, we explain how to deploy this architecture.

Prerequisites

To implement the solution outlined in this post, you must have the following:

A Linux or MacOS development environment with at least 20 GB of free disk space. It can be a local machine or a cloud instance. If you use an AWS Cloud9 instance, make sure you have increased the disk size to 20 GB.
The AWS Command Line Interface (AWS CLI) installed on your development environment. This tool allows you to interact with AWS services through command line commands.
An AWS account and an AWS Identity and Access Management (IAM) principal with sufficient permissions to create and manage the resources needed for this application. If you don’t have an AWS account, refer to How do I create and activate a new Amazon Web Services account? To configure the AWS CLI with the associated credentials, typically, you set up an AWS access key ID and secret access key for a designated IAM user with appropriate permissions.
Request access to Amazon Bedrock FMs. In this post, we use either Anthropic’s Claude Sonnet 3 or Amazon Titan Text G1 Premier available in Amazon Bedrock, but you can also choose other models that are supported for Amazon Bedrock knowledge bases.
Optionally, an Amazon Bedrock knowledge base created in your account, which allows you to integrate your own documents into your generative AI applications. If you don’t have an existing knowledge base, refer to Create an Amazon Bedrock knowledge base. Alternatively, the solution proposes an option without a knowledge base, with answers generated only by the FM on the backend.
A Business or Enterprise Google Workspace account with access to Google Chat. You also need a Google Cloud project with billing enabled. To check that an existing project has billing enabled, see Verify the billing status of your projects.
Docker installed on your development environment.

Deploy the solution

The application presented in this post is available in the accompanying GitHub repository and provided as an AWS Cloud Development Kit (AWS CDK) project. Complete the following steps to deploy the AWS CDK project in your AWS account:

Clone the GitHub repository on your local machine.
Install the Python package dependencies that are needed to build and deploy the project. This project is set up like a standard Python project. We recommend that you create a virtual environment within this project, stored under the .venv. To manually create a virtual environment on MacOS and Linux, use the following command:
```
python3 -m venv .venv
```

After the initialization process is complete and the virtual environment is created, you can use the following command to activate your virtual environment:
```
source .venv/bin/activate
```

Install the Python package dependencies that are needed to build and deploy the project. In the root directory, run the following command:
```
pip install -r requirements.txt
```

Run the cdk bootstrap command to prepare an AWS environment for deploying the AWS CDK application.
Run the script init-script.bash:

chmod u+x init-script.bash
./init-script.bash

This script prompts you for the following:

The Amazon Bedrock knowledge base ID to associate with your Google Chat app (refer to the prerequisites section). Keep this blank if you decide not to use an existing knowledge base.
Which LLM you want to use in Amazon Bedrock for text generation. For this solution, you can choose between Anthropic’s Claude Sonnet 3 or Amazon Titan Text G1 – Premier

The following screenshot shows the input variables to the init-script.bash script.

The script deploys the AWS CDK project in your account. After it runs successfully, it outputs the parameter ApiEndpoint, whose value designates the invoke URL for the HTTP API endpoint deployed as part of this project. Note the value of this parameter because you use it later in the Google Chat app configuration.
The following screenshot shows the output of the init-script.bash script.

You can also find this parameter on the AWS CloudFormation console, on the stack’s Outputs tab.

Register a new app in Google Chat

To integrate the AWS powered chat assistant into Google Chat, you create a custom Google Chat app. Google Chat apps are extensions that bring external services and resources directly into the Google Chat environment. These apps can participate in direct messages, group conversations, or dedicated chat spaces, allowing users to access information and take actions without leaving their chat interface.

For our AI-powered business assistant, we create an interactive custom Google Chat app that uses the HTTP integration method. This approach allows our app to receive and respond to user messages in real time, providing a seamless conversational experience.

After you have deployed the AWS CDK stack in the previous section, complete the following steps to register a Google Chat app in the Google Cloud portal:

Open the Google Cloud portal and log in with your Google account.
Search for “Google Chat API” and navigate to the Google Chat API page, which lets you build Google Chat apps to integrate your services with Google Chat.
If this is your first time using the Google Chat API, choose ACTIVATE. Otherwise, choose MANAGE.
On the Configuration tab, under Application info, provide the following information, as shown in the following screenshot:
1. For App name, enter an app name (for example, bedrock-chat).
2. For Avatar URL, enter the URL for your app’s avatar image. As a default, you can provide the Google chat product icon.
3. For Description, enter a description of the app (for example, Chat App with Amazon Bedrock).

Under Interactive features, turn on Enable Interactive features.
Under Functionality, select Receive 1:1 messages and Join spaces and group conversations, as shown in the following screenshot.

Under Connection settings, provide the following information:
1. Select App URL.
2. For App URL, enter the Invoke URL associated with the deployment stage of the HTTP API gateway. This is the ApiEndpoint parameter that you noted at the end of the deployment of the AWS CDK template.
3. For Authentication Audience, select App URL, as shown in the following screenshot.

Under Visibility, select Make this Chat app available to specific people and groups in <your-company-name> and provide email addresses for individuals and groups who will be authorized to use your app. You need to add at least your own email if you want to access the app.

Choose Save.

The following animation illustrates these steps on the Google Cloud console.

By completing these steps, the new Amazon Bedrock chat app should be accessible on the Google Chat console for the persons or groups that you authorized in your Google Workspace.

To dispatch interaction events to the solution deployed in this post, Google Chat sends requests to your API Gateway endpoint. To verify the authenticity of these requests, Google Chat includes a bearer token in the Authorization header of every HTTPS request to your endpoint. The Lambda authorizer function provided with this solution verifies that the bearer token was issued by Google Chat and targeted at your specific app using the Google OAuth client library. You can further customize the Lambda authorizer function to implement additional control rules based on User or Space objects included in the request from Google Chat to your API Gateway endpoint. This allows you to fine-tune access control, for example, by restricting certain features to specific users or limiting the app’s functionality in particular chat spaces, enhancing security and customization options for your organization.

Converse with your custom Google Chat app

You can now converse with the new app within your Google Chat interface. Connect to Google Chat with an email that you authorized during the configuration of your app and initiate a conversation by finding the app:

Choose New chat in the chat pane, then enter the name of the application (bedrock-chat) in the search field.
Choose Chat and enter a natural language phrase to interact with the application.

Although we previously demonstrated a usage scenario that involves a direct chat with the Amazon Bedrock application, you can also invoke the application from within a Google chat space, as illustrated in the following demo.

Customize the solution

In this post, we used Amazon Bedrock to power the chat-based assistant. However, you can customize the solution to use a variety of AWS services and create a solution that fits your specific business needs.

To customize the application, complete the following steps:

Edit the file lambda/lambda-chat-app/lambda-chatapp-code.py in the GitHub repository you cloned to your local machine during deployment.
Implement your business logic in this file.

The code runs in a Lambda function. Each time a request is processed, Lambda runs the lambda_handler function:

def lambda_handler(event, context):
    if event['requestContext']['http']['method'] == 'POST':
        # A POST request indicates a Google Chat App Event sent by the application        
        data = json.loads(event['body'])
        # Invoke handle_post function that includes the logic to process Google chat app events
        response = handle_post(data)
        return { 'text': response }
    else:
        return {
            'statusCode': 405,
            'body': json.dumps("Method Not Allowed. This function must be called from Google Chat.")
        }

When Google Chat sends a request, the lambda_handler function calls the handle_post function.

Let’s replace the handle_post function with the following code:

def handle_post(data):
    if data['type'] == 'MESSAGE':
        user_message = data['message']['text']  
        space_name = data['space']['name']
        return f"Hello! You said: {user_message}nThe space name is: {space_name}"

Save your file, then run the following command in your terminal to deploy your new code:

cdk deploy

The deployment should take about a minute. When it’s complete, you can go to Google Chat and test your new business logic. The following screenshot shows an example chat.

As the image shows, your function gets the user message and a space name. You can use this space name as a unique ID for the conversation, which lets you to manage history.

As you become more familiar with the solution, you may want to explore advanced Amazon Bedrock features to significantly expand its capabilities and make it more robust and versatile. Consider integrating Amazon Bedrock Guardrails to implement safeguards customized to your application requirements and responsible AI policies. Consider also expanding the assistant’s capabilities through function calling, to perform actions on behalf of users, such as scheduling meetings or initiating workflows. You could also use Amazon Bedrock Prompt Flows to accelerate the creation, testing, and deployment of workflows through an intuitive visual builder. For more advanced interactions, you could explore implementing Amazon Bedrock Agents capable of reasoning about complex problems, making decisions, and executing multistep tasks autonomously.

Performance optimization

The serverless architecture used in this post provides a scalable solution out of the box. As your user base grows or if you have specific performance requirements, there are several ways to further optimize performance. You can implement API caching to speed up repeated requests or use provisioned concurrency for Lambda functions to eliminate cold starts. To overcome API Gateway timeout limitations in scenarios requiring longer processing times, you can increase the integration timeout on API Gateway, or you might replace it with an Application Load Balancer, which allows for extended connection durations. You can also fine-tune your choice of Amazon Bedrock model to balance accuracy and speed. Finally, Provisioned Throughput in Amazon Bedrock lets you provision a higher level of throughput for a model at a fixed cost.

Clean up

In this post, you deployed a solution that lets you interact directly with a chat assistant powered by AWS from your Google Chat environment. The architecture incurs usage cost for several AWS services. First, you will be charged for model inference and for the vector databases you use with Amazon Bedrock Knowledge Bases. AWS Lambda costs are based on the number of requests and compute time, and Amazon DynamoDB charges depend on read/write capacity units and storage used. Additionally, Amazon API Gateway incurs charges based on the number of API calls and data transfer. For more details about pricing, refer to Amazon Bedrock pricing.

There might also be costs associated with using Google services. For detailed information about potential charges related to Google Chat, refer to the Google Chat product documentation.

To avoid unnecessary costs, clean up the resources created in your AWS environment when you’re finished exploring this solution. Use the cdk destroy command to delete the AWS CDK stack previously deployed in this post. Alternatively, open the AWS CloudFormation console and delete the stack you deployed.

Conclusion

In this post, we demonstrated a practical solution for creating an AI-powered business assistant for Google Chat. This solution seamlessly integrates Google Workspace with AWS hosted data by using LLMs on Amazon Bedrock, Lambda for application logic, DynamoDB for session management, and API Gateway for secure communication. By implementing this solution, organizations can provide their workforce with a streamlined way to access AI-driven insights and knowledge bases directly within their familiar Google Chat interface, enabling natural language interaction and data-driven discussions without the need to switch between different applications or platforms.

Furthermore, we showcased how to customize the application to implement tailored business logic that can use other AWS services. This flexibility empowers you to tailor the assistant’s capabilities to their specific requirements, providing a seamless integration with your existing AWS infrastructure and data sources.

AWS offers a comprehensive suite of cutting-edge AI services to meet your organization’s unique needs, including Amazon Bedrock and Amazon Q. Now that you know how to integrate AWS services with Google Chat, you can explore their capabilities and build awesome applications!

About the Authors

Nizar Kheir is a Senior Solutions Architect at AWS with more than 15 years of experience spanning various industry segments. He currently works with public sector customers in France and across EMEA to help them modernize their IT infrastructure and foster innovation by harnessing the power of the AWS Cloud.

Lior Perez is a Principal Solutions Architect on the construction team based in Toulouse, France. He enjoys supporting customers in their digital transformation journey, using big data, machine learning, and generative AI to help solve their business challenges. He is also personally passionate about robotics and Internet of Things (IoT), and he constantly looks for new ways to use technologies for innovation.

Discover insights from Gmail using the Gmail connector for Amazon Q Business

October 31, 2024

by Divyajeet Singh Amazon AWS

A number of organizations use Gmail for their business email needs. Gmail for business is part of Google Workspace, which provides a set of productivity and collaboration tools like Google Drive, Gmail, and Google Calendar. Google Drive supports storing documents such as Emails contain a wealth of information found in different places, such as within the subject of an email, the message content, or even attachments. Performing an intelligent search on emails with co-workers can help you find answers to questions, improving productivity and enhancing the overall customer experience for the organization.

Amazon Q Business is a fully managed, generative AI-powered assistant designed to enhance enterprise operations. It can be tailored to specific business needs by connecting to company data, information, and systems through over 40 built-in connectors.

Amazon Q Business enables users in various roles, such as marketers, project managers, and sales representatives, to have tailored conversations, solve problems, generate content, take action, and more, all through a web-based interface. This tool aims to make employees work smarter, move faster, and drive more significant impact by providing immediate and relevant information and streamlining tasks.

With the Gmail connector for Amazon Q Business, you can enhance productivity and streamline communication processes within your organization. This integration empowers you to use advanced search capabilities and intelligent email management using natural language.

In this post, we guide you through the process of setting up the Gmail connector, enabling seamless interaction between Gmail and Amazon Q Business. Whether you’re a small startup or a large enterprise, this solution can help you maximize the potential of your Gmail data and empower your team with actionable insights.

Finding accurate answers from content in Gmail mailbox using Amazon Q Business

After you integrate Amazon Q Business with Gmail, you can ask a question and Amazon Q Business can index through your mailbox and find relevant answers. For example, you can make the following queries:

Natural language search – You can search for emails and attachments within your mailbox using natural language, making it effortless to find your desired information without having to remember specific keywords or filters
Summarization – You can request a concise summary of the conversations and attachments matching your search query, allowing you to quickly grasp the key points without having to manually sift through individual items
Query clarification – If your query is ambiguous or lacks sufficient context, Amazon Q Business can engage in a dialogue to clarify the intent, so you receive the most relevant and accurate results

Overview of the Gmail connector for Amazon Q Business

To crawl and index contents in Gmail, you can configure the Gmail connector for Amazon Q Business as a data source in your Amazon Q Business application. When you connect Amazon Q Business to a data source and initiate the sync process, Amazon Q Business crawls and indexes documents from the data source into its index.

A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. A data source is a data repository or location that Amazon Q Business connects to in order to retrieve your email data. After you set up the connector, you can create one or multiple data sources within Amazon Q Business and configure them to start indexing emails from your Gmail account.

Types of documents

Gmail messages can be sorted and stored inside your email inbox using folders and labels.

Let’s looks at what are considered as documents in the context of the Gmail connector for Amazon Q Business. The connector supports the crawling of the following entities in Gmail:

Email – Each email is considered a single document
Attachment – Each email attachment is considered a single document

Additionally, supported custom metadata and custom objects are also crawled during the sync process.

The Gmail connector for Amazon Q Business also supports the indexing of a rich set of metadata from the various entities in Gmail. It further provides the ability to map these source metadata fields to Amazon Q index fields for indexing. These field mappings allow you to map Gmail field names to Amazon Q index field names. There are three types of metadata fields that Amazon Q connectors support:

Default fields – These are required with each document, such as the title, creation date, or author
Optional fields – These are provided by the data source, and the administrator can optionally choose one or more of these fields if they contain important and relevant information to produce accurate answers
Custom metadata fields – These are fields created in the data source in addition to what the data source already provides

Refer to Gmail data source connector field mappings for more information.

Authentication

Before we index the content from Gmail, we need to first establish a secure connection between the Gmail connector for Amazon Q Business with your Google service account. To establish a secure connection, we need to authenticate with the data source.

The connector supports authentication using a Google service account. We describe the process of creating an account later in this post. For more information about authentication, see Gmail connector overview.

Secure querying with ACL crawling and identity crawling

Secure querying is when a user runs a query and is returned answers only from documents that the user has access to. To enable users to do secure querying, Amazon Q Business honors the access control lists (ACLs) of the documents. Amazon Q Business does this by first supporting the indexing of ACLs. Indexing documents with ACLs is crucial for maintaining data security, because documents without ACLs are considered public. Additionally, the user’s credentials (email address) are passed along with the query so that answers from documents that are relevant and which user is authorized to access are displayed.

When connecting a Gmail data source, Amazon Q Business crawls the ACL information attached to a document (user and group information) from your Gmail instance. In Gmail, user IDs are mapped to _user_id. User IDs exist in Gmail on files with set access permissions. They’re mapped from the user emails as the IDs in Gmail.

When a user logs in to a web application to conduct a search, the user’s credentials, such as an email address, need to match what is in the ACL of the document to return results from that document. The web application that the user uses to retrieve answers is connected to an identity provider (IdP) or AWS IAM Identity Center. The user’s credentials from the IdP or IAM Identity Center are referred to here as the federated user credentials. The federated user credentials are passed along with the query so that Amazon Q can return the answers from the documents that this user has access to.

Refer to How Amazon Q Business connector crawls Gmail ACLs for more information.

Solution overview

In the following sections, we demonstrate how to set up the Gmail connector for Amazon Q Business. Then we provide examples of how to use the AI-powered chat interface to gain insights from the connected data source.

In our solution, we index emails from Gmail by configuring the Gmail data source connector. This connector allows you to query your Gmail data using Amazon Q Business as your query engine.

After the configuration is complete, you can configure how often Amazon Q Business should synchronize with your Gmail account to keep up to date with the email content. This process makes sure that your email interactions are systematically updated within Amazon Q Business, enabling you to query and uncover valuable insights from your Gmail data.

The following diagram illustrates the solution architecture. Google Workspace is the data source. Emails and attachments along with the ACL information are passed to Amazon Q Business from the Google workspace. The user submits a query to the Amazon Q Business application. Amazon Q Business retrieves the ACL of the user and provides answers based on the emails and attachments that the user has access to.

Prerequisites

You should have the following:

An Amazon Q Business application. If you haven’t created one yet, refer to Build private and secure enterprise generative AI apps with Amazon Q Business and AWS IAM Identity Center for instructions.
A Google Workspace account and an organization for your business with one or many users that have access to Gmail.
Administrator account credentials to Google Workspace and the Google Cloud console.
Access to AWS Secrets Manager.
Privileges to create a new Amazon Q application (or add data sources to existing applications), AWS resources, and AWS Identity and Access Management (IAM) roles and policies.

Configure the Gmail connector for an Amazon Q Business application

To enable Amazon Q Business to access and index emails from Gmail accounts within the organization, it’s essential to configure the organization’s Google workspace. In the steps that follow, we create a service account that will be used by the Gmail connector for Amazon Q Business to index emails.

We provide the service account with authorization scopes to allow access to the required Gmail APIs. The authorization scopes express the permissions you request users to authorize for your application and are applicable to emails within your organization’s Google workspace.

Complete the following steps:

Log in to your organization’s Google Cloud account.
Create a new project with an appropriate name and assign it to your organization. In our example, we name the project GmailConnector.
Choose Create.

After you create the project, on the navigation menu, choose APIs and Services and Library to view the API Library.

On the API Library page, search for and choose Admin SDK API.

The Admin SDK API enables managing the Google workspace account resources and audit usage.

Choose Enable.

Similarly, search for the Gmail API on the API Library

The Gmail API can help in viewing and managing the Gmail mailbox data like threads, messages, and labels.

Choose Enable to enable this API.

We now create a service account. The service account will be used by the Amazon Q Business Gmail data source connector to access the organization’s emails based on the allowed API scope.

On the navigation menu, choose IAM and Admin and Service accounts.

Choose Create service account.

Name the service account Amazon-q-integration-gmail, enter a description, and choose Create and continue.
Skip the optional sections Grant this service account access to project and Grant users access to this service account.
Choose Done.

Choose the service account you created to navigate to the service account details page.
Note the unique ID for the service account—the unique ID is also known as the client ID, and will be used in later steps.

Next, we create the keys for the service account, which will allow it to be used by the Gmail connector for Amazon Q Business.

On the Keys tab, choose Add key and Create new key.

When prompted for the key type, select the recommended option JSON and choose Create.

This will download the private key to your computer, which must be kept safe to allow configuration within the Amazon Q console. The following screenshot shows an example of the credentials JSON file.

On the Details tab, expand the Advanced settings section and choose View Google Workspace Admin console in the Domain-wide Delegation

Granting access to the service account using a domain-wide delegation to your organization’s data must be treated as a privileged operation and done with caution. You can reverse the access grant by disabling or deleting the service account or removing access through the Google Workspace Admin console.

Use the Google Workspace Admin credentials to log in to the Google Workspace Admin console.
Under Security on the navigation menu, under Access and data control, choose API controls.
In the Domain-wide delegation section, choose Manage domain-wide delegation.

Choose Add new.

In the Add a new client ID dialog, enter the unique ID for the service account you created.
Enter the following scopes to allow the service account to access the emails from Gmail:
- https://www.googleapis.com/auth/gmail.readonly – This scope allows to you to view your email messages and settings.
- https://www.googleapis.com/auth/admin.directory.user.readonly – This scope allows to see and download your organization’s Google Workspace directory.

For more details about all the scopes available, refer to OAuth 2.0 Scopes for Google APIs.

Choose Authorize.

This concludes the configuration within the Google Cloud console and Google Workspace Admin console.

Create the Gmail connector for an Amazon Q Business application

This post assumes that an Amazon Q Business application has already been created beforehand. If you haven’t created one yet, refer to Build private and secure enterprise generative AI apps with Amazon Q Business and AWS IAM Identity Center for instructions.

Complete the following steps to configure the connector:

On the Amazon Q Business console, choose Applications in the navigation pane.
Select the application that you want to add the Gmail connector to.
On the Actions menu, choose Edit.

On the Update application page, leave all values unchanged and choose Update.

On the Update retriever page, leave all values as default and choose Next.

On the Connect data sources page, on the All tab, search for Gmail in the search field.
Choose the plus sign next to Gmail, which will open up a page to set up the data source.

In the Name and description section, enter a name and description.

In the Authentication section, choose Create and add new secret.

In the Create an AWS Secrets Manager secret pop-up, provide the following information:
- Enter a name for your Secrets Manager secret.
- For Client email and Private key, refer to the JSON file that you downloaded to your local machine earlier.
- For Admin account email, enter the admin account for your Google
- For Private key, enter the private key details.
- Choose Save.

In the IAM role section, for IAM role, choose Create a new service role (recommended).

In the Sync scope section, select Message attachments and enter a value for Maximum file size.
Optionally, configure the following under Additional configuration (we leave everything as default for this post):
- For Date range, enter the start and end dates for emails to be crawled. Emails received on or after the start date and before the end date are included in the sync scope.
- For Email domains, enter the email from domains, email to domains, subject, CC emails, and BCC emails you want to include or exclude in your index.
- For Keywords in subjects, include or exclude any documents with at least one keyword mentioned in their subjects
- For Labels, add regular expression patterns to include or exclude certain labels or attachment types. You can add up to 100 patterns.
- For Attachments, add regular expression patterns to include or exclude certain attachments. You can add up to 100 patterns.

In the Sync mode section, select New, modified, or deleted content sync.
In the Sync run schedule section, choose the frequency that works best for your use case. For this post, we choose Run on demand.

Choose Add data source and wait for the retriever to be created.

After the data source is created, you’re redirected to the Connect data sources page to add more data sources as needed.

Verify your data source is added and choose Next.

On the Update groups and users page, choose Add groups and users.

The users and groups that you add in this section are from the IAM Identity Center users and groups set up by your administrator.

In the Add or assign users and groups pop-up window, select Assign existing users and groups to add existing users configured in your connected IAM Identity Center, then choose Next.

Optionally, if you have permissions to add users to connected IAM Identity Center, you can select Add new users.

Choose Get started.

Search for users by user display name or groups by group name.
Choose the users or groups you want you add and choose Assign.

The groups and users that you added should now be available on the Groups or Users tabs.

Choose Assign.

For each group or user entry, an Amazon Q Business subscription tier needs to be assigned.

To enable a subscription for a group, on the Update groups and users page, choose the Groups tab (if individual users need to be assigned a subscription, choose the Users tab).
Under the Subscription column, select Choose subscription and choose a subscription (Q Business Lite or Q Business Pro).
Choose Update application to complete adding and setting up the Gmail connector for Amazon Q Business.

Configure Gmail field mappings

To help you structure data for retrieval and chat filtering, Amazon Q Business crawls data source document attributes or metadata and maps them to fields in your Amazon Q index. Amazon Q has reserved fields that it uses when querying your application. When possible, Amazon Q automatically maps these built-in fields to attributes in your data source.

If a built-in field doesn’t have a default mapping, or if you want to map additional index fields, use the custom field mappings to specify how a data source attribute maps to your Amazon Q application.

On the Amazon Q Business console, choose your application.
Under Data sources, select your data source.
On the Actions menu, choose Edit.

In the Field mappings section, select the required fields to crawl under Messages and Message attachments and any types that are available.

The Gmail connector setup for Amazon Q Business is now complete.

To test the connectivity to Gmail and initiate the data synchronization, choose Sync now. The initial sync process may take several minutes to complete.

When the sync is complete, in the Sync run history section, you can see the sync status along with a summary of how may total items were added, deleted, modified, and failed during the sync process.

Query Gmail data using the Amazon Q web experience

Now that the data synchronization is complete, you can start exploring insights from Amazon Q. In the newly created Amazon Q application, choose Customize web experience to open a new tab with a preview of the UI and options to customize as per your needs.

You can customize the Title, Subtitle, and Welcome message fields according to your needs, which will be reflected in the UI.

For this walkthrough, we use the defaults and choose View web experience to be redirected to the login page for the Amazon Q application.

Log in to the application using the credentials for the user that were added to the Amazon Q application. After the login is successful, you’re redirected to the Amazon Q assistant UI, where you can ask questions using natural language and get insights from your Gmail index.

The Gmail data source connected to this Amazon Q Business application has email and Gmail attachments. We demonstrate how the Amazon Q application lets you ask questions on your email using natural language and receive responses and insights for those queries.

Let’s begin by asking Amazon Q to summarize key points from Matt Garma’s (CEO of AWS) email. The following screenshot displays the response and it also includes the email source from where it is generating the response.

For our next example, let’s ask Amazon Q to provide details about return issue customer is facing for a bicycle order they placed with Amazon. Following screenshot shows the details about the issue being faced by the customer and includes the email source from where Amazon Q is generating the response.

Troubleshooting

Troubleshooting your Amazon Q Business Gmail connector provides information about error codes you might see for the Gmail connector and suggested troubleshooting actions. If you encounter an HTTP status code 403 (Forbidden) error when you open your Amazon Q Business application, it means that the user is unable to access the application. . See Troubleshooting Amazon Q Business and identity provider integration for common causes and how to address them.

Frequently asked questions

In this section, we provide guidance to frequently asked questions.

Amazon Q Business is unable to answer your questions

This could happen due to a several reasons:

No permissions – ACLs applied to your account doesn’t allow you to query certain data sources. If this is the case, reach out to your application administrator to make sure your ACLs are configured to access the data sources.
Data connector sync failed – The data connector might have failed to sync information from the source to the Amazon Q Business application. Verify the data connector’s sync run schedule and sync history to confirm the sync is successful.

If neither of these reasons are true in your case, open a support case to get this resolved.

How to generate responses from authoritative data sources

You can configure these options using Amazon Q Business application global controls under Admin controls and guardrails.

Log in as an Amazon Q Business application administrator.
Navigate to the application and choose Admin controls and guardrails in the navigation pane.
Choose Edit in the Global controls section to control these options.

For more information, refer to Admin controls and guardrails in Amazon Q Business.

Amazon Q Business responds using old (stale) data even though your data source is updated

Each Amazon Q Business data connector can be configured with unique sync run schedule frequency. Verify the sync status and sync schedule frequency for your data connector to see when the last sync ran successfully. Your data connector’s sync run schedule could be set to sync at a scheduled time of day, week, or month. If it’s set to run on demand, the sync has to be run manually. When the sync run is complete, verify the sync history to make sure the run has successfully synced all new issues. Refer to Sync run schedule for more information on each option.

How to set up Amazon Q Business using a different IdP

You can set up Amazon Q Business with another SAML 2.0-compliant IdP, such as Okta, Entra ID, or Ping Identity. For more information, see Creating an Amazon Q Business application using Identity Federation through IAM.

Expand the solution

You can explore other features in Amazon Q Business. For example, the Amazon Q Business document enrichment feature helps you control both which documents and document attributes are ingested into your index and how they’re ingested. With document enrichment, you can create, modify, or delete document attributes and document content when you ingest them into your Amazon Q Business index. For example, you can scrub personally identifiable information (PII) by choosing to delete any document attributes related to PII.

Amazon Q Business also offers the following features:

Filtering using metadata – Use document attributes to customize and control users’ chat experience. This is currently supported only if you use the Amazon Q Business API.
Source attribution with citations – Verify responses using Amazon Q Business source attributions.
Upload files and chat – Let users upload files directly into chat and use uploaded file data to perform web experience tasks.
Quick prompts – Feature sample prompts to inform users of the capabilities of their Amazon Q Business web experience.

To improve retrieved results and customize the user chat experience, you can map document attributes from your data sources to fields in your Amazon Q index. To learn more, see Gmail data source connector field mappings.

Clean up

To avoid incurring future charges, clean up any resources you created as part of this solution, including the Amazon Q application:

On the Amazon Q console, choose Applications in the navigation pane.
Select the dashboard you created.
On the Actions menu, choose Delete.
Delete the IAM roles created for the application and data retriever.
If you used IAM Identity Center for this walkthrough, delete your IAM Identity Center instance.

Conclusion

In this post, we discussed how to configure the Gmail connector for Amazon Q Business and use the AI-powered chat interface to gain insights from the connected data source.

To learn more about the Gmail connector for Amazon Q Business, refer to Connecting Gmail to Amazon Q Business, the Amazon Q User Guide, and the Amazon Q Developer Guide.

About the Authors

Divyajeet (DJ) Singh is a Sr. Solutions Architect at AWS Canada. He loves working with customers to help them solve their unique business challenges using the cloud. In his free time, he enjoys spending time with family and friends, and exploring new places.

Temi Aremu is a Solutions Architect at AWS Canada. She is passionate about helping customers solve their business problems with the power of the AWS Cloud. Temi’s areas of interest are analytics, machine learning, and empowering the next generation of women in STEM.

Vineet Kachhawaha is a Sr. Solutions Architect at AWS focusing on AI/ML and generative AI. He co-leads the AWS for Legal Tech team within AWS. He is passionate about working with enterprise customers and partners to design, deploy, and scale AI/ML applications to derive business value.

Vijai Gandikota is a Principal Product Manager in the Amazon Q and Amazon Kendra organization of Amazon Web Services. He is responsible for the Amazon Q and Amazon Kendra connectors, ingestion, security, and other aspects of the Amazon Q and Amazon Kendra services.

Dipti Kulkarni is a Software Development Manager on the Amazon Q and Amazon Kendra engineering team of Amazon Web Services, where she manages the connector development and integration teams.

Accelerate custom labeling workflows in Amazon SageMaker Ground Truth without using AWS Lambda

October 31, 2024

by Sundar Raghavan Amazon AWS

Amazon SageMaker Ground Truth enables the creation of high-quality, large-scale training datasets, essential for fine-tuning across a wide range of applications, including large language models (LLMs) and generative AI. By integrating human annotators with machine learning, SageMaker Ground Truth significantly reduces the cost and time required for data labeling. Whether it’s annotating images, videos, or text, SageMaker Ground Truth allows you to build accurate datasets while maintaining human oversight and feedback at scale. This human-in-the-loop approach is crucial for aligning foundation models with human preferences, enhancing their ability to perform tasks tailored to your specific requirements.

To support various labeling needs, SageMaker Ground Truth provides built-in workflows for common tasks like image classification, object detection, and semantic segmentation. Additionally, it offers the flexibility to create custom workflows, enabling you to design your own UI templates for specialized data labeling tasks, tailored to your unique requirements.

Previously, setting up a custom labeling job required specifying two AWS Lambda functions: a pre-annotation function, which is run on each dataset object before it’s sent to workers, and a post-annotation function, which is run on the annotations of each dataset object and consolidates multiple worker annotations if needed. Although these functions offer valuable customization capabilities, they also add complexity for users who don’t require additional data manipulation. In these cases, you would have to write functions that merely returned your input unchanged, increasing development effort and the potential for errors when integrating the Lambda functions with the UI template and input manifest file.

Today, we’re pleased to announce that you no longer need to provide pre-annotation and post-annotation Lambda functions when creating custom SageMaker Ground Truth labeling jobs. These functions are now optional on both the SageMaker console and the CreateLabelingJob API. This means you can create custom labeling workflows more efficiently when you don’t require extra data processing.

In this post, we show you how to set up a custom labeling job without Lambda functions using SageMaker Ground Truth. We guide you through configuring the workflow using a multimodal content evaluation template, explain how it works without Lambda functions, and highlight the benefits of this new capability.

Solution overview

When you omit the Lambda functions in a custom labeling job, the workflow simplifies:

No pre-annotation function – The data from the input manifest file is inserted directly into the UI template. You can reference the data object fields in your template without needing a Lambda function to map them.
No post-annotation function – Each worker’s annotation is saved directly to your specified Amazon Simple Storage Service (Amazon S3) bucket as an individual JSON file, with the annotation stored under a worker-response key. Without a post-annotation Lambda function, the output manifest file references these worker response files instead of including all annotations directly within the manifest.

In the following sections, we walk through how to set up a custom labeling job without Lambda functions using a multimodal content evaluation template, which allows you to evaluate model-generated descriptions of images. Annotators can review an image, a prompt, and the model’s response, then evaluate the response based on criteria such as accuracy, relevance, and clarity. This provides crucial human feedback for fine-tuning models using Reinforcement Learning from Human Feedback (RLHF) or evaluating LLMs.

Prepare the input manifest file

To set up our labeling job, we begin by preparing the input manifest file that the template will use. The input manifest is a JSON Lines file where each line represents a dataset item to be labeled. Each line contains a source field for embedded data or a source-ref field for references to data stored in Amazon S3. These fields are used to provide the data objects that annotators will label. For detailed information on the input manifest file structure, refer to Input manifest files.

For our specific task—evaluating model-generated descriptions of images—we structure the input manifest to include the following fields:

“source” – The prompt provided to the model
“image” – The S3 URI of the image associated with the prompt
“modelResponse” – The model’s generated description of the image

By including these fields, we’re able to present both the prompt and the related data directly to the annotators within the UI template. This approach eliminates the need for a pre-annotation Lambda function because all necessary information is readily accessible in the manifest file.

The following code is an example of what a line in our input manifest might look like:

{
  "source": "Describe the following image in four lines",
  "image": "s3://your-bucket-name/path-to-image/image.jpeg",
  "modelResponse": "The image features a stylish pair of over-ear headphones with cushioned ear cups and a tan leather headband on a wooden desk. Soft natural light fills a cozy home office, with a laptop, smartphone, and notebook nearby. A cup of coffee and a pen add to the workspace's relaxed vibe. The setting blends modern tech with a warm, inviting atmosphere."
}

Insert the prompt in the UI template

In your UI template, you can insert the prompt using {{ task.input.source }}, display the image using an <img> tag with src="{{ task.input.image | grant_read_access }}" (the grant_read_access Liquid filter provides the worker with access to the S3 object), and show the model’s response with {{ task.input.modelResponse }}. Annotators can then evaluate the model’s response based on predefined criteria, such as accuracy, relevance, and clarity, using tools like sliders or text input fields for additional comments. You can find the complete UI template for this task in our GitHub repository.

Create the labeling job on the SageMaker console

To configure the labeling job using the AWS Management Console, complete the following steps:

On the SageMaker console, under Ground Truth in the navigation pane, choose Labeling job.
Choose Create labeling job.
Specify your input manifest location and output path.
Select Custom as the task type.
Choose Next.
Enter a task title and description.
Under Template, upload your UI template.

The annotation Lambda functions are now an optional setting under Additional configuration.

Choose Preview to display the UI template for review.

Choose Create to create the labeling job.

Create the labeling job using the CreateLabelingJob API

You can also create the custom labeling job programmatically by using the AWS SDK to invoke the CreateLabelingJob API. After uploading the input manifest files to an S3 bucket and setting up a work team, you can define your labeling job in code, omitting the Lambda function parameters if they’re not needed. The following example demonstrates how to do this using Python and Boto3.

In the API, the pre-annotation Lambda function is specified using the PreHumanTaskLambdaArn parameter within the HumanTaskConfig structure. The post-annotation Lambda function is specified using the AnnotationConsolidationLambdaArn parameter within the AnnotationConsolidationConfig structure. With the recent update, both PreHumanTaskLambdaArn and AnnotationConsolidationConfig are now optional. This means you can omit them if your labeling workflow doesn’t require additional data preprocessing or postprocessing.

The following code is an example of how to create a labeling job without specifying the Lambda functions:

response = sagemaker.create_labeling_job(
    LabelingJobName="Lambda-free-job-demo",
    LabelAttributeName="label",
    InputConfig={
        "DataSource": {
            "S3DataSource": {
                "ManifestS3Uri": "s3://customer-bucket/path-to-manifest"
            }
        }
    },
    OutputConfig={
        "S3OutputPath": "s3://customer-bucket/path-to-output-file"
    },
    RoleArn="arn:aws:iam::012345678910:role/CustomerRole",

    # Notice, no PreHumanTaskLambdaArn or AnnotationConsolidationConfig!
    HumanTaskConfig={
        "TaskAvailabilityLifetimeInSeconds": 21600,
        "TaskTimeLimitInSeconds": 3600,
        "WorkteamArn": "arn:aws:sagemaker:us-west-2:058264523720:workteam/private-crowd/customer-work-team-name",
        "TaskDescription": " Evaluate model-generated text responses based on a reference image.",
        "MaxConcurrentTaskCount": 1000,
        "TaskTitle": " Evaluate Model Responses Based on Image References",
        "NumberOfHumanWorkersPerDataObject": 1,
        "UiConfig": {
            "UiTemplateS3Uri": "s3://customer-bucket/path-to-ui-template"
        }
    }
)

When the annotators submit their evaluations, their responses are saved directly to your specified S3 bucket. The output manifest file includes the original data fields and a worker-response-ref that points to a worker response file in S3. This worker response file contains all the annotations for that data object. If multiple annotators have worked on the same data object, their individual annotations are included within this file under an answers key, which is an array of responses. Each response includes the annotator’s input and metadata such as acceptance time, submission time, and worker ID.

This means that all annotations for a given data object are collected in one place, allowing you to process or analyze them later according to your specific requirements, without needing a post-annotation Lambda function. You have access to all the raw annotations and can perform any necessary consolidation or aggregation as part of your post-processing workflow.

Benefits of labeling jobs without Lambda functions

Creating custom labeling jobs without Lambda functions offers several benefits:

Simplified setup – You can create custom labeling jobs more quickly by skipping the creation and configuration of Lambda functions when they’re not needed.
Time savings – Reducing the number of components in your labeling workflow saves development and debugging time.
Reduced complexity – Fewer moving parts mean a lower chance of encountering configuration errors or integration issues.
Cost reduction – By not using Lambda functions, you reduce the associated costs of deploying and invoking these resources.
Flexibility – You retain the ability to use Lambda functions for preprocessing and annotation consolidation when your project requires these capabilities. This update offers simplicity for straightforward tasks and flexibility for more complex requirements.

This feature is currently available in all AWS Regions that support SageMaker Ground Truth. In the future, look out for built-in task types that don’t require annotation Lambda functions, providing a simplified experience for SageMaker Ground Truth across the board.

Conclusion

The introduction of workflows for custom labeling jobs in SageMaker Ground Truth without Lambda functions significantly simplifies the data labeling process. By making Lambda functions optional, we’ve made it simpler and faster to set up custom labeling jobs, reducing potential errors and saving valuable time.

This update maintains the flexibility of custom workflows while removing unnecessary steps for those who don’t require specialized data processing. Whether you’re conducting simple labeling tasks or complex multi-stage annotations, SageMaker Ground Truth now offers a more streamlined path to high-quality labeled data.

We encourage you to explore this new feature and see how it can enhance your data labeling workflows. To get started, check out the following resources:

Browse over 80 available UI templates to suit your labeling needs on GitHub
Follow the step-by-step guide on creating custom labeling workflows to tailor your data labeling tasks

About the Authors

Sundar Raghavan is an AI/ML Specialist Solutions Architect at AWS, helping customers leverage SageMaker and Bedrock to build scalable and cost-efficient pipelines for computer vision applications, natural language processing, and generative AI. In his free time, Sundar loves exploring new places, sampling local eateries and embracing the great outdoors.

Alan Ismaiel is a software engineer at AWS based in New York City. He focuses on building and maintaining scalable AI/ML products, like Amazon SageMaker Ground Truth and Amazon Bedrock Model Evaluation. Outside of work, Alan is learning how to play pickleball, with mixed results.

Yinan Lang is a software engineer at AWS GroundTruth. He worked on GroundTruth, MechanicalTurk and Bedrock infrastructure, as well as customer facing projects for GroundTruth Plus. He also focuses on product security and worked on fixing risks and creating security tests. In leisure time, he is an audiophile and particularly loves to practice keyboard compositions by Bach.

George King is a summer 2024 intern at Amazon AI. He studies Computer Science and Math at the University of Washington and is currently between his second and third year. George loves being outdoors, playing games (chess and all kinds of card games), and exploring Seattle, where he has lived his entire life.

Unlock organizational wisdom using voice-driven knowledge capture with Amazon Transcribe and Amazon Bedrock

October 30, 2024

by Jundong Qiao Amazon AWS

Preserving and taking advantage of institutional knowledge is critical for organizational success and adaptability. This collective wisdom, comprising insights and experiences accumulated by employees over time, often exists as tacit knowledge passed down informally. Formalizing and documenting this invaluable resource can help organizations maintain institutional memory, drive innovation, enhance decision-making processes, and accelerate onboarding for new employees. However, effectively capturing and documenting this knowledge presents significant challenges. Traditional methods, such as manual documentation or interviews, are often time-consuming, inconsistent, and prone to errors. Moreover, the most valuable knowledge frequently resides in the minds of seasoned employees, who may find it difficult to articulate or lack the time to document their expertise comprehensively.

This post introduces an innovative voice-based application workflow that harnesses the power of Amazon Bedrock, Amazon Transcribe, and React to systematically capture and document institutional knowledge through voice recordings from experienced staff members. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Our solution uses Amazon Transcribe for real-time speech-to-text conversion, enabling accurate and immediate documentation of spoken knowledge. We then use generative AI, powered by Amazon Bedrock, to analyze and summarize the transcribed content, extracting key insights and generating comprehensive documentation.

The front-end of our application is built using React, a popular JavaScript library for creating dynamic UIs. This React-based UI seamlessly integrates with Amazon Transcribe, providing users with a real-time transcription experience. As employees speak, they can observe their words converted to text in real-time, permitting immediate review and editing.

By combining the React front-end UI with Amazon Transcribe and Amazon Bedrock, we’ve created a comprehensive solution for capturing, processing, and preserving valuable institutional knowledge. This approach not only streamlines the documentation process but also enhances the quality and accessibility of the captured information, supporting operational excellence and fostering a culture of continuous learning and improvement within organizations.

Solution overview

This solution uses a combination of AWS services, including Amazon Transcribe, Amazon Bedrock, AWS Lambda, Amazon Simple Storage Service (Amazon S3), and Amazon CloudFront, to deliver real-time transcription and document generation. This solution uses a combination of cutting-edge technologies to create a seamless knowledge capture process:

User interface – A React-based front-end, distributed through Amazon CloudFront, provides an intuitive interface for employees to input voice data.
Real-time transcription – Amazon Transcribe streaming converts speech to text in real time, providing accurate and immediate transcription of spoken knowledge.
Intelligent processing – A Lambda function, powered by generative AI models through Amazon Bedrock, analyzes and summarizes the transcribed text. It goes beyond simple summarization by performing the following actions:
- Extracting key concepts and terminologies.
- Structuring the information into a coherent, well-organized document.
Secure storage – Raw audio files, processed information, summaries, and generated content are securely stored in Amazon S3, providing scalable and durable storage for this valuable knowledge repository. S3 bucket policies and encryption are implemented to enforce data security and compliance.

This solution uses a custom authorization Lambda function with Amazon API Gateway instead of more comprehensive identity management solutions such as Amazon Cognito. This approach was chosen for several reasons:

Simplicity – As a sample application, it doesn’t demand full user management or login functionality
Minimal user friction – Users don’t need to create accounts or log in, simplifying the user experience
Quick implementation – For rapid prototyping, this approach can be faster to implement than setting up a full user management system
Temporary credential management – Businesses can use this approach to offer secure, temporary access to AWS services without embedding long-term credentials in the application

Although this solution works well for this specific use case, it’s important to note that for production applications, especially those dealing with sensitive data or needing user-specific functionality, a more robust identity solution such as Amazon Cognito would typically be recommended.

The following diagram illustrates the architecture of our solution.

The workflow includes the following steps:

Users access the front-end UI application, which is distributed through CloudFront
The React web application sends an initial request to Amazon API Gateway
API Gateway forwards the request to the authorization Lambda function
The authorization function checks the request against the AWS Identity and Access Management (IAM) role to confirm proper permissions
The authorization function sends temporary credentials back to the front-end application through API Gateway
With the temporary credentials, the React web application communicates directly with Amazon Transcribe for real-time speech-to-text conversion as the user records their input
After recording and transcription, the user sends (through the front-end UI) the transcribed texts and audio files to the backend through API Gateway
API Gateway routes the authorized request (containing transcribed text and audio files) to the orchestration Lambda function
The orchestration function sends the transcribed text for summarization
The orchestration function receives summarized text from Amazon Bedrock to generate content
The orchestration function stores the generated PDF files and recorded audio files in the artifacts S3 bucket

Prerequisites

You need the following prerequisites:

An active AWS account
Docker installed
The AWS CDK Toolkit 2.114.1+ installed and bootstrapped to the us-east-1 AWS Region
Python 3.12+ installed
Model access to Anthropic’s Claude enabled in Amazon Bedrock
An IAM user or role with access to Amazon Transcribe, Amazon Bedrock, Amazon S3, and Lambda

Deploy the solution with the AWS CDK

The AWS Cloud Development Kit (AWS CDK) is an open source software development framework for defining cloud infrastructure as code and provisioning it through AWS CloudFormation. Our AWS CDK stack deploys resources from the following AWS services:

Amazon Bedrock
Amazon CloudFront
AWS CodeBuild
Amazon EventBridge
IAM
AWS Key Management Service (AWS KMS)
AWS Lambda
Amazon S3
AWS Systems Manager Parameter Store
Amazon Transcribe
AWS WAF

To deploy the solution, complete the following steps:

Clone the GitHub repository: genai-knowledge-capture-webapp
Follow the Prerequisites section in the README.md file to set up your local environment

As of this writing, this solution supports deployment to the us-east-1 Region. The CloudFront distribution in this solution is geo-restricted to the US and Canada by default. To change this configuration, refer to the react-app-deploy.ts GitHub repo.

Invoke npm install to install the dependencies
Invoke cdk deploy to deploy the solution

The deployment process typically takes 20–30 minutes. When the deployment is complete, CodeBuild will build and deploy the React application, which typically takes 2–3 minutes. After that, you can access the UI at the ReactAppUrl URL that is output by the AWS CDK.

Amazon Transcribe Streaming within React application

Our solution’s front-end is built using React, a popular JavaScript library for creating dynamic user interfaces. We integrate Amazon Transcribe streaming into our React application using the aws-sdk/client-transcribe-streaming library. This integration enables real-time speech-to-text functionality, so users can observe their spoken words converted to text instantly.

The real-time transcription offers several benefits for knowledge capture:

With the immediate feedback, speakers can correct or clarify their statements in the moment
The visual representation of spoken words can help maintain focus and structure in the knowledge sharing process
It reduces the cognitive load on the speaker, who doesn’t need to worry about note-taking or remembering key points

In this solution, the Amazon Transcribe client is managed in a reusable React hook, useAudioTranscription.ts. An additional React hook, useAudioProcessing.ts, implements the necessary audio stream processing. Refer to the GitHub repo for more information. The following is a simplified code snippet demonstrating the Amazon Transcribe client integration:

// Create Transcribe client
transcribeClientRef.current = new TranscribeStreamingClient({
  region: credentials.Region,
  credentials: {
    accessKeyId: credentials.AccessKeyId,
    secretAccessKey: credentials.SecretAccessKey,
    sessionToken: credentials.SessionToken,
  },
});

// Create Transcribe Start Command
const transcribeStartCommand = new StartStreamTranscriptionCommand({
  LanguageCode: transcribeLanguage,
  MediaEncoding: audioEncodingType,
  MediaSampleRateHertz: audioSampleRate,
  AudioStream: getAudioStreamGenerator(),
});

// Start Transcribe session
const data = await transcribeClientRef.current.send(
  transcribeStartCommand
);
console.log("Transcribe session established ", data.SessionId);
setIsTranscribing(true);

// Process Transcribe result stream
if (data.TranscriptResultStream) {
  try {
    for await (const event of data.TranscriptResultStream) {
      handleTranscriptEvent(event, setTranscribeResponse);
    }
  } catch (error) {
    console.error("Error processing transcript result stream:", error);
  }
}

For optimal results, we recommend using a good-quality microphone and speaking clearly. At the time of writing, the system supports major dialects of English, with plans to expand language support in future updates.

Use the application

After deployment, open the ReactAppUrl link (https://<cloud front domain name>.cloudfront.net) in your browser (the solution supports Chrome, Firefox, Edge, Safari, and Brave browsers on Mac and Windows). A web UI opens, as shown in the following screenshot.

To use this application, complete the following steps:

Enter a question or topic.
Enter a file name for the document.
Choose Start Transcription and start recording your input for the given question or topic. The transcribed text will be shown in the Transcription box in real time.
After recording, you can edit the transcribed text.
You can also choose the play icon to play the recorded audio clips.
Choose Generate Document to invoke the backend service to generate a document from the input question and associated transcription. Meanwhile, the recorded audio clips are sent to an S3 bucket for future analysis.

The document generation process uses FMs from Amazon Bedrock to create a well-structured, professional document. The FM model performs the following actions:

Organizes the content into logical sections with appropriate headings
Identifies and highlights important concepts or terminologies
Generates a brief executive summary at the beginning of the document
Applies consistent formatting and styling

The audio files and generated documents are stored in a dedicated S3 bucket, as shown in the following screenshot, with appropriate encryption and access controls in place.

Choose View Document after you generate the document, and you will notice a professional PDF document generated with the user’s input in your browser, accessed through a presigned URL.

Additional information

To further enhance your knowledge capture solution and address specific use cases, consider the additional features and best practices discussed in this section.

Custom vocabulary with Amazon Transcribe

For industries with specialized terminology, Amazon Transcribe offers a custom vocabulary feature. You can define industry-specific terms, acronyms, and phrases to improve transcription accuracy. To implement this, complete the following steps:

Create a custom vocabulary file with your specialized terms
Use the Amazon Transcribe API to add this vocabulary to your account
Specify the custom vocabulary in your transcription requests

Asynchronous file uploads

For handling large audio files or improving user experience, implement an asynchronous upload process:

Create a separate Lambda function for file uploads
Use Amazon S3 presigned URLs to allow direct uploads from the client to Amazon S3
Invoke the upload Lambda function using S3 Event Notifications

Multi-topic document generation

For generating comprehensive documents covering multiple topics, refer to the following AWS Prescriptive Guidance pattern: Document institutional knowledge from voice inputs by using Amazon Bedrock and Amazon Transcribe. This pattern provides a scalable approach to combining multiple voice inputs into a single, coherent document.

Key benefits of this approach include:

Efficient capture of complex, multifaceted knowledge
Improved document structure and coherence
Reduced cognitive load on subject matter experts (SMEs)

Use captured knowledge as a knowledge base

The knowledge captured through this solution can serve as a valuable, searchable knowledge base for your organization. To maximize its utility, you can integrate with enterprise search solutions such as Amazon Bedrock Knowledge Bases to make the captured knowledge quickly discoverable. Additionally, you can set up regular review and update cycles to keep the knowledge base current and relevant.

Clean up

When you’re done testing the solution, remove it from your AWS account to avoid future costs:

Invoke cdk destroy to remove the solution
You may also need to manually remove the S3 buckets created by the solution

Summary

This post demonstrates the power of combining AWS services such as Amazon Transcribe and Amazon Bedrock with popular front-end frameworks such as React to create a robust knowledge capture solution. By using real-time transcription and generative AI, organizations can efficiently document and preserve valuable institutional knowledge, fostering innovation, improving decision-making, and maintaining a competitive edge in dynamic business environments.

We encourage you to explore this solution further by deploying it in your own environment and adapting it to your organization’s specific needs. The source code and detailed instructions are available in our genai-knowledge-capture-webapp GitHub repository, providing a solid foundation for your knowledge capture initiatives.

By embracing this innovative approach to knowledge capture, organizations can unlock the full potential of their collective wisdom, driving continuous improvement and maintaining their competitive edge.

About the Authors

Jundong Qiao is a Machine Learning Engineer at AWS Professional Service, where he specializes in implementing and enhancing AI/ML capabilities across various sectors. His expertise encompasses building next-generation AI solutions, including chatbots and predictive models that drive efficiency and innovation.

Michael Massey is a Cloud Application Architect at Amazon Web Services. He helps AWS customers achieve their goals by building highly-available and highly-scalable solutions on the AWS Cloud.

Praveen Kumar Jeyarajan is a Principal DevOps Consultant at AWS, supporting Enterprise customers and their journey to the cloud. He has 13+ years of DevOps experience and is skilled in solving myriad technical challenges using the latest technologies. He holds a Masters degree in Software Engineering. Outside of work, he enjoys watching movies and playing tennis.