Amazon AWS – Page 52

Achieve multi-Region resiliency for your conversational AI chatbots with Amazon Lex

October 30, 2024

by Sanjeet Sanda Amazon AWS

Global Resiliency is a new Amazon Lex capability that enables near real-time replication of your Amazon Lex V2 bots in a second AWS Region. When you activate this feature, all resources, versions, and aliases associated after activation will be synchronized across the chosen Regions. With Global Resiliency, the replicated bot resources and aliases in the second Region will have the same identifiers as those in the source Region. This consistency allows you to seamlessly route traffic to any Region by simply changing the Region identifier, providing uninterrupted service availability. In the event of a Regional outage or disruption, you can swiftly redirect your bot traffic to a different Region. Applications now have the ability to use replicated Amazon Lex bots across Regions in an active-active or active-passive manner for improved availability and resiliency. With Global Resiliency, you no longer need to manually manage separate bots across Regions, because the feature automatically replicates and keeps Regional configurations in sync. With just a few clicks or commands, you gain robust Amazon Lex bot replication capabilities. Applications that are using Amazon Lex bots can now fail over from an impaired Region seamlessly, minimizing the risk of costly downtime and maintaining business continuity. This feature streamlines the process of maintaining robust and highly available conversational applications. These include interactive voice response (IVR) systems, chatbots for digital channels, and messaging platforms, providing a seamless and resilient customer experience.

In this post, we walk you through enabling Global Resiliency for a sample Amazon Lex V2 bot. We showcase the replication process of bot versions and aliases across multiple Regions. Additionally, we discuss how to handle integrations with AWS Lambda and Amazon CloudWatch after enabling Global Resiliency.

Solution overview

For this exercise, we create a BookHotel bot as our sample bot. We use an AWS CloudFormation template to build this bot, including defining intents, slots, and other required components such as a version and alias. Throughout our demonstration, we use the us-east-1 Region as the source Region, and we replicate the bot in the us-west-2 Region, which serves as the replica Region. We then replicate this bot, enable logging, and integrate it with a Lambda function.

To better understand the solution, refer to the following architecture diagram.

Enabling Global Resiliency for an Amazon Lex bot is straightforward using the AWS Management Console, AWS Command Line Interface (AWS CLI), or APIs. We walk through the instructions to replicate the bot later in this post.
After replication is successfully enabled, the bot will be replicated across Regions, providing a unified experience. This allows you to distribute IVR or chat application requests between Regions in either an active-active or active-passive setup, depending on your use case.
A key benefit of Global Resiliency is that developers can continuously work on bot improvements in the source Region, and changes are automatically synchronized to the replica Region. This streamlines the development workflow without compromising resiliency.

At the time of writing, Global Resiliency only works with predetermined pairs of Regions. For more information, see Use Global Resiliency to deploy bots to other Regions.

Prerequisites

You should have the following prerequisites:

An AWS account with administrator access
Access to Amazon Lex Global Resiliency (contact your Amazon Connect Solutions Architect or Technical Account Manager)
Working knowledge of the following services:
- AWS CloudFormation
- Amazon CloudWatch
- AWS Lambda
- Amazon Lex

Create a sample Amazon Lex bot

To set up a sample bot for our use case, refer to Manage your Amazon Lex bot via AWS CloudFormation templates. For this example, we create a bot named BookHotel in the source Region (us-east-1). Complete the following steps:

Download the CloudFormation template and deploy it in the source Region (us-east-1). For instructions, see Create a stack from the CloudFormation console.

Upon successful deployment, the BookHotel bot will be created in the source Region.

On the Amazon Lex console, choose Bots in the navigation pane and locate the BookHotel.

Verify that the Global Resiliency option is available under Deployment in the navigation pane. If this option isn’t visible, the Global Resiliency feature may not be enabled for your account. In this case, refer to the prerequisites section for enabling the Global Resiliency feature.

Our sample BookHotel bot has one version (Version 1, in addition to the draft version) and an alias named BookHotelDemoAlias (in addition to the TestBotAlias).

Enable Global Resiliency

To activate Global Resiliency and set up bot replication in a replica Region, complete the following steps:

On the Amazon Lex console, choose us-east-1 as your Region.
Choose Bots in the navigation pane and locate the BookHotel.
Under Deployment in the navigation pane, choose Global Resiliency.

You can see the replication details here. Because you haven’t enabled Global Resiliency yet, all the details are blank.

Choose Create replica to create a draft version of your bot.

In your source Region (us-east-1), after the bot replication is complete, you will see Replication status as Enabled.

Switch to the replica Region (us-west-2).

You can see that the BookHotel bot is replicated. This is a read-only replica and the bot ID in the replica Region matches the bot ID in the source Region.

Under Deployment in the navigation pane, choose Global Resiliency.

You can see the replication details here, which are the same as that in the source Region BookHotel bot.

You have verified that the bot is replicated successfully after Global Resiliency is enabled. Only new versions and aliases created from this point onward will be replicated. As a next step, we create a bot version and alias to demonstrate the replication.

Create a new bot version and alias

Complete the following steps to create a new bot version and alias:

On the Amazon Lex console in your source Region (us-east-1), navigate to the BookHotel.
Choose Bot versions in the navigation pane, and choose Create new version to create Version 2.

Version 2 now has Global Resiliency enabled, whereas Version 1 and the draft version do not, because they were created prior to enabling Global Resiliency.

Choose Aliases in the navigation pane, then choose Create new alias.
Create a new alias for the BookHotel bot called BookHotelDemoAlias_GR and point that to the new version.

Similarly, the BookHotelDemoAlias_GR now has Global Resiliency enabled, whereas aliases created before enabling Global Resiliency, such as BookHotelDemoAlias and TestBotAlias, don’t have Global Resiliency enabled.

Choose Global Resiliency in the navigation pane to view the source and replication details.

The details for Last replicated version are now updated to Version 2.

Switch to the replica Region (us-west-2) and choose Global Resiliency in the navigation pane.

You can see that the new Global Resiliency enabled version (Version 2) is replicated and the new alias BookHotelDemoAlias_GR is also present.

You have verified that the new version and alias were created after Global Resiliency is replicated to the replica Region. You can now make Amazon Lex runtime calls to both Regions.

Handling integrations with Lambda and CloudWatch after enabling Global Resiliency

Amazon Lex has integrations with other AWS services such as enabling custom logic with Lambda functions and logging with conversation logs using CloudWatch and Amazon Simple Storage Service (Amazon S3). In this section, we associate a Lambda function and CloudWatch group for the BookHotel bot in the source Region (us-east-1) and validate its association in the replica Region (us-west-2).

Download the CloudFormation template to deploy a sample Lambda and CloudWatch log group.
Deploy the CloudFormation stack to the source Region (us-east-1). For instructions, see Create a stack from the CloudFormation console.

This will deploy a Lambda function (book-hotel-lambda) and a CloudWatch log group (/lex/book-hotel-bot) in the us-east-1 Region.

Deploy the CloudFormation stack to the replica Region (us-west-2).

This will deploy a Lambda function (book-hotel-lambda) and a CloudWatch log group (/lex/book-hotel-bot) in the us-west-2 Region. The Lambda function name and CloudWatch log group name must be the same in both Regions.

On the Amazon Lex console in the source Region (us-east-1), navigate to the BookHotel.
Choose Aliases in the navigation pane, and choose the BookHotelDemoAlias_GR.
In the Languages section, choose English (US).
Select the book-hotel-lambda function and associate it with the BookHotel bot by choosing Save.
Navigate back to the BookHotelDemoAlias_GR alias, and in the Conversation logs section, choose Manage conversation logs.
Enable Text logs and select the /lex/book-hotel-bot log group, then choose Save.

Conversation text logs are now enabled for the BookHotel bot in us-east-1.

Switch to the replica Region (us-west-2) and navigate to the BookHotel.
Choose Aliases in the navigation pane, and choose the BookHotelDemoAlias_GR.

You can see that the conversation logs are already associated with the /lex/book-hotel-bot CloudWatch group the us-west-2 Region.

In the Languages section, choose English (US).

You can see that the book-hotel-lambda function is associated with the BookHotel alias.

Through this process, we have demonstrated how Lambda functions and CloudWatch log groups are automatically associated with the corresponding bot resources in the replica Region for the replicated bots, providing a seamless and consistent integration across both Regions.

Disabling Global Resiliency

You have the flexibility to disable Global Resiliency at any time. By disabling Global Resiliency, your source bot, along with its associated aliases and versions, will no longer be replicated across other Regions. In this section, we demonstrate the process to disable Global Resiliency.

On the Amazon Lex console in your source Region (us-east-1), choose Bots in the navigation pane and locate the BookHotel.
Under Deployment in the navigation pane, choose Global Resiliency.
Choose Disable Global Resiliency.

Enter confirm in the confirmation box and choose Delete.

This action initiates the deletion of the replicated BookHotel bot in the replica Region.

The replication status will change to Deleting, and after a few minutes, the deletion process will be complete. You will then see the Create replica option available again. If you don’t see it, try refreshing the page.

Check the Bot versions page of the BookHotel bot to confirm that Version 2 is still the latest version.
Check the Aliases page to confirm that the BookHotelDemoAlias_GR alias is still present on the source bot.

Applications referring to this alias can continue to function as normal in the source Region.

Switch to the replica Region (us-west-2) to confirm that the BookHotel bot has been deleted from this Region.

You can reenable Global Resiliency on the source Region (us-east-1) by going through the process described earlier in this post.

Clean up

To prevent incurring charges, complete the following steps to clean up the resources created during this demonstration:

Disable Global Resiliency for the bot by following the instructions detailed earlier in this post.
Delete the book-hotel-lambda-cw-stack CloudFormation stack from the us-west-2. For instructions, see Delete a stack on the CloudFormation console.
Delete the book-hotel-lambda-cw-stack CloudFormation stack from the us-east-1.
Delete the book-hotel-stack CloudFormation stack from the us-east-1.

Integrations with Amazon Connect

Amazon Lex Global Resiliency seamlessly complements Amazon Connect Global Resiliency, providing you with a comprehensive solution for maintaining business continuity and resilience across your conversational AI and contact center infrastructure. Amazon Connect Global Resiliency enables you to automatically maintain your instances synchronized across two Regions, making sure that all configuration resources, such as contact flows, queues, and agents, are true replicas of each other.

With the addition of Amazon Lex Global Resiliency, Amazon Connect customers gain the added benefit of automated synchronization of their Amazon Lex V2 bots associated with their contact flows. This integration provides a consistent and uninterrupted experience during failover scenarios, because your Amazon Lex interactions seamlessly transition between Regions without any disruption. By combining these complementary features, you can achieve end-to-end resilience. This minimizes the risk of downtime and makes sure your conversational AI and contact center operations remain highly available and responsive, even in the case of Regional failures or capacity constraints.

Global Resiliency APIs

Global Resiliency provides API support to create and manage replicas. These are supported in the AWS CLI and AWS SDKs. In this section, we demonstrate usage with the AWS CLI.

Create a bot replica in the replica Region using the CreateBotReplica.
Monitor the bot replication status using the DescribeBotReplica.
List the replicated bots using the ListBotReplicas.
List all the version replication statuses applicable for Global Resiliency using the ListBotVersionReplicas.

This list includes only the replicated bot versions, which were created after Global Resiliency was enabled. In the API response, a botVersionReplicationStatus of Available indicates that the bot version was replicated successfully.

List all the alias replication statuses applicable for Global Resiliency using the ListBotAliasReplicas.

This list includes only the replicated bot aliases, which were created after Global Resiliency was enabled. In the API response, a botAliasReplicationStatus of Available indicates that the bot alias was replicated successfully.

Conclusion

In this post, we introduced the Global Resiliency feature for Amazon Lex V2 bots. We discussed the process to enable Global Resiliency using the console and reviewed some of the new APIs released as part of this feature.

As the next step, you can explore Global Resiliency and apply the techniques discussed in this post to replicate bots and bot versions across Regions. This hands-on practice will solidify your understanding of managing and replicating Amazon Lex V2 bots in your solution architecture.

About the Authors

Priti Aryamane is a Specialty Consultant at AWS Professional Services. With over 15 years of experience in contact centers and telecommunications, Priti specializes in helping customers achieve their desired business outcomes with customer experience on AWS using Amazon Lex, Amazon Connect, and generative AI features.

Sanjeet Sanda is a Specialty Consultant at AWS Professional Services with over 20 years of experience in telecommunications, contact center technology, and customer experience. He specializes in designing and delivering customer-centric solutions with a focus on integrating and adapting existing enterprise call centers into Amazon Connect and Amazon Lex environments. Sanjeet is passionate about streamlining adoption processes by using automation wherever possible. Outside of work, Sanjeet enjoys hanging out with his family, having barbecues, and going to the beach.

Yogesh Khemka is a Senior Software Development Engineer at AWS, where he works on large language models and natural language processing. He focuses on building systems and tooling for scalable distributed deep learning training and real-time inference.

Create and fine-tune sentence transformers for enhanced classification accuracy

October 30, 2024

by Kara Yang Amazon AWS

Sentence transformers are powerful deep learning models that convert sentences into high-quality, fixed-length embeddings, capturing their semantic meaning. These embeddings are useful for various natural language processing (NLP) tasks such as text classification, clustering, semantic search, and information retrieval.

In this post, we showcase how to fine-tune a sentence transformer specifically for classifying an Amazon product into its product category (such as toys or sporting goods). We showcase two different sentence transformers, paraphrase-MiniLM-L6-v2 and a proprietary Amazon large language model (LLM) called M5_ASIN_SMALL_V2.0, and compare their results. M5 LLMS are BERT-based LLMs fine-tuned on internal Amazon product catalog data using product title, bullet points, description, and more. They are currently being used for use cases such as automated product classification and similar product recommendations. Our hypothesis is that M5_ASIN_SMALL_V2.0 will perform better for the use case of Amazon product category classification due to it being fine-tuned with Amazon product data. We prove this hypothesis in the following experiment illustrated in this post.

Solution overview

In this post, we demonstrate how to fine-tune a sentence transformer with Amazon product data and how to use the resulting sentence transformer to improve classification accuracy of product categories using an XGBoost decision tree. For this demonstration, we use a public Amazon product dataset called Amazon Product Dataset 2020 from a kaggle competition. This dataset contains the following attributes and fields:

Domain name – amazon.com
Date range – January 1, 2020, through January 31, 2020
File extension – CSV
Available fields – Uniq Id, Product Name, Brand Name, Asin, Category, Upc Ean Code, List Price, Selling Price, Quantity, Model Number, About Product, Product Specification, Technical Details, Shipping Weight, Product Dimensions, Image, Variants, SKU, Product Url, Stock, Product Details, Dimensions, Color, Ingredients, Direction To Use, Is Amazon Seller, Size Quantity Variant, and Product Description
Label field – Category

Prerequisites

Before you begin, install the following packages. You can do this in either an Amazon SageMaker notebook or your local Jupyter notebook by running the following commands:

!pip install sentencepiece --quiet
!pip install sentence_transformers --quiet
!pip install xgboost –-quiet
!pip install scikit-learn –-quiet/

Preprocess the data

The first step needed for fine-tuning a sentence transformer is to preprocess the Amazon product data for the sentence transformer to be able to consume the data and fine-tune effectively. It involves normalizing the text data, defining the product’s main category by extracting the first category from the Category field, and selecting the most important fields from the dataset that contribute to classifying the product’s main category accurately. We use the following code for preprocessing:

import pandas as pd
from sklearn.preprocessing import LabelEncoder

data = pd.read_csv('marketing_sample_for_amazon_com-ecommerce__20200101_20200131__10k_data.csv')
data.columns = data.columns.str.lower().str.replace(' ', '_')
data['main_category'] = data['category'].str.split("|").str[0]
data["all_text"] = data.apply(
    lambda r: " ".join(
        [
            str(r["product_name"]) if pd.notnull(r["product_name"]) else "",
            str(r["about_product"]) if pd.notnull(r["about_product"]) else "",
            str(r["product_specification"]) if pd.notnull(r["product_specification"]) else "",
            str(r["technical_details"]) if pd.notnull(r["technical_details"]) else ""
        ]
    ),
    axis=1
)
label_encoder = LabelEncoder()
labels_transform = label_encoder.fit_transform(data['main_category'])
data['label']=labels_transform
data[['all_text','label']]

The following screenshot shows an example of what our dataset looks like after it has been preprocessed.

Fine-tune the sentence transformer paraphrase-MiniLM-L6-v2

The first sentence transformer we fine-tune is called paraphrase-MiniLM-L6-v2. It uses the popular BERT model as its underlying architecture to transform product description text into a 384-dimensional dense vector embedding that will be consumed by our XGBoost classifier for product category classification. We use the following code to fine-tune paraphrase-MiniLM-L6-v2 using the preprocessed Amazon product data:

from sentence_transformers import SentenceTransformer
model_name='paraphrase-MiniLM-L6-v2'
model = SentenceTransformer(model_name)

The first step is to define a classification head that represents the 24 product categories that an Amazon product can be classified into. This classification head will be used to train the sentence transformer specifically to be more effective at transforming product descriptions according to the 24 product categories. The idea is that all product descriptions that are within the same category should be transformed into a vector embedding that is closer in distance compared to product descriptions that belong in different categories.

The following code is for fine-tuning sentence transformer 1:

import torch.nn as nn

# Define classification head
class ClassificationHead(nn.Module):
    def __init__(self, embedding_dim, num_classes):
        super(ClassificationHead, self).__init__()
        self.linear = nn.Linear(embedding_dim, num_classes)

    def forward(self, features):
        x = features['sentence_embedding']
        x = self.linear(x)
        return x

# Define the number of classes for a classification task.
num_classes = 24
print('class number:', num_classes)
classification_head = ClassificationHead(model.get_sentence_embedding_dimension(), num_classes)

# Combine SentenceTransformer model and classification head."
class SentenceTransformerWithHead(nn.Module):
    def __init__(self, transformer, head):
        super(SentenceTransformerWithHead, self).__init__()
        self.transformer = transformer
        self.head = head

    def forward(self, input):
        features = self.transformer(input)
        logits = self.head(features)
        return logits

model_with_head = SentenceTransformerWithHead(model, classification_head)

We then set the fine-tuning parameters. For this post, we train on five epochs, optimize for cross-entropy loss, and use the AdamW optimization method. We chose epoch 5 because, after testing various epoch values, we observed that the loss minimized at epoch 5. This made it the optimal number of training iterations for achieving the best classification results.

The following code is for fine-tuning sentence transformer 2:

import os
os.environ["TORCH_USE_CUDA_DSA"] = "1"
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

from sentence_transformers import SentenceTransformer, InputExample, LoggingHandler
import torch
from torch.utils.data import DataLoader
from transformers import AdamW, get_linear_schedule_with_warmup

train_sentences = data['all_text']
train_labels = data['label']
# training parameters
num_epochs = 5
batch_size = 2
learning_rate = 2e-5

# Convert the dataset to PyTorch tensors.
train_examples = [InputExample(texts=[s], label=l) for s, l in zip(train_sentences, train_labels)]

# Customize collate_fn to convert InputExample objects into tensors.
def collate_fn(batch):
    texts = [example.texts[0] for example in batch]
    labels = torch.tensor([example.label for example in batch])
    return texts, labels

train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=batch_size, collate_fn=collate_fn)

# Define the loss function, optimizer, and learning rate scheduler.
criterion = nn.CrossEntropyLoss()
optimizer = AdamW(model_with_head.parameters(), lr=learning_rate)
total_steps = len(train_dataloader) * num_epochs
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=total_steps)

# Training loop
loss_list=[]
for epoch in range(num_epochs):
    model_with_head.train()
    for step, (texts, labels) in enumerate(train_dataloader):
        labels = labels.to(model.device)
        optimizer.zero_grad()

        # Encode text and pass through classification head.
        inputs = model.tokenize(texts)
        input_ids = inputs['input_ids'].to(model.device)
        input_attention_mask = inputs['attention_mask'].to(model.device)
        inputs_final = {'input_ids': input_ids, 'attention_mask': input_attention_mask}
        
        # move model_with_head to the same device
        model_with_head = model_with_head.to(model.device)
        logits = model_with_head(inputs_final)
        
        loss = criterion(logits, labels)
        loss.backward()
        optimizer.step()
        scheduler.step()
        if step % 100 == 0:
            print(f"Epoch {epoch}, Step {step}, Loss: {loss.item()}")

    print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}')
    model_save_path = f'./intermediate-output/epoch-{epoch}'
    model.save(model_save_path)
    loss_list.append(loss.item())
# Save the final model
model_final_save_path='st_ft_epoch_5'
model.save(model_final_save_path)

To observe whether our resulting fine-tuned sentence transformer improves our product category classification accuracy, we use it as our text embedder in the XGBoost classifier in the next step.

XGBoost classification

XGBoost (Extreme Gradient Boosting) classification is a machine learning technique used for classification tasks. It’s an implementation of the gradient boosting framework designed to be efficient, flexible, and portable. For this post, we have XGBoost consume the product description text embedding output of our sentence transformers and observe product category classification accuracy. We use the following code to use the standard paraphrase-MiniLM-L6-v2 sentence transformer before it was fine-tuned to classify Amazon products to their respective categories:

from sklearn.model_selection import train_test_split
import xgboost as xgb
from sklearn.metrics import accuracy_score

model = SentenceTransformer('paraphrase-MiniLM-L6-v2')  
data['text_embedding'] = data['all_text'].apply(lambda x: model.encode(str(x)))
text_embeddings = pd.DataFrame(data['text_embedding'].tolist(), index=data.index, dtype=float)

# Convert numeric columns stored as strings to floats
numeric_columns = ['selling_price', 'shipping_weight', 'product_dimensions']  # Add more columns as needed
for col in numeric_columns:
    data[col] = pd.to_numeric(data[col], errors='coerce')

# Convert categorical columns to category type
categorical_columns = ['model_number', 'is_amazon_seller']  # Add more columns as needed
for col in categorical_columns:
    data[col] = data[col].astype('category')
    
X_0 = data[['selling_price','model_number','is_amazon_seller']]
X = pd.concat([X_0, text_embeddings], axis=1)
label_encoder = LabelEncoder()
data['main_category_encoded'] = label_encoder.fit_transform(data['main_category'])
y = data['main_category_encoded']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Re-encode the labels to ensure they are consecutive integers starting from 0
unique_labels = sorted(set(y_train) | set(y_test))
label_mapping = {label: idx for idx, label in enumerate(unique_labels)}

y_train = y_train.map(label_mapping)
y_test = y_test.map(label_mapping)

# Enable categorical support for XGBoost
dtrain = xgb.DMatrix(X_train, label=y_train, enable_categorical=True)
dtest = xgb.DMatrix(X_test, label=y_test, enable_categorical=True)

param = {
    'max_depth': 6,
    'eta': 0.3,
    'objective': 'multi:softmax',
    'num_class': len(label_mapping),
    'eval_metric': 'mlogloss'
}

num_round = 100
bst = xgb.train(param, dtrain, num_round)

# Evaluate the model
y_pred = bst.predict(dtest)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

Accuracy: 0.78

We observe a 78% accuracy using the stock paraphrase-MiniLM-L6-v2 sentence transformer. To observe the results of the fine-tuned paraphrase-MiniLM-L6-v2 sentence transformer, we need to update the beginning of the code as follows. All other code remains the same.

model = SentenceTransformer('st_ft_epoch_5')  
data['text_embedding_miniLM_ft10'] = data['all_text'].apply(lambda x: model.encode(str(x)))
text_embeddings = pd.DataFrame(data['text_embedding_finetuned'].tolist(), index=data.index, dtype=float)
X_pa_finetuned = pd.concat([X_0, text_embeddings], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X_pa_finetuned, y, test_size=0.2, random_state=42)

# Re-encode the labels to ensure they are consecutive integers starting from 0
unique_labels = sorted(set(y_train) | set(y_test))
label_mapping = {label: idx for idx, label in enumerate(unique_labels)}

y_train = y_train.map(label_mapping)
y_test = y_test.map(label_mapping)

# Build and train the XGBoost model
# Enable categorical support for XGBoost
dtrain = xgb.DMatrix(X_train, label=y_train, enable_categorical=True)
dtest = xgb.DMatrix(X_test, label=y_test, enable_categorical=True)

param = {
    'max_depth': 6,
    'eta': 0.3,
    'objective': 'multi:softmax',
    'num_class': len(label_mapping),
    'eval_metric': 'mlogloss'
}

num_round = 100
bst = xgb.train(param, dtrain, num_round)

y_pred = bst.predict(dtest)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Optionally, convert the predicted labels back to the original category labels
inverse_label_mapping = {idx: label for label, idx in label_mapping.items()}
y_pred_labels = pd.Series(y_pred).map(inverse_label_mapping)

Accuracy: 0.94

With the fine-tuned paraphrase-MiniLM-L6-v2 sentence transformer, we observe a 94% accuracy, a 16% increase from the baseline of 78% accuracy. From this observation, we conclude that fine-tuning paraphrase-MiniLM-L6-v2 is effective for classifying Amazon product data into product categories.

Fine-tune the sentence transformer M5_ASIN_SMALL_V20

Now we create a sentence transformer from a BERT-based model called M5_ASIN_SMALL_V2.0. It’s a 40-million-parameter BERT-based model trained at M5, an internal team at Amazon specializing in fine-tuning LLMs using Amazon product data. It was distilled from a larger teacher model (approximately 5 billion parameters), which was pre-trained on a large amount of unlabeled ASIN data and pre-fine-tuned on a set of Amazon supervised learning tasks (multi-task pre-fine-tuning). It is a multi-task, multi-lingual, multi-locale, and multi-modal BERT-based encoder-only model trained on text and structured data input. Its neural network architectural details are as follows:

Model backbone:
Hidden size: 384
Number of hidden layers: 24
Number of attention heads: 16
Intermediate size: 1536
Vocabulary size: 256,035
Number of backbone parameters: 42,587,904
Number of word embedding parameters (bert.embedding.*): 98,517,504
Total number of parameters: 141,259,023

Because M5_ASIN_SMALL_V20 was pre-trained on Amazon product data specifically, we hypothesize that building a sentence transformer from it will increase the accuracy of product category classification. We complete the following steps to build a sentence transformer from M5_ASIN_SMALL_V20, fine-tune it, and input it into an XGBoost classifier to observe accuracy impact:

Load a pre-trained M5 model that you want to use as the base encoder.
Use the M5 model within the SentenceTransformer framework to create a sentence transformer.
Add a pooling layer to create fixed-size sentence embeddings from the variable-length output of the BERT model.
Combine the M5 model and pooling layer into a single model.
Fine-tune the model on a relevant dataset.

See the following code for Steps 1–3:

from sentence_transformers import models 
from transformers import AutoTokenizer

# Step 1: Load Pre-trained M5 Model
model_path = 'M5_ASIN_SMALL_V20'  # or your custom model path
transformer_model = models.Transformer(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Step 2: Define Pooling Layer
pooling_model = models.Pooling(transformer_model.get_word_embedding_dimension(),
                               pooling_mode_mean_tokens=True)

# Step 3: Create SentenceTransformer Model
model_mean_m5_base = SentenceTransformer(modules=[transformer_model, pooling_model])

The rest of the code remains the same as fine-tuning for the paraphrase-MiniLM-L6-v2 sentence transformer, except that we use the fine-tuned M5 sentence transformer instead to create embeddings for the texts in the dataset:

loaded_model = SentenceTransformer('m5_ft_epoch_5_mean')
data['text_embedding_m5'] = data['all_text'].apply(lambda x: loaded_model.encode(str(x)))

Result

We observe similar results to paraphrase-MiniLM-L6-v2 when looking at accuracy before fine-tuning, observing a 78% accuracy for M5_ASIN_SMALL_V20. However, we observe that the fine-tuned M5_ASIN_SMALL_V20 sentence transformer performs better than the fine-tuned paraphrase-MiniLM-L6-v2. Its accuracy is 98%, compared to 94% for the fine-tuned paraphrase-MiniLM-L6-v2. We fine-tuned the sentence transformers for 5 epochs, because experiments showed this was the optimal number to minimize loss. The following graph summarizes our observations of accuracy improvement with fine-tuning for 5 epochs in a single comparison chart.

Clean up

We recommend using GPUs to fine-tune the sentence transformers, for example, ml.g5.4xlarge or ml.g4dn.16xlarge. Be sure to clean up resources to avoid incurring additional costs.

If you’re using a SageMaker notebook instance, refer to Clean up Amazon SageMaker notebook instance resources. If you’re using Amazon SageMaker Studio, refer to Delete or stop your Studio running instances, applications, and spaces.

Conclusion

In this post, we explored sentence transformers and how to use them effectively for text classification tasks. We dived deep into the sentence transformer paraphrase-MiniLM-L6-v2, demonstrated how to use a BERT-based model like M5_ASIN_SMALL_V20 to create a sentence transformer, showed how to fine-tune sentence transformers, and showed the accuracy effects of fine-tuning sentence transformers.

Fine-tuning sentence transformers has proven to be highly effective for classifying product descriptions into categories, significantly enhancing prediction accuracy. As a next step, we encourage you to explore different sentence transformers from Hugging Face.

Lastly, if you want to explore M5, note that it is proprietary to Amazon and you can only access it as an Amazon partner or customer as of the time of this publication. Connect with your Amazon point of contact if you’re an Amazon partner or customer wanting to use M5, and they will guide you through M5’s offerings and how it can be used for your use case.

About the Authors

Kara Yang is a Data Scientist at AWS Professional Services in the San Francisco Bay Area, with extensive experience in AI/ML. She specializes in leveraging cloud computing, machine learning, and Generative AI to help customers address complex business challenges across various industries. Kara is passionate about innovation and continuous learning.

Farshad Harirchi is a Principal Data Scientist at AWS Professional Services. He helps customers across industries, from retail to industrial and financial services, with the design and development of generative AI and machine learning solutions. Farshad brings extensive experience in the entire machine learning and MLOps stack. Outside of work, he enjoys traveling, playing outdoor sports, and exploring board games.

James Poquiz is a Data Scientist with AWS Professional Services based in Orange County, California. He has a BS in Computer Science from the University of California, Irvine and has several years of experience working in the data domain having played many different roles. Today he works on implementing and deploying scalable ML solutions to achieve business outcomes for AWS clients.

Empower your generative AI application with a comprehensive custom observability solution

October 29, 2024

by Ishan Singh Amazon AWS

Recently, we’ve been witnessing the rapid development and evolution of generative AI applications, with observability and evaluation emerging as critical aspects for developers, data scientists, and stakeholders. Observability refers to the ability to understand the internal state and behavior of a system by analyzing its outputs, logs, and metrics. Evaluation, on the other hand, involves assessing the quality and relevance of the generated outputs, enabling continual improvement.

Comprehensive observability and evaluation are essential for troubleshooting, identifying bottlenecks, optimizing applications, and providing relevant, high-quality responses. Observability empowers you to proactively monitor and analyze your generative AI applications, and evaluation helps you collect feedback, refine models, and enhance output quality.

In the context of Amazon Bedrock, observability and evaluation become even more crucial. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. As the complexity and scale of these applications grow, providing comprehensive observability and robust evaluation mechanisms are essential for maintaining high performance, quality, and user satisfaction.

We have built a custom observability solution that Amazon Bedrock users can quickly implement using just a few key building blocks and existing logs using FMs, Amazon Bedrock Knowledge Bases, Amazon Bedrock Guardrails, and Amazon Bedrock Agents. This solution uses decorators in your application code to capture and log metadata such as input prompts, output results, run time, and custom metadata, offering enhanced security, ease of use, flexibility, and integration with native AWS services.

Notably, the solution supports comprehensive Retrieval Augmented Generation (RAG) evaluation so you can assess the quality and relevance of generated responses, identify areas for improvement, and refine the knowledge base or model accordingly.

In this post, we set up the custom solution for observability and evaluation of Amazon Bedrock applications. Through code examples and step-by-step guidance, we demonstrate how you can seamlessly integrate this solution into your Amazon Bedrock application, unlocking a new level of visibility, control, and continual improvement for your generative AI applications.

By the end of this post, you will:

Understand the importance of observability and evaluation in generative AI applications
Learn about the key features and benefits of this solution
Gain hands-on experience in implementing the solution through step-by-step demonstrations
Explore best practices for integrating observability and evaluation into your Amazon Bedrock workflows

Prerequisites

To implement the observability solution discussed in this post, you need the following prerequisites:

An active Amazon Web Services (AWS) account and AWS Identity and Access Management (IAM) role with Amazon Bedrock access
Access to the FMs you plan to use
Basic understanding of decorators in your preferred programming language (Python or Node.js)
A clone of the amazon-bedrock-samples GitHub repository
Basic familiarity with AWS services such as Amazon Data Firehose, Amazon Athena, and AWS Glue crawlers (optional, depending on the specific components used in the solution)

Solution overview

The observability solution for Amazon Bedrock empowers users to track and analyze interactions with FMs, knowledge bases, guardrails, and agents using decorators in their source code. Key highlights of the solution include:

Decorator – Decorators are applied to functions invoking Amazon Bedrock APIs, capturing input prompt, output results, custom metadata, custom metrics, and latency related metrics.
Flexible logging –You can use this solution to store logs either locally or in Amazon Simple Storage Service (Amazon S3) using Amazon Data Firehose, enabling integration with existing monitoring infrastructure. Additionally, you can choose what gets logged.
Dynamic data partitioning – The solution enables dynamic partitioning of observability data based on different workflows or components of your application, such as prompt preparation, data preprocessing, feedback collection, and inference. This feature allows you to separate data into logical partitions, making it easier to analyze and process data later.
Security – The solution uses AWS services and adheres to AWS Cloud Security best practices so your data remains within your AWS account.
Cost optimization – This solution uses serverless technologies, making it cost-effective for the observability infrastructure. However, some components may incur additional usage-based costs.
Multiple programming language support – The GitHub repository provides the observability solution in both Python and Node.js versions, catering to different programming preferences.

Here’s a high-level overview of the observability solution architecture:

The following steps explain how the solution works:

Application code using Amazon Bedrock is decorated with @bedrock_logs.watch to save the log
Logged data streams through Amazon Data Firehose
AWS Lambda transforms the data and applies dynamic partitioning based on call_type variable
Amazon S3 stores the data securely
Optional components for advanced analytics
AWS Glue creates tables from S3 data
Amazon Athena enables data querying
Visualize logs and insights in your favorite dashboard tool

This architecture provides comprehensive logging, efficient data processing, and powerful analytics capabilities for your Amazon Bedrock applications.

Getting started

To help you get started with the observability solution, we have provided example notebooks in the attached GitHub repository, covering knowledge bases, evaluation, and agents for Amazon Bedrock. These notebooks demonstrate how to integrate the solution into your Amazon Bedrock application and showcase various use cases and features including feedback collected from users or quality assurance (QA) teams.

The repository contains well-documented notebooks that cover topics such as:

Setting up the observability infrastructure
Integrating the decorator pattern into your application code
Logging model inputs, outputs, and custom metadata
Collecting and analyzing feedback data
Evaluating model responses and knowledge base performance
Example visualization for observability data using AWS services

To get started with the example notebooks, follow these steps:

Clone the GitHub repository

git clone https://github.com/aws-samples/amazon-bedrock-samples.git

Navigate to the observability solution directory

cd amazon-bedrock-samples/evaluation-observe/Custom-Observability-Solution

Follow the instructions in the README file to set up the required AWS resources and configure the solution
Open the provided Jupyter notebooks and follow along with the examples and demonstrations

These notebooks provide a hands-on learning experience and serve as a starting point for integrating our solution into your generative AI applications. Feel free to explore, modify, and adapt the code examples to suit your specific requirements.

Key features

The solution offers a range of powerful features to streamline observability and evaluation for your generative AI applications on Amazon Bedrock:

Decorator-based implementation – Use decorators to seamlessly integrate observability logging into your application functions, capturing inputs, outputs, and metadata without modifying the core logic
Selective logging – Choose what to log by selectively capturing function inputs, outputs, or excluding sensitive information or large data structures that might not be relevant for observability
Logical data partitioning – Create logical partitions in the observability data based on different workflows or application components, enabling easier analysis and processing of specific data subsets
Human-in-the-loop evaluation – Collect and associate human feedback with specific model responses or sessions, facilitating comprehensive evaluation and continual improvement of your application’s performance and output quality
Multi-component support – Support observability and evaluation for various Amazon Bedrock components, including InvokeModel, batch inference, knowledge bases, agents, and guardrails, providing a unified solution for your generative AI applications
Comprehensive evaluation – Evaluate the quality and relevance of generated responses, including RAG evaluation for knowledge base applications, using the open source RAGAS library to compute evaluation metrics

This concise list highlights the key features you can use to gain insights, optimize performance, and drive continual improvement for your generative AI applications on Amazon Bedrock. For a detailed breakdown of the features and implementation specifics, refer to the comprehensive documentation in the GitHub repository.

Implementation and best practices

The solution is designed to be modular and flexible so you can customize it according to your specific requirements. Although the implementation is straightforward, following best practices is crucial for the scalability, security, and maintainability of your observability infrastructure.

Solution deployment

This solution includes an AWS CloudFormation template that streamlines the deployment of required AWS resources, providing consistent and repeatable deployments across environments. The CloudFormation template provisions resources such as Amazon Data Firehose delivery streams, AWS Lambda functions, Amazon S3 buckets, and AWS Glue crawlers and databases.

Decorator pattern

The solution uses the decorator pattern to integrate observability logging into your application functions seamlessly. The @bedrock_logs.watch decorator wraps your functions, automatically logging inputs, outputs, and metadata to Amazon Kinesis Firehose. Here’s an example of how to use the decorator:

# import observability
from observability import BedrockLogs

# instantiate BedrockLogs in Firehose mode
bedrock_logs = BedrockLogs(delivery_stream_name='your-firehose-delivery-stream', feedback_variables=True)

# decorate your function
@bedrock_logs.watch(capture_input=True, capture_output=True, call_type='<your-custom-dataset-name>')
def your_function(arg1, arg2):
    # Your function code here along with any custom metric of your choosing
    return output

Human-in-the-loop evaluation

The solution supports human-in-the-loop evaluation so you can incorporate human feedback into the performance evaluation of your generative AI application. You can involve end users, experts, or QA teams in the evaluation process, providing insights to enhance output quality and relevance. Here’s an example of how you can implement human-in-the-loop evaluation:

@bedrock_logs.watch(call_type='Retrieve-and-Generate-with-KB')
def main(input_arguments):
    # Your code to interact with Amazon Bedrock Knowledge Base or Agent
    return response, custom_metric, etc.

@bedrock_logs.watch(call_type='observation-feedback')
def observation_level_feedback(feedback):
    pass

# Invoke main function with user input and get run_id and observation_id
tuple_of_function_outputs, run_id, observation_id = main(input_arguments)

# Collect human feedback on model response in your application
user_feedback = 'thumbs-up'

observation_feedback_from_front_end = {
    'user_id': 'User-1',
    'f_run_id': run_id,
    'f_observation_id': observation_id,
    'actual_feedback': user_feedback
}

# Log the human-in-loop feedback using observation_level_feedback function
observation_level_feedback(observation_feedback_from_front_end)

By using the run_id and observation_id generated, you can associate human feedback with specific model responses or sessions. This feedback can then be analyzed and used to refine the knowledge base, fine-tune models, or identify areas for improvement.

Best practices

It’s recommended to follow these best practices:

Plan call types in advance – Determine the logical partitions (call_type) for your observability data based on different workflows or application components. This enables easier analysis and processing of specific data subsets.
Use feedback variables – Configure feedback_variables=True when initializing BedrockLogs to generate run_id and observation_id. These IDs can be used to join logically partitioned datasets, associating feedback data with corresponding model responses.
Extend for general steps – Although the solution is designed for Amazon Bedrock, you can use the decorator pattern to log observability data for general steps such as prompt preparation, postprocessing, or other custom workflows.
Log custom metrics – If you need to calculate custom metrics such as latency, context relevance, faithfulness, or any other metric, you can pass these values in the response of your decorated function, and the solution will log them alongside the observability data.
Selective logging – Use the capture_input and capture_output parameters to selectively log function inputs or outputs or exclude sensitive information or large data structures that might not be relevant for observability.
Comprehensive evaluation – Evaluate the quality and relevance of generated responses, including RAG evaluation for knowledge base applications, using the KnowledgeBasesEvaluations

By following these best practices and using the features of the solution, you can set up comprehensive observability and evaluation for your generative AI applications to gain valuable insights, identify areas for improvement, and enhance the overall user experience.

In the next post in this three-part series, we dive deeper into observability and evaluation for RAG and agent-based generative AI applications, providing in-depth insights and guidance.

Clean up

To avoid incurring costs and maintain a clean AWS account, you can remove the associated resources by deleting the AWS CloudFormation stack you created for this walkthrough. You can follow the steps provided in the Deleting a stack on the AWS CloudFormation console documentation to delete the resources created for this solution.

Conclusion and next steps

This comprehensive solution empowers you to seamlessly integrate comprehensive observability into your generative AI applications in Amazon Bedrock. Key benefits include streamlined integration, selective logging, custom metadata tracking, and comprehensive evaluation capabilities, including RAG evaluation. Use AWS services such as Athena to analyze observability data, drive continual improvement, and connect with your favorite dashboard tool to visualize the data.

This post focused is on Amazon Bedrock, but it can be extended to broader machine learning operations (MLOps) workflows or integrated with other AWS services such as AWS Lambda or Amazon SageMaker. We encourage you to explore this solution and integrate it into your workflows. Access the source code and documentation in our GitHub repository and start your integration journey. Embrace the power of observability and unlock new heights for your generative AI applications.

About the authors

Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.

Chris Pecora is a Generative AI Data Scientist at Amazon Web Services. He is passionate about building innovative products and solutions while also focused on customer-obsessed science. When not running experiments and keeping up with the latest developments in generative AI, he loves spending time with his kids.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Automate Amazon Bedrock batch inference: Building a scalable and efficient pipeline

October 29, 2024

by Yanyan Zhang Amazon AWS

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Batch inference in Amazon Bedrock efficiently processes large volumes of data using foundation models (FMs) when real-time results aren’t necessary. It’s ideal for workloads that aren’t latency sensitive, such as obtaining embeddings, entity extraction, FM-as-judge evaluations, and text categorization and summarization for business reporting tasks. A key advantage is its cost-effectiveness, with batch inference workloads charged at a 50% discount compared to On-Demand pricing. Refer to Supported Regions and models for batch inference for current supporting AWS Regions and models.

Although batch inference offers numerous benefits, it’s limited to 10 batch inference jobs submitted per model per Region. To address this consideration and enhance your use of batch inference, we’ve developed a scalable solution using AWS Lambda and Amazon DynamoDB. This post guides you through implementing a queue management system that automatically monitors available job slots and submits new jobs as slots become available.

We walk you through our solution, detailing the core logic of the Lambda functions. By the end, you’ll understand how to implement this solution so you can maximize the efficiency of your batch inference workflows on Amazon Bedrock. For instructions on how to start your Amazon Bedrock batch inference job, refer to Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock.

The power of batch inference

Organizations can use batch inference to process large volumes of data asynchronously, making it ideal for scenarios where real-time results are not critical. This capability is particularly useful for tasks such as asynchronous embedding generation, large-scale text classification, and bulk content analysis. For instance, businesses can use batch inference to generate embeddings for vast document collections, classify extensive datasets, or analyze substantial amounts of user-generated content efficiently.

One of the key advantages of batch inference is its cost-effectiveness. Amazon Bedrock offers select FMs for batch inference at 50% of the On-Demand inference price. Organizations can process large datasets more economically because of this significant cost reduction, making it an attractive option for businesses looking to optimize their generative AI processing expenses while maintaining the ability to handle substantial data volumes.

Solution overview

The solution presented in this post uses batch inference in Amazon Bedrock to process many requests efficiently using the following solution architecture.

This architecture workflow includes the following steps:

A user uploads files to be processed to an Amazon Simple Storage Service (Amazon S3) bucket br-batch-inference-{Account_Id}-{AWS-Region} in the to-process folder. Amazon S3 invokes the {stack_name}-create-batch-queue-{AWS-Region} Lambda function.
The invoked Lambda function creates new job entries in a DynamoDB table with the status as Pending. The DynamoDB table is crucial for tracking and managing the batch inference jobs throughout their lifecycle. It stores information such as job ID, status, creation time, and other metadata.
The Amazon EventBridge rule scheduled to run every 15 minutes invokes the {stack_name}-process-batch-jobs-{AWS-Region} Lambda function.
The {stack_name}-process-batch-jobs-{AWS-Region} Lambda function performs several key tasks:
- Scans the DynamoDB table for jobs in InProgress, Submitted, Validation and Scheduled status
- Updates job status in DynamoDB based on the latest information from Amazon Bedrock
- Calculates available job slots and submits new jobs from the Pending queue if slots are available
- Handles error scenarios by updating job status to Failed and logging error details for troubleshooting
The Lambda function makes the GetModelInvocationJob API call to get the latest status of the batch inference jobs from Amazon Bedrock
The Lambda function then updates the status of the jobs in DynamoDB using the UpdateItem API call, making sure that the table always reflects the most current state of each job
The Lambda function calculates the number of available slots before the Service Quota Limit for batch inference jobs is reached. Based on this, it queries for jobs in the Pending state that can be submitted
If there is a slot available, the Lambda function will make CreateModelInvocationJob API calls to create new batch inference jobs for the pending jobs
It updates the DynamoDB table with the status of the batch inference jobs created in the previous step
After one batch job is complete, its output files will be available in the S3 bucket br-batch-inference-{Account_Id}-{AWS-Region} processed folder

Prerequisites

To perform the solution, you need the following prerequisites:

An active AWS account.
An AWS Region from the list of batch inference supported Regions for Amazon Bedrock.
Access to your selected models hosted on Amazon Bedrock. Make sure the selected model has been enabled in Amazon Bedrock.
If you plan to use your own AWS Identity and Access Management (IAM) role for batch inference, create it with a trust policy and Amazon S3 access (read access to the folder containing input data and write access to the folder storing output data).

Deployment guide

To deploy the pipeline, complete the following steps:

Choose the Launch Stack button:
Choose Next, as shown in the following screenshot
Specify the pipeline details with the options fitting your use case:
- Stack name (Required) – The name you specified for this AWS CloudFormation. The name must be unique in the region in which you’re creating it.
- ModelId (Required) – Provide the model ID that you need your batch job to run with.
- RoleArn (Optional) – By default, the CloudFormation stack will deploy a new IAM role with the required permissions. If you have a role you want to use instead of creating a new role, provide the IAM role Amazon Resource Name (ARN) that has sufficient permission to create a batch inference job in Amazon Bedrock and read/write in the created S3 bucket br-batch-inference-{Account_Id}-{AWS-Region}. Follow the instructions in the prerequisites section to create this role.

In the Amazon Configure stack options section, add optional tags, permissions, and other advanced settings if needed. Or you can just leave it blank and choose Next, as shown in the following screenshot.
Review the stack details and select I acknowledge that AWS CloudFormation might create AWS IAM resources, as shown in the following screenshot.
Choose Submit. This initiates the pipeline deployment in your AWS account.
After the stack is deployed successfully, you can start using the pipeline. First, create a /to-process folder under the created Amazon S3 location for input. A .jsonl uploaded to this folder will have a batch job created with the selected model. The following is a screenshot of the DynamoDB table where you can track the job status and other types of metadata related to the job.
After your first batch job from the pipeline is complete, the pipeline will create a /processed folder under the same bucket, as shown in the following screenshot. Outputs from the batch jobs created by this pipeline will be stored in this folder.
To start using this pipeline, upload the .jsonl files you’ve prepared for batch inference in Amazon Bedrock

You’re done! You’ve successfully deployed your pipeline and you can check the batch job status in the Amazon Bedrock console. If you want to have more insights about each .jsonl file’s status, navigate to the created DynamoDB table {StackName}-DynamoDBTable-{UniqueString} and check the status there. You may need to wait up to 15 minutes to observe the batch jobs created because EventBridge is scheduled to scan DynamoDB every 15 minutes.

Clean up

If you no longer need this automated pipeline, follow these steps to delete the resources it created to avoid additional cost:

On the Amazon S3 console, manually delete the contents inside buckets. Make sure the bucket is empty before moving to step 2.
On the AWS CloudFormation console, choose Stacks in the navigation pane.
Select the created stack and choose Delete, as shown in the following screenshot.

This automatically deletes the deployed stack.

Conclusion

In this post, we’ve introduced a scalable and efficient solution for automating batch inference jobs in Amazon Bedrock. By using AWS Lambda, Amazon DynamoDB, and Amazon EventBridge, we’ve addressed key challenges in managing large-scale batch processing workflows.

This solution offers several significant benefits:

Automated queue management – Maximizes throughput by dynamically managing job slots and submissions
Cost optimization – Uses the 50% discount on batch inference pricing for economical large-scale processing

This automated pipeline significantly enhances your ability to process large amounts of data using batch inference for Amazon Bedrock. Whether you’re generating embeddings, classifying text, or analyzing content in bulk, this solution offers a scalable, efficient, and cost-effective approach to batch inference.

As you implement this solution, remember to regularly review and optimize your configuration based on your specific workload patterns and requirements. With this automated pipeline and the power of Amazon Bedrock, you’re well-equipped to tackle large-scale AI inference tasks efficiently and effectively. We encourage you to try it out and share your feedback to help us continually improve this solution.

For additional resources, refer to the following:

User guide – Process multiple prompts with batch inference
Code sample – Sample for building your batch inference job
Blog post – Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

About the authors

Neeraj Lamba is a Cloud Infrastructure Architect with Amazon Web Services (AWS) Worldwide Public Sector Professional Services. He helps customers transform their business by helping design their cloud solutions and offering technical guidance. Outside of work, he likes to travel, play Tennis and experimenting with new technologies.

Build a video insights and summarization engine using generative AI with Amazon Bedrock

October 29, 2024

by Simone Zucchet Amazon AWS

Professionals in a wide variety of industries have adopted digital video conferencing tools as part of their regular meetings with suppliers, colleagues, and customers. These meetings often involve exchanging information and discussing actions that one or more parties must take after the session. The traditional way to make sure information and actions aren’t forgotten is to take notes during the session; a manual and tedious process that can be error-prone, particularly in a high-activity or high-pressure scenario. Furthermore, these notes are usually personal and not stored in a central location, which is a lost opportunity for businesses to learn what does and doesn’t work, as well as how to improve their sales, purchasing, and communication processes.

This post presents a solution where you can upload a recording of your meeting (a feature available in most modern digital communication services such as Amazon Chime) to a centralized video insights and summarization engine. This engine uses artificial intelligence (AI) and machine learning (ML) services and generative AI on AWS to extract transcripts, produce a summary, and provide a sentiment for the call. The solution notes the logged actions per individual and provides suggested actions for the uploader. All of this data is centralized and can be used to improve metrics in scenarios such as sales or call centers. Many commercial generative AI solutions available are expensive and require user-based licenses. In contrast, our solution is an open-source project powered by Amazon Bedrock, offering a cost-effective alternative without those limitations.

This solution can help your organizations’ sales, sales engineering, and support functions become more efficient and customer-focused by reducing the need to take notes during customer calls.

Use case overview

The organization in this scenario has noticed that during customer calls, some actions often get skipped due to the complexity of the discussions, and that there might be potential to centralize customer data to better understand how to improve customer interactions in the long run. The organization already records sessions in video format, but these videos are often kept in individual repositories, and a review of the access logs has shown that employees rarely use them in their day-to-day activities.

To increase efficiency, reduce the load, and gain better insights, this solution looks at how to use generative AI to analyze recorded videos and provide employees with valuable insights relating to their calls. It also supports audio files so you have flexibility around the type of call recordings you use. Generated call transcripts and insights include conversation summary, sentiment, a list of logged actions, and a set of suggested next best actions. These insights are stored in a central repository, unlocking the ability for analytics teams to have a single view of interactions and use the data to formulate better sales and support strategies.

Organizations typically can’t predict their call patterns, so the solution relies on AWS serverless services to scale during busy times. This enables you to keep up with peak demands, but also scale down to reduce costs during times such as seasonal holidays when the sales, engineering, and support teams are away.

This post provides guidance on how you can create a video insights and summarization engine using AWS AI/ML services. We walk through the key components and services needed to build the end-to-end architecture, offering example code snippets and explanations for each critical element that help achieve the core functionality. This approach should enable you to understand the underlying architectural concepts and provides flexibility for you to either integrate these into existing workloads or use them as a foundation to build a new workload.

Solution overview

The following diagram illustrates the pipeline for the video insights and summarization engine.

To enable the video insights solution, the architecture uses a combination of AWS services, including the following:

Amazon API Gateway is a fully managed service that makes it straightforward for developers to create, publish, maintain, monitor, and secure APIs at scale.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.
AWS Lambda is an event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. You can invoke Lambda functions from over 200 AWS services and software-as-a-service (SaaS) applications.
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. You can use Amazon S3 to securely store objects and also serve static websites.
Amazon Transcribe is an automatic speech recognition (ASR) service that makes it straightforward for developers to add speech-to-text capability to their applications.

For integration between services, we use API Gateway as an event trigger for our Lambda function, and DynamoDB as a highly scalable database to store our customer details. Finally, video or audio files uploaded are stored securely in an S3 bucket.

The end-to-end solution for the video insights and summarization engine starts with the UI. We build a simple static web application hosted in Amazon S3 and deploy an Amazon CloudFront distribution to serve the static website for low latency and high transfer speeds. We use CloudFront origin access control (OAC) to secure Amazon S3 origins and permit access to the designated CloudFront distributions only. With Amazon Cognito, we are able to protect the web application from unauthenticated users.

We use API Gateway as the entry point for real-time communications between the frontend and backend of the video insights and summarization engine, while controlling access using Amazon Cognito as the authorizer. With Lambda integration, we can create a web API with an endpoint to the Lambda function.

To start the workflow, upload a raw video file directly into an S3 bucket with the pre-signed URL given through API Gateway and a Lambda function. The updated video is fed into Amazon Transcribe, which converts the speech of the video into a video transcript in text format. Finally, we use large language models (LLMs) available through Amazon Bedrock to summarize the video transcript and extract insights from the video content.

The solution stores uploaded videos and video transcripts in Amazon S3, which offers durable, highly available, and scalable data storage at a low cost. We also store the video summaries, sentiments, insights, and other workflow metadata in DynamoDB, a NoSQL database service that allows you to quickly keep track of the workflow status and retrieve relevant information from the original video.

We also use Amazon CloudWatch and Amazon EventBridge to monitor every component of the workflow in real time and respond as necessary.

AI/ML workflow

In this post, we focus on the workflow using AWS AI/ML services to generate the summarized content and extract insights from the video transcript.

Starting with the Amazon Transcribe StartTranscriptionJob API, we transcribe the original video stored in Amazon S3 into a JSON file. The following code shows an example of this using Python:

job_args = {
    'TranscriptionJobName': jobId,
    'Media': {'MediaFileUri': media_uri},
    'MediaFormat': media_format,
    'LanguageCode': language_code,
    'Subtitles': {'Formats': ['srt']},
    'OutputBucketName': output_bucket_name,
    'OutputKey': jobId + ".json"
}
if vocabulary_name is not None:
    job_args['Settings'] = {'VocabularyName': vocabulary_name}
response = transcribe_client.start_transcription_job(**job_args)

The following is an example of our workload’s Amazon Transcribe output in JSON format:

{
    "jobName": "a37f0f27-0908-45eb-8d98-8efc3a9d4590-1698392975",
    "accountId": "8469761*****",
    "results": {
        "transcripts": [{
                "transcript": "Thank you for calling, my name is Ivy. Can I have your name?..."}],
        "items": [{
                "start_time": "7.809","end_time": "8.21",
                "alternatives": [{
                        "confidence": "0.998","content": "Thank"}],
                "type": "pronunciation"
            },
            ...
        ]
    },
    "status": "COMPLETED"
}

As the output from Amazon Transcribe is created and stored in Amazon S3, we use Amazon S3 Event Notifications to invoke an event to a Lambda function when the transcription job is finished and a video transcript file object has been created.

In the next step of the workflow, we use LLMs available through Amazon Bedrock. LLMs are neural network-based language models containing hundreds of millions to over a trillion parameters. The ability to generate content has resulted in LLMs being widely utilized for use cases such as text generation, summarization, translation, sentiment analysis, conversational chatbots, and more. For this solution, we use Anthropic’s Claude 3 on Amazon Bedrock to summarize the original text, get the sentiment of the conversation, extract logged actions, and suggest further actions for the sales team. In Amazon Bedrock, you can also use other LLMs for text summarization such as Amazon Titan, Meta Llama 3, and others, which can be invoked using the Amazon Bedrock API.

As shown in the following Python code to summarize the video transcript, you can call the InvokeEndpoint API to invoke the specified Amazon Bedrock model to run inference using the input provided in the request body:

modelId = 'anthropic.claude-3-sonnet-20240229-v1:0'
accept = 'application/json'
contentType = 'application/json'
    
prompt_template = """
The following is the transcript from one of our sales representatives and our customer.
The AI is a tool that the sales representative uses to obtain a brief summary of what the conversation was about. The AI based this summary on the contents of the conversation and does not make up events that did not happen.
     The transcript is:
     <text>
       {}
     </text>
What is the 2 paragraphs summary of the conversation?
"""
    
PROMPT = prompt_template.format(raw_text)
   	
body = json.dumps(
     {
     	"messages": [
            {
              "role": "user",
              "content": [
                 {"type": "text", "text": PROMPT}
              ],
             }
            ],
           "anthropic_version": "bedrock-2023-05-31",
           "max_tokens": 512,
           "temperature": 0.1,
           "top_p": 0.9
        }
    )
response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response["body"].read())
summary = response_body["content"][0]["text"]

You can invoke the endpoint with different parameters defined in the payload to impact the text summarization:

temperature – temperature is used in text generation to control the level of randomness of the output. A lower temperature value results in a more conservative and deterministic output; a higher temperature value encourages more diverse and creative outputs.
top_p – top_p, also known as nucleus sampling, is another parameter to control the diversity of the summaries text. It indicates the cumulative probability threshold to select the next token during the text generation process. Lower values of top_p result in a narrower selection of tokens with high probabilities, leading to more deterministic outputs. Conversely, higher values of top_p introduce more randomness and diversity into the generated summaries.

Although there’s no universal optimal combination of top_p and temperature for all scenarios, in the preceding code, we demonstrate sample values with high top_p and low temperature in order to generate summaries focused on key information, maintaining fidelity to the original video transcript while still introducing some degree of wording variation.

The following is another example of using the Anthropic’s Claude 3 model through the Amazon Bedrock API to provide suggested actions to sales representatives based on the video transcript:

prompt_template = """
The following is the transcript from one of our sales representatives and our customer.
The AI is a tool that the sales representative uses to look into what additional actions they can use to increase sales after the session. The AI bases the suggested actions on the contents of the conversation and what it thinks might help increase the customers satisfaction and loyalty.

The transcript is:
     <text>
      {}
     </text>

     Using the transcript above, provide a bullet point format for suggested actions the sales representative could do to increase follow on sales.
    """


PROMPT = prompt_template.format(raw_text)
    
body = json.dumps(
   	{
     	"messages": [
         	  {
              "role": "user",
              "content": [
                 {"type": "text", "text": PROMPT}
               ],
             }
            ],
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 1024,
            "temperature": 0.1,
            "top_p": 0.9
        }
    )

response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response["body"].read())
suggested_actions = response_body["content"][0]["text"]

After we successfully generate video summaries, sentiments, logged actions, and suggested actions from the original video transcript, we store these insights in a DynamoDB table, which is then updated in the UI through API Gateway.

The following screenshot shows a simple UI for the video insights and summarization engine. The frontend is built on Cloudscape, an open source design system for the cloud. On average, it takes less than 5 minutes and costs no more than $2 to process 1 hour of video, assuming the video’s transcript contains approximately 8,000 words.

Future improvements

The solution in this post shows how you can use AWS services with Amazon Bedrock to build a cost-effective and powerful generative AI application that allows you to analyze video content and extract insights to help teams become more efficient. This solution is just the beginning of the value you can unlock with AWS generative AI and broader ML services.

One example of how this solution could be taken further is to expand the scope to help tackle some of the logged actions from calls. The addition of services such as Amazon Bedrock Agents could help automate some of the responses, such as forwarding relevant documentation like product specifications, price lists, or even a simple recap email. All of these could save effort and time, enabling you to focus more on value-added activities.

Similarly, the centralization of all this data could allow you to create an analytics layer on top of a centralized database to help formulate more effective sales and support strategies. This data is usually lost or misplaced within organizations because people prefer different methods for note collection. The proposed solution gives you the freedom to centralize data but also augment organization data with the voice of the customer. For example, the analytics team could analyze what employees did well in calls that have a positive sentiment and offer training or guidance to help everyone achieve more positive customer interactions.

Conclusion

In this post, we described how to create a solution that ingests video and audio files to create powerful, actionable, and accurate insights that an organization can use through the power of Amazon Bedrock generative AI capabilities on AWS. The insights provided can help reduce the undifferentiated heavy lifting that customer-facing teams encounter, and also provide a centralized dataset of customer conversations that an organization can use to further improve performance.

For further information on how you can use Amazon Bedrock for your workloads, see Amazon Bedrock.

About the Authors

Simone Zucchet is a Solutions Architect Manager at AWS. With over 6 years of experience as a Cloud Architect, Simone enjoys working on innovative projects that help transform the way organizations approach business problems. He helps support large enterprise customers at AWS and is part of the Machine Learning TFC. Outside of his professional life, he enjoys working on cars and photography.

Vu San Ha Huynh is a Solutions Architect at AWS. He has a PhD in computer science and enjoys working on different innovative projects to help support large enterprise customers.

Adam Raffe is a Principal Solutions Architect at AWS. With over 8 years of experience in cloud architecture, Adam helps large enterprise customers solve their business problems using AWS.

Ahmed Raafat is a Principal Solutions Architect at AWS, with 20 years of field experience and a dedicated focus of 6 years within the AWS ecosystem. He specializes in AI/ML solutions. His extensive experience spans various industry verticals, making him a trusted advisor for numerous enterprise customers, helping them seamlessly navigate and accelerate their cloud journey.

Automate document processing with Amazon Bedrock Prompt Flows (preview)

October 29, 2024

by Erik Cordsen Amazon AWS

Enterprises in industries like manufacturing, finance, and healthcare are inundated with a constant flow of documents—from financial reports and contracts to patient records and supply chain documents. Historically, processing and extracting insights from these unstructured data sources has been a manual, time-consuming, and error-prone task. However, the rise of intelligent document processing (IDP), which uses the power of artificial intelligence and machine learning (AI/ML) to automate the extraction, classification, and analysis of data from various document types is transforming the game. For manufacturers, this means streamlining processes like purchase order management, invoice processing, and supply chain documentation. Financial services firms can accelerate workflows around loan applications, account openings, and regulatory reporting. And in healthcare, IDP revolutionizes patient onboarding, claims processing, and medical record keeping.

By integrating IDP into their operations, organizations across these key industries experience transformative benefits: increased efficiency and productivity through the reduction of manual data entry, improved accuracy and compliance by reducing human errors, enhanced customer experiences due to faster document processing, greater scalability to handle growing volumes of documents, and lower operational costs associated with document management.

This post demonstrates how to build an IDP pipeline for automatically extracting and processing data from documents using Amazon Bedrock Prompt Flows, a fully managed service that enables you to build generative AI workflow using Amazon Bedrock and other services in an intuitive visual builder. Amazon Bedrock Prompt Flows allows you to quickly update your pipelines as your business changes, scaling your document processing workflows to help meet evolving demands.

Solution overview

To be scalable and cost-effective, this solution uses serverless technologies and managed services. In addition to Amazon Bedrock Prompt Flows, the solution uses the following services:

Amazon Textract – Automatically extracts printed text, handwriting, and data from
Amazon Simple Storage Service (Amazon S3) – Object storage built to retrieve data from anywhere.
Amazon Simple Notification Service (Amazon SNS) – A highly available, durable, secure, and fully managed publish-subscribe (pub/sub) messaging service to decouple microservices, distributed systems, and serverless applications.
AWS Lambda – A compute service that runs code in response to triggers such as changes in data, changes in application state, or user actions. Because services such as Amazon S3 and Amazon SNS can directly trigger an AWS Lambda function, you can build a variety of real-time serverless data-processing systems.
Amazon DynamoDB – a serverless, NoSQL, fully-managed database with single-digit millisecond performance at

Solution architecture

The solution proposed contains the following steps:

Users upload a PDF for analysis to Amazon S3.
The Amazon S3 upload triggers an AWS Lambda function execution.
The function invokes Amazon Textract to extract text from the PDF in batch mode.
Amazon Textract sends an SNS notification when the job is complete.
An AWS Lambda function reads the Amazon Textract response and calls an Amazon Bedrock prompt flow to classify the document.
Results of the classification are stored in Amazon S3 and sent to a destination AWS Lambda function.
The destination AWS Lambda function calls an Amazon Bedrock prompt flow to extract and analyze data based on the document class provided.
Results of the extraction and analysis are stored in Amazon S3.

This workflow is shown in the following diagram.

In the following sections, we dive deep into how to build your IDP pipeline with Amazon Bedrock Prompt Flows.

Prerequisites

To complete the activities described in this post, ensure that you complete the following prerequisites in your local environment:

An AWS account with sufficient permissions to access the console and execute CLI commands.
Install and configure the AWS Command Line Interface (AWS CLI).
Install the AWS Serverless Application Model Command Line Interface (AWS SAM CLI).
Access to an AWS Region that supports Amazon Bedrock Prompt Flows.
To gain model access to Anthropic Claude 3 Sonnet on Amazon Bedrock, follow the instructions at Access Amazon Bedrock foundation models.

Implementation time and cost estimation

Time to complete	~ 60 minutes
Cost to run 1000 pages	Under $25
Time to cleanup	~20 minutes
Learning level	Advanced (300)

Deploy the solution

To deploy the solution, follow these steps:

Clone the GitHub repository
Use the shell script to build and deploy the solution by running the following commands from your project root directory:

chmod +x deploy.sh
./deploy.sh

This will trigger the AWS CloudFormation template in your AWS account.

Test the solution

Once the template is deployed successfully, follow these steps to test the solution:

On the AWS CloudFormation console, select the stack that was deployed
Select the Resources tab
Locate the resources labeled SourceS3Bucket and DestinationS3Bucket, as shown in the following screenshot. Select the link to open the SourceS3Bucket in a new tab

Select Upload and then Add folder
Under sample_files, select the folder customer123, then choose Upload

Alternatively, you can upload the folder using the following AWS CLI command from the root of the project:

aws s3 sync ./sample_files/customer123 s3://[SourceS3Bucket_NAME]/customer123

After a few minutes the uploaded files will be processed. To view the results, follow these steps:

Open the DestinationS3Bucket
Under customer123, you should see a folder for documents for the processing jobs. Download and review the files locally using the console or with the following AWS CLI command

aws s3 sync s3://[DestinationS3Bucket_NAME]/customer123 ./result_files/customer123

Inside the folder for customer123 you will see several subfolders, as shown in the following diagram:

customer123
└── [Long Textract Job ID]
    ├── classify_response.txt
    ├── input_doc.txt
    └── FOR_REVIEW
        ├── pages_0.txt
        └── report.txt
└── [Long Textract Job ID]
    ├── classify_response.txt
    ├── input_doc.txt
    └── URLA_1003
        ├── pages_0.json
        ├── pages_0.txt
        └── report.txt
└── [Long Textract Job ID]
    ├── classify_response.txt
    ├── input_doc.txt
    └── BANK_STATEMENT
        ├── pages_0.json
        ├── pages_0.txt
        └── report.txt
└── [Long Textract Job ID]
    ├── classify_response.txt
    ├── input_doc.txt
    └── DRIVERS_LICENSE
        ├── pages_0.json
        ├── pages_0.txt
        └── report.txt

How it works

After the document text is extracted, it is sent to a classify prompt flow along with a list of classes, as shown in the following screenshot:

The list of classes is generated in the AWS Lambda function by using the API to identify existing prompt flows that contain class definitions in their description. This approach allows us to expand the solution to new document types by adding a new prompt flow supporting the new document class, as shown in the following screenshot:

For each document type, you can implement an extract and analyze flow that is appropriate to this document type. The following screenshot shows an example flow from the URLA_1003 flow. In this case, a prompt is used to convert the text to a standardized JSON format, and a second prompt then analyzes that JSON document to generate a report to the processing agent.

Expand the solution using Amazon Bedrock Prompt Flows

To adapt to new use cases without changing the underlying code, use Amazon Bedrock Prompt Flows as described in the following steps.

Create a new prompt

From the files you downloaded, look for a folder named FOR_REVIEW. This folder contains documents that were processed and did not fit into an existing class. Open report.txt and review the suggested document class and proposed JSON template.

In the navigation pane in Amazon Bedrock, open Prompt management and select Create prompt, as shown in the following screenshot:

Name the new prompt IDP_PAYSTUB_JSON and then choose Create
In the Prompt box, enter the following text. Replace COPY YOUR JSON HERE with the JSON template from your txt file

Analyze the provided paystub
<PAYSTUB>
{{doc_text}}
</PAYSTUB>

Provide a structured JSON object containing the following information:

[COPY YOUR JSON HERE]

The following screenshot demonstrates this step.

Choose Select model and choose Anthropic Claude 3 Sonnet
Save your changes by choosing Save draft
To test your prompt, open the pages_[n].txt file FOR_REVIEW folder and copy the content into the doc_text input box. Choose Run and the model should return a response

The following screenshot demonstrates this step.

When you are satisfied with the results, choose Create Version. Note the version number because you will need it in the next section

Create a prompt flow

Now we will create a prompt flow using the prompt you created in the previous section.

In the navigation menu, choose Prompt flows and then choose Create prompt flow, as shown in the following screenshot:

Name the new flow IDP_PAYSTUB
Choose Create and use a new service role and then choose Save

Next, create the flow using the following steps. When you are done, the flow should resemble the following screenshot.

Configure the Flow input node:
1. Choose the Flow input node and select the Configure
2. Select Object as the Type. This means that flow invocation will expect to receive a JSON object.
Add the S3 Retrieval node:
1. In the Prompt flow builder navigation pane, select the Nodes tab
2. Drag an S3 Retrieval node into your flow in the center pane
3. In the Prompt flow builder pane, select the Configure tab
4. Enter get_doc_text as the Node name
5. Expand the Inputs Set the input express for objectKey to $.data.doc_text_s3key
6. Drag a connection from the output of the Flow input node to the objectKey input of this node
Add the Prompt node:
1. Drag a Prompt node into your flow in the center pane
2. In the Prompt flow builder pane, select the Configure tab
3. Enter map_to_json as the Node name
4. Choose Use a prompt from your Prompt Management
5. Select IDP_PAYSTUB_JSON from the dropdown
6. Choose the version you noted previously
7. Drag a connection from the output of the get_doc_text node to the doc_text input of this node
Add the S3 Storage node:
1. In the Prompt flow builder navigation pane, select the Nodes tab
2. Drag an S3 Storage node into your flow in the center pane
3. In the Prompt flow builder pane, select the Configure tab in
4. Enter save_json as the Node name
5. Expand the Inputs Set the input express for objectKey to $.data.JSON_s3key
6. Drag a connection from the output of the Flow input node to the objectKey input of this node
7. Drag a connection from the output of the map_to_json node to the content input of this node
Configure the Flow output node:
1. Drag a connection from the output of the save_json node to the input of this node
Choose Save to save your flow. Your flow should now be prepared for testing
1. To test your flow, in the Test prompt flow pane on the right, enter the following JSON object. Choose Run and the flow should return a model response
2. When you are satisfied with the result, choose Save and exit

{
"doc_text_s3key": "[PATH TO YOUR TEXT FILE IN S3].txt",
"JSON_s3key": "[PATH TO YOUR TEXT FILE IN S3].json"
}

To get the path to your file, follow these steps:

Navigate to FOR_REVIEW in S3 and choose the pages_[n].txt file
Choose the Properties tab
Copy the key path by selecting the copy icon to the left of the key value, as shown in the following screenshot. Be sure to replace .txt with .json in the second line of input as noted previously.

Publish a version and alias

On the flow management screen, choose Publish version. A success banner appears at the top
At the top of the screen, choose Create alias
Enter latest for the Alias name
Choose Use an existing version to associate this alias. From the dropdown menu, choose the version that you just published
Select Create alias. A success banner appears at the top.
Get the FlowId and AliasId to use in the step below
1. Choose the Alias you just created
2. From the ARN, copy the FlowId and AliasId

Add your new class to DynamoDB

Open the AWS Management Console and navigate to the DynamoDB service.
Select the table document-processing-bedrock-prompt-flows-IDP_CLASS_LIST
Choose Actions then Create item
Choose JSON view for entering the item data.
Paste the following JSON into the editor:

{
    "class_name": {
        "S": "PAYSTUB"
    },
    "expected_inputs": {
        "S": "Should contain Gross Pay, Net Pay, Pay Date "
    },
    "flow_alias_id": {
        "S": "[Your flow Alias ID]"
    },
    "flow_id": {
        "S": "[Your flow ID]"
    },
    "flow_name": {
        "S": "[The name of your flow]"
    }
}

Review the JSON to ensure all details are correct.
Choose Create item to add the new class to your DynamoDB table.

Test by repeating the upload of the test file

Use the console to repeat the upload of the paystub.jpg file from your customer123 folder into Amazon S3. Alternatively, you can enter the following command into the command line:

aws s3 cp ./sample_files/customer123/paystub.jpeg s3://[INPUT_BUCKET_NAME]/customer123/

In a few minutes, check the report in the output location to see that you successfully added support for the new document type.

Clean up

Use these steps to delete the resources you created to avoid incurring charges on your AWS account:

Empty the SourceS3Bucket and DestinationS3Bucket buckets including all versions
Use the following shell script to delete the CloudFormation stack and test resources from your account:

chmod +x cleanup.sh
./cleanup.sh

Return to the Expand the solution using Amazon Bedrock Prompt Flows section and follow these steps:
1. In the Create a prompt flow section:
  1. Choose the flow idp_paystub that you created and choose Delete
  2. Follow the instructions to permanently delete the flow
2. In the Create a new prompt section:
  1. Choose the prompt paystub_json that you created and choose Delete
  2. Follow the instructions to permanently delete the prompt

Conclusion

This solution demonstrates how customers can use Amazon Bedrock Prompt Flows to deploy and expand a scalable, low-code IDP pipeline. By taking advantage of the flexibility of Amazon Bedrock Prompt Flows, organizations can rapidly implement and adapt their document processing workflows to help meet evolving business needs. The low-code nature of Amazon Bedrock Prompt Flows makes it possible for business users and developers alike to create, modify, and extend IDP pipelines without extensive programming knowledge. This significantly reduces the time and resources required to deploy new document processing capabilities or adjust existing ones.

By adopting this integrated IDP solution, businesses across industries can accelerate their digital transformation initiatives, improve operational efficiency, and enhance their ability to extract valuable insights from document-based processes, driving significant competitive advantages.

Review your current manual document processing processes and identify where Amazon Bedrock Prompt Flows can help you automate these workflows for your business.

For further exploration and learning, we recommend checking out the following resources:

About the Authors

Erik Cordsen is a Solutions Architect at AWS serving customers in Georgia. He is passionate about applying cloud technologies and ML to solve real life problems. When he is not designing cloud solutions, Erik enjoys travel, cooking, and cycling.

Vivek Mittal is a Solution Architect at Amazon Web Services. He is passionate about serverless and machine learning technologies. Vivek takes great joy in assisting customers with building innovative solutions on the AWS cloud.

Brijesh Pati is an Enterprise Solutions Architect at AWS. His primary focus is helping enterprise customers adopt cloud technologies for their workloads. He has a background in application development and enterprise architecture and has worked with customers from various industries such as sports, finance, energy, and professional services. His interests include serverless architectures and AI/ML.

Governing the ML lifecycle at scale: Centralized observability with Amazon SageMaker and Amazon CloudWatch

October 29, 2024

by Abhishek Doppalapudi Amazon AWS

This post is part of an ongoing series on governing the machine learning (ML) lifecycle at scale. To start from the beginning, refer to Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker.

A multi-account strategy is essential not only for improving governance but also for enhancing security and control over the resources that support your organization’s business. This approach enables various teams within your organization to experiment, innovate, and integrate more rapidly while keeping the production environment secure and available for your customers. However, because multiple teams might use your ML platform in the cloud, monitoring large ML workloads across a scaling multi-account environment presents challenges in setting up and monitoring telemetry data that is scattered across multiple accounts. In this post, we dive into setting up observability in a multi-account environment with Amazon SageMaker.

Amazon SageMaker Model Monitor allows you to automatically monitor ML models in production, and alerts you when data and model quality issues appear. SageMaker Model Monitor emits per-feature metrics to Amazon CloudWatch, which you can use to set up dashboards and alerts. You can use cross-account observability in CloudWatch to search, analyze, and correlate cross-account telemetry data stored in CloudWatch such as metrics, logs, and traces from one centralized account. You can now set up a central observability AWS account and connect your other accounts as sources. Then you can search, audit, and analyze logs across your applications to drill down into operational issues in a matter of seconds. You can discover and visualize operational and model metrics from many accounts in a single place and create alarms that evaluate metrics belonging to other accounts.

AWS CloudTrail is also essential for maintaining security and compliance in your AWS environment by providing a comprehensive log of all API calls and actions taken across your AWS account, enabling you to track changes, monitor user activities, and detect suspicious behavior. This post also dives into how you can centralize CloudTrail logging so that you have visibility into user activities within all of your SageMaker environments.

Solution overview

Customers often struggle with monitoring their ML workloads across multiple AWS accounts, because each account manages its own metrics, resulting in data silos and limited visibility. ML models across different accounts need real-time monitoring for performance and drift detection, with key metrics like accuracy, CPU utilization, and AUC scores tracked to maintain model reliability.

To solve this, we implement a solution that uses SageMaker Model Monitor and CloudWatch cross-account observability. This approach enables centralized monitoring and governance, allowing your ML team to gain comprehensive insights into logs and performance metrics across all accounts. With this unified view, your team can effectively monitor and manage their ML workloads, improving operational efficiency.

Implementing the solution consists of the following steps:

Deploy the model and set up SageMaker Model Monitor.
Enable CloudWatch cross-account observability.
Consolidate metrics across source accounts and build unified dashboards.
Configure centralized logging to API calls across multiple accounts using CloudTrail.

The following architecture diagram showcases the centralized observability solution in a multi-account setup. We deploy ML models across two AWS environments, production and test, which serve as our source accounts. We use SageMaker Model Monitor to assess these models’ performance. Additionally, we enhance centralized management and oversight by using cross-account observability in CloudWatch to aggregate metrics from the ML workloads in these source accounts into the observability account.

Deploy the model and set up SageMaker Model Monitor

We deploy an XGBoost classifier model, trained on publicly available banking marketing data, to identify potential customers likely to subscribe to term deposits. This model is deployed in both production and test source accounts, where its real-time performance is continually validated against baseline metrics using SageMaker Model Monitor to detect deviations in model performance. Additionally, we use CloudWatch to centralize and share the data and performance metrics of these ML workloads in the observability account, providing a comprehensive view across different accounts. You can find the full source code for this post in the accompanying GitHub repo.

The first step is to deploy the model to an SageMaker endpoint with data capture enabled:

endpoint_name = f"BankMarketingTarget-endpoint-{datetime.utcnow():%Y-%m-%d-%H%M}"
print("EndpointName =", endpoint_name)

data_capture_config = DataCaptureConfig(
enable_capture=True, sampling_percentage=100, destination_s3_uri=s3_capture_upload_path)

model.deploy(
initial_instance_count=1,
instance_type="ml.m4.xlarge",
endpoint_name=endpoint_name,
data_capture_config=data_capture_config,)

For real-time model performance evaluation, it’s essential to establish a baseline. This baseline is created by invoking the endpoint with validation data. We use SageMaker Model Monitor to perform baseline analysis, compute performance metrics, and propose quality constraints for effective real-time performance evaluation.

Next, we define the model quality monitoring object and run the model quality monitoring baseline job. The model monitor automatically generates baseline statistics and constraints based on the provided validation data. The monitoring job evaluates the model’s predictions against ground truth labels to make sure the model maintains its performance over time.

Banking_Quality_Monitor = ModelQualityMonitor(
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    volume_size_in_gb=20,
    max_runtime_in_seconds=1800,
    sagemaker_session=session,
)
job = Banking_Quality_Monitor.suggest_baseline(
    job_name=baseline_job_name,
    baseline_dataset=baseline_dataset_uri,
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri=baseline_results_uri,
    problem_type="BinaryClassification",
    inference_attribute="prediction",
    probability_attribute="probability",
    ground_truth_attribute="label",
)
job.wait(logs=False)

In addition to the generated baseline, SageMaker Model Monitor requires two additional inputs: predictions from the deployed model endpoint and ground truth data provided by the model-consuming application. Because data capture is enabled on the endpoint, we first generate traffic to make sure prediction data is captured. When listing the data capture files stored, you should expect to see various files from different time periods, organized based on the hour in which the invocation occurred. When viewing the contents of a single file, you will notice the following details. The inferenceId attribute is set as part of the invoke_endpoint call. When ingesting ground truth labels and merging them with predictions for performance metrics, SageMaker Model Monitor uses inferenceId, which is included in captured data records. It’s used to merge these captured records with ground truth records, making sure the inferenceId in both datasets matches. If inferenceId is absent, it uses the eventId from captured data to correlate with the ground truth record.

{
"captureData": {
"endpointInput": {
"observedContentType": "text/csv",
"mode": "INPUT",
"data": "162,1,0.1,25,1.4,94.465,-41.8,4.961,0.2,0.3,0.4,0.5,0.6,0.7,0.8,1.1,0.9,0.10,0.11,0.12,0.13,0.14,0.15,1.2,0.16,0.17,0.18,0.19,0.20,1.3",
"encoding": "CSV"
},
"endpointOutput": {
"observedContentType": "text/csv; charset=utf-8",
"mode": "OUTPUT",
"data": "0.000508524535689503",
"encoding": "CSV"
}
},
"eventMetadata": {
"eventId": "527cfbb1-d945-4de8-8155-a570894493ca",
"inferenceId": "0",
"inferenceTime": "2024-08-18T20:25:54Z"
},
"eventVersion": "0"
}

SageMaker Model Monitor ingests ground truth data collected periodically and merges it with prediction data to calculate performance metrics. This monitoring process uses baseline constraints from the initial setup to continuously assess the model’s performance. By enabling enable_cloudwatch_metrics=True, SageMaker Model Monitor uses CloudWatch to monitor the quality and performance of our ML models, thereby emitting these performance metrics to CloudWatch for comprehensive tracking.

from sagemaker.model_monitor import CronExpressionGenerator

response = Banking_Quality_Monitor.create_monitoring_schedule(
monitor_schedule_name=Banking_monitor_schedule_name,
endpoint_input=endpointInput,
output_s3_uri=baseline_results_uri,
problem_type="BinaryClassification",
ground_truth_input=ground_truth_upload_path,
constraints=baseline_job.suggested_constraints(),
schedule_cron_expression=CronExpressionGenerator.hourly(),
enable_cloudwatch_metrics=True,
)

Each time the model quality monitoring job runs, it begins with a merge job that combines two datasets: the inference data captured at the endpoint and the ground truth data provided by the application. This is followed by a monitoring job that assesses the data for insights into model performance using the baseline setup.

Waiting for execution to finish......................................................!
groundtruth-merge-202408182100-7460007b77e6223a3f739740 job status: Completed
groundtruth-merge-202408182100-7460007b77e6223a3f739740 job exit message, if any: None
groundtruth-merge-202408182100-7460007b77e6223a3f739740 job failure reason, if any: None
Waiting for execution to finish......................................................!
model-quality-monitoring-202408182100-7460007b77e6223a3f739740 job status: Completed
model-quality-monitoring-202408182100-7460007b77e6223a3f739740 job exit message, if any: CompletedWithViolations: Job completed successfully with 8 violations.
model-quality-monitoring-202408182100-7460007b77e6223a3f739740 job failure reason, if any: None
Execution status is: CompletedWithViolations
{'MonitoringScheduleName': 'BankMarketingTarget-monitoring-schedule-2024-08-18-2029', 'ScheduledTime': datetime.datetime(2024, 8, 18, 21, 0, tzinfo=tzlocal()), 'CreationTime': datetime.datetime(2024, 8, 18, 21, 2, 21, 198000, tzinfo=tzlocal()), 'LastModifiedTime': datetime.datetime(2024, 8, 18, 21, 12, 53, 253000, tzinfo=tzlocal()), 'MonitoringExecutionStatus': 'CompletedWithViolations', 'ProcessingJobArn': 'arn:aws:sagemaker:us-west-2:730335512115:processing-job/model-quality-monitoring-202408182100-7460007b77e6223a3f739740', 'EndpointName': 'BankMarketingTarget-endpoint-2024-08-18-1958'}
====STOP====
No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures

Check for deviations from the baseline constraints to effectively set appropriate thresholds in your monitoring process. As you can see in the following the screenshot, various metrics such as AUC, accuracy, recall, and F2 score are closely monitored, each subject to specific threshold checks like LessThanThreshold or GreaterThanThreshold. By actively monitoring these metrics, you can detect significant deviations and make informed decisions promptly, making sure your ML models perform optimally within established parameters.

Enable CloudWatch cross-account observability

With CloudWatch integrated into SageMaker Model Monitor to track the metrics of ML workloads running in the source accounts (production and test), the next step involves enabling CloudWatch cross-account observability. CloudWatch cross-account observability allows you to monitor and troubleshoot applications spanning multiple AWS accounts within an AWS Region. This feature enables seamless searching, visualization, and analysis of metrics, logs, traces, and Application Insights across linked accounts, eliminating account boundaries. You can use this feature to consolidate CloudWatch metrics from these source accounts into the observability account.

To achieve this centralized governance and monitoring, we establish two types of accounts:

Observability account – This central AWS account aggregates and interacts with ML workload metrics from the source accounts
Source accounts (production and test) – These individual AWS accounts share their ML workload metrics and logging resources with the central observability account, enabling centralized oversight and analysis

Configure the observability account

Complete the following steps to configure the observability account:

On the CloudWatch console of the observability account, choose Settings in the navigation pane.
In the Monitoring account configuration section, choose Configure.

Select which telemetry data can be shared with the observability account.

Under List source accounts, enter the source accounts that will share data with the observability account.

To link the source accounts, you can use account IDs, organization IDs, or organization paths. You can use an organization ID to include all accounts within the organization, or an organization path can target all accounts within a specific department or business unit. In this case, because we have two source accounts to link, we enter the account IDs of those two accounts.

Choose Configure.

After the setup is complete, the message “Monitoring account enabled” appears in the CloudWatch settings.

Additionally, your source accounts are listed on the Configuration policy tab.

Link source accounts

Now that the observability account has been enabled with source accounts, you can link these source accounts within an AWS organization. You can choose from two methods:

For organizations using AWS CloudFormation, you can download a CloudFormation template and deploy it in a CloudFormation delegated administration account. This method facilitates the bulk addition of source accounts.
For linking individual accounts, two options are available:
- Download a CloudFormation template that can be deployed directly within each source account.
- Copy a provided URL, which simplifies the setup process using the AWS Management Console.

Complete the following steps to use the provided URL:

Copy the URL and open it in a new browser window where you’re logged in as the source account.

Configure the telemetry data you want to share. This can include logs, metrics, traces, Application Insights, or Internet Monitor.

During this process, you’ll notice that the Amazon Resource Name (ARN) of the observability account configuration is automatically filled in. This convenience is due to copying and pasting the URL provided in the earlier step. If, however, you choose not to use the URL, you can manually enter the ARN. Copy the ARN from the observability account settings and enter it into the designated field in the source account configuration page.

Define the label that identifies your source accounts. This label is crucial for organizing and distinguishing your accounts within the monitoring system.

Choose Link to finalize the connection between your source accounts and the observability account.

Repeat these steps for both source accounts.

You should see those accounts listed on the Linked source accounts tab within the observability account CloudWatch settings configuration.

Consolidate metrics across source accounts and build unified dashboards

In the observability account, you can access and monitor detailed metrics related to your ML workloads and endpoints deployed across the source accounts. This centralized view allows you to track a variety of metrics, including those from SageMaker endpoints and processing jobs, all within a single interface.

The following screenshot displays CloudWatch model metrics for endpoints in your source accounts. Because you linked the production and test source accounts using the label as the account name, CloudWatch categorizes metrics by account label, effectively distinguishing between the production and test environments. It organizes key details into columns, including account labels, metric names, endpoints, and performance metrics like accuracy and AUC, all captured by scheduled monitoring jobs. These metrics offer valuable insights into the performance of your models across these environments.

The observability account allows you to monitor key metrics of ML workloads and endpoints. The following screenshots display CPU utilization metrics associated with the BankMarketingTarget model and BankMarketing model endpoints you deployed in the source accounts. This view provides detailed insights into critical performance indicators, including:

CPU utilization
Memory utilization
Disk utilization

Furthermore, you can create dashboards that offer a consolidated view of key metrics related to your ML workloads running across the linked source accounts. These centralized dashboards are pivotal for overseeing the performance, reliability, and quality of your ML models on a large scale.

Let’s look at a consolidated view of the ML workload metrics running in our production and test source accounts. This dashboard provides us with immediate access to critical information:

AUC scores – Indicating model performance, giving insights into the trade-off between true positives and false positives
Accuracy rates – Showing prediction correctness, which helps in assessing the overall reliability of the model
F2 scores – Offering a balance between precision and recall, particularly valuable when false negatives are more critical to minimize
Total number of violations – Highlighting any breaches in predefined thresholds or constraints, making sure the model adheres to expected behavior
CPU usage levels – Helping you manage resource allocation by monitoring the processing power utilized by the ML workloads
Disk utilization percentages – Providing efficient storage management by keeping track of how much disk space is being consumed

This following screenshots show CloudWatch dashboards for the models deployed in our production and test source accounts. We track metrics for accuracy, AUC, CPU and disk utilization, and violation counts, providing insights into model performance and resource usage.

You can configure CloudWatch alarms to proactively monitor and receive notifications on critical ML workload metrics from your source accounts. The following screenshot shows an alarm configured to track the accuracy of our bank marketing prediction model in the production account. This alarm is set to trigger if the model’s accuracy falls below a specified threshold, so any significant degradation in performance is promptly detected and addressed. By using such alarms, you can maintain high standards of model performance and quickly respond to potential issues within your ML infrastructure.

You can also create a comprehensive CloudWatch dashboard for monitoring various aspects of Amazon SageMaker Studio, including the number of domains, apps, and user profiles across different AWS accounts. The following screenshot illustrates a dashboard that centralizes key metrics from the production and test source accounts.

Configure centralized logging of API calls across multiple accounts with CloudTrail

If AWS Control Tower has been configured to automatically create an organization-wide trail, each account will send a copy of its CloudTrail event trail to a centralized Amazon Simple Storage Service (Amazon S3) bucket. This bucket is typically created in the log archive account and is configured with limited access, where it serves as a single source of truth for security personnel. If you want to set up a separate account to allow the ML admin team to have access, you can configure replication from the log archive account. You can create the destination bucket in the observability account.

After you create the bucket for replicated logs, you can configure Amazon S3 replication by defining the source and destination bucket, and attaching the required AWS Identity and Access Management (IAM) permissions. Then you update the destination bucket policy to allow replication.

Complete the following steps:

Create an S3 bucket in the observability account.
Log in to the log archive account.
On the Amazon S3 console, open the Control Tower logs bucket, which will have the format aws-controltower-logs-{ACCOUNT-ID}-{REGION}.

You should see an existing key that corresponds to your organization ID. The trail logs are stored under /{ORG-ID}/AWSLogs/{ACCOUNT-ID}/CloudTrail/{REGION}/YYYY/MM/DD.

On the Management tab, choose Create replication rule.
For Replication rule name, enter a name, such as replicate-ml-workloads-to-observability.
Under Source bucket, select Limit the scope of the rule using one or more filters, and enter a path the corresponds to the account you want to enable querying against.

Select Specify a bucket in another account and enter the observability account ID and the bucket name.
Select Change object ownership to destination bucket owner.
For IAM role, choose Create new role.

After you set the cross-account replication, the logs being stored in the S3 bucket in the log archive account will be replicated in the observability account. You can now use Amazon Athena to query and analyze the data being stored in Amazon S3. If you don’t have Control Tower configured, you have to manually configure CloudTrail in each account to write to the S3 bucket in the centralized observability account for analysis. If your organization has more stringent security and compliance requirements, you can configure replication of just the SageMaker logs from the log archive account to the bucket in the observability account by integrating Amazon S3 Event Notifications with AWS Lambda functions.

The following is a sample query run against the logs stored in the observability account bucket and the associated result in Athena:

SELECT useridentity.arn, useridentity.sessioncontext.sourceidentity, requestparametersFROM observability_replicated_logs
WHERE eventname = 'CreateEndpoint'
AND eventsource = 'sagemaker.amazonaws.com'

Conclusion

Centralized observability in a multi-account setup empowers organizations to manage ML workloads at scale. By integrating SageMaker Model Monitor with cross-account observability in CloudWatch, you can build a robust framework for real-time monitoring and governance across multiple environments.

This architecture not only provides continuous oversight of model performance, but also significantly enhances your ability to quickly identify and resolve potential issues, thereby improving governance and security throughout our ML ecosystem.

In this post, we outlined the essential steps for implementing centralized observability within your AWS environment, from setting up SageMaker Model Monitor to using cross-account features in CloudWatch. We also demonstrated centralizing CloudTrail logs by replicating them from the log archive account and querying them using Athena to get insights into user activity within SageMaker environments across the organization.

As you implement this solution, remember that achieving optimal observability is an ongoing process. Continually refining and expanding your monitoring capabilities is crucial to making sure your ML models remain reliable, efficient, and aligned with business objectives. As ML practices evolve, blending cutting-edge technology with sound governance principles is key. Run the code yourself using the following notebook or try out the observability module in the following workshop.

About the Authors

Abhishek Doppalapudi is a Solutions Architect at Amazon Web Services (AWS), where he assists startups in building and scaling their products using AWS services. Currently, he is focused on helping AWS customers adopt Generative AI solutions. In his free time, Abhishek enjoys playing soccer, watching Premier League matches, and reading.

Venu Kanamatareddy is a Startup Solutions Architect at AWS. He brings 16 years of extensive IT experience working with both Fortune 100 companies and startups. Currently, Venu is helping guide and assist Machine Learning and Artificial Intelligence-based startups to innovate, scale, and succeed.

Vivek Gangasani is a Senior GenAI Specialist Solutions Architect at AWS. He helps emerging GenAI companies build innovative solutions using AWS services and accelerated compute. Currently, he is focused on developing strategies for fine-tuning and optimizing the inference performance of Large Language Models. In his free time, Vivek enjoys hiking, watching movies and trying different cuisines.

Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure, scalable, reliable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides motorcycle and walks with his three-year old sheep-a-doodle!

Import data from Google Cloud Platform BigQuery for no-code machine learning with Amazon SageMaker Canvas

October 28, 2024

by Amit Gautam Amazon AWS

In the modern, cloud-centric business landscape, data is often scattered across numerous clouds and on-site systems. This fragmentation can complicate efforts by organizations to consolidate and analyze data for their machine learning (ML) initiatives.

This post presents an architectural approach to extract data from different cloud environments, such as Google Cloud Platform (GCP) BigQuery, without the need for data movement. This minimizes the complexity and overhead associated with moving data between cloud environments, enabling organizations to access and utilize their disparate data assets for ML projects.

We highlight the process of using Amazon Athena Federated Query to extract data from GCP BigQuery, using Amazon SageMaker Data Wrangler to perform data preparation, and then using the prepared data to build ML models within Amazon SageMaker Canvas, a no-code ML interface.

SageMaker Canvas allows business analysts to access and import data from over 50 sources, prepare data using natural language and over 300 built-in transforms, build and train highly accurate models, generate predictions, and deploy models to production without requiring coding or extensive ML experience.

Solution overview

The solution outlines two main steps:

Set up Amazon Athena for federated queries from GCP BigQuery, which enables running live queries in GCP BigQuery directly from Athena
Import the data into SageMaker Canvas from BigQuery using Athena as an intermediate

After the data is imported into SageMaker Canvas, you can use the no-code interface to build ML models and generate predictions based on the imported data.

You can use SageMaker Canvas to build the initial data preparation routine and generate accurate predictions without writing code. However, as your ML needs evolve or require more advanced customization, you may want to transition from a no-code environment to a code-first approach. The integration between SageMaker Canvas and Amazon SageMaker Studio allows you to operationalize the data preparation routine for production-scale deployments. For more details, refer to Seamlessly transition between no-code and code-first machine learning with Amazon SageMaker Canvas and Amazon SageMaker Studio

The overall architecture, as seen below, demonstrates how to use AWS services to seamlessly access and integrate data from a GCP BigQuery data warehouse into SageMaker Canvas for building and deploying ML models.

The workflow includes the following steps:

Within the SageMaker Canvas interface, the user composes a SQL query to run against the GCP BigQuery data warehouse. SageMaker Canvas relays this query to Athena, which acts as an intermediary service, facilitating the communication between SageMaker Canvas and BigQuery.
Athena uses the Athena Google BigQuery connector, which uses a pre-built AWS Lambda function to enable Athena federated query capabilities. This Lambda function retrieves the necessary BigQuery credentials (service account private key) from AWS Secrets Manager for authentication purposes.
After authentication, the Lambda function uses the retrieved credentials to query BigQuery and obtain the desired result set. It parses this result set and sends it back to Athena.
Athena returns the queried data from BigQuery to SageMaker Canvas, where you can use it for ML model training and development purposes within the no-code interface.

This solution offers the following benefits:

Seamless integration – SageMaker Canvas empowers you to integrate and use data from various sources, including cloud data warehouses like BigQuery, directly within its no-code ML environment. This integration eliminates the need for additional data movement or complex integrations, enabling you to focus on building and deploying ML models without the overhead of data engineering tasks.
Secure access – The use of Secrets Manager makes sure BigQuery credentials are securely stored and accessed, enhancing the overall security of the solution.
Scalability – The serverless nature of the Lambda function and the ability in Athena to handle large datasets make this solution scalable and able to accommodate growing data volumes. Additionally, you can use multiple queries to partition the data to source in parallel.

In the next sections, we dive deeper into the technical implementation details and walk through a step-by-step demonstration of this solution.

Dataset

The steps outlined in this post provide an example of how to import data into SageMaker Canvas for no-code ML. In this example, we demonstrate how to import data through Athena from GCP BigQuery.

For our dataset, we use a synthetic dataset from a telecommunications mobile phone carrier. This sample dataset contains 5,000 records, where each record uses 21 attributes to describe the customer profile. The Churn column in the dataset indicates whether the customer left service (true/false). This Churn attribute is the target variable that the ML model should aim to predict.

The following screenshot shows an example of the dataset on the BigQuery console.

Prerequisites

Complete the following prerequisite steps:

Create a service account in GCP and a service account key.
Download the private key JSON file.
Store the JSON file in Secrets Manager:
1. On the Secrets Manager console, choose Secrets in the navigation pane, then choose Store a new secret.
2. For Secret type¸ select Other type of secret.
3. Copy the contents of the JSON file and enter it under Key/value pairs on the Plaintext tab.

If you don’t have a SageMaker domain already created, create it along with the user profile. For instructions, see Quick setup to Amazon SageMaker.

Make sure the user profile has permission to invoke Athena by confirming that the AWS Identity and Access Management (IAM) role has glue:GetDatabase and athena:GetDataCatalog permission on the resource. See the following example:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"athena:GetDataCatalog"
],
"Resource": [
"arn:aws:glue:*:<AWS account id>:catalog",
"arn:aws:glue:*:<AWS account id>:database/*",
"arn:aws:athena:*:<AWS account id>:datacatalog/*"
]
}
]
}

Register the Athena data source connector

Complete the following steps to set up the Athena data source connector:

On the Athena console, choose Data sources in the navigation pane.
Choose Create data source.
On the Choose a data source page, search for and select Google BigQuery, then choose Next.

On the Enter data source details page, provide the following information:
1. For Data source name¸ enter a name.
2. For Description, enter an optional description.
3. For Lambda function, choose Create Lambda function to configure the connection.

Under Application settings¸ enter the following details:
1. For SpillBucket, enter the name of the bucket where the function can spill data.
2. For GCPProjectID, enter the project ID within GCP.
3. For LambdaFunctionName, enter the name of the Lambda function that you’re creating.
4. For SecretNamePrefix, enter the secret name stored in Secrets Manager that contains GCP credentials.

Choose Deploy.

You’re returned to the Enter data source details page.

In the Connection details section, choose the refresh icon under Lambda function.
Choose the Lambda function you just created. The ARN of the Lambda function is displayed.
Optionally, for Tags, add key-value pairs to associate with this data source.

For more information about tags, see Tagging Athena resources.

Choose Next.
On the Review and create page, review the data source details, then choose Create data source.

The Data source details section of the page for your data source shows information about your new connector. You can now use the connector in your Athena queries. For information about using data connectors in queries, see Running federated queries.

To query from Athena, launch the Athena SQL editor and choose the data source you created. You should be able to run live queries against the BigQuery database.

Connect to SageMaker Canvas with Athena as a data source

To import data from Athena, complete the following steps:

On the SageMaker Canvas console, choose Data Wrangler in the navigation pane.
Choose Import data and prepare.
Select the Tabular
Choose Athena as the data source.

SageMaker Data Wrangler in SageMaker Canvas allows you to prepare, featurize, and analyze your data. You can integrate a SageMaker Data Wrangler data preparation flow into your ML workflows to simplify and streamline data preprocessing and feature engineering using little to no coding.

Choose an Athena table in the left pane from AwsDataCatalog and drag and drop the table into the right pane.

Choose Edit in SQL and enter the following SQL query:

SELECT 
state,
account_length,
area_code,
phone,
intl_plan,
vmail_plan,vmail_message,day_mins,
day_calls,
day_charge,
eve_mins,
eve_calls,
eve_charge,
night_mins,
night_calls,
night_charge,
intl_mins,
intl_calls,
intl_charge,
custserv_calls,
churn FROM "bigquery"."athenabigquery"."customer_churn" order by random() limit 50 ;

In the preceding query, bigquery is the data source name created in Athena, athenabigquery is the database name, and customer_churn is the table name.

Choose Run SQL to preview the dataset and when you’re satisfied with the data, choose Import.

When working with ML, it’s crucial to randomize or shuffle the dataset. This step is essential because you may have access to millions or billions of data points, but you don’t necessarily need to use the entire dataset for training the model. Instead, you can limit the data to a smaller subset specifically for training purposes. After you’ve shuffled and prepared the data, you can begin the iterative process of data preparation, feature evaluation, model training, and ultimately hosting the trained model.

You can process or export your data to a location that is suitable for your ML workflows. For example, you can export the transformed data as a SageMaker Canvas dataset and create an ML model from it.
After you export your data, choose Create model to create an ML model from your data.

The data is imported into SageMaker Canvas as a dataset from the specific table in Athena. You can now use this dataset to create a model.

Train a model

After your data is imported, it shows up on the Datasets page in SageMaker Canvas. At this stage, you can build a model. To do so, complete the following steps:

Select your dataset and choose Create a model.

For Model name, enter your model name (for this post, my_first_model).

SageMaker Canvas enables you to create models for predictive analysis, image analysis, and text analysis.

Because we want to categorize customers, select Predictive analysis for Problem type.
Choose Create.

On the Build page, you can see statistics about your dataset, such as the percentage of missing values and mode of the data.

For Target column, choose a column that you want to predict (for this post, churn).

SageMaker Canvas offers two types of models that can generate predictions. Quick build prioritizes speed over accuracy, providing a model in 2–15 minutes. Standard build prioritizes accuracy over speed, providing a model in 30 minutes–2 hours.

For this example, choose Quick build.

After the model is trained, you can analyze the model accuracy.

The Overview tab shows us the column impact, or the estimated importance of each column in predicting the target column. In this example, the Night_calls column has the most significant impact in predicting if a customer will churn. This information can help the marketing team gain insights that lead to taking actions to reduce customer churn. For example, we can see that both low and high CustServ_Calls increase the likelihood of churn. The marketing team can take actions to help prevent customer churn based on these learnings. Examples include creating a detailed FAQ on websites to reduce customer service calls, and running education campaigns with customers on the FAQ that can keep engagement up.

Generate predictions

On the Predict tab, you can generate both batch predictions and single predictions. Complete the following steps to generate a batch prediction:

Download the following sample inference dataset for generating predictions.
To test batch predictions, choose Batch prediction.

SageMaker Canvas allows you to generate batch predictions either manually or automatically on a schedule. To learn how to automate batch predictions on a schedule, refer to Manage automations.

For this post, choose Manual.
Upload the file you downloaded.
Choose Generate predictions.

After a few seconds, the prediction is complete, and you can choose View to see the prediction.

Optionally, choose Download to download a CSV file containing the full output. SageMaker Canvas will return a prediction for each row of data and the probability of the prediction being correct.

Optionally, you can deploy your models to an endpoint to make predictions. For more information, refer to Deploy your models to an endpoint.

Clean up

To avoid future charges, log out of SageMaker Canvas.

Conclusion

In this post, we showcased a solution to extract the data from BigQuery using Athena federated queries and a sample dataset. We then used the extracted data to build an ML model using SageMaker Canvas to predict customers at risk of churning—without writing code. SageMaker Canvas enables business analysts to build and deploy ML models effortlessly through its no-code interface, democratizing ML across the organization. This enables you to harness the power of advanced analytics and ML to drive business insights and innovation, without the need for specialized technical skills.

For more information, see Query any data source with Amazon Athena’s new federated query and Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas. If you’re new to SageMaker Canvas, refer to Build, Share, Deploy: how business analysts and data scientists achieve faster time-to-market using no-code ML and Amazon SageMaker Canvas.

About the authors

Amit Gautam is an AWS senior solutions architect supporting enterprise customers in the UK on their cloud journeys, providing them with architectural advice and guidance that helps them achieve their business outcomes.

Sujata Singh is an AWS senior solutions architect supporting enterprise customers in the UK on their cloud journeys, providing them with architectural advice and guidance that helps them achieve their business outcomes.

Customized model monitoring for near real-time batch inference with Amazon SageMaker

October 28, 2024

by Joe King Amazon AWS

Real-world applications vary in inference requirements for their artificial intelligence and machine learning (AI/ML) solutions to optimize performance and reduce costs. Examples include financial systems processing transaction data streams, recommendation engines processing user activity data, and computer vision models processing video frames. In these scenarios, customized model monitoring for near real-time batch inference with Amazon SageMaker is essential, making sure the quality of predictions is continuously monitored and any deviations are promptly detected.

In this post, we present a framework to customize the use of Amazon SageMaker Model Monitor for handling multi-payload inference requests for near real-time inference scenarios. SageMaker Model Monitor monitors the quality of SageMaker ML models in production. Early and proactive detection of deviations in model quality enables you to take corrective actions, such as retraining models, auditing upstream systems, or fixing quality issues without having to monitor models manually or build additional tooling. SageMaker Model Monitor provides monitoring capabilities for data quality, model quality, bias drift in a model’s predictions, and drift in feature attribution. SageMaker Model Monitor adapts well to common AI/ML use cases and provides advanced capabilities given edge case requirements such as monitoring custom metrics, handling ground truth data, or processing inference data capture.

You can deploy your ML model to SageMaker hosting services and get a SageMaker endpoint for real-time inference. Your client applications invoke this endpoint to get inferences from the model. To reduce the number of invocations and meet custom business objectives, AI/ML developers can customize inference code to send multiple inference records in one payload to the endpoint for near real-time model predictions. Rather than using a SageMaker Model Monitoring schedule with native configurations, a SageMaker Model Monitor Bring Your Own Container (BYOC) approach meets these custom requirements. Although this advanced BYOC topic can appear overwhelming to AI/ML developers, with the right framework, there is opportunity to accelerate SageMaker Model Monitor BYOC development for customized model monitoring requirements.

In this post, we provide a BYOC framework with SageMaker Model Monitor to enable customized payload handling (such as multi-payload requests) from SageMaker endpoint data capture, use ground truth data, and output custom business metrics for model quality.

Overview of solution

SageMaker Model Monitor uses a SageMaker pre-built image using Spark Deequ, which accelerates the usage of model monitoring. Using this pre-built image occasionally becomes problematic when customization is required. For example, the pre-built image requires one inference payload per inference invocation (request to a SageMaker endpoint). However, if you’re sending multiple payloads in one invocation to reduce the number of invocations and setting up model monitoring with SageMaker Model Monitor, then you will need to explore additional capabilities within SageMaker Model Monitor.

A preprocessor script is a capability of SageMaker Model Monitor to preprocess SageMaker endpoint data capture before creating metrics for model quality. However, even with a preprocessor script, you still face a mismatch in the designed behavior of SageMaker Model Monitor, which expects one inference payload per request.

Given these requirements, we create the BYOC framework shown in the following diagram. In this example, we demonstrate setting up a SageMaker Model Monitor job for monitoring model quality.

The workflow includes the following steps:

Before and after training an AI/ML model, an AI/ML developer creates baseline and validation data that is used downstream for monitoring model quality. For example, users can save the accuracy score of a model, or create custom metrics, to validate model quality.
An AI/ML developer creates a SageMaker endpoint including custom inference scripts. Data capture must be enabled for the SageMaker endpoint to save real-time inference data to Amazon Simple Storage Service (Amazon S3) and support downstream SageMaker Model Monitor.
A user or application sends a request including multiple inference payloads. If you have a large volume of inference records, SageMaker batch transform may be a suitable option for your use case.
The SageMaker endpoint (which includes the custom inference code to preprocesses the multi-payload request) passes the inference data to the ML model, postprocesses the predictions, and sends a response to the user or application. The information pertaining to the request and response is stored in Amazon S3.
Independent of calling the SageMaker endpoint, the user or application generates ground truth for the predictions returned by the SageMaker endpoint.
A customer image (BYOC) is pushed to Amazon Elastic Container Registry (Amazon ECR) that contains code to perform the following actions:
- Read input and output contracts required for SageMaker Model Monitor.
- Read ground truth data.
- Optionally, read any baseline constraint or validation data (such as accuracy score threshold).
- Process data capture stored in Amazon S3 from the SageMaker endpoint.
- Compare real-time data with ground truth and create model quality metrics.
- Publish metrics to Amazon CloudWatch Logs and output a model quality report.
The AI/ML developer creates a SageMaker Model Monitor schedule and sets the custom image (BYOC) as the referable image URI.

This post uses code provided in the following GitHub repo to demonstrate the solution. The process includes the following steps:

Train a multi-classification XGBoost model using the public forest coverage dataset.
Create an inference script for the SageMaker endpoint for custom inference logic.
Create a SageMaker endpoint with data capture enabled.
Create a constraint file that contains metrics used to determine if model quality alerts should be generated.
Create a custom Docker image for SageMaker Model Monitor by using the SageMaker Docker Build CLI and push it to Amazon ECR.
Create a SageMaker Model Monitor schedule with the BYOC image.
View the custom model quality report generated by the SageMaker Model Monitor job.

Prerequisites

To follow along with this walkthrough, make sure you have the following prerequisites:

An AWS account
Access to Amazon SageMaker Studio
Applicable AWS Identity and Access Management (IAM) roles for SageMaker, Amazon S3, and Amazon ECR
Familiarity with model deployment and model monitoring concepts
The GitHub repository cloned to an environment within SageMaker Studio

Train the model

In the SageMaker Studio environment, launch a SageMaker training job to train a multi-classification model and output model artifacts to Amazon S3:


from sagemaker.xgboost.estimator import XGBoost
from sagemaker.estimator import Estimator

hyperparameters = {
    "max_depth": 5,
    "eta": 0.36,
    "gamma": 2.88,
    "min_child_weight": 9.89,
    "subsample": 0.77,
    "objective": "multi:softprob",
    "num_class": 7,
    "num_round": 50
}

xgb_estimator = XGBoost(
    entry_point="./src/train.py",
    hyperparameters=hyperparameters,
    role=role,
    instance_count=1,
    instance_type="ml.m5.2xlarge",
    framework_version="1.5-1",
    output_path=f's3://{bucket}/{prefix_name}/models'
)

xgb_estimator.fit(
    {
        "train": train_data_path,
        "validation": validation_data_path
    },
    wait=True,
    logs=True
)

Create Inference Code

Before you deploy the SageMaker endpoint, create an inference script (inference.py) that contains a function to preprocess the request with multiple payloads, invoke the model, and postprocess results.

For output_fn, a payload index is created for each inference record found in the request. This enables you to merge ground truth records with data capture within the SageMaker Model Monitor job.

See the following code:

def input_fn(input_data, content_type):
    """Take request data and de-serializes the data into an object for prediction.
        When an InvokeEndpoint operation is made against an Endpoint running SageMaker model server,
        the model server receives two pieces of information:
            - The request Content-Type, for example "application/json"
            - The request data, which is at most 5 MB (5 * 1024 * 1024 bytes) in size.
    Args:
        input_data (obj): the request data.
        content_type (str): the request Content-Type.
    Returns:
        (obj): data ready for prediction. For XGBoost, this defaults to DMatrix.
    """
    
    if content_type == "application/json":
        request_json = json.loads(input_data)
        prediction_df = pd.DataFrame.from_dict(request_json)
        return xgb.DMatrix(prediction_df)
    else:
        raise ValueError


def predict_fn(input_data, model):
    """A predict_fn for XGBooost Framework. Calls a model on data deserialized in input_fn.
    Args:
        input_data: input data (DMatrix) for prediction deserialized by input_fn
        model: XGBoost model loaded in memory by model_fn
    Returns: a prediction
    """
    output = model.predict(input_data, validate_features=True)
    return output


def output_fn(prediction, accept):
    """Function responsible to serialize the prediction for the response.
    Args:
        prediction (obj): prediction returned by predict_fn .
        accept (str): accept content-type expected by the client.
    Returns: JSON output
    """
    
    if accept == "application/json":
        prediction_labels = np.argmax(prediction, axis=1)
        prediction_scores = np.max(prediction, axis=1)
        output_returns = [
            {
                "payload_index": int(index), 
                "label": int(label), 
                "score": float(score)} for label, score, index in zip(
                prediction_labels, prediction_scores, range(len(prediction_labels))
            )
        ]
        return worker.Response(encoders.encode(output_returns, accept), mimetype=accept)
    
    else:
        raise ValueError

Deploy the SageMaker endpoint

Now that you have created the inference script, you can create the SageMaker endpoint:


from sagemaker.model_monitor import DataCaptureConfig

predictor = xgb_estimator.deploy(
    instance_type="ml.m5.large",
    initial_instance_count=1,
    wait=True,
    data_capture_config=DataCaptureConfig(
        enable_capture=True,
        sampling_percentage=100,
        destination_s3_uri=f"s3://{bucket}/{prefix_name}/model-monitor/data-capture"
    ),
    source_dir="./src",
    entry_point="inference.py"
)

Create constraints for model quality monitoring

In model quality monitoring, you need to compare your metric generated from ground truth and data capture with a pre-specified threshold. In this example, we use the accuracy value of the trained model on the test set as a threshold. If the newly computed accuracy metric (generated using ground truth and data capture) is lower than this threshold, a violation report will be generated and the metrics will be published to CloudWatch.

See the following code:

constraints_dict = {
    "accuracy":{
        "threshold": accuracy_value
    }
}
    

# Serializing json
json_object = json.dumps(constraints_dict, indent=4)
 
# Writing to sample.json
with open("constraints.json", "w") as outfile:
    outfile.write(json_object)

This contraints.json file is written to Amazon S3 and will be the input for the processing job for the SageMaker Model Monitor job downstream.

Publish the BYOC image to Amazon ECR

Create a script named model_quality_monitoring.py to perform the following functions:

Read environment variables and any arguments passed to the SageMaker Model Monitor job
Read SageMaker endpoint data capture and constraint metadata configured with the SageMaker Model Monitor job
Read ground truth data from Amazon S3 using the AWS SDK for pandas
Create accuracy metrics with data capture and ground truth
Create metrics and violation reports given constraint violations
Publish metrics to CloudWatch if violations are present

This script serves as the entry point for the SageMaker Model Monitor job. With a custom image, the entry point script needs to be specified in the Docker image, as shown in the following code. This way, when the SageMaker Model Monitor job initiates, the specified script is run. The sm-mm-mqm-byoc:1.0 image URI is passed to the image_uri argument when you define the SageMaker Model Monitor job downstream.

FROM 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3

RUN python3 -m pip install awswrangler

ENV PYTHONUNBUFFERED=TRUE

ADD ./src/model_quality_monitoring.py /

ENTRYPOINT ["python3", "/model_quality_monitoring.py"]

The custom BYOC image is pushed to Amazon ECR using the SageMaker Docker Build CLI:

sm-docker build . --file ./docker/Dockerfile --repository sm-mm-mqm-byoc:1.0

Create a SageMaker Model Monitor schedule

Next, you use the Amazon SageMaker Python SDK to create a model monitoring schedule. You can define the BYOC ECR image created in the previous section as the image_uri parameter.

You can customize the environment variables and arguments passed to the SageMaker Processing job when SageMaker Model Monitor runs the model quality monitoring job. In this example, the ground truth Amazon S3 URI path is passed as an environment variable and is used within the SageMaker Processing job:


sm_mm_mqm = ModelMonitor(
    role=role, 
    image_uri=f"{account_id}.dkr.ecr.us-east-1.amazonaws.com/sm-mm-mqm-byoc:1.0", 
    instance_count=1, 
    instance_type='ml.m5.xlarge', 
    base_job_name="sm-mm-mqm-byoc",
    sagemaker_session=sess,
    env={
        "ground_truth_s3_uri_path": f"s3://{bucket}/{prefix_name}/model-monitor/mqm/ground_truth/{predictor.endpoint_name}"
    }
)

Before you create the schedule, specify the endpoint name, the Amazon S3 URI output location you want to send violation reports to, the statistics and constraints metadata files (if applicable), and any custom arguments you want to pass to your entry script within your BYOC SageMaker Processing job. In this example, the argument –-create-violation-tests is passed, which creates a mock violation for demonstration purposes. SageMaker Model Monitor accepts the rest of the parameters and translates them into environment variables, which you can use within your custom monitoring job.

sm_mm_mqm.create_monitoring_schedule(
    endpoint_input=predictor.endpoint_name,
    output=MonitoringOutput(
        source="/opt/ml/processing/output",
        destination=f"s3://{bucket}/{prefix_name}/model-monitor/mqm/reports"
    ),
    statistics=f"s3://{bucket}/{prefix_name}/model-monitor/mqm/baseline-data/statistics.json",
    constraints=f"s3://{bucket}/{prefix_name}/model-monitor/mqm/baseline-data/constraints.json",
    monitor_schedule_name="sm-mm-byoc-batch-inf-schedule",
    schedule_cron_expression=CronExpressionGenerator().hourly(),
    arguments=[
        "--create-violation-tests"
    ]
)

Review the entry point script model_quallity_monitoring.py to better understand how to use custom arguments and environment variables provided by the SageMaker Model Monitor job.

Observe the SageMaker Model Monitor job output

Now that the SageMaker Model Monitor resource is created, the SageMaker endpoint is invoked.

In this example, a request is provided that includes a list of two payloads in which we want to collect predictions:

sm_runtime = boto3.client("sagemaker-runtime")

response = sm_runtime.invoke_endpoint(
    EndpointName=predictor.endpoint_name,
    ContentType="application/json",
    Accept="application/json",
    Body=test_records,
    InferenceId="0"
)

InferenceId is passed as an argument to the invoke_endpoint method. This ID is used downstream when merging the ground truth data to the real-time SageMaker endpoint data capture. In this example, we want to collect ground truth with the following structure.

InferenceI	payload_index	groundTruthLabel
0	0	1
0	1	0

This makes it simpler when merging the ground truth data with real-time data within the SageMaker Model Monitor custom job.

Because we set the CRON schedule for the SageMaker Model Monitor job to an hourly schedule, we can view the results at the end of the hour. In SageMaker Studio Classic, by navigating the SageMaker endpoint details page, you can choose the Monitoring job history tab to view status reports of the SageMaker Model Monitor job.

If an issue is found, you can choose the monitoring job name to review the report.

In this example, the custom model monitoring metric created in the BYOC flagged an accuracy score violation of -1 (this was done purposely for demonstration with the argument --create-violation-tests).

This gives you the ability to monitor model quality violations for your custom SageMaker Model Monitor job within the SageMaker Studio console. If you want to invoke CloudWatch alarms based on published CloudWatch metrics, you must create these CloudWatch metrics with your BYOC job. You can review how this is done within the monitor_quality_monitoring.py script. For automated alerts for model monitoring, creating an Amazon Simple Notification Service (Amazon SNS) topic is recommended, which email user groups will subscribe to for alerts on a given CloudWatch metric alarm.

Clean up

To avoid incurring future charges, delete all resources related to the SageMaker Model Monitor schedule by completing the following steps:

Delete data capture and any ground truth data:

! aws s3 rm s3://{bucket}/{prefix_name}/model-monitor/data-capture/{predictor.endpoint_name} --recursive
! aws s3 rm s3://{bucket}/{prefix_name}/model-monitor/mqm/ground_truth/{predictor.endpoint_name} --recursive

Delete the monitoring schedule:
```
sm_mm_mqm.delete_monitoring_schedule()
```

Delete the SageMaker model and SageMaker endpoint:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

Custom business or technical requirements for a SageMaker endpoint frequently have an impact on downstream efforts in model monitoring. In this post, we provided a framework that enables you to customize SageMaker Model Monitor jobs (in this case, for monitoring model quality) to handle the use case of passing multiple inference payloads to a SageMaker endpoint.

Explore the provided GitHub repository to implement this customized model monitoring framework with SageMaker Model Monitor. You can use this framework as a starting point to monitor your custom metrics or handle other unique requirements for model quality monitoring in your AI/ML applications.

About the Authors

Joe King is a Sr. Data Scientist at AWS, bringing a breadth of data science, ML engineering, MLOps, and AI/ML architecting to help businesses create scalable solutions on AWS.

Ajay Raghunathan is a Machine Learning Engineer at AWS. His current work focuses on architecting and implementing ML solutions at scale. He is a technology enthusiast and a builder with a core area of interest in AI/ML, data analytics, serverless, and DevOps. Outside of work, he enjoys spending time with family, traveling, and playing football.

Raju Patil is a Sr. Data Scientist with AWS Professional Services. He architects, builds, and deploys AI/ML solutions to help AWS customers across different verticals overcome business challenges in a variety of AI/ML use cases.

How Planview built a scalable AI Assistant for portfolio and project management using Amazon Bedrock

October 25, 2024

by Sunil Ramachandra Amazon AWS

This post is co-written with Lee Rehwinkel from Planview.

Businesses today face numerous challenges in managing intricate projects and programs, deriving valuable insights from massive data volumes, and making timely decisions. These hurdles frequently lead to productivity bottlenecks for program managers and executives, hindering their ability to drive organizational success efficiently.

Planview, a leading provider of connected work management solutions, embarked on an ambitious plan in 2023 to revolutionize how 3 million global users interact with their project management applications. To realize this vision, Planview developed an AI assistant called Planview Copilot, using a multi-agent system powered by Amazon Bedrock.

Developing this multi-agent system posed several challenges:

Reliably routing tasks to appropriate AI agents
Accessing data from various sources and formats
Interacting with multiple application APIs
Enabling the self-serve creation of new AI skills by different product teams

To overcome these challenges, Planview developed a multi-agent architecture built using Amazon Bedrock. Amazon Bedrock is a fully managed service that provides API access to foundation models (FMs) from Amazon and other leading AI startups. This allows developers to choose the FM that is best suited for their use case. This approach is both architecturally and organizationally scalable, enabling Planview to rapidly develop and deploy new AI skills to meet the evolving needs of their customers.

This post focuses primarily on the first challenge: routing tasks and managing multiple agents in a generative AI architecture. We explore Planview’s approach to this challenge during the development of Planview Copilot, sharing insights into the design decisions that provide efficient and reliable task routing.

We describe customized home-grown agents in this post because this project was implemented before Amazon Bedrock Agents was generally available. However, Amazon Bedrock Agents is now the recommended solution for organizations looking to use AI-powered agents in their operations. Amazon Bedrock Agents can retain memory across interactions, offering more personalized and seamless user experiences. You can benefit from improved recommendations and recall of prior context where required, enjoying a more cohesive and efficient interaction with the agent. We share our learnings in our solution to help you understanding how to use AWS technology to build solutions to meet your goals.

Solution overview

Planview’s multi-agent architecture consists of multiple generative AI components collaborating as a single system. At its core, an orchestrator is responsible for routing questions to various agents, collecting the learned information, and providing users with a synthesized response. The orchestrator is managed by a central development team, and the agents are managed by each application team.

The orchestrator comprises two main components called the router and responder, which are powered by a large language model (LLM). The router uses AI to intelligently route user questions to various application agents with specialized capabilities. The agents can be categorized into three main types:

Help agent – Uses Retrieval Augmented Generation (RAG) to provide application help
Data agent – Dynamically accesses and analyzes customer data
Action agent – Runs actions within the application on the user’s behalf

After the agents have processed the questions and provided their responses, the responder, also powered by an LLM, synthesizes the learned information and formulates a coherent response to the user. This architecture allows for a seamless collaboration between the centralized orchestrator and the specialized agents, which provides users an accurate and comprehensive answers to their questions. The following diagram illustrates the end-to-end workflow.

Technical overview

Planview used key AWS services to build its multi-agent architecture. The central Copilot service, powered by Amazon Elastic Kubernetes Service (Amazon EKS), is responsible for coordinating activities among the various services. Its responsibilities include:

Managing user session chat history using Amazon Relational Database Service (Amazon RDS)
Coordinating traffic between the router, application agents, and responder
Handling logging, monitoring, and collecting user-submitted feedback

The router and responder are AWS Lambda functions that interact with Amazon Bedrock. The router considers the user’s question and chat history from the central Copilot service, and the responder considers the user’s question, chat history, and responses from each agent.

Application teams manage their agents using Lambda functions that interact with Amazon Bedrock. For improved visibility, evaluation, and monitoring, Planview has adopted a centralized prompt repository service to store LLM prompts.

Agents can interact with applications using various methods depending on the use case and data availability:

Existing application APIs – Agents can communicate with applications through their existing API endpoints
Amazon Athena or traditional SQL data stores – Agents can retrieve data from Amazon Athena or other SQL-based data stores to provide relevant information
Amazon Neptune for graph data – Agents can access graph data stored in Amazon Neptune to support complex dependency analysis
Amazon OpenSearch Service for document RAG – Agents can use Amazon OpenSearch Service to perform RAG on documents

The following diagram illustrates the generative AI assistant architecture on AWS.

Router and responder sample prompts

The router and responder components work together to process user queries and generate appropriate responses. The following prompts provide illustrative router and responder prompt templates. Additional prompt engineering would be required to improve reliability for a production implementation.

First, the available tools are described, including their purpose and sample questions that can be asked of each tool. The example questions help guide the natural language interactions between the orchestrator and the available agents, as represented by tools.

tools = '''
<tool>
<toolName>applicationHelp</toolName>
<toolDescription>
Use this tool to answer application help related questions.
Example questions:
How do I reset my password?
How do I add a new user?
How do I create a task?
</toolDescription>
</tool>
<tool>
<toolName>dataQuery</toolName>
<toolDescription>
Use this tool to answer questions using application data.
Example questions:
Which tasks are assigned to me?
How many tasks are due next week?
Which task is most at risk?
</toolDescription>
</tool>

Next, the router prompt outlines the guidelines for the agent to either respond directly to user queries or request information through specific tools before formulating a response:

system_prompt_router = f'''
<role>
Your job is to decide if you need additional information to fully answer the User's 
questions.
You achieve your goal by choosing either 'respond' or 'callTool'.
You have access to your chat history in <chatHistory></chatHistory> tags.
You also have a list of available tools to assist you in <tools></tools> tags.
</role>
<chatHistory>
{chatHistory}
</chatHistory>
<tools>
{tools}
</tools>
<rules>
- If the chat history contains sufficient information to answer the User's questions, 
choose the 'respond' action.
- To gather more information before responding, choose the 'callTool' action.
- You many only choose from the tools in the <tools></tools> tags.
- If no tool can assist with the question, choose the 'respond' action.
- Place your chosen action within <action></action> tags.
- When you chose the 'callTool' action, provide the <toolName> and the <toolQuestion> you
would like to ask.
- Your <toolQuestion> should be verbose and avoid using pronouns.
- Start by providing your step-by-step thinking in <thinking></thinking> tags.
- Then you will give your answer in <answer></answer> tags.
- Your answer should follow the format of one of these three examples:
When choosing the 'respond' action, your answer should follow the below example EXACTLY:
<example>
<answer>
<action>respond</action>
</answer>
</example>
When choosing the 'callTool' action for a single Tool:
<example>
<answer>
<action>callTool</action>
<tools>
<tool>
<toolName>applicationHelp</toolName>
<toolQuestion>How do I reset my password?</toolQuestion>
</tool>
</tools>
</answer>
</example>
Executing the above, would produce the following result:
You can also call multiple Tools using this format:
<example>
<answer>
<action>callTool</action>
<tools>
<tool>
<toolName>dataQuery</toolName>
<toolQuestion>How many tasks are assigned to me?</toolQuestion>
</tool>
<tool>
<toolName>applicationHelp</toolName>
<toolQuestion>How do I add a new task?</toolQuestion>
</tool>
</tools>
</answer>
</example>
</rules>
'''

The following is a sample response from the router component that initiates the dataQuery tool to retrieve and analyze task assignments for each user:

<thinking>
To determine who has the most tasks assigned, I will need to query the application data. The "dataQuery" tool seems most appropriate for this question.
</thinking>

<answer>
<action>callTool</action>
<tools>
    <tool>
        <toolName>dataQuery</toolName>
        <toolQuestion>Which user has the most tasks currently assigned to them?   </toolQuestion>
    </tool>
</tools>
</answer>

The following is a sample response from the responder component that uses the dataQuery tool to fetch information about the user’s assigned tasks. It reports that the user has five tasks assigned to them.

<thinking>
Based on the chat history, I previously called the dataQuery tool to ask "How many tasks are currently assigned to the user?". The tool responded that the user has 5 tasks assigned to them.
</thinking>

<answer>
According to the data I queried previously, you have 5 tasks assigned to you.
</answer>

Model evaluation and selection

Evaluating and monitoring generative AI model performance is crucial in any AI system. Planview’s multi-agent architecture enables assessment at various component levels, providing comprehensive quality control despite the system’s complexity. Planview evaluates components at three levels:

Prompts – Assessing LLM prompts for effectiveness and accuracy
AI agents – Evaluating complete prompt chains to maintain optimal task handling and response relevance
AI system – Testing user-facing interactions to verify seamless integration of all components

The following figure illustrates the evaluation framework for prompts and scoring.

To conduct these evaluations, Planview uses a set of carefully crafted test questions that cover typical user queries and edge cases. These evaluations are performed during the development phase and continue in production to track the quality of responses over time. Currently, human evaluators play a crucial role in scoring responses. To aid in the evaluation, Planview has developed an internal evaluation tool to store the library of questions and track the responses over time.

To assess each component and determine the most suitable Amazon Bedrock model for a given task, Planview established the following prioritized evaluation criteria:

Quality of response – Assuring accuracy, relevance, and helpfulness of system responses
Time of response – Minimizing latency between user queries and system responses
Scale – Making sure the system can scale to thousands of concurrent users
Cost of response – Optimizing operational costs, including AWS services and generative AI models, to maintain economic viability

Based on these criteria and the current use case, Planview selected Anthropic’s Claude 3 Sonnet on Amazon Bedrock for the router and responder components.

Results and impact

Over the past year, Planview Copilot’s performance has significantly improved through the implementation of a multi-agent architecture, development of a robust evaluation framework, and adoption of the latest FMs available through Amazon Bedrock. Planview saw the following results between the first generation of Planview Copilot developed mid-2023 and the latest version:

Accuracy – Human-evaluated accuracy has improved from 50% answer acceptance to now exceeding 95%
Response time – Average response times have been reduced from over 1 minute to 20 seconds
Load testing – The AI assistant has successfully passed load tests, where 1,000 questions were submitted simultaneous with no noticeable impact on response time or quality
Cost-efficiency – The cost per customer interaction has been slashed to one tenth of the initial expense
Time-to-market – New agent development and deployment time has been reduced from months to weeks

Conclusion

In this post, we explored how Planview was able to develop a generative AI assistant to address complex work management process by adopting the following strategies:

Modular development – Planview built a multi-agent architecture with a centralized orchestrator. The solution enables efficient task handling and system scalability, while allowing different product teams to rapidly develop and deploy new AI skills through specialized agents.
Evaluation framework – Planview implemented a robust evaluation process at multiple levels, which was crucial for maintaining and improving performance.
Amazon Bedrock integration – Planview used Amazon Bedrock to innovate faster with broad model choice and access to various FMs, allowing for flexible model selection based on specific task requirements.

Planview is migrating to Amazon Bedrock Agents, which enables the integration of intelligent autonomous agents within their application ecosystem. Amazon Bedrock Agents automate processes by orchestrating interactions between foundation models, data sources, applications, and user conversations.

As next steps, you can explore Planview’s AI assistant feature built on Amazon Bedrock and stay updated with new Amazon Bedrock features and releases to advance your AI journey on AWS.

About Authors

Sunil Ramachandra is a Senior Solutions Architect enabling hyper-growth Independent Software Vendors (ISVs) to innovate and accelerate on AWS. He partners with customers to build highly scalable and resilient cloud architectures. When not collaborating with customers, Sunil enjoys spending time with family, running, meditating, and watching movies on Prime Video.

Benedict Augustine is a thought leader in Generative AI and Machine Learning, serving as a Senior Specialist at AWS. He advises customer CxOs on AI strategy, to build long-term visions while delivering immediate ROI.As VP of Machine Learning, Benedict spent the last decade building seven AI-first SaaS products, now used by Fortune 100 companies, driving significant business impact. His work has earned him 5 patents.

Lee Rehwinkel is a Principal Data Scientist at Planview with 20 years of experience in incorporating AI & ML into Enterprise software. He holds advanced degrees from both Carnegie Mellon University and Columbia University. Lee spearheads Planview’s R&D efforts on AI capabilities within Planview Copilot. Outside of work, he enjoys rowing on Austin’s Lady Bird Lake.