Paper presents a criterion for halting the hyperparameter optimization process.Read More
Scale YOLOv5 inference with Amazon SageMaker endpoints and AWS Lambda
After data scientists carefully come up with a satisfying machine learning (ML) model, the model must be deployed to be easily accessible for inference by other members of the organization. However, deploying models at scale with optimized cost and compute efficiencies can be a daunting and cumbersome task. Amazon SageMaker endpoints provide an easily scalable and cost-optimized solution for model deployment. The YOLOv5 model, distributed under the GPLv3 license, is a popular object detection model known for its runtime efficiency as well as detection accuracy. In this post, we demonstrate how to host a pre-trained YOLOv5 model on SageMaker endpoints and use AWS Lambda functions to invoke these endpoints.
Solution overview
The following image outlines the AWS services used to host the YOLOv5 model using a SageMaker endpoint and invoke the endpoint using Lambda. The SageMaker notebook accesses a YOLOv5 PyTorch model from an Amazon Simple Storage Service (Amazon S3) bucket, converts it to YOLOv5 TensorFlow SavedModel
format, and stores it back to the S3 bucket. This model is then used when hosting the endpoint. When an image is uploaded to Amazon S3, it acts as a trigger to run the Lambda function. The function utilizes OpenCV Lambda layers to read the uploaded image and run inference using the endpoint. After the inference is run, you can use the results obtained from it as needed.
In this post, we walk through the process of utilizing a YOLOv5 default model in PyTorch and converting it to a TensorFlow SavedModel
. This model is hosted using a SageMaker endpoint. Then we create and publish a Lambda function that invokes the endpoint to run inference. Pre-trained YOLOv5 models are available on GitHub. For the purpose of this post, we use the yolov5l model.
Prerequisites
As a prerequisite, we need to set up the following AWS Identity and Access Management (IAM) roles with appropriate access policies for SageMaker, Lambda, and Amazon S3:
-
SageMaker IAM role – This requires
AmazonS3FullAccess
policies attached for storing and accessing the model in the S3 bucket -
Lambda IAM role – This role needs multiple policies:
- To access images stored in Amazon S3, we require the following IAM policies:
s3:GetObject
s3:ListBucket
- To run the SageMaker endpoint, we need access to the following IAM policies:
sagemaker:ListEndpoints
sagemaker:DescribeEndpoint
sagemaker:InvokeEndpoint
sagemaker:InvokeEndpointAsync
- To access images stored in Amazon S3, we require the following IAM policies:
You also need the following resources and services:
- The AWS Command Line Interface (AWS CLI), which we use to create and configure Lambda.
- A SageMaker notebook instance. These come with Docker pre-installed, and we use this to create the Lambda layers. To set up the notebook instance, complete the following steps:
- On the SageMaker console, create a notebook instance and provide the notebook name, instance type (for this post, we use ml.c5.large), IAM role, and other parameters.
- Clone the public repository and add the YOLOv5 repository provided by Ultralytics.
Host YOLOv5 on a SageMaker endpoint
Before we can host the pre-trained YOLOv5 model on SageMaker, we must export and package it in the correct directory structure inside model.tar.gz
. For this post, we demonstrate how to host YOLOv5 in the saved_model
format. The YOLOv5 repo provides an export.py
file that can export the model in many different ways. After you clone the YOLOv5 and enter the YOLOv5 directory from command line, you can export the model with the following command:
$ cd yolov5
$ pip install -r requirements.txt tensorflow-cpu
$ python export.py --weights yolov5l.pt --include saved_model --nms
This command creates a new directory called yolov5l_saved_model
inside the yolov5
directory. Inside the yolov5l_saved_model
directory, we should see the following items:
To create a model.tar.gz
file, move the contents of yolov5l_saved_model
to export/Servo/1
. From the command line, we can compress the export
directory by running the following command and upload the model to the S3 bucket:
$ mkdir export && mkdir export/Servo
$ mv yolov5l_saved_model export/Servo/1
$ tar -czvf model.tar.gz export/
$ aws s3 cp model.tar.gz "<s3://BUCKET/PATH/model.tar.gz>"
Then, we can deploy a SageMaker endpoint from a SageMaker notebook by running the following code:
import os
import tensorflow as tf
from tensorflow.keras import backend
from sagemaker.tensorflow import TensorFlowModel
model_data = '<s3://BUCKET/PATH/model.tar.gz>'
role = '<IAM ROLE>'
model = TensorFlowModel(model_data=model_data,
framework_version='2.8', role=role)
INSTANCE_TYPE = 'ml.m5.xlarge'
ENDPOINT_NAME = 'yolov5l-demo'
predictor = model.deploy(initial_instance_count=1,
instance_type=INSTANCE_TYPE,
endpoint_name=ENDPOINT_NAME)
The preceding script takes approximately 2–3 minutes to fully deploy the model to the SageMaker endpoint. You can monitor the status of the deployment on the SageMaker console. After the model is hosted successfully, the model is ready for inference.
Test the SageMaker endpoint
After the model is successfully hosted on a SageMaker endpoint, we can test it out, which we do using a blank image. The testing code is as follows:
import numpy as np
ENDPOINT_NAME = 'yolov5l-demo'
modelHeight, modelWidth = 640, 640
blank_image = np.zeros((modelHeight,modelWidth,3), np.uint8)
data = np.array(resized_image.astype(np.float32)/255.)
payload = json.dumps([data.tolist()])
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='application/json',
Body=payload)
result = json.loads(response['Body'].read().decode())
print('Results: ', result)
Set up Lambda with layers and triggers
We use OpenCV to demonstrate the model by passing an image and getting the inference results. Lambda doesn’t come with external libraries like OpenCV pre-built, therefore we need to build it before we can invoke the Lambda code. Furthermore, we want to make sure that we don’t build external libraries like OpenCV every time Lambda is being invoked. For this purpose, Lambda provides a functionality to create Lambda layers. We can define what goes in these layers, and they can be consumed by the Lambda code every time it’s invoked. We also demonstrate how to create the Lambda layers for OpenCV. For this post, we use an Amazon Elastic Compute Cloud (Amazon EC2) instance to create the layers.
After we have the layers in place, we create the app.py
script, which is the Lambda code that uses the layers, runs the inference, and gets results. The following diagram illustrates this workflow.
Create Lambda layers for OpenCV using Docker
Use Dockerfile as follows to create the Docker image using Python 3.7:
FROM amazonlinux
RUN yum update -y
RUN yum install gcc openssl-devel bzip2-devel libffi-devel wget tar gzip zip make -y
# Install Python 3.7
WORKDIR /
RUN wget https://www.python.org/ftp/python/3.7.12/Python-3.7.12.tgz
RUN tar -xzvf Python-3.7.12.tgz
WORKDIR /Python-3.7.12
RUN ./configure --enable-optimizations
RUN make altinstall
# Install Python packages
RUN mkdir /packages
RUN echo "opencv-python" >> /packages/requirements.txt
RUN mkdir -p /packages/opencv-python-3.7/python/lib/python3.7/site-packages
RUN pip3.7 install -r /packages/requirements.txt -t /packages/opencv-python-3.7/python/lib/python3.7/site-packages
# Create zip files for Lambda Layer deployment
WORKDIR /packages/opencv-python-3.7/
RUN zip -r9 /packages/cv2-python37.zip .
WORKDIR /packages/
RUN rm -rf /packages/opencv-python-3.7/
Build and run Docker and store the output ZIP file in the current directory under layers
:
$ docker build --tag aws-lambda-layers:latest <PATH/TO/Dockerfile>
$ docker run -rm -it -v $(pwd):/layers aws-lambda-layers cp /packages/cv2-python37.zip /layers
Now we can upload the OpenCV layer artifacts to Amazon S3 and create the Lambda layer:
$ aws s3 cp layers/cv2-python37.zip s3://<BUCKET>/<PATH/TO/STORE/ARTIFACTS>
$ aws lambda publish-layer-version --layer-name cv2 --description "Open CV" --content S3Bucket=<BUCKET>,S3Key=<PATH/TO/STORE/ARTIFACTS>/cv2-python37.zip --compatible-runtimes python3.7
After the preceding commands run successfully, you have an OpenCV layer in Lambda, which you can review on the Lambda console.
Create the Lambda function
We utilize the app.py
script to create the Lambda function and use OpenCV. In the following code, change the values for BUCKET_NAME
and IMAGE_LOCATION
to the location for accessing the image:
import os, logging, json, time, urllib.parse
import boto3, botocore
import numpy as np, cv2
logger = logging.getLogger()
logger.setLevel(logging.INFO)
client = boto3.client('lambda')
# S3 BUCKETS DETAILS
s3 = boto3.resource('s3')
BUCKET_NAME = "<NAME OF S3 BUCKET FOR INPUT IMAGE>"
IMAGE_LOCATION = "<S3 PATH TO IMAGE>/image.png"
# INFERENCE ENDPOINT DETAILS
ENDPOINT_NAME = 'yolov5l-demo'
config = botocore.config.Config(read_timeout=80)
runtime = boto3.client('runtime.sagemaker', config=config)
modelHeight, modelWidth = 640, 640
# RUNNING LAMBDA
def lambda_handler(event, context):
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
# INPUTS - Download Image file from S3 to Lambda /tmp/
input_imagename = key.split('/')[-1]
logger.info(f'Input Imagename: {input_imagename}')
s3.Bucket(BUCKET_NAME).download_file(IMAGE_LOCATION + '/' + input_imagename, '/tmp/' + input_imagename)
# INFERENCE - Invoke the SageMaker Inference Endpoint
logger.info(f'Starting Inference ... ')
orig_image = cv2.imread('/tmp/' + input_imagename)
if orig_image is not None:
start_time_iter = time.time()
# pre-processing input image
image = cv2.resize(orig_image.copy(), (modelWidth, modelHeight), interpolation = cv2.INTER_AREA)
data = np.array(image.astype(np.float32)/255.)
payload = json.dumps([data.tolist()])
# run inference
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME, ContentType='application/json', Body=payload)
# get the output results
result = json.loads(response['Body'].read().decode())
end_time_iter = time.time()
# get the total time taken for inference
inference_time = round((end_time_iter - start_time_iter)*100)/100
logger.info(f'Inference Completed ... ')
# OUTPUTS - Using the output to utilize in other services downstream
return {
"statusCode": 200,
"body": json.dumps({
"message": "Inference Time:// " + str(inference_time) + " seconds.",
"results": result
}),
}
Deploy the Lambda function with the following code:
$ zip app.zip app.py
$ aws s3 cp app.zip s3://<BUCKET>/<PATH/TO/STORE/FUNCTION>
$ aws lambda create-function --function-name yolov5-lambda --handler app.lambda_handler --region us-east-1 --runtime python3.7 --environment "Variables={BUCKET_NAME=$BUCKET_NAME,S3_KEY=$S3_KEY}" --code S3Bucket=<BUCKET>,S3Key="<PATH/TO/STORE/FUNCTION/app.zip>"
Attach the OpenCV layer to the Lambda function
After we have the Lambda function and layer in place, we can connect the layer to the function as follows:
$ aws lambda update-function-configuration --function-name yolov5-lambda --layers cv2
We can review the layer settings via the Lambda console.
Trigger Lambda when an image is uploaded to Amazon S3
We use an image upload to Amazon S3 as a trigger to run the Lambda function. For instructions, refer to Tutorial: Using an Amazon S3 trigger to invoke a Lambda function.
You should see the following function details on the Lambda console.
Run inference
After you set up Lambda and the SageMaker endpoint, you can test the output by invoking the Lambda function. We use an image upload to Amazon S3 as a trigger to invoke Lambda, which in turn invokes the endpoint for inference. As an example, we upload the following image to the Amazon S3 location <S3 PATH TO IMAGE>/test_image.png
configured in the previous section.
After the image is uploaded, the Lambda function is triggered to download and read the image data and send it to the SageMaker endpoint for inference. The output result from the SageMaker endpoint is obtained and returned by the function in JSON format, which we can use in different ways. The following image shows example output overlayed on the image.
Clean up
Depending on the instance type, SageMaker notebooks can require significant compute usage and cost. To avoid unnecessary costs, we advise stopping the notebook instance when it’s not in use. Additionally, Lambda functions and SageMaker endpoints incur charges only when they’re invoked. Therefore, no cleanup is necessary for those services. However, if an endpoint isn’t being used any longer, it’s good practice to remove the endpoint and the model.
Conclusion
In this post, we demonstrated how to host a pre-trained YOLOv5 model on a SageMaker endpoint and use Lambda to invoke inference and process the output. The detailed code is available on GitHub.
To learn more about SageMaker endpoints, check out Create your endpoint and deploy your model and Build, test, and deploy your Amazon SageMaker inference models to AWS Lambda, which highlights how you can automate the process of deploying YOLOv5 models.
About the authors
Kevin Song is an IoT Edge Data Scientist at AWS Professional Services. Kevin holds a PhD in Biophysics from The University of Chicago. He has over 4 years of industry experience in Computer Vision and Machine Learning. He is involved in helping customers in the sports and life sciences industry deploy Machine Learning models.
Romil Shah is an IoT Edge Data Scientist at AWS Professional Services. Romil has over 6 years of industry experience in Computer Vision, Machine Learning and IoT edge devices. He is involved in helping customers optimize and deploy their Machine Learning models for edge devices for industrial setup.
20B-parameter Alexa model sets new marks in few-shot learning
With an encoder-decoder architecture — rather than decoder only — the Alexa Teacher Model excels other large language models on few-shot tasks such as summarization and machine translation.Read More
Amazon Scholar Kathleen McKeown receives dual honors
McKeown awarded IEEE Innovation in Societal Infrastructure Award and named a member of the American Philosophical Society.Read More
Simplify iterative machine learning model development by adding features to existing feature groups in Amazon SageMaker Feature Store
Feature engineering is one of the most challenging aspects of the machine learning (ML) lifecycle and a phase where the most amount of time is spent—data scientists and ML engineers spend 60–70% of their time on feature engineering. AWS introduced Amazon SageMaker Feature Store during AWS re:Invent 2020, which is a purpose-built, fully managed, centralized store for features and associated metadata. Features are signals extracted from data to train ML models. The advantage of Feature Store is that the feature engineering logic is authored one time, and the features generated are stored on a central platform. The central store of features can be used for training and inference and be reused across different data engineering teams.
Features in a feature store are stored in a collection called feature group. A feature group is analogous to a database table schema where columns represent features and rows represent individual records. Feature groups have been immutable since Feature Store was introduced. If we had to add features to an existing feature group, the process was cumbersome—we had to create a new feature group, backfill the new feature group with historical data, and modify downstream systems to use this new feature group. ML development is an iterative process of trial and error where we may identify new features continuously that can improve model performance. It’s evident that not being able to add features to feature groups can lead to a complex ML model development lifecycle.
Feature Store recently introduced the ability to add new features to existing feature groups. A feature group schema evolves over time as a result of new business requirements or because new features have been identified that yield better model performance. Data scientists and ML engineers need to easily add features to an existing feature group. This ability reduces the overhead associated with creating and maintaining multiple feature groups and therefore lends itself to iterative ML model development. Model training and inference can take advantage of new features using the same feature group by making minimal changes.
In this post, we demonstrate how to add features to a feature group using the newly released UpdateFeatureGroup API.
Overview of solution
Feature Store acts as a single source of truth for feature engineered data that is used in ML training and inference. When we store features in Feature Store, we store them in feature groups.
We can enable feature groups for offline only mode, online only mode, or online and offline modes.
An online store is a low-latency data store and always has the latest snapshot of the data. An offline store has a historical set of records persisted in Amazon Simple Storage Service (Amazon S3). Feature Store automatically creates an AWS Glue Data Catalog for the offline store, which enables us to run SQL queries against the offline data using Amazon Athena.
The following diagram illustrates the process of feature creation and ingestion into Feature Store.
The workflow contains the following steps:
- Define a feature group and create the feature group in Feature Store.
- Ingest data into the feature group, which writes to the online store immediately and then to the offline store.
- Use the offline store data stored in Amazon S3 for training one or more models.
- Use the offline store for batch inference.
- Use the online store supporting low-latency reads for real-time inference.
- To update the feature group to add a new feature, we use the new Amazon SageMaker
UpdateFeatureGroup
API. This also updates the underlying AWS Glue Data Catalog. After the schema has been updated, we can ingest data into this updated feature group and use the updated offline and online store for inference and model training.
Dataset
To demonstrate this new functionality, we use a synthetically generated customer dataset. The dataset has unique IDs for customer, sex, marital status, age range, and how long since they have been actively purchasing.
Let’s assume a scenario where a business is trying to predict the propensity of a customer purchasing a certain product, and data scientists have developed a model to predict this intended outcome. Let’s also assume that the data scientists have identified a new signal for the customer that could potentially improve model performance and better predict the outcome. We work through this use case to understand how to update feature group definition to add the new feature, ingest data into this new feature, and finally explore the online and offline feature store to verify the changes.
Prerequisites
For this walkthrough, you should have the following prerequisites:
- An AWS account.
- A SageMaker Jupyter notebook instance. Access the code from the Amazon SageMaker Feature Store Update Feature Group GitHub repository and upload it to your notebook instance.
- You can also run the notebook in the Amazon SageMaker Studio environment, which is an IDE for ML development. You can clone the GitHub repo via a terminal inside the Studio environment using the following command:
Add features to a feature group
In this post, we walk through the update_feature_group.ipynb notebook, in which we create a feature group, ingest an initial dataset, update the feature group to add a new feature, and re-ingest data that includes the new feature. At the end, we verify the online and offline store for the updates. The fully functional notebook and sample data can be found in the GitHub repository. Let’s explore some of the key parts of the notebook here.
- We create a feature group to store the feature-engineered customer data using the
FeatureGroup.create
API of the SageMaker SDK.
- We create a Pandas DataFrame with the initial CSV data. We use the current time as the timestamp for the
event_time
feature. This corresponds to the time when the event occurred, which implies when the record is added or updated in the feature group. - We ingest the DataFrame into the feature group using the SageMaker SDK
FeatureGroup.ingest
API. This is a small dataset and therefore can be loaded into a Pandas DataFrame. When we work with large amounts of data and millions of rows, there are other scalable mechanisms to ingest data into Feature Store, such as batch ingestion with Apache Spark.
- We can verify that data has been ingested into the feature group by running Athena queries in the notebook or running queries on the Athena console.
- After we verify that the offline feature store has the initial data, we add the new feature
has_kids
to the feature group using the Boto3 update_feature_group APIThe Data Catalog gets automatically updated as part of this API call. The API supports adding multiple features at a time by specifying them in the
FeatureAdditions
dictionary.
- We verify that feature has been added by checking the updated feature group definition
The
LastUpdateStatus
in thedescribe_feature_group
API response initially shows the statusInProgress
. After the operation is successful, theLastUpdateStatus
status changes toSuccessful
. If for any reason the operation encounters an error, thelastUpdateStatus
status shows asFailed
, with the detailed error message inFailureReason
.
When theupdate_feature_group
API is invoked, the control plane reflects the schema change immediately, but the data plane takes up to 5 minutes to update its feature group schema. We must ensure that enough time is given for the update operation before proceeding to data ingestion.
- We prepare data for the
has_kids
feature by generating random 1s and 0s to indicate whether a customer has kids or not.
- We ingest the DataFrame that has the newly added column into the feature group using the SageMaker SDK
FeatureGroup.ingest
API
- Next, we verify the feature record in the online store for a single customer using the Boto3
get_record
API. - Let’s query the same customer record on the Athena console to verify the offline data store. The data is appended to the offline store to maintain historical writes and updates. Therefore, we see two records here: a newer record that has the feature updated to value 1, and an older record that doesn’t have this feature and therefore shows the value as empty. The offline store persistence happens in batches within 15 minutes, so this step could take time.
Now that we have this feature added to our feature group, we can extract this new feature into our training dataset and retrain models. The goal of the post is to highlight the ease of modifying a feature group, ingesting data into the new feature, and then using the updated data in the feature group for model training and inference.
Clean up
Don’t forget to clean up the resources created as part of this post to avoid incurring ongoing charges.
- Delete the S3 objects in the offline store:
- Delete the feature group:
- Stop the SageMaker Jupyter notebook instance. For instructions, refer to Clean Up.
Conclusion
Feature Store is a fully managed, purpose-built repository to store, share, and manage features for ML models. Being able to add features to existing feature groups simplifies iterative model development and alleviates the challenges we see in creating and maintaining multiple feature groups.
In this post, we showed you how to add features to existing feature groups via the newly released SageMaker UpdateFeatureGroup
API. The steps shown in this post are available as a Jupyter notebook in the GitHub repository. Give it a try and let us know your feedback in the comments.
Further reading
If you’re interested in exploring the complete scenario mentioned earlier in this post of predicting a customer ordering a certain product, check out the following notebook, which modifies the feature group, ingests data, and trains an XGBoost model with the data from the updated offline store. This notebook is part of a comprehensive workshop developed to demonstrate Feature Store functionality.
References
More information is available at the following resources:
- Create, Store, and Share Features with Amazon SageMaker Feature Store
- Amazon Athena User Guide
- Get Started with Amazon SageMaker Notebook Instances
- UpdateFeatureGroup API
- SageMaker Boto3 update_feature_group API
- Getting started with Amazon SageMaker Feature Store
About the authors
Chaitra Mathur is a Principal Solutions Architect at AWS. She guides customers and partners in building highly scalable, reliable, secure, and cost-effective solutions on AWS. She is passionate about Machine Learning and helps customers translate their ML needs into solutions using AWS AI/ML services. She holds 5 certifications including the ML Specialty certification. In her spare time, she enjoys reading, yoga, and spending time with her daughters.
Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build AI/ML solutions. Mark’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Mark holds six AWS certifications, including the ML Specialty Certification. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services.
Charu Sareen is a Sr. Product Manager for Amazon SageMaker Feature Store. Prior to AWS, she was leading growth and monetization strategy for SaaS services at VMware. She is a data and machine learning enthusiast and has over a decade of experience spanning product management, data engineering, and advanced analytics. She has a bachelor’s degree in Information Technology from National Institute of Technology, India and an MBA from University of Michigan, Ross School of Business.
Frank McQuillan is a Principal Product Manager for Amazon SageMaker Feature Store.
The path to carbon reductions in high-growth economic sectors
Confronting climate change requires the participation of governments, companies, academics, civil-society organizations, and the public.Read More
Add conversational AI to any contact center with Amazon Lex and the Amazon Chime SDK
Customer satisfaction is a potent metric that directly influences the profitability of an organization. With rapid technological advances in the past decade or so, it’s even more important to elevate customer focus in the following ways:
- Making your organization accessible to your customers across multiple modalities, including voice, text, social media, and more
- Providing your customers with a highly efficient post-sales and service experience
- Continuously improving the quality of your service as business trends and dynamics change
Establishing highly efficient contact centers requires significant automation, the ability to scale, and a mechanism of active learning through customer feedback. There is a challenge at every point in the contact center customer journey—from long hold times at the beginning to operational costs associated with long average handle times.
In traditional contact centers, one solution for long hold times is enabling self-service options for customers using an Interactive Voice Response system (IVR). An IVR uses a set of automated menu options to help reduce agent call volumes by addressing common frequently asked requests without involving a live agent. Traditional IVRs, however, typically follow a pre-determined sequence, without the ability to respond intelligently to customer requests. A non-conversational IVR such as this can frustrate your customers and lead them to attempt to contact an agent as soon as possible, which increases your call deflection rates. You can solve for this challenge by adding artificial intelligence (AI) to your IVR. An AI-enabled IVR can more quickly and accurately help your customer resolve issues without human intervention. When an agent is needed, the AI-enabled IVR can route your customer to the correct agent with the correct information already collected, thereby saving the customer from having to repeat the information. With AWS AI services, it’s even easier because there is no machine learning (ML) training or expertise required to use powerful, pre-trained ML models.
AI-powered automated applications are a natural choice for IVRs because they can understand and respond in natural language. Additionally, you can add enhanced capabilities to your IVR to learn and evolve based on how customers interact with it. With Amazon Lex, you can build powerful, multi-lingual conversational AI systems and elevate the self-service experience for your customers with no ML skills required. With the Amazon Chime SDK, you can easily integrate your existing contact center to Amazon Lex using an Amazon Chime SDK SIP media application. This includes contact centers such as Avaya, Cisco, Genesys, and others. Amazon Chime SDK integration with Amazon Lex is available in US East (N. Virginia) and US West (Oregon) AWS Regions.
This allows you the flexibility of native integration with Amazon Lex for AI-powered self-service, and the ability to integrate with a host of other AWS AI services to transform your entire contact center operations.
In this post, we provide a walkthrough of how you can add AI-powered IVRs to any contact center that supports SIP trunking using Amazon Chime SDK and Amazon Lex, via the recently launched Amazon Chime SDK PSTN audio integration with Amazon Lex. We cover the following topics in this post:
- Reference solution architecture for the self-service AI
- Deploying the solution
- Reviewing the Account Balance chatbot
- Reviewing the Amazon Chime SDK Voice Connector
- Testing the solution
- Cleaning up resources
Solution overview
As described in the previous section, we use two key AWS services, Amazon Lex and the Amazon Chime SDK, to build the self-service AI solution. We also use AWS Lambda (a fully managed serverless compute service), Amazon Elastic Compute Cloud (Amazon EC2, a compute infrastructure), and Amazon DynamoDB (a fully managed no SQL database) to create a working example. The code base for this solution is available in the accompanying GitHub repository. Instructions to deploy and test this solution are provided in the next section.
The following diagram illustrates the solution architecture.
The solution workflow consists of the following steps:
- When we make a phone call using a landline or cell phone, the Public Switched Telephone Network (PSTN) connects us to the other party. In this demo, we use an Asterisk server (a free contact center framework) deployed on an Amazon EC2 server to emulate a contact center connected to the PSTN through an Amazon Chime Voice Connector. Asterisk is a software implementation of a private branch exchange (PBX)— a controller of a private telephone network used within a company or organization.
- As part of this demo, a phone number is acquired via the Amazon Chime SDK and associated with the Asterisk PBX. When a call is made to this number, it’s delivered as SIP (Session Initiation Protocol) to the Asterisk PBX server. The Asterisk PBX then routes this call to the Amazon Chime Voice Connector using SIP, where it triggers an Amazon Chime SIP media application.
- Amazon Chime PSTN audio uses a SIP media application to create a programmable VoIP application. The Amazon Chime SIP media application works with a Lambda function to programmatically handle the call.
- When the call arrives at the Amazon Chime SIP media application, the associated Lambda function is invoked. The function stores the call information in a DynamoDB table and returns a
StartBotConversation
action. TheStartBotConversation
action establishes a voice conversation between the end-user on PSTN and the Amazon Lex bot. - Amazon Lex is a fully managed AWS AI service with advanced natural language models to design, build, test, and deploy conversational interfaces in applications. It combines automatic speech recognition and natural language understanding technologies to create a human-like interaction for your applications. As an example, this demo deploys a bot to perform three automated tasks, or intents:
Check Balance
,Transfer Funds
, andOpen Account
. An intent represents an action that the user wants to perform. - The conversation starts with the caller interacting with the Amazon Lex bot by telling the bot what they want to do. The automatic speech recognition (ASR) and natural language understanding (NLU) capabilities of the bot help it understand the user input. Amazon Lex is able to determine the intent requested based on the caller input and sample utterances configured for each intent.
- After the intent is determined, Amazon Lex interacts with the caller to gather information for all the slots configured for that intent. For example, the
Open Account
intent includes four slots:- First Name
- Last Name
- Account Type
- Phone Number
- Amazon Lex works with the caller to capture information for all of these required slots of the selected intent. After these have been captured and the intent has been fulfilled, Amazon Lex returns call processing to the Amazon Chime SIP media application, along with the full results of the Amazon Lex bot conversation.
- The subsequent processing steps are performed by the PSTN audio handler Lambda function. This includes parsing the results, determining the next call route action, storing the results in a DynamoDB table, and returning the hang up action.
- The Asterisk PBX uses the information stored in the DynamoDB table to determine the next action. For example, if the caller wanted to check their balance, the call ends. However, if the caller wanted to open an account, the call is sent to the agent and includes the information captured in the Amazon Lex bot.
We have used AWS Cloud Development Kit (AWS CDK) to package this application for easy deployment in your account. The AWS CDK is an open-source software development framework to define your cloud application resources using familiar programming languages. It provides high-level components called constructs that preconfigure cloud resources with proven defaults, so you can build cloud applications with ease.
Prerequisites
Before we deploy the solution, we need to have an AWS account and a local machine to run the AWS CDK stack. Complete the following steps:
- Log in to your AWS account.
If you don’t have an AWS account, you can sign up for one.For new customers, AWS provides a Free Tier, which provides the ability to explore and try out AWS services free of charge (up to the specified limits for each service). This can help you gain hands-on experience with the AWS platform, products, and services.We use a local machine, such as a laptop or a desktop computer, to deploy the stack using AWS CDK. - Open a new terminal window for MacOS, or putty for Windows OS to install all the prerequisites required to deploy the solution.
- Install the following prerequisite software:
- AWS Command Line Interface (AWS CLI) – A command line tool for interacting with AWS services. For installation instructions, refer to Installing, updating, and uninstalling the AWS CLI.
- Node.js > 16 – Open-source JavaScript backend engine for application development and deployment. For installation instructions, refer to Tutorial: Setting Up Node.js on an Amazon EC2 Instance.
-
Yarn – Yarn is a package manager for your code. It allows easy access to use and share the code between developers. Run the following command to install Yarn:
Now we run the following commands to set up the AWS access keys we need. For more information, refer to Managing access keys for IAM users.
- Run the following command:
- Run the following command:
- Provide the values for your AWS account’s access key ID and secret access key.
- Change the Region name or leave the default Region as it is.
- Accept the default value of JSON for the output format.
Deploy the solution
You can also customize this solution for your requirements. Review the output resources this deployment contains and modify the Lambda function to add the custom business logic you need for your own solution.
Run the following steps in the same terminal to deploy the application:
- Clone the git repository:
- Enter the project directory:
- Deploy the AWS CDK application:
After a few minutes, your stack deployment should be complete. The following screenshot shows the sample output.
- Install the web client SIP phone with the following commands:
Review the Amazon Chime SDK Voice Connector
In this post, we use the Amazon Chime SDK to route the calls received on the Asterisk PBX server (or your existing contact centers) to Amazon Lex. This is done using Amazon Chime SIP PSTN audio and the Amazon Chime Voice Connector. Amazon Chime PSTN audio enables you to create programmable telephony applications using Lambda functions. These Amazon Chime SIP media applications are triggered by either a PSTN phone number or Amazon Chime Voice Connector. The following screenshot shows the SIP rule that is triggered by an Amazon Chime SDK Voice Connector and targets a SIP media application.
Review the Account Balance chatbot
The Amazon Lex bot in this demo includes three intents. These intents can be requested through natural language speech from the caller. For example, the Check Balance
intent is seeded with the following sample utterances.
An intent can require zero or more parameters, which are called slots. We add slots as part of the intent configuration while building the blot. At runtime, Amazon Lex prompts the user for specific slot values. The user must provide values for all required slots before Amazon Lex can fulfill the intent.
For the Check Balance
intent, Amazon Lex prompts for slot data, such as:
After the Amazon Lex bot gathers all the required slot information, it fulfills the intent by invoking the appropriate response. In this case, it queries the account balance related to the account and provides it to the customer.
In this post, we’re using a Lambda function to help initialize, validate, and fulfill the intent. The following is the sample Python code showing how the function handles invocations depending on which intent is being used:
The following is the sample code that explains the code block for the Check Balance
intent in the Lambda function. In this example, we generate a random number as the account balance, but this could be integrated with your existing database to provide accurate caller information.
Test the solution
Let’s walk through the solution by following the path of a single user request:
- Get the phone number from the output after deploying the AWS CDK:
- Dial into the phone number from any PSTN-based phone.
- Now you can try the menu options.
For the Amazon Lex bot to understand the Check Balance
intent, you can speak any of the following utterances:
- What’s the balance in my account?
- Check my account balance?
- I want to check the balance?
Amazon Lex prompts for the slot data that’s required to fulfill this intent. For the Check Balance
intent, Amazon Lex prompts for the account and date of birth:
- For which account would you like to check the balance?
- For verification purposes, what is your data of birth?
After you provide the required information, the bot fulfills the intent and provides the account balance information. The following is a sample output message for the Check Balance
intent: Thank you. The balance on your <account> account is $<balance>
.
- Complete the call by hanging up or being transferred to an agent.
When the conversation with the Amazon Lex bot is complete, the call returns to the SIP media application and associated Lambda function with the results from the bot conversation.
The Amazon Chime SIP media application performs the post-processing steps and returns the call to the Asterisk PBX. For the Open Account
intent, this causes the Asterisk PBX to call an agent using a web client-based SIP phone. The following screenshot shows the dashboard with the agent call information. This call can be answered on the web client to establish two-way audio between the caller and the agent. As shown in the screenshot, the information provided by the caller has been preserved and presented to the agent.
Watch the following video for an example of a partner solution on how to integrate Amazon Lex with Cisco Unified Contact Center using Amazon Chime SDK:
Clean up resources
To clean up the resources used in this demo and avoid incurring further charges, run the following command in the terminal window:
The AWS CloudFormation stack created by the AWS CDK is destroyed, removing all the allocated resources.
Conclusion
In this post, we demonstrated a solution with a reference architecture to add self-service AI to any contact center using Amazon Lex and the Amazon Chime SDK. We showed how the solution works and provided a detailed walkthrough of the code and deployment steps. This solution is meant to be a reference architecture or a quick start guide that you can customize for your own needs.
Give it a whirl and let us know how this solved your use case by leaving feedback in the comments section. For more information, see the project GitHub repository.
About the authors
Prem Ranga is a NLP domain lead and a Sr. AI/ML specialist SA at AWS and an author who frequently publishes blogs, research papers, and recently a NLP text book. When he is not helping customers adopt AWS AI/ML, Prem dabbles with building Simple Beer Service units for AWS offices, running competitive gaming events with DeepRacer & DeepComposer, and educating students, young professionals on career building AI/ML skills. You can follow Prem’s work on LinkedIn.
Court Schuett is the Lead Evangelist for the Amazon Chime SDK with a background in telephony and now loves to build things that build things. Court is focused on teaching developers and non-developers alike how to build with AWS.
Vamshi Krishna Enabothala is a Senior AI/ML Specialist SA at AWS with expertise in big data, analytics, and orchestrating scalable AI/ML architectures for startups and enterprises. Vamshi is focused on Language AI and innovates in building world-class recommender engines. Outside of work, Vamshi is an RC enthusiast, building and playing with RC equipment (planes, cars, and drones), and also enjoys gardening.
Identify the location of anomalies using Amazon Lookout for Vision at the edge without using a GPU
Automated defect detection using computer vision helps improve quality and lower the cost of inspection. Defect detection involves identifying the presence of a defect, classifying types of defects, and identifying where the defects are located. Many manufacturing processes require detection at a low latency, with limited compute resources, and with limited connectivity.
Amazon Lookout for Vision is a machine learning (ML) service that helps spot product defects using computer vision to automate the quality inspection process in your manufacturing lines, with no ML expertise required. Lookout for Vision now includes the ability to provide the location and type of anomalies using semantic segmentation ML models. These customized ML models can either be deployed to the AWS Cloud using cloud APIs or to custom edge hardware using AWS IoT Greengrass. Lookout for Vision now supports inference on an x86 compute platform running Linux with or without an NVIDIA GPU accelerator and on any NVIDIA Jetson-based edge appliance. This flexibility allows detection of defects on existing or new hardware.
In this post, we show you how to detect defective parts using Lookout for Vision ML models running on an edge appliance, which we simulate using an Amazon Elastic Compute Cloud (Amazon EC2) instance. We walk through training the new semantic segmentation models, exporting them as AWS IoT Greengrass components, and running inference in CPU-only mode with Python example code.
Solution overview
In this post, we use a set of pictures of toy aliens composed of normal and defective images such as missing limbs, eyes, or other parts. We train a Lookout for Vision model in the cloud to identify defective toy aliens. We compile the model to a target X86 CPU, package the trained Lookout for Vision model as an AWS IoT Greengrass component, and deploy the model to an EC2 instance without a GPU using the AWS IoT Greengrass console. Finally, we demonstrate a Python-based sample application running on the EC2 (C5a.2xl) instance that sources the toy alien images from the edge device file system, runs the inference on the Lookout for Vision model using the gRPC interface, and sends the inference data to an MQTT topic in the AWS Cloud. The scripts outputs an image that includes the color and location of the defects on the anomalous image.
The following diagram illustrates the solution architecture. It’s important to note for each defect type you want to detect in localization, you must have 10 marked anomaly images in training and 10 in test data, for a total of 20 images of that type. For this post, we search for missing limbs on the toy.
The solution has the following workflow:
- Upload a training dataset and a test dataset to Amazon Simple Storage Service (Amazon S3).
- Use the new Lookout for Vision UI to add an anomaly type and mark where those anomalies are in the training and test images.
- Train a Lookout for Vision model in the cloud.
- Compile the model to the target architecture (X86) and deploy the model to the EC2 (C5a.2xl) instance using the AWS IoT Greengrass console.
- Source images from local disk.
- Run inferences on the deployed model via the gRPC interface and retrieve an image of anomaly masks overlaid on the original image.
- Post the inference results to an MQTT client running on the edge instance.
- Receive the MQTT message on a topic in AWS IoT Core in the AWS Cloud for further monitoring and visualization.
Steps 5, 6, and 7 are coordinated with the sample Python application.
Prerequisites
Before you get started, complete the following prerequisites. For this post, we use an EC2 c5.2xl instance and install AWS IoT Greengrass V2 on it to try out the new features. If you want to run on an NVIDIA Jetson, follow the steps in our previous post, Amazon Lookout for Vision now supports visual inspection of product defects at the edge.
- Create an AWS account.
- Start an EC2 instance that we can install AWS IoT Greengrass on and use the new CPU-only inference mode.You can also use an Intel X86 64 bit machine with 8 gigabytes of ram or more (we use a c5a.2xl, but anything with greater than 8 gigabytes on x86 platform should work) running Ubuntu 20.04.
- Install AWS IoT Greengrass V2:
- Install the needed system and Python 3 dependencies (Ubuntu 20.04):
Upload the dataset and train the model
We use the toy aliens dataset to demonstrate the solution. The dataset contains normal and anomalous images. Here are a few sample images from the dataset.
The following image shows a normal toy alien.
The following image shows a toy alien missing a leg.
The following image shows a toy alien missing a head.
In this post, we look for missing limbs. We use the new user interface to draw a mask around the defects in our training and tests data. This will tell the semantic segmentation models how to identify this type of defect.
- Start by uploading your dataset, either via Amazon S3 or from your computer.
- Sort them into folders titled
normal
andanomaly
. - When creating your dataset, select Automatically attach labels to images based on the folder name.This allows us to sort out the anomalous images later and draw in the areas to be labeled with a defect.
- Try to hold back some images for testing later of both
normal
andanomaly
. - After all the images have been added to the dataset, choose Add anomaly labels.
- Begin labeling data by choosing Start labeling.
- To speed up the process, you can select multiple images and classify them as
Normal
orAnomaly
.
If you want to highlight anomalies in addition to classifying them, you need to highlight where the anomalies are located. - Choose the image you want to annotate.
- Use the drawing tools to show the area where part of the subject is missing, or draw a mask over the defect.
- Choose Submit and close to keep these changes.
- Repeat this process for all your images.
- When you’re done, choose Save to persist your changes. Now you’re ready to train your model.
- Choose Train model.
After you complete these steps, you can navigate to the project and the Models page to check the performance of the trained model. You can start the process of exporting the model to the target edge device any time after the model is trained.
Retrain the model with corrected images
Sometimes the anomaly tagging may not be quite correct. You have the chance to help your model learn your anomalies better. For example, the following image is identified as an anomaly, but doesn’t show the missing_limbs
tag.
Let’s open the editor and fix this.
Go through any images you find like this. If you find it’s tagged an anomaly incorrectly, you can use the eraser tool to remove the incorrect tag.
You can now train your model again and achieve better accuracy.
Compile and package the model as an AWS IoT Greengrass component
In this section, we walk through the steps to compile the toy alien model to our target edge device and package the model as an AWS IoT Greengrass component.
- On the Lookout for Vision console, choose your project.
- In the navigation pane, choose Edge model packages.
- Choose Create model packaging job.
- For Job name, enter a name.
- For Job description, enter an optional description.
- Choose Browse models.
- Select the model version (the toy alien model built in the previous section).
- Choose Choose.
- If you’re running this on Amazon EC2 or an X86-64 device, select Target platform and choose Linux, X86, and CPU.
If using CPU, you can leave the compiler options empty if you’re not sure and don’t have an NVIDIA GPU. If you have an Intel-based platform that supports AVX512, you can add these compiler options to optimize for better performance:{"mcpu": "skylake-avx512"}
.
You can see your job name and status showing as
In progress
. The model packaging job may take a few minutes to complete.When the model packaging job is complete, the status shows asSuccess
. - Choose your job name (in our case it’s
aliensblogcpux86
) to see the job details.
- Choose Create model packaging job.
- Enter the details for Component name, Component description (optional), Component version, and Component location.Lookout for Vision stores the component recipes and artifacts in this Amazon S3 location.
- Choose Continue deployment in Greengrass to deploy the component to the target edge device.
The AWS IoT Greengrass component and model artifacts have been created in your AWS account.
Deploy the model
Be sure you have AWS IoT Greengrass V2 installed on your target device for your account before you continue. For instructions, refer to Install the AWS IoT Greengrass Core software.
In this section, we walk through the steps to deploy the toy alien model to the edge device using the AWS IoT Greengrass console.
- On the AWS IoT Greengrass console, navigate to your edge device.
- Choose Deploy to initiate the deployment steps.
- Select Core device (because the deployment is to a single device) and enter a name for Target name.The target name is the same name you used to name the core device during the AWS IoT Greengrass V2 installation process.
- Choose your component. In our case, the component name is
aliensblogcpux86
, which contains the toy alien model. - Choose Next.
- Configure the component (optional).
- Choose Next.
- Expand Deployment policies.
- For Component update policy, select Notify components.This allows the already deployed component (a prior version of the component) to defer an update until you’re ready to update.
- For Failure handling policy, select Don’t roll back.In case of a failure, this option allows us to investigate the errors in deployment.
- Choose Next.
- Review the list of components that will be deployed on the target (edge) device.
- Choose Next.You should see the message
Deployment successfully created
. - To validate the model deployment was successful, run the following command on your edge device:
You should see a similar output running the aliensblogcpux86
lifecycle startup script:
Components currently running in Greengrass:
Run inferences on the model
Note: If you are running Greengrass as another user than what you are logged in as, you will need to change permissions of the file /tmp/aws.iot.lookoutvision.EdgeAgent.sock
:
We’re now ready to run inferences on the model. On your edge device, run the following command to load the model (replace <modelName> with the model name used in your component):
To generate inferences, run the following command with the source file name (replace <path/to/images> with the path and file name of the image to check and replace <modelName> with the model name used for your component):
The model correctly predicts the image as anomalous (missing_limbs
) with a confidence score of 0.9996867775917053. It tells us the mask of the anomaly tag missing_limbs
and the percentage area. The response also contains bitmap data you can decode of what it found.
Download and open the file blended.png
, which looks like the following image. Note the area highlighted with the defect around the legs.
Customer stories
With AWS IoT Greengrass and Lookout for Vision, you can now automate visual inspection with computer vision for processes like quality control and defect assessment—all on the edge and in real time. You can proactively identify problems such as parts damage (like dents, scratches, or poor welding), missing product components, or defects with repeating patterns on the production line itself—saving you time and money. Customers like Tyson and Baxter are discovering the power of Lookout for Vision to increase quality and reduce operational costs by automating visual inspection.
“Operational excellence is a key priority at Tyson Foods. Predictive maintenance is an essential asset for achieving this objective by continuously improving overall equipment effectiveness (OEE). In 2021, Tyson Foods launched a machine learning-based computer vision project to identify failing product carriers during production to prevent them from impacting team member safety, operations, or product quality. The models trained using Amazon Lookout for Vision performed well. The pin detection model achieved 95% accuracy across both classes. The Amazon Lookout for Vision model was tuned to perform at 99.1% accuracy for failing pin detection. By far the most exciting result of this project was the speedup in development time. Although this project utilizes two models and a more complex application code, it took 12% less developer time to complete. This project for monitoring the condition of the product carriers at Tyson Foods was completed in record time using AWS managed services such as Amazon Lookout for Vision.”
—Audrey Timmerman, Sr Applications Developer, Tyson Foods.
“Latency and inferencing speed is critical for real-time assessment and critical quality checks of our manufacturing processes. Amazon Lookout for Vision edge on a CPU device gives us the ability to achieve this on production-grade equipment, enabling us to deliver cost-effective AI vision solutions at scale.”
—A.K. Karan, Global Senior Director – Digital Transformation, Integrated Supply Chain, Baxter International Inc.
Cleanup
Complete the following steps to remove the assets you created from your account and avoid any ongoing billing:
- On the Lookout for Vision console, navigate to your project.
- On the Actions menu, delete your datasets.
- Delete your models.
- On the Amazon S3 console, empty the buckets you created, then delete the buckets.
- On the Amazon EC2 console, delete the instance you started to run AWS IoT Greengrass.
- On the AWS IoT Greengrass console, choose Deployments in the navigation pane.
- Delete your component versions.
- On the AWS IoT Greengrass console, delete the AWS IoT things, groups, and devices.
Conclusion
In this post, we described a typical scenario for industrial defect detection at the edge using defect localization and deployed to a CPU-only device. We walked through the key components of the cloud and edge lifecycle with an end-to-end example using Lookout for Vision and AWS IoT Greengrass. With Lookout for Vision, we trained an anomaly detection model in the cloud using the toy alien dataset, compiled the model to a target architecture, and packaged the model as an AWS IoT Greengrass component. With AWS IoT Greengrass, we deployed the model to an edge device. We demonstrated a Python-based sample application that sources toy alien images from the edge device local file system, runs the inferences on the Lookout for Vision model at the edge using the gRPC interface, and sends the inference data to an MQTT topic in the AWS Cloud.
In a future post, we will show how to run inferences on a real-time stream of images using a GStreamer media pipeline.
Start your journey towards industrial anomaly detection and identification by visiting the Amazon Lookout for Vision and AWS IoT Greengrass resource pages.
About the authors
Manish Talreja is a Senior Industrial ML Practice Manager with AWS Professional Services. He helps AWS customers achieve their business goals by architecting and building innovative solutions that use AWS ML and IoT services on the AWS Cloud.
Ryan Vanderwerf is a partner solutions architect at Amazon Web Services. He previously provided Java virtual machine-focused consulting and project development as a software engineer at OCI on the Grails and Micronaut team. He was chief architect/director of products at ReachForce, with a focus on software and system architecture for AWS Cloud SaaS solutions for marketing data management. Ryan has built several SaaS solutions in several domains such as financial, media, telecom, and e-learning companies since 1996.
Prakash Krishnan is a Senior Software Development Manager at Amazon Web Services. He leads the engineering teams that are building large-scale distributed systems to apply fast, efficient, and highly scalable algorithms to deep learning-based image and video recognition problems.
Fine-tune and deploy a summarizer model using the Hugging Face Amazon SageMaker containers bringing your own script
There have been many recent advancements in the NLP domain. Pre-trained models and fully managed NLP services have democratised access and adoption of NLP. Amazon Comprehend is a fully managed service that can perform NLP tasks like custom entity recognition, topic modelling, sentiment analysis and more to extract insights from data without the need of any prior ML experience.
Last year, AWS announced a partnership with Hugging Face to help bring natural language processing (NLP) models to production faster. Hugging Face is an open-source AI community, focused on NLP. Their Python-based library (Transformers) provides tools to easily use popular state-of-the-art Transformer architectures like BERT, RoBERTa, and GPT. You can apply these models to a variety of NLP tasks, such as text classification, information extraction, and question answering, among others.
Amazon SageMaker is a fully managed service that provides developers and data scientists the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the ML process, making it easier to develop high-quality models. The SageMaker Python SDK provides open-source APIs and containers to train and deploy models on SageMaker, using several different ML and deep learning frameworks.
The Hugging Face integration with SageMaker allows you to build Hugging Face models at scale on your own domain-specific use cases.
In this post, we walk you through an example of how to build and deploy a custom Hugging Face text summarizer on SageMaker. We use Pegasus [1] for this purpose, the first Transformer-based model specifically pre-trained on an objective tailored for abstractive text summarization. BERT is pre-trained on masking random words in a sentence; in contrast, during Pegasus’s pre-training, sentences are masked from an input document. The model then generates the missing sentences as a single output sequence using all the unmasked sentences as context, creating an executive summary of the document as a result.
Thanks to the flexibility of the HuggingFace library, you can easily adapt the code shown in this post for other types of transformer models, such as t5, BART, and more.
Load your own dataset to fine-tune a Hugging Face model
To load a custom dataset from a CSV file, we use the load_dataset
method from the Transformers package. We can apply tokenization to the loaded dataset using the datasets.Dataset.map
function. The map
function iterates over the loaded dataset and applies the tokenize function to each example. The tokenized dataset can then be passed to the trainer for fine-tuning the model. See the following code:
Build your training script for the Hugging Face SageMaker estimator
As explained in the post AWS and Hugging Face collaborate to simplify and accelerate adoption of Natural Language Processing models, training a Hugging Face model on SageMaker has never been easier. We can do so by using the Hugging Face estimator from the SageMaker SDK.
The following code snippet fine-tunes Pegasus on our dataset. You can also find many sample notebooks that guide you through fine-tuning different types of models, available directly in the transformers GitHub repository. To enable distributed training, we can use the Data Parallelism Library in SageMaker, which has been built into the HuggingFace Trainer API. To enable data parallelism, we need to define the distribution
parameter in our Hugging Face estimator.
The maximum training batch size you can configure depends on the model size and the GPU memory of the instance used. If SageMaker distributed training is enabled, the total batch size is the sum of every batch that is distributed across each device/GPU. If we use an ml.g4dn.16xlarge with distributed training instead of an ml.g4dn.xlarge instance, we have eight times (8 GPUs) as much memory as a ml.g4dn.xlarge instance (1 GPU). The batch size per device remains the same, but eight devices are training in parallel.
As usual with SageMaker, we create a train.py
script to use with Script Mode and pass hyperparameters for training. The following code snippet for Pegasus loads the model and trains it using the Transformers Trainer
class:
The full code is available on GitHub.
Deploy the trained Hugging Face model to SageMaker
Our friends at Hugging Face have made inference on SageMaker for Transformers models simpler than ever thanks to the SageMaker Hugging Face Inference Toolkit. You can directly deploy the previously trained model by simply setting up the environment variable "HF_TASK":"summarization"
(for instructions, see Pegasus Models), choosing Deploy, and then choosing Amazon SageMaker, without needing to write an inference script.
However, if you need some specific way to generate or postprocess predictions, for example generating several summary suggestions based on a list of different text generation parameters, writing your own inference script can be useful and relatively straightforward:
As shown in the preceding code, such an inference script for HuggingFace on SageMaker only needs the following template functions:
-
model_fn() – Reads the content of what was saved at the end of the training job inside
SM_MODEL_DIR
, or from an existing model weights directory saved as a tar.gz file in Amazon Simple Storage Service (Amazon S3). It’s used to load the trained model and associated tokenizer. - input_fn() – Formats the data received from a request made to the endpoint.
-
predict_fn() – Calls the output of
model_fn()
(the model and tokenizer) to run inference on the output ofinput_fn()
(the formatted data).
Optionally, you can create an output_fn()
function for inference formatting, using the output of predict_fn()
, which we didn’t demonstrate in this post.
We can then deploy the trained Hugging Face model with its associated inference script to SageMaker using the Hugging Face SageMaker Model class:
Test the deployed model
For this demo, we trained the model on the Women’s E-Commerce Clothing Reviews dataset, which contains reviews of clothing articles (which we consider as the input text) and their associated titles (which we consider as summaries). After we remove articles with missing titles, the dataset contains 19,675 reviews. Fine-tuning the Pegasus model on a training set containing 70% of those articles for five epochs took approximately 3.5 hours on an ml.p3.16xlarge instance.
We can then deploy the model and test it with some example data from the test set. The following is an example review describing a sweater:
Thanks to our custom inference script hosted in a SageMaker endpoint, we can generate several summaries for this review with different text generation parameters. For example, we can ask the endpoint to generate a range of very short to moderately long summaries specifying different length penalties (the smaller the length penalty, the shorter the generated summary). The following are some parameter input examples, and the subsequent machine-generated summaries:
Which summary do you prefer? The first generated title captures all the important facts about the review, with a quarter the number of words. In contrast, the last one only uses three words (less than 1/10th the length of the original review) to focus on the most important feature of the sweater.
Conclusion
You can fine-tune a text summarizer on your custom dataset and deploy it to production on SageMaker with this simple example available on GitHub. Additional sample notebooks to train and deploy Hugging Face models on SageMaker are also available.
As always, AWS welcomes feedback. Please submit any comments or questions.
References
[1] PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
About the authors
Viktor Malesevic is a Machine Learning Engineer with AWS Professional Services, passionate about Natural Language Processing and MLOps. He works with customers to develop and put challenging deep learning models to production on AWS. In his spare time, he enjoys sharing a glass of red wine and some cheese with friends.
Aamna Najmi is a Data Scientist with AWS Professional Services. She is passionate about helping customers innovate with Big Data and Artificial Intelligence technologies to tap business value and insights from data. In her spare time, she enjoys gardening and traveling to new places.
Team and user management with Amazon SageMaker and AWS SSO
Amazon SageMaker Studio is a web-based integrated development environment (IDE) for machine learning (ML) that lets you build, train, debug, deploy, and monitor your ML models. Each onboarded user in Studio has their own dedicated set of resources, such as compute instances, a home directory on an Amazon Elastic File System (Amazon EFS) volume, and a dedicated AWS Identity and Access Management (IAM) execution role.
One of the most common real-world challenges in setting up user access for Studio is how to manage multiple users, groups, and data science teams for data access and resource isolation.
Many customers implement user management using federated identities with AWS Single Sign-On (AWS SSO) and an external identity provider (IdP), such as Active Directory (AD) or AWS Managed Microsoft AD directory. It’s aligned with the AWS recommended practice of using temporary credentials to access AWS accounts.
An Amazon SageMaker domain supports AWS SSO and can be configured in AWS SSO authentication mode. In this case, each entitled AWS SSO user has their own Studio user profile. Users given access to Studio have a unique sign-in URL that directly opens Studio, and they sign in with their AWS SSO credentials. Organizations manage their users in AWS SSO instead of the SageMaker domain. You can assign multiple users access to the domain at the same time. You can use Studio user profiles for each user to define their security permissions in Studio notebooks via an IAM role attached to the user profile, called an execution role. This role controls permissions for SageMaker operations according to its IAM permission policies.
In AWS SSO authentication mode, there is always one-to-one mapping between users and user profiles. The SageMaker domain manages the creation of user profiles based on the AWS SSO user ID. You can’t create user profiles via the AWS Management Console. This works well in the case when one user is a member of only one data science team or if users have the same or very similar access requirements across their projects and teams. In a more common use case, when a user can participate in multiple ML projects and be a member of multiple teams with slightly different permission requirements, the user requires access to different Studio user profiles with different execution roles and permission policies. Because you can’t manage user profiles independently of AWS SSO in AWS SSO authentication mode, you can’t implement a one-to-many mapping between users and Studio user profiles.
If you need to establish a strong separation of security contexts, for example for different data categories, or need to entirely prevent the visibility of one group of users’ activity and resources to another, the recommended approach is to create multiple SageMaker domains. At the time of this writing, you can create only one domain per AWS account per Region. To implement the strong separation, you can use multiple AWS accounts with one domain per account as a workaround.
The second challenge is to restrict access to the Studio IDE to only users from inside a corporate network or a designated VPC. You can achieve this by using IAM-based access control policies. In this case, the SageMaker domain must be configured with IAM authentication mode, because the IAM identity-based polices aren’t supported by the sign-in mechanism in AWS SSO mode. The post Secure access to Amazon SageMaker Studio with AWS SSO and a SAML application solves this challenge and demonstrates how to control network access to a SageMaker domain.
This solution addresses these challenges of AWS SSO user management for Studio for a common use case of multiple user groups and a many-to-many mapping between users and teams. The solution outlines how to use a custom SAML 2.0 application as the mechanism to trigger the user authentication for Studio and support multiple Studio user profiles per one AWS SSO user.
You can use this approach to implement a custom user portal with applications backed by the SAML 2.0 authorization process. Your custom user portal can have maximum flexibility on how to manage and display user applications. For example, the user portal can show some ML project metadata to facilitate identifying an application to access.
You can find the solution’s source code in our GitHub repository.
Solution overview
The solution implements the following architecture.
The main high-level architecture components are as follows:
- Identity provider – Users and groups are managed in an external identity source, for example in Azure AD. User assignments to AD groups define what permissions a particular user has and which Studio team they have access to. The identity source must by synchronized with AWS SSO.
- AWS SSO – AWS SSO manages SSO users, SSO permission sets, and applications. This solution uses a custom SAML 2.0 application to provide access to Studio for entitled AWS SSO users. The solution also uses SAML attribute mapping to populate the SAML assertion with specific access-relevant data, such as user ID and user team. Because the solution creates a SAML API, you can use any IdP supporting SAML assertions to create this architecture. For example, you can use Okta or even your own web application that provides a landing page with a user portal and applications. For this post, we use AWS SSO.
- Custom SAML 2.0 applications – The solution creates one application per Studio team and assigns one or multiple applications to a user or a user group based on entitlements. Users can access these applications from within their AWS SSO user portal based on assigned permissions. Each application is configured with the Amazon API Gateway endpoint URL as its SAML backend.
- SageMaker domain – The solution provisions a SageMaker domain in an AWS account and creates a dedicated user profile for each combination of AWS SSO user and Studio team the user is assigned to. The domain must be configured in IAM authentication mode.
- Studio user profiles – The solution automatically creates a dedicated user profile for each user-team combination. For example, if a user is a member of two Studio teams and has corresponding permissions, the solution provisions two separate user profiles for this user. Each profile always belongs to one and only one user. Because you have a Studio user profile for each possible combination of a user and a team, you must consider your account limits for user profiles before implementing this approach. For example, if your limit is 500 user profiles, and each user is a member of two teams, you consume that limit 2.5 times faster, and as a result you can onboard 250 users. With a high number of users, we recommend implementing multiple domains and accounts for security context separation. To demonstrate the proof of concept, we use two users, User 1 and User 2, and two Studio teams, Team 1 and Team 2. User 1 belongs to both teams, whereas User 2 belongs to Team 2 only. User 1 can access Studio environments for both teams, whereas User 2 can access only the Studio environment for Team 2.
- Studio execution roles – Each Studio user profile uses a dedicated execution role with permission polices with the required level of access for the specific team the user belongs to. Studio execution roles implement an effective permission isolation between individual users and their team roles. You manage data and resource access for each role and not at an individual user level.
The solution also implements an attribute-based access control (ABAC) using SAML 2.0 attributes, tags on Studio user profiles, and tags on SageMaker execution roles.
In this particular configuration, we assume that AWS SSO users don’t have permissions to sign in to the AWS account and don’t have corresponding AWS SSO-controlled IAM roles in the account. Each user signs in to their Studio environment via a presigned URL from an AWS SSO portal without the need to go to the console in their AWS account. In a real-world environment, you might need to set up AWS SSO permission sets for users to allow the authorized users to assume an IAM role and to sign in to an AWS account. For example, you can provide data scientist role permissions for a user to be able to interact with account resources and have the level of access they need to fulfill their role.
Solution architecture and workflow
The following diagram presents the end-to-end sign-in flow for an AWS SSO user.
An AWS SSO user chooses a corresponding Studio application in their AWS SSO portal. AWS SSO prepares a SAML assertion (1) with configured SAML attribute mappings. A custom SAML application is configured with the API Gateway endpoint URL as its Assertion Consumer Service (ACS), and needs mapping attributes containing the AWS SSO user ID and team ID. We use ssouserid
and teamid
custom attributes to send all needed information to the SAML backend.
The API Gateway calls an SAML backend API. An AWS Lambda function (2) implements the API, parses the SAML response to extract the user ID and team ID. The function uses them to retrieve a team-specific configuration, such as an execution role and SageMaker domain ID. The function checks if a required user profile exists in the domain, and creates a new one with the corresponding configuration settings if no profile exists. Afterwards, the function generates a Studio presigned URL for a specific Studio user profile by calling CreatePresignedDomainUrl API (3) via a SageMaker API VPC endpoint. The Lambda function finally returns the presigned URL with HTTP 302 redirection response to sign the user in to Studio.
The solution implements a non-production sample version of an SAML backend. The Lambda function parses the SAML assertion and uses only attributes in the <saml2:AttributeStatement>
element to construct a CreatePresignedDomainUrl
API call. In your production solution, you must use a proper SAML backend implementation, which must include a validation of an SAML response, a signature, and certificates, replay and redirect prevention, and any other features of an SAML authentication process. For example, you can use a python3-saml SAML backend implementation or OneLogin open-source SAML toolkit to implement a secure SAML backend.
Dynamic creation of Studio user profiles
The solution automatically creates a Studio user profile for each user-team combination, as soon as the AWS SSO sign-in process requests a presigned URL. For this proof of concept and simplicity, the solution creates user profiles based on the configured metadata in the AWS SAM template:
You can configure own teams, custom settings, and tags by adding them to the metadata configuration for the AWS CloudFormation resource GetUserProfileMetadata
.
For more information on configuration elements of UserSettings
, refer to create_user_profile in boto3.
IAM roles
The following diagram shows the IAM roles in this solution.
The roles are as follows:
- Studio execution role – A Studio user profile uses a dedicated Studio execution role with data and resource permissions specific for each team or user group. This role can also use tags to implement ABAC for data and resource access. For more information, refer to SageMaker Roles.
-
SAML backend Lambda execution role – This execution role contains permission to call the
CreatePresignedDomainUrl
API. You can configure the permission policy to include additional conditional checks usingCondition
keys. For example, to allow access to Studio only from a designated range of IP addresses within your private corporate network, use the following code:For more examples on how to use conditions in IAM policies, refer to Control Access to the SageMaker API by Using Identity-based Policies.
- SageMaker – SageMaker assumes the Studio execution role on your behalf, as controlled by a corresponding trust policy on the execution role. This allows the service to access data and resources, and perform actions on your behalf. The Studio execution role must contain a trust policy allowing SageMaker to assume this role.
- AWS SSO permission set IAM role – You can assign your AWS SSO users to AWS accounts in your AWS organization via AWS SSO permission sets. A permission set is a template that defines a collection of user role-specific IAM policies. You manage permission sets in AWS SSO, and AWS SSO controls the corresponding IAM roles in each account.
- AWS Organizations Service Control Policies – If you use AWS Organizations, you can implement Service Control Policies (SCPs) to centrally control the maximum available permissions for all accounts and all IAM roles in your organization. For example, to centrally prevent access to Studio via the console, you can implement the following SCP and attach it to the accounts with the SageMaker domain:
Solution provisioned roles
The AWS CloudFormation stack for this solution creates three Studio execution roles used in the SageMaker domain:
SageMakerStudioExecutionRoleDefault
SageMakerStudioExecutionRoleTeam1
SageMakerStudioExecutionRoleTeam2
None of the roles have the AmazonSageMakerFullAccess policy attached, and each has only a limited set of permissions. In your real-world SageMaker environment, you need to amend the role’s permissions based on your specific requirements.
SageMakerStudioExecutionRoleDefault
has only the custom policy SageMakerReadOnlyPolicy
attached with a restrictive list of allowed actions.
Both team roles, SageMakerStudioExecutionRoleTeam1
and SageMakerStudioExecutionRoleTeam2
, additionally have two custom polices, SageMakerAccessSupportingServicesPolicy
and SageMakerStudioDeveloperAccessPolicy
, allowing usage of particular services and one deny-only policy, SageMakerDeniedServicesPolicy
, with explicit deny on some SageMaker API calls.
The Studio developer access policy enforces the setting of the Team
tag equal to the same value as the user’s own execution role for calling any SageMaker Create*
API:
Furthermore, it allows using delete, stop, update, and start operations only on resources tagged with the same Team tag as the user’s execution role:
For more information on roles and polices, refer to Configuring Amazon SageMaker Studio for teams and groups with complete resource isolation.
Network infrastructure
The solution implements a fully isolated SageMaker domain environment with all network traffic going through AWS PrivateLink connections. You may optionally enable internet access from the Studio notebooks. The solution also creates three VPC security groups to control traffic between all solution components such as the SAML backend Lambda function, VPC endpoints, and Studio notebooks.
For this proof of concept and simplicity, the solution creates a SageMaker subnet in a single Availability Zone. For your production setup, you must use multiple private subnets across multiple Availability Zones and ensure that each subnet is appropriately sized, assuming minimum five IPs per user.
This solution provisions all required network infrastructure. The CloudFormation template ./cfn-templates/vpc.yaml contains the source code.
Deployment steps
To deploy and test the solution, you must complete the following steps:
- Deploy the solution’s stack via an AWS Serverless Application Model (AWS SAM) template.
- Create AWS SSO users, or use existing AWS SSO users.
- Create custom SAML 2.0 applications and assign AWS SSO users to the applications.
The full source code for the solution is provided in our GitHub repository.
Prerequisites
To use this solution, the AWS Command Line Interface (AWS CLI), AWS SAM CLI, and Python3.8 or later must be installed.
The deployment procedure assumes that you enabled AWS SSO and configured for the AWS Organizations in the account where the solution is deployed.
To set up AWS SSO, refer to the instructions in GitHub.
Solution deployment options
You can choose from several solution deployment options to have the best fit for your existing AWS environment. You can also select the network and SageMaker domain provisioning options. For detailed information about the different deployment choices, refer to the README file.
Deploy the AWS SAM template
To deploy the AWS SAM template, complete the following steps:
- Clone the source code repository to your local environment:
- Build the AWS SAM application:
- Deploy the application:
- Provide stack parameters according to your existing environment and desired deployment options, such as existing VPC, existing private and public subnets, and existing SageMaker domain, as discussed in the Solution deployment options chapter of the README file.
You can leave all parameters at their default values to provision new network resources and a new SageMaker domain. Refer to detailed parameter usage in the README file if you need to change any default settings.
Wait until the stack deployment is complete. The end-to-end deployment including provisioning all network resources and a SageMaker domain takes about 20 minutes.
To see the stack output, run the following command in the terminal:
Create SSO users
Follow the instructions to add AWS SSO users to create two users with names User1 and User2 or use any two of your existing AWS SSO users to test the solution. Make sure you use AWS SSO in the same AWS Region in which you deployed the solution.
Create custom SAML 2.0 applications
To create the required custom SAML 2.0 applications for Team 1 and for Team 2, complete the following steps:
- Open the AWS SSO console in the AWS management account of your AWS organization, in the same Region where you deployed the solution stack.
- Choose Applications in the navigation pane.
- Choose Add a new application.
- Choose Add a custom SAML 2.0 application.
- For Display name, enter an application name, for example
SageMaker Studio Team 1
. - Leave Application start URL and Relay state empty.
- Choose If you don’t have a metadata file, you can manually enter your metadata values.
- For Application ACS URL, enter the URL provided in the
SAMLBackendEndpoint
key of the AWS SAM stack output. - For Application SAML audience, enter the URL provided in the
SAMLAudience
key of the AWS SAM stack output. - Choose Save changes.
- Navigate to the Attribute mappings tab.
- Set the Subject to email and Format to emailAddress.
- Add the following new attributes:
- Choose Save changes.
- On the Assigned users tab, choose Assign users.
- Choose User 1 for the Team 1 application and both User 1 and User 2 for the Team 2 application.
- Choose Assign users.
Test the solution
To test the solution, complete the following steps:
- Go to AWS SSO user portal
https://<Identity Store ID>.awsapps.com/start
and sign as User 1.
Two SageMaker applications are shown in the portal.
- Choose SageMaker Studio Team 1.
You’re redirected to the Studio instance for Team 1 in a new browser window.
The first time you start Studio, SageMaker creates a JupyterServer application. This process takes few minutes.
- In Studio, on the File menu, choose New and Terminal to start a new terminal.
- In the terminal command line, enter the following command:
The command returns the Studio execution role.
In our setup, this role must be different for each team. You can also check that each user in each instance of Studio has their own home directory on a mounted Amazon EFS volume.
- Return to the AWS SSO portal, still logged as User 1, and choose SageMaker Studio Team 2.
You’re redirected to a Team 2 Studio instance.
The start process can again take several minutes, because SageMaker starts a new JupyterServer application for User 2.
- Sign as User 2 in the AWS SSO portal.
User 2 has only one application assigned: SageMaker Studio Team 2.
If you start an instance of Studio via this user application, you can verify that it uses the same SageMaker execution role as User 1’s Team 2 instance. However, each Studio instance is completely isolated. User 2 has their own home directory on an Amazon EFS volume and own instance of JupyterServer application. You can verify this by creating a folder and some files for each of the users and see that each user’s home directory is isolated.
Now you can sign in to the SageMaker console and see that there are three user profiles created.
You just implemented a proof of concept solution to manage multiple users and teams with Studio.
Clean up
To avoid charges, you must remove all project-provisioned and generated resources from your AWS account. Use the following SAM CLI command to delete the solution CloudFormation stack:
For security reasons and to prevent data loss, the Amazon EFS mount and the content associated with the Studio domain deployed in this solution are not deleted. The VPC and subnets associated with the SageMaker domain remain in your AWS account. For instructions to delete the file system and VPC, refer to Deleting an Amazon EFS file system and Work with VPCs, respectively.
To delete the custom SAML application, complete the following steps:
- Open the AWS SSO console in the AWS SSO management account.
- Choose Applications.
- Select SageMaker Studio Team 1.
- On the Actions menu, choose Remove.
- Repeat these steps for SageMaker Studio Team 2.
Conclusion
This solution demonstrated how you can create a flexible and customizable environment using AWS SSO and Studio user profiles to support your own organization structure. The next possible improvement steps towards a production-ready solution could be:
- Implement automated Studio user profile management as a dedicated microservice to support an automated profile provisioning workflow and to handle metadata and configuration for user profiles, for example in Amazon DynamoDB.
- Use the same mechanism in a more general case of multiple SageMaker domains and multiple AWS accounts. The same SAML backend can vend a corresponding presigned URL redirecting to a user profile-domain-account combination according to your custom logic based on user entitlements and team setup.
- Implement a synchronization mechanism between your IdP and AWS SSO and automate creation of custom SAML 2.0 applications.
- Implement scalable data and resource access management with attribute-based access control (ABAC).
If you have any feedback or questions, please leave them in the comments.
Further reading
Documentation
Blog posts
- Onboarding Amazon SageMaker Studio with AWS SSO and Okta Universal Directory
- Configuring Amazon SageMaker Studio for teams and groups with complete resource isolation
- Secure access to Amazon SageMaker Studio with AWS SSO and a SAML application
About the Author
Yevgeniy Ilyin is a Solutions Architect at AWS. He has over 20 years of experience working at all levels of software development and solutions architecture and has used programming languages from COBOL and Assembler to .NET, Java, and Python. He develops and codes cloud native solutions with a focus on big data, analytics, and data engineering.