Two Alexa AI papers present novel methodologies that use vision and language understanding to improve embodied task completion in simulated environments.Read More
How CCC Intelligent Solutions created a custom approach for hosting complex AI models using Amazon SageMaker
This post is co-written by Christopher Diaz, Sam Kinard, Jaime Hidalgo and Daniel Suarez from CCC Intelligent Solutions.
In this post, we discuss how CCC Intelligent Solutions (CCC) combined Amazon SageMaker with other AWS services to create a custom solution capable of hosting the types of complex artificial intelligence (AI) models envisioned. CCC is a leading software-as-a-service (SaaS) platform for the multi-trillion-dollar property and casualty insurance economy powering operations for insurers, repairers, automakers, part suppliers, lenders, and more. CCC cloud technology connects more than 30,000 businesses digitizing mission-critical workflows, commerce, and customer experiences. A trusted leader in AI, Internet of Things (IoT), customer experience, and network and workflow management, CCC delivers innovations that keep people’s lives moving forward when it matters most.
The challenge
CCC processes more than $1 trillion claims transactions annually. As the company continues to evolve to integrate AI into its existing and new product catalog, this requires sophisticated approaches to train and deploy multi-modal machine learning (ML) ensemble models for solving complex business needs. These are a class of models that encapsulate proprietary algorithms and subject matter domain expertise that CCC has honed over the years. These models should be able to ingest new layers of nuanced data and customer rules to create single prediction outcomes. In this blog post, we will learn how CCC leveraged Amazon SageMaker hosting and other AWS services to deploy or host multiple multi-modal models into an ensemble inference pipeline.
As shown in the following diagram, an ensemble is a collection of two or more models that are orchestrated to run in a linear or nonlinear fashion to produce a single prediction. When stacked linearly, the individual models of an ensemble can be directly invoked for predictions and later consolidated for unification. At times, ensemble models can also be implemented as a serial inference pipeline.
For our use case, the ensemble pipeline is strictly nonlinear, as depicted in the following diagram. Nonlinear ensemble pipelines are theoretically directly acyclic graphs (DAGs). For our use case, this DAG pipeline had both independent models that are run in parallel (Services B, C) and other models that use predictions from previous steps (Service D).
A practice that comes out of the research-driven culture at CCC is the continuous review of technologies that can be leveraged to bring more value to customers. As CCC faced this ensemble challenge, leadership launched a proof-of-concept (POC) initiative to thoroughly assess the offerings from AWS to discover, specifically, whether Amazon SageMaker and other AWS tools could manage the hosting of individual AI models in complex, nonlinear ensembles.
Ensemble explained: In this context, an ensemble is a group of 2 or more AI models that work together to produce 1 overall prediction.
Questions driving the research
Can Amazon SageMaker be used to host complex ensembles of AI models that work together to provide one overall prediction? If so, can SageMaker offer other benefits out of the box, such as increased automation, reliability, monitoring, automatic scaling, and cost-saving measures?
Finding alternative ways to deploy CCC’s AI models using the technological advancements from cloud providers will allow CCC to bring AI solutions to market faster than its competition. Additionally, having more than one deployment architecture provides flexibility when finding the balance between cost and performance based on business priorities.
Based on our requirements, we finalized the following list of features as a checklist for a production-grade deployment architecture:
- Support for complex ensembles
- Guaranteed uptime for all components
- Customizable automatic scaling for deployed AI models
- Preservation of AI model input and output
- Usage metrics and logs for all components
- Cost-saving mechanisms
With a majority of CCC’s AI solutions relying on computer vision models, a new architecture was required to support image and video files that continue to increase in resolution. There was a strong need to design and implement this architecture as an asynchronous model.
After cycles of research and initial benchmarking efforts, CCC determined SageMaker was a perfect fit to meet a majority of their production requirements, especially the guaranteed uptime SageMaker provides for most of its inference components. The default feature of Amazon SageMaker Asynchronous Inference endpoints saving input/output in Amazon S3 simplifies the task of preserving data generated from complex ensembles. Additionally, with each AI model being hosted by its own endpoint, managing automatic scaling policies at the model or endpoint level becomes easier. By simplifying the management, a potential cost-saving benefit from this is development teams can allocate more time towards fine-tuning scaling policies to minimize over-provisioning of compute resources.
Having decided to proceed with using SageMaker as the pivotal component of the architecture, we also realized SageMaker can be part of an even larger architecture, supplemented with many other serverless AWS-managed services. This choice was needed to facilitate the higher-order orchestration and observability needs of this complex architecture.
Firstly, to remove payload size limitations and greatly reduce timeout risk during high-traffic scenarios, CCC implemented an architecture that runs predictions asynchronously using SageMaker Asynchronous Inference endpoints coupled with other AWS-managed services as the core building blocks. Additionally, the user interface for the system follows the fire-and-forget design pattern. In other words, once a user has uploaded their input to the system, nothing more needs to be done. They will be notified when the prediction is available. The figure below illustrates a high-level overview of our asynchronous event-driven architecture. In the upcoming section, let us do a deep dive into the execution flow of the designed architecture.
Step-by-step solution
Step 1
A client makes a request to the AWS API Gateway endpoint. The content of the request contains the name of the AI service from which they need a prediction and the desired method of notification.
This request is passed to a Lambda function called New Prediction, whose main tasks are to:
- Check if the requested service by the client is available.
- Assign a unique prediction ID to the request. This prediction ID can be used by the user to check the status of the prediction throughout the entire process.
- Generate an Amazon S3 pre-signed URL that the user will need to use in the next step to upload the input content of the prediction request.
- Create an entry in Amazon DynamoDB with the information of the received request.
The Lambda function will then return a response through the API Gateway endpoint with a message that includes the prediction ID assigned to the request and the Amazon S3 pre-signed URL.
Step 2
The client securely uploads the prediction input content to an S3 bucket using the pre-signed URL generated in the previous step. Input content depends on the AI service and can be composed of images, tabular data, or a combination of both.
Step 3
The S3 bucket is configured to trigger an event when the user uploads the input content. This notification is sent to an Amazon SQS queue and handled by a Lambda function called Process Input. The Process Input Lambda will obtain the information related to that prediction ID from DynamoDB to get the name of the service to which the request is to be made.
This service can either be a single AI model, in which case the Process Input Lambda will make a request to the SageMaker endpoint that hosts that model (Step 3-A), or it can be an ensemble AI service in which case the Process Input Lambda will make a request to the state machine of the step functions that hosts the ensemble logic (Step 3-B).
In either option (single AI model or ensemble AI service), when the final prediction is ready, it will be stored in the appropriate S3 bucket, and the caller will be notified via the method specified in Step 1 (more details about notifications in Step 4).
Step 3-A
If the prediction ID is associated to a single AI model, the Process Input Lambda will make a request to the SageMaker endpoint that serves the model. In this system, two types of SageMaker endpoints are supported:
- Asynchronous: The Process Input Lambda makes the request to the SageMaker asynchronous endpoint. The immediate response includes the S3 location where SageMaker will save the prediction output. This request is asynchronous, following the fire-and-forget pattern, and does not block the execution flow of the Lambda function.
- Synchronous: The Process Input Lambda makes the request to the SageMaker synchronous endpoint. Since it is a synchronous request, Process Input waits for the response, and once obtained, it stores it in S3 in an analogous way that SageMaker asynchronous endpoints would do.
In both cases (synchronous or asynchronous endpoints), the prediction is processed in an equivalent way, storing the output in an S3 bucket. When the asynchronous SageMaker endpoint completes a prediction, an Amazon SNS event is triggered. This behavior is also replicated for synchronous endpoints with additional logic in the Lambda function.
Step 3-B
If the prediction ID is associated with an AI ensemble, the Process Input Lambda will make the request to the step function associated to that AI Ensemble. As mentioned above, an AI Ensemble is an architecture based on a group of AI models working together to generate a single overall prediction. The orchestration of an AI ensemble is done through a step function.
The step function has one step per AI service that comprises the ensemble. Each step will invoke a Lambda function that will prepare its corresponding AI service’s input using different combinations of the output content from previous AI service calls of previous steps. It then makes a call to each AI service which in this context, can wither be a single AI model or another AI ensemble.
The same Lambda function, called GetTransformCall used to handle the intermediate predictions of an AI Ensemble is used throughout the step function, but with different input parameters for each step. This input includes the name of the AI service to be called. It also includes the mapping definition to construct the input for the specified AI service. This is done using a custom syntax that the Lambda can decode, which in summary, is a JSON dictionary where the values should be replaced with the content from the previous AI predictions. The Lambda will download these previous predictions from Amazon S3.
In each step, the GetTransformCall Lambda reads from Amazon S3 the previous outputs that are needed to build the input of the specified AI service. It will then invoke the New Prediction Lambda code previously used in Step 1 and provide the service name, callback method (“step function”), and token needed for the callback in the request payload, which is then saved in DynamoDB as a new prediction record. The Lambda also stores the created input of that stage in an S3 bucket. Depending on whether that stage is a single AI model or an AI ensemble, the Lambda makes a request to a SageMaker endpoint or a different step function that manages an AI ensemble that is a dependency of the parent ensemble.
Once the request is made, the step function enters a pending state until it receives the callback token indicating it can move to the next stage. The action of sending a callback token is performed by a Lambda function called notifications (more details in Step 4) when the intermediate prediction is ready. This process is repeated for each stage defined in the step function until the final prediction is ready.
Step 4
When a prediction is ready and stored in the S3 bucket, an SNS notification is triggered. This event can be triggered in different ways depending on the flow:
- Automatically when a SageMaker asynchronous endpoint completes a prediction.
- As the very last step of the step function.
- By Process Input or GetTransformCall Lambda when a synchronous SageMaker endpoint has returned a prediction.
For B and C, we create an SNS message similar to what A automatically sends.
A Lambda function called notifications is subscribed to this SNS topic. The notifications Lambda will get the information related to the prediction ID from DynamoDB, update the entry with status value to “completed” or “error,” and perform the necessary action depending on the callback mode saved in the database record.
If this prediction is an intermediate prediction of an AI ensemble, as described in step 3-B, the callback mode associated to this prediction will be “step function,” and the database record will have a callback token associated with the specific step in the step function. The notifications Lambda will make a call to the AWS Step Functions API using the method “SendTaskSuccess” or “SendTaskFailure.” This will allow the step function to continue to the next step or exit.
If the prediction is the final output of the step function and the callback mode is “Webhook” [or email, message brokers (Kafka), etc.], then the notifications Lambda will notify the client in the specified way. At any point, the user can request the status of their prediction. The request must include the prediction ID that was assigned in Step 1 and point to the correct URL within API Gateway to route the request to the Lambda function called results.
The results Lambda will make a request to DynamoDB, obtaining the status of the request and returning the information to the user. If the status of the prediction is error, then the relevant details on the failure will be included in the response. If the prediction status is success, an S3 pre-signed URL will be returned for the user to download the prediction content.
Outcomes
Preliminary performance testing results are promising and support the case for CCC to extend the implementation of this new deployment architecture.
Notable observations:
- Tests reveal strength in processing batch or concurrent requests with high throughput and a 0 percent failure rate during high traffic scenarios.
- Message queues provide stability within the system during sudden influxes of requests until scaling triggers can provision additional compute resources. When increasing traffic by 3x, average request latency only increased by 5 percent.
- The price of stability is increased latency due to the communication overhead between the various system components. When user traffic is above the baseline threshold, the added latency can be partially mitigated by providing more compute resources if performance is a higher priority over cost.
- SageMaker’s asynchronous inference endpoints allow the instance count to be scaled to zero while keeping the endpoint active to receive requests. This functionality enables deployments to continue running without incurring compute costs and scale up from zero when needed in two scenarios: service deployments used in lower test environments and those that have minimal traffic without requiring immediate processing.
Conclusion
As observed during the POC process, the innovative design jointly created by CCC and AWS provides a solid foundation for using Amazon SageMaker with other AWS managed services to host complex multi-modal AI ensembles and orchestrate inference pipelines effectively and seamlessly. By leveraging Amazon SageMaker’s out-of-the-box functionalities like Asynchronous Inference, CCC has more opportunities to focus on specialized business-critical tasks. In the spirit of CCC’s research-driven culture, this novel architecture will continue to evolve as CCC leads the way forward, alongside AWS, in unleashing powerful new AI solutions for clients.
For detailed steps on how to create, invoke, and monitor asynchronous inference endpoints, refer to the documentation, which also contains a sample notebook to help you get started. For pricing information, visit Amazon SageMaker Pricing.
For examples on using asynchronous inference with unstructured data such as computer vision and natural language processing (NLP), refer to Run computer vision inference on large videos with Amazon SageMaker asynchronous endpoints and Improve high-value research with Hugging Face and Amazon SageMaker asynchronous inference endpoints, respectively.
About the Authors
Christopher Diaz is a Lead R&D Engineer at CCC Intelligent Solutions. As a member of the R&D team, he has worked on a variety of projects ranging from ETL tooling, backend web development, collaborating with researchers to train AI models on distributed systems, and facilitating the delivery of new AI services between research and operations teams. His recent focus has been on researching cloud tooling solutions to enhance various aspects of the company’s AI model development lifecycle. In his spare time, he enjoys trying new restaurants in his hometown of Chicago and collecting as many LEGO sets as his home can fit. Christopher earned his Bachelor of Science in Computer Science from Northeastern Illinois University.
Emmy Award winner Sam Kinard is a Senior Manager of Software Engineering at CCC Intelligent Solutions. Based in Austin, Texas, he wrangles the AI Runtime Team, which is responsible for serving CCC’s AI products at high availability and large scale. In his spare time, Sam enjoys being sleep deprived because of his two wonderful children. Sam has a Bachelor of Science in Computer Science and a Bachelor of Science in Mathematics from the University of Texas at Austin.
Jaime Hidalgo is a Senior Systems Engineer at CCC Intelligent Solutions. Before joining the AI research team, he led the company’s global migration to Microservices Architecture, designing, building, and automating the infrastructure in AWS to support the deployment of cloud products and services. Currently, he builds and supports an on-premises data center cluster built for AI training and also designs and builds cloud solutions for the company’s future of AI research and deployment.
Daniel Suarez is a Data Science Engineer at CCC Intelligent Solutions. As a member of the AI Engineering team, he works on the automation and preparation of AI Models in the production, evaluation, and monitoring of metrics and other aspects of ML operations. Daniel received a Master’s in Computer Science from the Illinois Institute of Technology and a Master’s and Bachelor’s in Telecommunication Engineering from Universidad Politecnica de Madrid.
Arunprasath Shankar is a Senior AI/ML Specialist Solutions Architect with AWS, helping global customers scale their AI solutions effectively and efficiently in the cloud. In his spare time, Arun enjoys watching sci-fi movies and listening to classical music.
Justin McWhirter is a Solutions Architect Manager at AWS. He works with a team of amazing Solutions Architects who help customers have a positive experience while adopting the AWS platform. When not at work, Justin enjoys playing video games with his two boys, ice hockey, and off-roading in his Jeep.
Using large language models (LLMs) to synthesize training data
Prompt engineering enables researchers to generate customized training examples for lightweight “student” models.Read More
Domain data trumps teacher knowledge for distilling NLU models
On natural-language-understanding tasks, student models trained only on task-specific data outperform those trained on a mix that includes generic data.Read More
Set up Amazon SageMaker Studio with Jupyter Lab 3 using the AWS CDK
Amazon SageMaker Studio is a fully integrated development environment (IDE) for machine learning (ML) partly based on JupyterLab 3. Studio provides a web-based interface to interactively perform ML development tasks required to prepare data and build, train, and deploy ML models. In Studio, you can load data, adjust ML models, move in between steps to adjust experiments, compare results, and deploy ML models for inference.
The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to create AWS CloudFormation stacks through automatic CloudFormation template generation. A stack is a collection of AWS resources, that can be programmatically updated, moved, or deleted. AWS CDK constructs are the building blocks of AWS CDK applications, representing the blueprint to define cloud architectures.
Setting up Studio with AWS CDK has become a streamlined process. The AWS CDK allows you to use native constructs to define and deploy Studio using infrastructure as code (IaC), including AWS Identity and Access Management (AWS IAM) permissions and desired cloud resource configurations, all in one place. This development approach can be used in combination with other common software engineering best practices such as automated code deployments, tests, and CI/CD pipelines. The AWS CDK reduces the time required to perform typical infrastructure deployment tasks while shrinking the surface area for human error through automation.
This post guides you through the steps to get started with setting up and deploying Studio to standardize ML model development and collaboration with fellow ML engineers and ML scientists. All examples in the post are written in the Python programming language. However, the AWS CDK offers built-in support for multiple other programming languages like JavaScript, Java and C#.
Prerequisites
To get started, the following prerequisites apply:
- The AWS Command Line Interface (AWS CLI) is installed
- The AWS CDK is installed
- You have permissions to create and deploy AWS CDK and AWS CloudFormation resources as defined in the scripts outlined in the post
- Python 3+
Clone the GitHub repository
First, let’s clone the GitHub repository.
When the repository is successfully pulled, you may inspect the cdk directory containing the following resources:
- cdk – Contains the main cdk resources
- app.py – Where the AWS CDK stack is defined
- cdk.json – Contains metadata, and feature flags
AWS CDK scripts
The two main files we want to look at in the cdk
subdirectory are sagemaker_studio_construct.py
and sagemaker_studio_stack.py
. Let’s look at each file in more detail.
Studio construct file
The Studio construct is defined in the sagemaker_studio_construct.py
file.
The Studio construct takes in the virtual private cloud (VPC), listed users, AWS Region, and underlying default instance type as parameters. This AWS CDK construct serves the following functions:
- Creates the Studio domain (
SageMakerStudioDomain
) - Sets the IAM role
sagemaker_studio_execution_role
withAmazonSageMakerFullAccess
permissions required to create resources. Permissions need to be scoped down further to follow the least privilege principle for improved security. - Sets Jupyter server app settings – takes in
JUPYTER_SERVER_APP_IMAGE_NAME
, defining the jupyter-server-3 container image to be used. - Sets kernel gateway app settings – takes in
KERNEL_GATEWAY_APP_IMAGE_NAME
, defining the datascience-2.0 container image to be used. - Creates a user profile for each listed user
The following code snippet shows the relevant Studio domain AWS CloudFormation resources defined in AWS CDK:
The following code snippet shows the user profiles created from AWS CloudFormation resources:
Studio stack file
After the construct has been defined, you can add it by creating an instance of the class and passing the required arguments inside of the stack. The stack creates the AWS CloudFormation resources as part of one coherent deployment. This means that if at least one cloud resource fails to be created, the CloudFormation stack rolls back any changes performed. The following code snippet of the Studio construct instantiates inside of the Studio stack:
Deploy the AWS CDK stack
To deploy your AWS CDK stack, run the following commands from the project’s root directory within your terminal window:
aws configure
pip3 install -r requirements.txt
cdk bootstrap --app "python3 -m cdk.app"
cdk deploy --app "python3 -m cdk.app"
Review the resources the AWS CDK creates in your AWS account and select yes when prompted to deploy the stack. Wait for your stack deployment to finish. This typically takes less than 5 minutes; however, adding more resources will prolong deployment time. You can also check the deployment status on the AWS CloudFormation console.
When the stack has been successfully deployed, check its information by going to the Studio Control Panel. You should see the SageMaker Studio user profile you created.
If you redeploy the stack it will check for changes, performing only the cloud resource updates necessary. For example, this can be used to add users, or change permissions of those users without having to recreate all of the defined cloud resources.
Cleanup
To delete a stack, complete the following steps:
- On the AWS CloudFormation console, choose Stacks in the navigation pane.
- Open the stack you want to delete.
- In the stack details pane, choose Delete.
- Choose Delete stack when prompted.
AWS CloudFormation will delete the resources created when the stack was deployed. This may take some time depending on the amount of resources created.
If you encounter any issues going through these cleanup steps, you may need to manually delete the Studio domain first before repeating the steps in this section.
Conclusion
In this post, we showed how to use AWS cloud-native IaC resources to build an easily reusable template for Studio deployments. SageMaker Studio is a fully integrated web-based IDE that provides a visual interface for ML development tasks based on JupyterLab3. With AWS CDK stacks, we were able to define constructs for building out cloud components that can be easily modified, edited, or deleted by making changes to the underlying CloudFormation stack.
For more information about Amazon Studio, see Amazon SageMaker Studio.
About the Authors
Cory Hairston is a Software Engineer at the Amazon ML Solutions Lab. He is ardent about learning new technologies and leveraging that information to build reusable software solutions. He is an avid power-lifter and spends his free time making digital art.
Marcelo Aberle is an ML Engineer in the AWS AI organization. He is leading MLOps efforts at the Amazon ML Solutions Lab, helping customers design and implement scalable ML systems. His mission is to guide customers on their enterprise ML journey and accelerate their ML path to production.
Yash Shah is a Science Manager in the Amazon ML Solutions Lab. He and his team of applied scientists and machine learning engineers work on a range of machine learning use cases from healthcare, sports, automotive and manufacturing.
Churn prediction using multimodality of text and tabular features with Amazon SageMaker Jumpstart
Amazon SageMaker JumpStart is the Machine Learning (ML) hub of SageMaker providing pre-trained, publicly available models for a wide range of problem types to help you get started with machine learning.
Understanding customer behavior is top of mind for every business today. Gaining insights into why and how customers buy can help grow revenue. Customer churn is a problem faced by a wide range of companies, from telecommunications to banking, where customers are typically lost to competitors. It’s in a company’s best interest to retain existing customers instead of acquiring new customers, because it usually costs significantly more to attract new customers. When trying to retain customers, companies often focus their efforts on customers who are more likely to leave. User behavior and customer support chat logs can contain valuable indicators on the likelihood of a customer ending the service. In this solution, we train and deploy a churn prediction model that uses a state-of-the-art natural language processing (NLP) model to find useful signals in text. In addition to textual inputs, this model uses traditional structured data inputs such as numerical and categorical fields.
Multimodality is a multi-disciplinary research field that addresses some of the original goals of artificial intelligence by integrating and modeling multiple modalities. This post aims to build a model that can process and relate information from multiple modalities such as tabular and textual features.
We show you how to train, deploy and use a churn prediction model that has processed numerical, categorical, and textual features to make its prediction. Although we dive deep into a churn prediction use case in this post, you can use this solution as a template to generalize fine-tuning pre-trained models with your own dataset, and subsequently run hyperparameter optimization (HPO) to improve accuracy. You can even replace the example dataset with your own and run it end to end to solve your own use cases. The solution outlined in the post is available on GitHub.
JumpStart solution templates
Amazon SageMaker JumpStart provides one-click, end-to-end solutions for many common ML use cases. Explore the following use cases for more information on available solution templates:
- Demand forecasting
- Credit rating prediction
- Fraud detection
- Computer vision
- Extract and analyze data from documents
- Predictive maintenance
- Churn prediction
- Personalized recommendations
- Reinforcement learning
- Healthcare and life sciences
- Financial pricing
The JumpStart solution templates cover a variety of use cases, under each of which several different solution templates are offered (this Document Understanding solution is under the “Extract and analyze data from documents” use case).
Choose the solution template that best fits your use case from the JumpStart landing page. For more information on specific solutions under each use case and how to launch a JumpStart solution, see Solution Templates.
Solution overview
The following figure demonstrates how you can use this solution with Amazon SageMaker components. The SageMaker training jobs are used to train the various NLP models, and SageMaker endpoints are used to deploy the models in each stage. We use Amazon Simple Storage Service (Amazon S3) alongside SageMaker to store the training data and model artifacts, and Amazon CloudWatch to log training and endpoint outputs.
We approach solving the churn prediction problem with the following steps:
- Data exploration to prepare the data to be ML ready.
- Train a multimodal model with a Hugging Face sentence transformer and Scikit-learn random forest classifier.
- Further improve the model performance with HPO using SageMaker automatic model tuning.
- Train two AutoGluon multimodal models: an AutoGluon multimodal weighted/stacked ensemble model, and an AutoGluon multimodal fusion model.
- Evaluate and compare the model performances on the holdout test data.
Prerequisites
To try out the solution in your own account, make sure that you have the following in place:
- An AWS account. If you don’t have an account, you can sign up for one.
- The solution outlined in the post is part of SageMaker JumpStart. To run this JumpStart solution and have the infrastructure deploy to your AWS account, you must create an active Amazon SageMaker Studio instance (see Onboard to Amazon SageMaker Studio). When your Studio instance is ready, use the instructions in JumpStart to launch the solution.
- When running this notebook on Studio, you should make sure the Python 3 (PyTorch 1.10 Python 3.8 CPU Optimized) image/kernel is used.
You can install the required packages as outlined in the solution to run this notebook:
Open the churn prediction use case
On the Studio console, choose Solutions, models, example notebooks under Quick start solutions in the navigation pane. Navigate to the Churn Prediction with Text solution in JumpStart.
Now we can take a closer look at some of the assets that are included in this solution.
Data exploration
First let’s download the test, validate, and train dataset from the source S3 bucket and upload it to our S3 bucket. The following screenshot shows us 10 observations of the training data.
Let’s begin exploring the train and validation dataset.
As you can see, we have different features such as CustServ Calls
, Day Charge
, and Day Calls
that we use to predict the target column y
(whether the customer left the service).
y
is known as the target attribute: the attribute that we want the ML model to predict. Because the target attribute is binary, our model performs binary prediction, also known as binary classification.
There are 21 features, including the target variable. The number of examples for training and validation data are 43,000 and 5,000, respectively.
The following screenshot shows the summary statistics of the training dataset.
We have explored the dataset and split it into training, validation, and test sets. The training and validation set is used for training and HPO. The test set is used as the holdout set for model performance evaluation. We now carry out feature engineering steps and then fit the model.
Fit a multimodal model with a Hugging Face sentence transformer and Scikit-learn random forest classifier
The model training consists of two components: a feature engineering step that processes numerical, categorical, and text features, and a model fitting step that fits the transformed features into a Scikit-learn random forest classifier.
For the feature engineering, we complete the following steps:
- Fill in the missing values for numerical features.
- Encode categorical features into one-hot values, where the missing values are counted as one of the categories for each feature.
- Use a Hugging Face sentence transformer to encode the text feature to generate a X-dimensional dense vector, where the value of X depends on a particular sentence transformer.
We choose the top three most downloaded sentence transformer models and use them in the following model fitting and HPO. Specifically, we use all-MiniLM-L6-v2, multi-qa-mpnet-base-dot-v1, and paraphrase-MiniLM-L6-v2. For hyperparameters of the random forest classifier, refer to the GitHub repo.
The following figure depicts the model architecture diagram.
There are many hyperparameters you can tune, such as n-estimators, max-depth, and bootstrap. For more details, refer to the GitHub repo.
For demonstration purposes, we only use numerical features CustServ Calls
and Account Length
, categorical features plan
, and limit
, and text feature text
to fit the model. Multiple features should be separated by ,.
We deploy the model after training is complete:
When calling our new endpoint from the notebook, we use a SageMaker SDK Predictor. A Predictor
is used to send data to an endpoint (as part of a request) and interpret the response. JSON is used as the format for both input data and output response because it’s a standard endpoint format and the endpoint response can contain nested data structures.
With our model successfully deployed and our predictor configured, we can try out the churn prediction model on an example input:
The following code shows the response (probability of churn) from querying the endpoint:
Note that the probability returned by this model has not been calibrated. When the model gives a probability of churn of 20%, for example, this doesn’t necessarily mean that 20% of customers with a probability of 20% resulted in churn. Calibration is a useful property in certain circumstances, but isn’t required in cases where discrimination between cases of churn and non-churn is sufficient. CalibratedClassifierCV from Scikit-learn can be used to calibrate a model.
Now we query the endpoint using the hold-out test data, which consists of 1,939 examples. The following table summarizes the evaluation results for our multimodal model with a Hugging Face sentence transformer and Scikit-learn random forest classifier.
Metric | BERT + Random Forest |
Accuracy | 0.77463 |
ROC AUC | 0.75905 |
Model performance is dependent on hyperparameter configurations. Training a model with one set of hyperparameter configurations will not guarantee an optimal model. As a result, we run the HPO process in the following section to further improve model performance.
Fit a multimodal model with HPO
In this section, we further improve the model performance by adding HPO tuning with SageMaker automatic model tuning. SageMaker automatic model tuning, also known as hyperparameter tuning, finds the best version of a model by running many training jobs on your dataset using the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose. The best model and its corresponding hyperparameters are selected on the validation data. Next, the best model is evaluated on the hold-out test data, which is the same test data we created in the previous section. Finally, we show that the performance of the model trained with HPO is significantly better than the one trained without HPO.
The following are static hyperparameters we don’t tune and dynamic hyperparameters we want to tune and their searching ranges:
We define the objective metric name, metric definition (with regex pattern), and objective type for the tuning job.
First, we set the objective as the accuracy score on the validation data (roc auc score on validation data
) and defined metrics for the tuning job by specifying the objective metric name and a regular expression (regex). The regular expression is used to match the algorithm’s log output and capture the numeric values of metrics.
Next, we specify hyperparameter ranges to select the best hyperparameter values from. We set the total number of tuning jobs as 10 and distribute these jobs on five different Amazon Elastic Compute Cloud (Amazon EC2) instances for running parallel tuning jobs.
Finally, we pass those values to instantiate a SageMaker Estimator object, similar to what we did in the previous training step. Instead of calling the fit function of the Estimator object, we pass the Estimator object in as a parameter to the HyperparameterTuner constructor and call the fit function of it to launch tuning jobs:
When the tuning job is complete, we can generate the summary table of all the tuning jobs.
After the tuning jobs are complete, we deploy the model that gives the best evaluation metric score on the validation dataset, perform inference on the same hold-out test dataset we did in the previous section, and compute evaluation metrics.
Metric | BERT + Random Forest | BERT + Random Forest with HPO |
Accuracy | 0.77463 | 0.9278 |
ROC AUC | 0.75905 | 0.79861 |
We can see running HPO with SageMaker automatic model tuning significantly improves the model performance.
In addition to HPO, model performance is also dependent on the algorithm. It’s important to train multiple state-of-the-art algorithms, compare their performance on the same hold-out test data, and pick up the optimal one. Therefore, we train two more AutoGluon multimodal models in the following sections.
Fit an AutoGluon multimodal weighted/stacked ensemble model
There are two types of AutoGluon multimodality:
- Train multiple tabular models as well as the
TextPredictor
model (utilizing theTextPredictor
model inside ofTabularPredictor
), and then combine them via either a weighted ensemble or stacked ensemble, as explained in AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data - Fuse multiple neural network models directly and handle raw text (which are also capable of handling additional numerical and categorical columns)
We train a multimodal weighted or stacked ensemble model first in this section, and train a fusion neural network model in the next section.
First, we retrieve the AutoGluon training image:
Next, we pass in hyperparameters. Unlike existing AutoML frameworks that primarily focus on the model or hyperparameter selection, AutoGluonTabular succeeds by ensembling multiple models and stacking them in multiple layers. Therefore, HPO is usually not required for AutoGluon ensemble models.
Finally, we create a SageMaker Estimator and call estimator.fit()
to start a training job:
After training is complete, we retrieve the AutoGluon inference image and deploy the model:
After we deploy the endpoints, we query the endpoint using the same test set and compute evaluation metrics. In the following table, we can see AutoGluon multimodal ensemble improves about 3% in ROC AUC compared with the BERT sentence transformer and random forest with HPO.
Metric | BERT + Random Forest | BERT + Random Forest with HPO | AutoGluon Multimodal Ensemble |
Accuracy | 0.77463 | 0.9278 | 0.92625 |
ROC AUC | 0.75905 | 0.79861 | 0.82918 |
Fit an AutoGluon multimodal fusion model
The following diagram illustrates the architecture of the model. For details, see AutoMM for Text + Tabular – Quick Start.
Internally, we use different networks to encode the text columns, categorical columns, and numerical columns. The features generated by individual networks are aggregated by a late-fusion aggregator. The aggregator can output both the logits or score predictions.
Here, we use the pretrained NLP backbone to extract the text features and then use two other towers to extract the feature from the categorical column and numerical column.
In addition, to deal with multiple text fields, we separate these fields with the [SEP] token and alternate 0s and 1s as the segment IDs, as shown in the following diagram.
Similarly, we follow instructions in the previous section to train and deploy the AutoGluon multimodal fusion model:
The following table summarizes the evaluation results for the AutoGluon multimodal fusion model, along with those of three models that we evaluated in the previous sections. We can see the AutoGluon multimodal ensemble and multimodal fusion models achieve the best performance.
Metrics | BERT + Random Forest | BERT + Random Forest with HPO | AutoGluon Multimodal Ensemble | AutoGluon Multimodal Fusion |
Accuracy | 0.77463 | 0.9278 | 0.92625 | 0.9247 |
ROC AUC | 0.75905 | 0.79861 | 0.82918 | 0.81115 |
Note that the results and relative performance between these models depend on the dataset you use for training. These results are representative, and even though the tendency for certain algorithms to perform better is based on relevant factors, the balance in performance might change given a different data distribution. You can replace the example dataset with your own data to determine what model works best for you.
Demo notebook
You can use the demo notebook to send example data to already-deployed model endpoints. The demo notebook quickly allows you to get hands-on experience by querying the example data. After you launch the Churn Prediction with Text solution, open the demo notebook by choosing Use Endpoint in Notebook.
Clean up
When you’ve finished with this solution, make sure that you delete all unwanted AWS resources by choosing Delete all resources.
Note that you need to manually delete any additional resources that you may have created in this notebook.
Conclusion
In this post, we showed how you can use Sagemaker JumpStart to predict churn using multimodality of text and tabular features.
If you’re interested in learning more about customer churn models, check out the following posts:
- Analyze customer churn probability using call transcription and customer profiles with Amazon SageMaker
- Preventing customer churn by optimizing incentive programs using stochastic programming
- Build, tune, and deploy an end-to-end churn prediction model using Amazon SageMaker Pipelines
About the Authors
Dr. Xin Huang is an Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A journal.
Rajakumar Sampathkumar is a Principal Technical Account Manager at AWS, providing customers guidance on business-technology alignment and supporting the reinvention of their cloud operation models and processes. He is passionate about cloud and machine learning. Raj is also a machine learning specialist and works with AWS customers to design, deploy, and manage their AWS workloads and architectures.
Leveraging artificial intelligence and machine learning at Parsons with AWS DeepRacer
This post is co-written with Jennifer Bergstrom, Sr. Technical Director, ParsonsX.
Parsons Corporation (NYSE:PSN) is a leading disruptive technology company in critical infrastructure, national defense, space, intelligence, and security markets providing solutions across the globe to help make the world safer, healthier, and more connected. Parsons provides services and capabilities across cybersecurity, missile defense, space ground station technology, transportation, environmental remediation, and water/wastewater treatment to name a few.
Parsons is a builder community and invests heavily in employee development programs and upskilling. With programs such as ParsonsX, Parsons’s digital transformation initiative, and – ‘The Guild,’ – an employee-focused community, Parsons strives to be an employer of choice and engages its employees in career development programs year-round to create a workforce of the future.
In this post, we show you how Parsons is building its next generation workforce by using machine learning (ML) and artificial intelligence (AI) with AWS DeepRacer in a fun and collaborative way.
As Parsons’ footprint moves into the cloud, their leadership recognized the need for a change in culture and a fundamental requirement to educate their engineering task force on the new cloud operating model, tools, and technologies. Parsons is observing industry trends that make it imperative to incorporate AI and ML capabilities into the organization’s strategic and tactical decision-making processes. To best serve the customer’s needs, Parsons must upskill its workforce across the board in AI/ML tooling and how to scale it in an enterprise organization. Parsons is on a mission to make AI/ML a foundation of business across the company.
Parsons chose AWS DeepRacer because it’s a fun, interactive, and exciting challenge that appealed to their broader range of employees and didn’t mandate a significant level of expertise to compete. Parsons found that AWS has many dedicated AWS DeepRacer experts in the field who would help plan, setup and run a series of AI/ML events and challenges. Parsons realized success of this event would be driven by efficient mechanisms and processes the AWS DeepRacer community has in place.
Parsons’ goal was to upskill their employees in an enjoyable and competitive way, with virtual leagues among peer groups and an in-person event for the top racers. The education initiative in partnership with AWS was comprised of four phases.
First, Parsons hosted a virtual live workshop with AWS experts in the AI/ML and DeepRacer community. The workshop taught the basics of reinforcement learning, reward functions, hyperparameter tuning, and accessing the AWS DeepRacer console to train and submit a model.
In the next phase, they hosted a virtual community league race for all participating Parsons employees. Models were optimized, submitted, and raced, and winners were announced at the end of racing. Participants in the virtual leagues were comprised of individual contributors and frontline managers from various job roles across Parsons, including civil engineers, bridge engineers, systems and software engineers, data analysts, project managers, and program managers. Joining them as participants in the league were business unit presidents, SVPs, VPs, senior directors, and directors.
In the third phase, an in-person league was held in Maryland. The top four participants from the virtual leagues saw their models loaded into and raced in physical AWS DeepRacer cars on a track built onsite. The top four competitors at this event included a market CTO, a project signaling engineer, a project engineer, and an engineer intern.
The fourth and final phase of the event had each of the four competitors provide a technical walkthrough of the techniques used to develop, train, and test their models. Through AWS DeepRacer, Parsons not only showcased the impact this event was able to make globally across all the divisions, but also that they were able to create a memorable experience for participants.
Over 500 employees registered from various business units and service organizations across Parsons worldwide right after an announcement of the AWS DeepRacer challenge was published internally. The AWS DeepRacer workshop saw unprecedented interest with over 470 Parsons employees joining the initial workshop. The virtual workshop generated significant engagement – 245 active users developed over 1,500+ models and spent over 500 hours training these models on the AWS DeepRacer console. The virtual league was a resounding success, with 185 racers from across the country participating and submitting 1415 models into the competition!
The virtual AWS DeepRacer league at Parsons provided a fun and inviting environment with lots of iterations, learning, and experimentation. Parsons’ Market CTO, John Statuli, who was one of the top four contenders at the race said, “It was a lot of fun to participate at the AWS DeepRacer event. I have not done any programming in a long time, but the combination of the AWS DeepRacer virtual workshop and the AWS DeepRacer program provided an easy way by which I could participate and compete for the top spot.”
At the final race held in Maryland, Parsons broadcasted a companywide virtual event that showcased a tough competition between their top four competitors from three different business units. Parsons top leadership joined the event, including CTO Rico Lorenzo, D&I CTO Ryan Gabrielle, President of Connected Communities Peter Torrellas, CDO Tim LaChapelle, and ParsonsX Sr. Director Jennifer Bergstrom. At the event, Parsons hosted a webinar with over 100 attendees and a winner’s walkthrough of their models.
With such an overwhelming response from employees across the globe and an interest in AI/ML learning, Parsons is now planning several additional events to continue growing their employees’ knowledgebase. To continue to upskill and educate their workforce, Parsons intends to run more AWS DeepRacer events and workshops focused on object avoidance, an Amazon SageMaker deep dive workshop, and an AWS DeepRacer head-to-head race. Parsons continues to engage with AWS on AI/ML services to build world-class solutions in the fields of critical infrastructure, national defense, space, and cybersecurity.
Whether your organization is new to machine learning or ready to build on existing skills, AWS DeepRacer can help you get there. To learn more visit Getting Started with AWS DeepRacer.
About the Authors
Jenn Bergstrom is a Parsons Fellow and Senior Technical Director. She is passionate about innovative technological solutions and strategies and enjoys designing well-architected cloud solutions for programs across all of Parsons’s domains. When not driving innovation at Parsons, she loves exploring the world with her husband and daughters, and mentoring diverse individuals transitioning into the tech industry. You can reach her on LinkedIn.
Deval Parikh is a Sr. Enterprise Solutions Architect at Amazon Web Services. She is passionate about helping enterprises reimagine their businesses in the cloud by leading them with strategic architectural guidance and building prototypes as an AWS expert. She is also an active board member of the Women at AWS affinity group where she oversees university programs to educate students on cloud technology and careers. She is also an avid hiker and a painter of oil on canvas. You can see many of her paintings at www.devalparikh.com. You can reach her on LinkedIn.
Teaching speech recognizers new words — without retraining
Using lists of rare or out-of-vocabulary words to bias connectionist temporal classification models enables personalization.Read More
How Thomson Reuters built an AI platform using Amazon SageMaker to accelerate delivery of ML projects
This post is co-written by Ramdev Wudali and Kiran Mantripragada from Thomson Reuters.
In 1992, Thomson Reuters (TR) released its first AI legal research service, WIN (Westlaw Is Natural), an innovation at the time, as most search engines only supported Boolean terms and connectors. Since then, TR has achieved many more milestones as its AI products and services are continuously growing in number and variety, supporting legal, tax, accounting, compliance, and news service professionals worldwide, with billions of machine learning (ML) insights generated every year.
With this tremendous increase of AI services, the next milestone for TR was to streamline innovation, and facilitate collaboration. Standardize building and reuse of AI solutions across business functions and AI practitioners’ personas, while ensuring adherence to enterprise best practices:
- Automate and standardize the repetitive undifferentiated engineering effort
- Ensure the required isolation and control of sensitive data according to common governance standards
- Provide easy access to scalable computing resources
To fulfill these requirements, TR built the Enterprise AI platform around the following five pillars: a data service, experimentation workspace, central model registry, model deployment service, and model monitoring.
In this post, we discuss how TR and AWS collaborated to develop TR’s first ever Enterprise AI Platform, a web-based tool that would provide capabilities ranging from ML experimentation, training, a central model registry, model deployment, and model monitoring. All these capabilities are built to address TR’s ever-evolving security standards and provide simple, secure, and compliant services to end-users. We also share how TR enabled monitoring and governance for ML models created across different business units with a single pane of glass.
The challenges
Historically at TR, ML has been a capability for teams with advanced data scientists and engineers. Teams with highly skilled resources were able to implement complex ML processes as per their needs, but quickly became very siloed. Siloed approaches didn’t provide any visibility to provide governance into extremely critical decision-making predictions.
TR business teams have vast domain knowledge; however, the technical skills and heavy engineering effort required in ML makes it difficult to use their deep expertise to solve business problems with the power of ML. TR wants to democratize the skills, making it accessible to more people within the organization.
Different teams in TR follow their own practices and methodologies. TR wants to build the capabilities that span across the ML lifecycle to their users to accelerate the delivery of ML projects by enabling teams to focus on business goals and not on the repetitive undifferentiated engineering effort.
Additionally, regulations around data and ethical AI continue to evolve, mandating for common governance standards across TR’s AI solutions.
Solution overview
TR’s Enterprise AI Platform was envisioned to provide simple and standardized services to different personas, offering capabilities for every stage of the ML lifecycle. TR has identified five major categories that modularize all TR’s requirements:
- Data service – To enable easy and secured access to enterprise data assets
- Experimentation workspace – To provide capabilities to experiment and train ML models
- Central model registry – An enterprise catalog for models built across different business units
- Model deployment service – To provide various inference deployment options following TR’s enterprise CI/CD practices
- Model monitoring services – To provide capabilities to monitor data and model bias and drifts
As shown in the following diagram, these microservices are built with a few key principles in mind:
- Remove the undifferentiated engineering effort from users
- Provide the required capabilities at the click of a button
- Secure and govern all capabilities as per TR’s enterprise standards
- Bring a single pane of glass for ML activities
TR’s AI Platform microservices are built with Amazon SageMaker as the core engine, AWS serverless components for workflows, and AWS DevOps services for CI/CD practices. SageMaker Studio is used for experimentation and training, and the SageMaker model registry is used to register models. The central model registry is comprised of both the SageMaker model registry and an Amazon DynamoDB table. SageMaker hosting services are used to deploy models, while SageMaker Model Monitor and SageMaker Clarify are used to monitor models for drift, bias, custom metric calculators, and explainability.
The following sections describe these services in detail.
Data service
A traditional ML project lifecycle starts with finding data. In general, data scientists spend 60% or more of their time to find the right data when they need it. Just like every organization, TR has multiple data stores that serve as a single point of truth for different data domains. TR identified two key enterprise data stores that provide data for most of their ML use cases: an object store and a relational data store. TR built an AI Platform data service to seamlessly provide access to both data stores from users’ experimentation workspaces and remove the burden from users to navigate complex processes to acquire data on their own. The TR’s AI Platform follows all the compliances and best practices defined by Data and Model Governance team. This includes a mandatory Data Impact Assessment that helps ML practitioners to understand and follow the ethical and appropriate use of data, with formal approval processes to ensure appropriate access to the data. Core to this service, as well as all platform services, is the security and compliance according to the best practices determined by TR and the industry.
Amazon Simple Storage Service (Amazon S3) object storage acts as a content data lake. TR built processes to securely access data from the content data lake to users’ experimentation workspaces while maintaining required authorization and auditability. Snowflake is used as the enterprise relational primary data store. Upon user request and based on the approval from the data owner, the AI Platform data service provides a snapshot of the data to the user readily available into their experimentation workspace.
Accessing data from various sources is a technical problem that can be easily solved. But the complexity TR has solved is to build approval workflows that automate identifying the data owner, sending an access request, making sure the data owner is notified that they have a pending access request, and based on the approval status take action to provide data to the requester. All the events throughout this process are tracked and logged for auditability and compliance.
As shown in the following diagram, TR uses AWS Step Functions to orchestrate the workflow and AWS Lambda to run the functionality. Amazon API Gateway is used to expose the functionality with an API endpoint to be consumed from their web portal.
Model experimentation and development
An essential capability for standardizing the ML lifecycle is an environment that allows data scientists to experiment with different ML frameworks and data sizes. Enabling such a secure, compliant environment in the cloud within minutes relieves data scientists from the burden of handling cloud infrastructure, networking requirements, and security standards measures, to focus instead on the data science problem.
TR builds an experimentation workspace that offers access to services such as AWS Glue, Amazon EMR, and SageMaker Studio to enable data processing and ML capabilities adhering to enterprise cloud security standards and required account isolation for every business unit. TR has encountered the following challenges while implementing the solution:
- Orchestration early on wasn’t fully automated and involved several manual steps. Tracking down where problems were occurring wasn’t easy. TR overcame this error by orchestrating the workflows using Step Functions. With the use of Step Functions, building complex workflows, managing states, and error handling became much easier.
- Proper AWS Identity and Access Management (IAM) role definition for the experimentation workspace was hard to define. To comply with TR’s internal security standards and least privilege model, originally, the workspace role was defined with inline policies. Consequentially, the inline policy grew with time and became verbose, exceeding the policy size limit allowed for the IAM role. To mitigate this, TR switched to using more customer-managed policies and referencing them in the workspace role definition.
- TR occasionally reached the default resource limits applied at the AWS account level. This caused occasional failures of launching SageMaker jobs (for example, training jobs) due to the desired resource type limit reached. TR worked closely with the SageMaker service team on this issue. This problem was solved after the AWS team launched SageMaker as a supported service in Service Quotas in June 2022.
Today, data scientists at TR can launch an ML project by creating an independent workspace and adding required team members to collaborate. Unlimited scale offered by SageMaker is at their fingertips by providing them custom kernel images with varied sizes. SageMaker Studio quickly became a crucial component in TR’s AI Platform and has changed user behavior from using constrained desktop applications to scalable and ephemeral purpose-built engines. The following diagram illustrates this architecture.
Central model registry
The model registry provides a central repository for all of TR’s machine learning models, enables risk and health management of those in a standardized manner across business functions, and streamlines potential models’ reuse. Therefore, the service needed to do the following:
- Provide the capability to register both new and legacy models, whether developed within or outside SageMaker
- Implement governance workflows, enabling data scientists, developers, and stakeholders to view and collectively manage the lifecycle of models
- Increase transparency and collaboration by creating a centralized view of all models across TR alongside metadata and health metrics
TR started the design with just the SageMaker model registry, but one of TR’s key requirements is to provide the capability to register models created outside of SageMaker. TR evaluated different relational databases but ended up choosing DynamoDB because the metadata schema for models coming from legacy sources will be very different. TR also didn’t want to impose any additional work on the users, so they implemented a seamless automatic synchronization between the AI Platform workspace SageMaker registries to the central SageMaker registry using Amazon EventBridge rules and required IAM roles. TR enhanced the central registry with DynamoDB to extend the capabilities to register legacy models that were created on users’ desktops.
TR’s AI Platform central model registry is integrated into the AI Platform portal and provides a visual interface to search models, update model metadata, and understand model baseline metrics and periodic custom monitoring metrics. The following diagram illustrates this architecture.
Model deployment
TR identified two major patterns to automate deployment:
- Models developed using SageMaker through SageMaker batch transform jobs to get inferences on a preferred schedule
- Models developed outside SageMaker on local desktops using open-source libraries, through the bring your own container approach using SageMaker processing jobs to run custom inference code, as an efficient way to migrate those models without refactoring the code
With the AI Platform deployment service, TR users (data scientists and ML engineers) can identify a model from the catalog and deploy an inference job into their chosen AWS account by providing the required parameters through a UI-driven workflow.
TR automated this deployment using AWS DevOps services like AWS CodePipeline and AWS CodeBuild. TR uses Step Functions to orchestrate the workflow of reading and preprocessing data to creating SageMaker inference jobs. TR deploys the required components as code using AWS CloudFormation templates. The following diagram illustrates this architecture.
Model monitoring
The ML lifecycle is not complete without being able to monitor models. TR’s enterprise governance team also mandates and encourages business teams to monitor their model performance over time to address any regulatory challenges. TR started with monitoring models and data for drift. TR used SageMaker Model Monitor to provide a data baseline and inference ground truth to periodically monitor how TR’s data and inferences are drifting. Along with SageMaker model monitoring metrics, TR enhanced the monitoring capability by developing custom metrics specific to their models. This will help TR’s data scientists understand when to retrain their model.
Along with drift monitoring, TR also wants to understand bias in the models. The out-of-the-box capabilities of SageMaker Clarify are used to build TR’s bias service. TR monitors both data and model bias and makes those metrics available for their users through the AI Platform portal.
To help all teams to adopt these enterprise standards, TR has made these services independent and readily available via the AI Platform portal. TR’s business teams can go into the portal and deploy a model monitoring job or bias monitoring job on their own and run them on their preferred schedule. They’re notified on the status of the job and the metrics for every run.
TR used AWS services for CI/CD deployment, workflow orchestration, serverless frameworks, and API endpoints to build microservices that can be triggered independently, as shown in the following architecture.
Results and future improvements
TR’s AI Platform went live in Q3 2022 with all five major components: a data service, experimentation workspace, central model registry, model deployment, and model monitoring. TR conducted internal training sessions for its business units to onboard the platform and offered them self-guided training videos.
The AI Platform has provided capabilities to TR’s teams that never existed before; it has opened a wide range of possibilities for TR’s enterprise governance team to enhance compliance standards and centralize the registry, providing a single pane of glass view across all ML models within TR.
TR acknowledges that no product is at its best on initial release. All TR’s components are at different levels of maturity, and TR’s Enterprise AI Platform team is in a continuous enhancement phase to iteratively improve product features. TR’s current advancement pipeline includes adding additional SageMaker inference options like real-time, asynchronous, and multi-model endpoints. TR is also planning to add model explainability as a feature to its model monitoring service. TR plans to use the explainability capabilities of SageMaker Clarify to develop its internal explainability service.
Conclusion
TR can now process vast amounts of data securely and use advanced AWS capabilities to take an ML project from ideation to production in the span of weeks, compared to the months it took before. With the out-of-the-box capabilities of AWS services, teams within TR can register and monitor ML models for the first time ever, achieving compliance with their evolving model governance standards. TR empowered data scientists and product teams to effectively unleash their creativity to solve most complex problems.
To know more about TR’s Enterprise AI Platform on AWS, check out the AWS re:Invent 2022 session. If you’d like to learn how TR accelerated the use of machine learning using the AWS Data Lab program, refer to the case study.
About the Authors
Ramdev Wudali is a Data Architect, helping architect and build the AI/ML Platform to enable data scientists and researchers to develop machine learning solutions by focusing on the data science and not on the infrastructure needs. In his spare time, he loves to fold paper to create origami tessellations, and wearing irreverent T-shirts.
Kiran Mantripragada is the Senior Director of AI Platform at Thomson Reuters. The AI Platform team is responsible for enabling production-grade AI software applications and enabling the work of data scientists and machine learning researchers. With a passion for science, AI, and engineering, Kiran likes to bridge the gap between research and productization to bring the real innovation of AI to the final consumers.
Bhavana Chirumamilla is a Sr. Resident Architect at AWS. She is passionate about data and ML operations, and brings lots of enthusiasm to help enterprises build data and ML strategies. In her spare time, she enjoys time with her family traveling, hiking, gardening, and watching documentaries.
Srinivasa Shaik is a Solutions Architect at AWS based in Boston. He helps enterprise customers accelerate their journey to the cloud. He is passionate about containers and machine learning technologies. In his spare time, he enjoys spending time with his family, cooking, and traveling.
Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his PhD in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently, he helps customers in the financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.
Federated Learning on AWS with FedML: Health analytics without sharing sensitive data – Part 2
Analyzing real-world healthcare and life sciences (HCLS) data poses several practical challenges, such as distributed data silos, lack of sufficient data at a single site for rare events, regulatory guidelines that prohibit data sharing, infrastructure requirement, and cost incurred in creating a centralized data repository. Because they’re in a highly regulated domain, HCLS partners and customers seek privacy-preserving mechanisms to manage and analyze large-scale, distributed, and sensitive data.
To mitigate these challenges, we propose a federated learning (FL) framework, based on open-source FedML on AWS, which enables analyzing sensitive HCLS data. It involves training a global machine learning (ML) model from distributed health data held locally at different sites. It doesn’t require moving or sharing data across sites or with a centralized server during the model training process.
Deploying an FL framework on the cloud has several challenges. Automating the client-server infrastructure to support multiple accounts or virtual private clouds (VPCs) requires VPC peering and efficient communication across VPCs and instances. In a production workload, a stable deployment pipeline is needed to seamlessly add and remove clients and update their configurations without much overhead. Furthermore, in a heterogenous setup, clients may have varying requirements for compute, network, and storage. In this decentralized architecture, logging and debugging errors across clients can be difficult. Finally, determining the optimal approach to aggregate model parameters, maintain model performance, ensure data privacy, and improve communication efficiency is an arduous task. In this post, we address these challenges by providing a federated learning operations (FLOps) template that hosts a HCLS solution. The solution is agnostic to use cases, which means you can adapt it for your use cases by changing the model and data.
In this two-part series, we demonstrate how you can deploy a cloud-based FL framework on AWS. In the first post, we described FL concepts and the FedML framework. In this second part, we present a proof-of-concept healthcare and life sciences use case from a real-world dataset eICU. This dataset comprises a multi-center critical care database collected from over 200 hospitals, which makes it ideal to test our FL experiments.
HCLS use case
For the purpose of demonstration, we built an FL model on a publicly available dataset to manage critically ill patients. We used the eICU Collaborative Research Database, a multi-center intensive care unit (ICU) database, comprising 200,859 patient unit encounters for 139,367 unique patients. They were admitted to one of 335 units at 208 hospitals located throughout the US between 2014–2015. Due to the underlying heterogeneity and distributed nature of the data, it provides an ideal real-world example to test this FL framework. The dataset includes laboratory measurements, vital signs, care plan information, medications, patient history, admission diagnosis, time-stamped diagnoses from a structured problem list, and similarly chosen treatments. It is available as a set of CSV files, which can be loaded into any relational database system. The tables are de-identified to meet the regulatory requirements US Health Insurance Portability and Accountability Act (HIPAA). The data can be accessed via a PhysioNet repository, and details of the data access process can be found here [1].
The eICU data is ideal for developing ML algorithms, decision support tools, and advancing clinical research. For benchmark analysis, we considered the task of predicting the in-hospital mortality of patients [2]. We defined it as a binary classification task, where each data sample spans a 1-hour window. To create a cohort for this task, we selected patients with a hospital discharge status in the patient’s record and a length of stay of at least 48 hours, because we focus on prediction mortality during the first 24 and 48 hours. This created a cohort of 30,680 patients containing 1,164,966 records. We adopted domain-specific data preprocessing and methods described in [3] for mortality prediction. This resulted in an aggregated dataset comprising several columns per patient per record, as shown in the following figure. The following table provides a patient record in a tabular style interface with time in columns (5 intervals over 48 hours) and vital sign observations in rows. Each row represents a physiological variable, and each column represents its value recorded over a time window of 48 hours for a patient.
Physiologic Parameter | Chart_Time_0 | Chart_Time_1 | Chart_Time_2 | Chart_Time_3 | Chart_Time_4 |
Glasgow Coma Score Eyes | 4 | 4 | 4 | 4 | 4 |
FiO2 | 15 | 15 | 15 | 15 | 15 |
Glasgow Coma Score Eyes | 15 | 15 | 15 | 15 | 15 |
Heart Rate | 101 | 100 | 98 | 99 | 94 |
Invasive BP Diastolic | 73 | 68 | 60 | 64 | 61 |
Invasive BP Systolic | 124 | 122 | 111 | 105 | 116 |
Mean arterial pressure (mmHg) | 77 | 77 | 77 | 77 | 77 |
Glasgow Coma Score Motor | 6 | 6 | 6 | 6 | 6 |
02 Saturation | 97 | 97 | 97 | 97 | 97 |
Respiratory Rate | 19 | 19 | 19 | 19 | 19 |
Temperature (C) | 36 | 36 | 36 | 36 | 36 |
Glasgow Coma Score Verbal | 5 | 5 | 5 | 5 | 5 |
admissionheight | 162 | 162 | 162 | 162 | 162 |
admissionweight | 96 | 96 | 96 | 96 | 96 |
age | 72 | 72 | 72 | 72 | 72 |
apacheadmissiondx | 143 | 143 | 143 | 143 | 143 |
ethnicity | 3 | 3 | 3 | 3 | 3 |
gender | 1 | 1 | 1 | 1 | 1 |
glucose | 128 | 128 | 128 | 128 | 128 |
hospitaladmitoffset | -436 | -436 | -436 | -436 | -436 |
hospitaldischargestatus | 0 | 0 | 0 | 0 | 0 |
itemoffset | -6 | -1 | 0 | 1 | 2 |
pH | 7 | 7 | 7 | 7 | 7 |
patientunitstayid | 2918620 | 2918620 | 2918620 | 2918620 | 2918620 |
unitdischargeoffset | 1466 | 1466 | 1466 | 1466 | 1466 |
unitdischargestatus | 0 | 0 | 0 | 0 | 0 |
We used both numerical and categorical features and grouped all records of each patient to flatten them into a single-record time series. The seven categorical features (Admission diagnosis, Ethnicity, Gender, Glasgow Coma Score Total, Glasgow Coma Score Eyes, Glasgow Coma Score Motor, and Glasgow Coma Score Verbal were converted to one-hot encoding vectors) contained 429 unique values and were converted into one-hot embeddings. To prevent data leakage across training node servers, we split the data by hospital IDs and kept all records of a hospital on a single node.
Solution overview
The following diagram shows the architecture of multi-account deployment of FedML on AWS. This includes two clients (Participant A and Participant B) and a model aggregator.
The architecture consists of three separate Amazon Elastic Compute Cloud (Amazon EC2) instances running in its own AWS account. Each of the first two instances is owned by a client, and the third instance is owned by the model aggregator. The accounts are connected via VPC peering to allow ML models and weights to be exchanged between the clients and aggregator. gRPC is used as communication backend for communication between model aggregator and clients. We tested a single account-based distributed computing setup with one server and two client nodes. Each of these instances were created using a custom Amazon EC2 AMI with FedML dependencies installed as per the FedML.ai installation guide.
Set up VPC peering
After you launch the three instances in their respective AWS accounts, you establish VPC peering between the accounts via Amazon Virtual Private Cloud (Amazon VPC). To set up a VPC peering connection, first create a request to peer with another VPC. You can request a VPC peering connection with another VPC in your account, or with a VPC in a different AWS account. To activate the request, the owner of the VPC must accept the request. For the purpose of this demonstration, we set up the peering connection between VPCs in different accounts but the same Region. For other configurations of VPC peering, refer to Create a VPC peering connection.
Before you begin, make sure that you have the AWS account number and VPC ID of the VPC to peer with.
Request a VPC peering connection
To create the VPC peering connection, complete the following steps:
- On the Amazon VPC console, in the navigation pane, choose Peering connections.
- Choose Create peering connection.
- For Peering connection name tag, you can optionally name your VPC peering connection.Doing so creates a tag with a key of the name and a value that you specify. This tag is only visible to you; the owner of the peer VPC can create their own tags for the VPC peering connection.
- For VPC (Requester), choose the VPC in your account to create the peering connection.
- For Account, choose Another account.
- For Account ID, enter the AWS account ID of the owner of the accepter VPC.
- For VPC (Accepter), enter the VPC ID with which to create the VPC peering connection.
- In the confirmation dialog box, choose OK.
- Choose Create peering connection.
Accept a VPC peering connection
As mentioned earlier, the VPC peering connection needs to be accepted by the owner of the VPC the connection request has been sent to. Complete the following steps to accept the peering connection request:
- On the Amazon VPC console, use the Region selector to choose the Region of the accepter VPC.
- In the navigation pane, choose Peering connections.
- Select the pending VPC peering connection (the status is
pending-acceptance
), and on the Actions menu, choose Accept Request. - In the confirmation dialog box, choose Yes, Accept.
- In the second confirmation dialog, choose Modify my route tables now to go directly to the route tables page, or choose Close to do this later.
Update route tables
To enable private IPv4 traffic between instances in peered VPCs, add a route to the route tables associated with the subnets for both instances. The route destination is the CIDR block (or portion of the CIDR block) of the peer VPC, and the target is the ID of the VPC peering connection. For more information, see Configure route tables.
Update your security groups to reference peer VPC groups
Update the inbound or outbound rules for your VPC security groups to reference security groups in the peered VPC. This allows traffic to flow across instances that are associated with the referenced security group in the peered VPC. For more details about setting up security groups, refer to Update your security groups to reference peer security groups.
Configure FedML
After you have the three EC2 instances running, connect to each of them and perform the following steps:
- Clone the FedML repository.
- Provide topology data about your network in the config file
grpc_ipconfig.csv
.
This file can be found at FedML/fedml_experiments/distributed/fedavg
in the FedML repository. The file includes data about the server and clients and their designated node mapping, such as FL Server – Node 0, FL Client 1 – Node 1, and FL Client 2 – Node2.
- Define the GPU mapping config file.
This file can be found at FedML/fedml_experiments/distributed/fedavg
in the FedML repository. The file gpu_mapping.yaml
consists of configuration data for client server mapping to the corresponding GPU, as shown in the following snippet.
After you define these configurations, you’re ready to run the clients. Note that the clients must be run before kicking off the server. Before doing that, let’s set up the data loaders for the experiments.
Customize FedML for eICU
To customize the FedML repository for eICU dataset, make the following changes to the data and data loader.
Data
Add data to the pre-assigned data folder, as shown in the following screenshot. You can place the data in any folder of your choice, as long as the path is consistently referenced in the training script and has access enabled. To follow a real-world HCLS scenario, where local data isn’t shared across sites, split and sample the data so there’s no overlap of hospital IDs across the two clients. This ensures the data of a hospital is hosted on its own server. We also enforced the same constraint to split the data into train/test sets within each client. Each of the train/test sets across the clients had a 1:10 ratio of positive to negative labels, with roughly 27,000 samples in training and 3,000 samples in test. We handle the data imbalance in model training with a weighted loss function.
Data loader
Each of the FedML clients loads the data and converts it into PyTorch tensors for efficient training on GPU. Extend the existing FedML nomenclature to add a folder for eICU data in the data_processing
folder.
The following code snippet loads the data from the data source. It preprocesses the data and returns one item at a time through the __getitem__
function.
Training ML models with a single data point at a time is tedious and time-consuming. Model training is typically done on a batch of data points at each client. To implement this, the data loader in the data_loader.py
script converts NumPy arrays into Torch tensors, as shown in the following code snippet. Note that FedML provides dataset.py
and data_loader.py
scripts for both structured and unstructured data that you can use for data-specific alterations, as in any PyTorch project.
Import the data loader into the training script
After you create the data loader, import it into the FedML code for ML model training. Like any other dataset (for example, CIFAR-10 and CIFAR-100), load the eICU data to the main_fedavg.py
script in the path FedML/fedml_experiments/distributed/fedavg/
. Here, we used the federated averaging (fedavg
) aggregation function. You can follow a similar method to set up the main
file for any other aggregation function.
We call the data loader function for eICU data with the following code:
Define the model
FedML supports several out-of-the-box deep learning algorithms for various data types, such as tabular, text, image, graphs, and Internet of Things (IoT) data. Load the model specific for eICU with input and output dimensions defined based on the dataset. For this proof of concept development, we used a logistic regression model to train and predict the mortality rate of patients with default configurations. The following code snippet shows the updates we made to the main_fedavg.py
script. Note that you can also use custom PyTorch models with FedML and import it into the main_fedavg.py
script.
Run and monitor FedML training on AWS
The following video shows the training process being initialized in each of the clients. After both the clients are listed for the server, create the server training process that performs federated aggregation of models.
To configure the FL server and clients, complete the following steps:
- Run Client 1 and Client 2.
To run a client, enter the following command with its corresponding node ID. For instance, to run Client 1 with node ID 1, run from the command line:
- After both the client instances are started, start the server instance using the same command and the appropriate node ID per your configuration in the
grpc_ipconfig.csv file
. You can see the model weights being passed to the server from the client instances.
- We train FL model for 50 epochs. As you can see in the below video, the weights are transferred between nodes 0, 1, and 2, indicating the training is progressing as expected in a federated manner.
- Finally, monitor and track the FL model training progression across different nodes in the cluster using the weights and biases (wandb) tool, as shown in the following screenshot. Please follow the steps listed here to install wandb and setup monitoring for this solution.
The following video captures all these steps to provide an end-to-end demonstration of FL on AWS using FedML:
Conclusion
In this post, we showed how you can deploy an FL framework, based on open-source FedML, on AWS. It allows you to train an ML model on distributed data, without the need to share or move it. We set up a multi-account architecture, where in a real-world scenario, hospitals or healthcare organizations can join the ecosystem to benefit from collaborative learning while maintaining data governance. We used the multi-hospital eICU dataset to test this deployment. This framework can also be applied to other use cases and domains. We will continue to extend this work by automating deployment through infrastructure as code (using AWS CloudFormation), further incorporating privacy-preserving mechanisms, and improving interpretability and fairness of the FL models.
Please review the presentation at re:MARS 2022 focused on “Managed Federated Learning on AWS: A case study for healthcare” for a detailed walkthrough of this solution.
Reference
[1] Pollard, Tom J., et al. “The eICU Collaborative Research Database, a freely available multi-center database for critical care research.” Scientific data 5.1 (2018): 1-13. [2] Yin, X., Zhu, Y. and Hu, J., 2021. A comprehensive survey of privacy-preserving federated learning: A taxonomy, review, and future directions. ACM Computing Surveys (CSUR), 54(6), pp.1-36. [3] Sheikhalishahi, Seyedmostafa, Vevake Balaraman, and Venet Osmani. “Benchmarking machine learning models on multi-centre eICU critical care dataset.” Plos one 15.7 (2020): e0235424.About the Authors
Vidya Sagar Ravipati is a Manager at the Amazon ML Solutions Lab, where he leverages his vast experience in large-scale distributed systems and his passion for machine learning to help AWS customers across different industry verticals accelerate their AI and cloud adoption. Previously, he was a Machine Learning Engineer in Connectivity Services at Amazon who helped to build personalization and predictive maintenance platforms.
Olivia Choudhury, PhD, is a Senior Partner Solutions Architect at AWS. She helps partners, in the Healthcare and Life Sciences domain, design, develop, and scale state-of-the-art solutions leveraging AWS. She has a background in genomics, healthcare analytics, federated learning, and privacy-preserving machine learning. Outside of work, she plays board games, paints landscapes, and collects manga.
Wajahat Aziz is a Principal Machine Learning and HPC Solutions Architect at AWS, where he focuses on helping healthcare and life sciences customers leverage AWS technologies for developing state-of-the-art ML and HPC solutions for a wide variety of use cases such as Drug Development, Clinical Trials, and Privacy Preserving Machine Learning. Outside of work, Wajahat likes to explore nature, hiking, and reading.
Divya Bhargavi is a Data Scientist and Media and Entertainment Vertical Lead at the Amazon ML Solutions Lab, where she solves high-value business problems for AWS customers using Machine Learning. She works on image/video understanding, knowledge graph recommendation systems, predictive advertising use cases.
Ujjwal Ratan is the leader for AI/ML and Data Science in the AWS Healthcare and Life Science Business Unit and is also a Principal AI/ML Solutions Architect. Over the years, Ujjwal has been a thought leader in the healthcare and life sciences industry, helping multiple Global Fortune 500 organizations achieve their innovation goals by adopting machine learning. His work involving the analysis of medical imaging, unstructured clinical text and genomics has helped AWS build products and services that provide highly personalized and precisely targeted diagnostics and therapeutics. In his free time, he enjoys listening to (and playing) music and taking unplanned road trips with his family.
Chaoyang He is Co-founder and CTO of FedML, Inc., a startup running for a community building open and collaborative AI from anywhere at any scale. His research focuses on distributed/federated machine learning algorithms, systems, and applications. He received his Ph.D. in Computer Science from the University of Southern California, Los Angeles, USA.
Salman Avestimehr is Co-founder and CEO of FedML, Inc., a startup running for a community building open and collaborative AI from anywhere at any scale. Salman Avestimehr is a world-renowned expert in federated learning with over 20 years of R&D leadership in both academia and industry. He is a Dean’s Professor and the inaugural director of the USC-Amazon Center on Trustworthy Machine Learning at the University of Southern California. He has also been an Amazon Scholar in Amazon. He is a United States Presidential award winner for his profound contributions in information technology, and a Fellow of IEEE.