Explain text classification model predictions using Amazon SageMaker Clarify

Explain text classification model predictions using Amazon SageMaker Clarify

Model explainability refers to the process of relating the prediction of a machine learning (ML) model to the input feature values of an instance in humanly understandable terms. This field is often referred to as explainable artificial intelligence (XAI). Amazon SageMaker Clarify is a feature of Amazon SageMaker that enables data scientists and ML engineers to explain the predictions of their ML models. It uses model agnostic methods like SHapely Additive exPlanations (SHAP) for feature attribution. Apart from supporting explanations for tabular data, Clarify also supports explainability for both computer vision (CV) and natural language processing (NLP) using the same SHAP algorithm.

In this post, we illustrate the use of Clarify for explaining NLP models. Specifically, we show how you can explain the predictions of a text classification model that has been trained using the SageMaker BlazingText algorithm. This helps you understand which parts or words of the text are most important for the predictions made by the model. Among other things, these observations can then be used to improve various processes like data acquisition that reduces bias in the dataset and model validation to ensure that models are performing as intended, and earn trust with all stakeholders when the model is deployed. This can be a key requirement in many application domains like sentiment analysis, legal reviews, medical diagnosis, and more.

We also provide a general design pattern that you can use while using Clarify with any of the SageMaker algorithms.

Solution overview

SageMaker algorithms have fixed input and output data formats. For example, the BlazingText algorithm container accepts inputs in JSON format. But customers often require specific formats that are compatible with their data pipelines. We present a couple of options that you can follow to use Clarify.

Option A

In this option, we use the inference pipeline feature of SageMaker hosting. An inference pipeline is a SageMaker model that constitutes a sequence of containers that processes inference requests. The following diagram illustrates an example.

Clarify job invokes inference pipeline with one container handling the format of data and the other container holding the model.

You can use inference pipelines to deploy a combination of your own custom models and SageMaker built-in algorithms packaged in different containers. For more information, refer to Hosting models along with pre-processing logic as serial inference pipeline behind one endpoint. Because Clarify supports only CSV and JSON Lines as input, you need to complete the following steps:

  1. Create a model and a container to convert the data from CSV (or JSON Lines) to JSON.
  2. After the model training step with the BlazingText algorithm, directly deploy the model. This will deploy the model using the BlazingText container, which accepts JSON as input. When using a different algorithm, SageMaker creates the model using that algorithm’s container.
  3. Use the preceding two models to create a PipelineModel. This chains the two models in a linear sequence and creates a single model. For an example, refer to Inference pipeline with Scikit-learn and Linear Learner.

With this solution, we have successfully created a single model whose input is compatible with Clarify and can be used by it to generate explanations.

Option B

This option demonstrates how you can integrate the use of different data formats between Clarify and SageMaker algorithms by bringing your own container for hosting the SageMaker model. The following diagram illustrates the architecture and the steps that are involved in the solution:

The steps are as follows:

  1. Use the BlazingText algorithm via the SageMaker Estimator to train a text classification model.
  2. After the model is trained, create a custom Docker container that can be used to create a SageMaker model and optionally deploy the model as a SageMaker model endpoint.
  3. Configure and create a Clarify job to use the hosting container for generating an explainability report.
  4. The custom container accepts the inference request as a CSV and enables Clarify to generate explanations.

It should be noted that this solution demonstrates the idea of obtaining offline explanations using Clarify for a BlazingText model. For more information about online explainability, refer to Online Explainability with SageMaker Clarify.

The rest of this post explains each of the steps in the second option.

Train a BlazingText model

We first train a text classification model using the BlazingText algorithm. In this example, we use the DBpedia Ontology dataset. DBpedia is a crowd-sourced initiative to extract structured content using information from various Wikimedia projects like Wikipedia. Specifically, we use the DBpedia ontology dataset as created by Zhang et al. It is constructed by selecting 14 non-overlapping classes from DBpedia 2014. The fields contain an abstract of a Wikipedia article and the corresponding class. The goal of a text classification model is to predict the class of an article given its abstract.

A detailed step-by-step process for training the model is available in the following notebook. After you have trained the model, take note of the Amazon Simple Storage Service (Amazon S3) URI path where the model artifacts are stored. For a step-by-step guide, refer to Text Classification using SageMaker BlazingText.

Deploy the trained BlazingText model using your own container on SageMaker

With Clarify, there are two options to provide the model information:

  • Create a SageMaker model without deploying it to an endpoint – When a SageMaker model is provided to Clarify, it creates an ephemeral endpoint using the model.
  • Create a SageMaker model and deploy it to an endpoint – When an endpoint is made available to Clarify, it uses the endpoint for obtaining explanations. This avoids the creation of an ephemeral endpoint and can reduce the runtime of a Clarify job.

In this post, we use the first option with Clarify. We use the SageMaker Python SDK for this purpose. For other options and more details, refer to Create your endpoint and deploy your model.

Bring your own container (BYOC)

We first build a custom Docker image that is used to create the SageMaker model. You can use the files and code in the source directory of our GitHub repository.

The Dockerfile describes the image we want to build. We start with a standard Ubuntu installation and then install Scikit-learn. We also clone fasttext and install the package. It’s used to load the BlazingText model for making predictions. Finally, we add the code that implements our algorithm in the form of the preceding files and set up the environment in the container. The entire Dockerfile is provided in our repository and you can use it as it is. Refer to Use Your Own Inference Code with Hosting Services for more details on how SageMaker interacts with your Docker container and its requirements.

Furthermore, predictor.py contains the code for loading the model and making the predictions. It accepts input data as a CSV, which makes it compatible with Clarify.

After you have the Dockerfile, build the Docker container and upload it to Amazon Elastic Container Registry (Amazon ECR). You can find the step-by-step process in the form of a shell script in our GitHub repository, which you can use to create and upload the Docker image to Amazon ECR.

Create the BlazingText model

The next step is to create a model object from the SageMaker Python SDK Model class that can be deployed to an HTTPS endpoint. We configure Clarify to use this model for generating explanations. For the code and other requirements for this step, refer to Deploy your trained SageMaker BlazingText Model using your own container in Amazon SageMaker.

Configure Clarify

Clarify NLP is compatible with regression and classification models. It helps you understand which parts of the input text influence the predictions of your model. Clarify supports 62 languages and can handle text with multiple languages. We use the SageMaker Python SDK to define the three configurations that are used by Clarify for creating the explainability report.

First, we need to create the processor object and also specify the location of the input dataset that will be used for the predictions and the feature attribution:

import sagemaker
sagemaker_session = sagemaker.Session()
from sagemaker import clarify
clarify_processor = clarify.SageMakerClarifyProcessor(
role=role,
instance_count=1,
instance_type="ml.m5.xlarge",
sagemaker_session=sagemaker_session,
)
file_path = "<location of the input dataset>"

DataConfig

Here, you should configure the location of the input data, the feature column, and where you want the Clarify job to store the output. This is done by passing the relevant arguments while creating a DataConfig object:

explainability_output_path = "s3://{}/{}/clarify-text-explainability".format(
sagemaker_session.default_bucket(), "explainability"
)

explainability_data_config = clarify.DataConfig(
s3_data_input_path=file_path,
s3_output_path=explainability_output_path,
headers=["Review Text"],
dataset_type="text/csv",
)

ModelConfig

With ModelConfig, you should specify information about your trained model. Here, we specify the name of the BlazingText SageMaker model that we created in a prior step and also set other parameters like the Amazon Elastic Compute Cloud (Amazon EC2) instance type and the format of the content:

model_config = clarify.ModelConfig(
model_name=model_name,
instance_type="ml.m5.xlarge",
instance_count=1,
accept_type="application/jsonlines",
content_type="text/csv",
endpoint_name_prefix=None,
)

SHAPConfig

This is used to inform Clarify about how to obtain the feature attributions. TextConfig is used to specify the granularity of the text and the language. In our dataset, because we want to break down the input text into words and the language is English, we set these values to token and English, respectively. Depending on the nature of your dataset, you can set granularity to sentence or paragraph. The baseline is set to a special token. This means that Clarify will drop subsets of the input text and replace them with values from the baseline while obtaining predictions for computing the SHAP values. This is how it determines the effect of the tokens on the model’s predictions and in turn identifies their importance. The number of samples that are to be used in the Kernel SHAP algorithm is determined by the value of the num_samples argument. Higher values result in more robust feature attributions, but that can also increase the runtime of the job. Therefore, you need to make a trade-off between the two. See the following code:

shap_config = clarify.SHAPConfig(
baseline=[["<UNK>"]],
num_samples=1000,
agg_method="mean_abs",
save_local_shap_values=True,
text_config=clarify.TextConfig(granularity="token", language="english"),
)

For more information, see Feature Attributions that Use Shapley Values and Amazon AI Fairness and Explainability Whitepaper.

ModelPredictedLabelConfig

For Clarify to extract a predicted label or predicted scores or probabilities, this config object needs to be set. See the following code:

from sagemaker.clarify import ModelPredictedLabelConfig
modellabel_config = ModelPredictedLabelConfig(probability="prob", label="label")

For more details, refer to the documentation in the SDK.

Run a Clarify job

After you create the different configurations, you’re now ready to trigger the Clarify processing job. The processing job validates the input and parameters, creates the ephemeral endpoint, and computes local and global feature attributions using the SHAP algorithm. When that’s complete, it deletes the ephemeral endpoint and generates the output files. See the following code:

clarify_processor.run_explainability(
data_config=explainability_data_config,
model_config=model_config,
explainability_config=shap_config,
model_scores=modellabel_config,
)

The runtime of this step depends on the size of the dataset and the number of samples that are generated by SHAP.

Visualize the results

Finally, we show a visualization of the results from the local feature attribution report that was generated by the Clarify processing job. The output is in a JSON Lines format and with some processing; you can plot the scores for the tokens in the input text like the following example. Higher bars have more impact on the target label. Furthermore, positive values are associated with higher predictions in the target variable and negative values with lower predictions. In this example, the model makes a prediction for the input text “Wesebach is a river of Hesse Germany.” The predicted class is Natural Place and the scores indicate that the model found the word “river” to be the most informative to make this prediction. This is intuitive for a human and by examining more samples, you can determine if the model is learning the right features and behaving as expected.

Conclusion

In this post, we explained how you can use Clarify to explain predictions from a text classification model that was trained using SageMaker BlazingText. Get started with explaining predictions from your text classification models using the sample notebook Text Explainability for SageMaker BlazingText.

We also discussed a more generic design pattern that you can use when using Clarify with SageMaker built-in algorithms. For more information, refer to What Is Fairness and Model Explainability for Machine Learning Predictions. We also encourage you to read the Amazon AI Fairness and Explainability Whitepaper, which provides an overview on the topic and discusses best practices and limitations.


About the Authors

Pinak Panigrahi works with customers to build machine learning driven solutions to solve strategic business problems on AWS. When not occupied with machine learning, he can be found taking a hike, reading a book or catching up with sports.

Dhawal Patel is a Principal Machine Learning Architect at AWS. He has worked with organizations ranging from large enterprises to mid-sized startups on problems related to distributed computing, and Artificial Intelligence. He focuses on Deep learning including NLP and Computer Vision domains. He helps customers achieve high performance model inference on SageMaker.

Read More

Upscale images with Stable Diffusion in Amazon SageMaker JumpStart

Upscale images with Stable Diffusion in Amazon SageMaker JumpStart

In November 2022, we announced that AWS customers can generate images from text with Stable Diffusion models in Amazon SageMaker JumpStart. Today, we announce a new feature that lets you upscale images (resize images without losing quality) with Stable Diffusion models in JumpStart. An image that is low resolution, blurry, and pixelated can be converted into a high-resolution image that appears smoother, clearer, and more detailed. This process, called upscaling, can be applied to both real images and images generated by text-to-image Stable Diffusion models. This can be used to enhance image quality in various industries such as ecommerce and real estate, as well as for artists and photographers. Additionally, upscaling can improve the visual quality of low-resolution images when displayed on high-resolution screens.

Stable Diffusion uses an AI algorithm to upscale images, eliminating the need for manual work that may require manually filling gaps in an image. It has been trained on millions of images and can accurately predict high-resolution images, resulting in a significant increase in detail compared to traditional image upscalers. Additionally, unlike non-deep-learning techniques such as nearest neighbor, Stable Diffusion takes into account the context of the image, using a textual prompt to guide the upscaling process.

In this post, we provide an overview of how to deploy and run inference with the Stable Diffusion upscaler model in two ways: via JumpStart’s user interface (UI) in Amazon SageMaker Studio, and programmatically through JumpStart APIs available in the SageMaker Python SDK.

Solution overview

The following images show examples of upscaling performed by the model. On the left is the original low-resolution image enlarged to match the size of the image generated by the model. On the right is the image generated by the model.

The first generated image is the result of low resolution cat image and the prompt “a white cat.”

The second generated image is the result of low resolution butterfly image and the prompt “a butterfly on a green leaf.”

Running large models like Stable Diffusion requires custom inference scripts. You have to run end-to-end tests to make sure that the script, the model, and the desired instance work together efficiently. JumpStart simplifies this process by providing ready-to-use scripts that have been robustly tested. You can access these scripts with one click through the Studio UI or with very few lines of code through the JumpStart APIs.

The following sections provide an overview of how to deploy the model and run inference using either the Studio UI or the JumpStart APIs.

Note that by using this model, you agree to the CreativeML Open RAIL++-M License.

Access JumpStart through the Studio UI

In this section, we demonstrate how to train and deploy JumpStart models through the Studio UI. The following video shows how to find the pre-trained Stable Diffusion upscaler model on JumpStart and deploy it. The model page contains valuable information about the model and how to use it. For inference, we use the ml.p3.2xlarge instance type because it provides the GPU acceleration needed for low-inference latency at a low price point. After you configure the SageMaker hosting instance, choose Deploy. It will take 5–10 minutes until the endpoint is up and running and ready to respond to inference requests.

Video: stable diffusion upscaling.mov

To accelerate the time to inference, JumpStart provides a sample notebook that shows how to run inference on the newly created endpoint. To access the notebook in Studio, choose Open Notebook in the Use Endpoint from Studio section of the model endpoint page.

Use JumpStart programmatically with the SageMaker SDK

You can use the JumpStart UI to deploy a pre-trained model interactively in just a few clicks. However, you can also use JumpStart models programmatically by using APIs that are integrated into the SageMaker Python SDK.

In this section, we choose an appropriate pre-trained model in JumpStart, deploy this model to a SageMaker endpoint, and run inference on the deployed endpoint, all using the SageMaker Python SDK. The following examples contain code snippets. For the full code with all of the steps in this demo, see the Introduction to JumpStart – Enhance image quality guided by prompt example notebook.

Deploy the pre-trained model

SageMaker utilizes Docker containers for various build and runtime tasks. JumpStart utilizes the SageMaker Deep Learning Containers (DLCs) that are framework-specific. We first fetch any additional packages, as well as scripts to handle training and inference for the selected task. Then the pre-trained model artifacts are separately fetched with model_uris, which provides flexibility to the platform. This allows multiple pre-trained models to be used with a single inference script. The following code illustrates this process:

model_id, model_version = "model-upscaling-stabilityai-stable-diffusion-x4-upscaler-fp16", "*"
# Retrieve the inference docker container uri
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)
# Retrieve the inference script uri
deploy_source_uri = script_uris.retrieve(model_id=model_id, model_version=model_version, script_scope="inference")

base_model_uri = model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope="inference")

Next, we provide those resources into a SageMaker model instance and deploy an endpoint:

# Create the SageMaker model instance
model = Model(
    image_uri=deploy_image_uri,
    source_dir=deploy_source_uri,
    model_data=base_model_uri,
    entry_point="inference.py",  # entry point file in source_dir and present in deploy_source_uri
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name,
)

# deploy the Model - note that we need to pass the Predictor class when we deploy the model through the Model class,
# in order to run inference through the SageMaker API
base_model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

After our model is deployed, we can get predictions from it in real time!

Input format

The endpoint accepts a low-resolution image as raw RGB values or a base64 encoded image. The inference handler decodes the image based on content_type:

  • For content_type = “application/json”, the input payload must be a JSON dictionary with the raw RGB values, a textual prompt, and other optional parameters
  • For content_type = “application/json;jpeg”, the input payload must be a JSON dictionary with the base64 encoded image, a textual prompt, and other optional parameters

Output format

The following code examples give you a glimpse of what the outputs look like. Similarly to the input format, the endpoint can respond with the raw RGB values of the image or a base64 encoded image. This can be specified by setting accept to one of the two values:

  • For accept = “application/json”, the endpoint returns the a JSON dictionary with RGB values for the image
  • For accept = “application/json;jpeg”, the endpoint returns a JSON dictionary with the JPEG image as bytes encoded with base64.b64 encoding

Note that sending or receiving the payload with the raw RGB values may hit default limits for the input payload and the response size. Therefore, we recommend using the base64 encoded image by setting content_type = “application/json;jpeg” and accept = “application/json;jpeg”.

The following code is an example inference request:

content_type = “application/json;jpeg” 

# We recommend rescaling the image of low_resolution_image such that both height and width are powers of 2.
# This can be achieved by original_image = Image.open('low_res_image.jpg'); rescaled_image = original_image.rescale((128,128)); rescaled_image.save('rescaled_image.jpg')
with open(low_res_img_file_name,'rb') as f: low_res_image_bytes = f.read()

encoded_image = base64.b64encode(bytearray(low_res_image_bytes)).decode()

payload = { "prompt": "a cat", "image": encoded_image,  "num_inference_steps":50, "guidance_scale":7.5}

accept = "application/json;jpeg"

def query(model_predictor, payload, content_type, accept):
    """Query the model predictor."""
    query_response = model_predictor.predict(
        payload,
        {
            "ContentType": content_type,
            "Accept": accept,
        },
    )
    return query_response

The endpoint response is a JSON object containing the generated images and the prompt:

def parse_response(query_response):
"""Parse response and return the generated images and prompt."""

    response_dict = json.loads(query_response)
    return response_dict["generated_images"], response_dict["prompt"]
    
query_response = query(model_predictor, json.dumps(payload).encode('utf-8'), content_type, accept)
generated_images, prompt = parse_response(query_response)

Supported parameters

Stable Diffusion upscaling models support many parameters for image generation:

  • image – A low resolution image.
  • prompt – A prompt to guide the image generation. It can be a string or a list of strings.
  • num_inference_steps (optional) – The number of denoising steps during image generation. More steps lead to higher quality image. If specified, it must a positive integer. Note that more inference steps will lead to a longer response time.
  • guidance_scale (optional) – A higher guidance scale results in an image more closely related to the prompt, at the expense of image quality. If specified, it must be a float. guidance_scale<=1 is ignored.
  • negative_prompt (optional) – This guides the image generation against this prompt. If specified, it must be a string or a list of strings and used with guidance_scale. If guidance_scale is disabled, this is also disabled. Moreover, if the prompt is a list of strings, then the negative_prompt must also be a list of strings.
  • seed (optional) – This fixes the randomized state for reproducibility. If specified, it must be an integer. Whenever you use the same prompt with the same seed, the resulting image will always be the same.
  • noise_level (optional) – This adds noise to latent vectors before upscaling. If specified, it must be an integer.

You can recursively upscale an image by invoking the endpoint repeatedly to get higher and higher quality images.

Image size and instance types

Images generated by the model can be up to four times the size of the original low-resolution image. Furthermore, the model’s memory requirement (GPU memory) grows with the size of the generated image. Therefore, if you’re upscaling an already high-resolution image or are recursively upscaling images, select an instance type with a large GPU memory. For instance, ml.g5.2xlarge has more GPU memory than the ml.p3.2xlarge instance type we used earlier. For more information on different instance types, refer to Amazon EC2 Instance Types.

Upscaling images piece by piece

To decrease memory requirements when upscaling large images, you can break the image into smaller sections, known as tiles, and upscale each tile individually. After the tiles have been upscaled, they can be blended together to create the final image. This method requires adapting the prompt for each tile so the model can understand the content of the tile and avoid creating strange images. The style part of the prompt should remain consistent for all tiles to make blending easier. When using higher denoising settings, it’s important to be more specific in the prompt because the model has more freedom to adapt the image. This can be challenging when the tile contains only background or isn’t directly related to the main content of the picture.

Limitations and bias

Even though Stable Diffusion has impressive performance in upscaling, it suffers from several limitations and biases. These include but are not limited to:

  • The model may not generate accurate faces or limbs because the training data doesn’t include sufficient images with these features
  • The model was trained on the LAION-5B dataset, which has adult content and may not be fit for product use without further considerations
  • The model may not work well with non-English languages because the model was trained on English language text
  • The model can’t generate good text within images

For more information on limitations and bias, refer to the Stable Diffusion upscaler model card.

Clean up

After you’re done running the notebook, make sure to delete all resources created in the process to ensure that the billing is stopped. The code to clean up the endpoint is available in the associated notebook.

Conclusion

In this post, we showed how to deploy a pre-trained Stable Diffusion upscaler model using JumpStart. We showed code snippets in this post—the full code with all of the steps in this demo is available in the Introduction to JumpStart – Enhance image quality guided by prompt example notebook. Try out the solution on your own and send us your comments.

To learn more about the model and how it works, see the following resources:

To learn more about JumpStart, check out the following blog posts:


About the Authors

Dr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois at Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences.

Heiko Hotz is a Senior Solutions Architect for AI & Machine Learning with a special focus on Natural Language Processing (NLP), Large Language Models (LLMs), and Generative AI. Prior to this role, he was the Head of Data Science for Amazon’s EU Customer Service. Heiko helps our customers being successful in their AI/ML journey on AWS and has worked with organizations in many industries, including Insurance, Financial Services, Media and Entertainment, Healthcare, Utilities, and Manufacturing. In his spare time Heiko travels as much as possible.

Read More

Cohere brings language AI to Amazon SageMaker

Cohere brings language AI to Amazon SageMaker

This is a guest post by Sudip Roy, Manager of Technical Staff at Cohere.

It’s an exciting day for the development community. Cohere’s state-of-the-art language AI is now available through Amazon SageMaker. This makes it easier for developers to deploy Cohere’s pre-trained generation language model to Amazon SageMaker, an end-to-end machine learning (ML) service. Developers, data scientists, and business analysts use Amazon SageMaker to build, train, and deploy ML models quickly and easily using its fully managed infrastructure, tools, and workflows.

At Cohere, the focus is on language. The company’s mission is to enable developers and businesses to add language AI to their technology stack and build game-changing applications with it. Cohere helps developers and businesses automate a wide range of tasks, such as copywriting, named entity recognition, paraphrasing, text summarization, and classification. The company builds and continually improves its general-purpose large language models (LLMs), making them accessible via a simple-to-use platform. Companies can use the models out of the box or tailor them to their particular needs using their own custom data.

Developers using SageMaker will have access to Cohere’s Medium generation language model. The Medium generation model excels at tasks that require fast responses, such as question answering, copywriting, or paraphrasing. The Medium model is deployed in containers that enable low-latency inference on a diverse set of hardware accelerators available on AWS, providing different cost and performance advantages for SageMaker customers.

“Amazon SageMaker provides the broadest and most comprehensive set of services that eliminate heavy lifting from each step of the machine learning process. We’re excited to offer Cohere’s general purpose large language model with Amazon SageMaker. Our joint customers can now leverage the broad range of Amazon SageMaker services and integrate Cohere’s model with their applications for accelerated time-to-value and faster innovation.”

-Rajneesh Singh, General Manager AI/ML at Amazon Web Services.

“As Cohere continues to push the boundaries of language AI, we are excited to join forces with Amazon SageMaker. This partnership will allow us to bring our advanced technology and innovative approach to an even wider audience, empowering developers and organizations around the world to harness the power of language AI and stay ahead of the curve in an increasingly competitive market.”

-Saurabh Baji, Senior Vice President of Engineering at Cohere.

The Cohere Medium generation language model available through SageMaker, provide developers with three key benefits:

  • Build, iterate, and deploy quickly – Cohere empowers any developer (no NLP, ML, or AI expertise required) to quickly get access to a pre-trained, state-of-the-art generation model that understands context and semantics at unprecedented levels. This high-quality, large language model reduces the time-to-value for customers by providing an out-of-the-box solution for a wide range of language understanding tasks.
  • Private and secure – With SageMaker, customers can spin up containers serving Cohere’s models without having to worry about their data leaving these self-managed containers.
  • Speed and accuracy Cohere’s Medium model offers customers a good balance across quality, cost, and latency. Developers can easily integrate the Cohere Generate endpoint into apps using a simple API and SDK.

Get started with Cohere in SageMaker

Developers can use the visual interface of the SageMaker JumpStart foundation models to test Cohere’s models without writing a single line of code. You can evaluate the model on your specific language understanding task and learn the basics of using generative language models. See Cohere’s documentation and blog for various tutorials and tips-and-tricks related to language modeling.

Deploy the SageMaker endpoint using a notebook

Cohere has packaged Medium models, along with an optimized, low-latency inference framework, in containers that can be deployed as SageMaker inference endpoints. Cohere’s containers can be deployed on a range of different instances (including ml.p3.2xlarge, ml.g5.xlarge, and ml.g5.2xlarge) that offer different cost/performance trade-offs. These containers are currently available in two Regions: us-east-1 and eu-west-1. Cohere intends to expand its offering in the near future, including adding to the number and size of models available, the set of supported tasks (such as the endpoints built on top of these models), the supported instances, and the available Regions.

To help developers get started quickly, Cohere has provided Jupyter notebooks that make it easy to deploy these containers and run inference on the deployed endpoints. With the preconfigured set of constants in the notebook, deploying the endpoint can be easily done with only a couple of lines of code as shown in the following example:

After the endpoint is deployed, users can use Cohere’s SDK to run inference. The SDK can be installed easily from PyPI as follows:

It can also be installed from the source code in Cohere’s public SDK GitHub repository.

After the endpoint is deployed, users can use the Cohere Generate endpoint to accomplish multiple generative tasks, such as text summarization, long-form content generation, entity extraction, or copywriting. The Jupyter notebook and GitHub repository include examples demonstrating some of these use cases.

Conclusion

The availability of Cohere natively on SageMaker via the AWS Marketplace represents a major milestone in the field of NLP. The Cohere model’s ability to generate high-quality, coherent text makes it a valuable tool for anyone working with text data.

If you’re interested in using Cohere for your own SageMaker projects, you can now access it on SageMaker JumpStart. Additionally, you can reference Cohere’s GitHub notebook for instructions on deploying the model and accessing it from the Cohere Generate endpoint.


About the authors

Sudip Roy is Manager of Technical Staff at Cohere, a provider of cutting-edge natural language processing (NLP) technology. Sudip is an accomplished researcher who has published and served on program committees for top conferences like NeurIPS, MLSys, OOPSLA, SIGMOD, VLDB, and SIGKDD, and his work has earned Outstanding Paper awards from SIGMOD and MLSys.

Karthik Bharathy is the product leader for the Amazon SageMaker team with over a decade of product management, product strategy, execution, and launch experience.

Karl Albertsen leads product, engineering, and science for Amazon SageMaker Algorithms and JumpStart, SageMaker’s machine learning hub. He is passionate about applying machine learning to unlock business value.

Read More

­­How CCC Intelligent Solutions created a custom approach for hosting complex AI models using Amazon SageMaker

­­How CCC Intelligent Solutions created a custom approach for hosting complex AI models using Amazon SageMaker

This post is co-written by Christopher Diaz, Sam Kinard, Jaime Hidalgo and Daniel Suarez  from CCC Intelligent Solutions.

In this post, we discuss how CCC Intelligent Solutions (CCC) combined Amazon SageMaker with other AWS services to create a custom solution capable of hosting the types of complex artificial intelligence (AI) models envisioned. CCC is a leading software-as-a-service (SaaS) platform for the multi-trillion-dollar property and casualty insurance economy powering operations for insurers, repairers, automakers, part suppliers, lenders, and more. CCC cloud technology connects more than 30,000 businesses digitizing mission-critical workflows, commerce, and customer experiences. A trusted leader in AI, Internet of Things (IoT), customer experience, and network and workflow management, CCC delivers innovations that keep people’s lives moving forward when it matters most.

The challenge

CCC processes more than $1 trillion claims transactions annually. As the company continues to evolve to integrate AI into its existing and new product catalog, this requires sophisticated approaches to train and deploy multi-modal machine learning (ML) ensemble models for solving complex business needs. These are a class of models that encapsulate proprietary algorithms and subject matter domain expertise that CCC has honed over the years. These models should be able to ingest new layers of nuanced data and customer rules to create single prediction outcomes. In this blog post, we will learn how CCC leveraged Amazon SageMaker hosting and other AWS services to deploy or host multiple multi-modal models into an ensemble inference pipeline.

As shown in the following diagram, an ensemble is a collection of two or more models that are orchestrated to run in a linear or nonlinear fashion to produce a single prediction. When stacked linearly, the individual models of an ensemble can be directly invoked for predictions and later consolidated for unification. At times, ensemble models can also be implemented as a serial inference pipeline.

For our use case, the ensemble pipeline is strictly nonlinear, as depicted in the following diagram. Nonlinear ensemble pipelines are theoretically directly acyclic graphs (DAGs). For our use case, this DAG pipeline had both independent models that are run in parallel (Services B, C) and other models that use predictions from previous steps (Service D).

A practice that comes out of the research-driven culture at CCC is the continuous review of technologies that can be leveraged to bring more value to customers. As CCC faced this ensemble challenge, leadership launched a proof-of-concept (POC) initiative to thoroughly assess the offerings from AWS to discover, specifically, whether Amazon SageMaker and other AWS tools could manage the hosting of individual AI models in complex, nonlinear ensembles.

Ensemble explained: In this context, an ensemble is a group of 2 or more AI models that work together to produce 1 overall prediction.

Questions driving the research

Can Amazon SageMaker be used to host complex ensembles of AI models that work together to provide one overall prediction? If so, can SageMaker offer other benefits out of the box, such as increased automation, reliability, monitoring, automatic scaling, and cost-saving measures?

Finding alternative ways to deploy CCC’s AI models using the technological advancements from cloud providers will allow CCC to bring AI solutions to market faster than its competition. Additionally, having more than one deployment architecture provides flexibility when finding the balance between cost and performance based on business priorities.

Based on our requirements, we finalized the following list of features as a checklist for a production-grade deployment architecture:

  • Support for complex ensembles
  • Guaranteed uptime for all components
  • Customizable automatic scaling for deployed AI models
  • Preservation of AI model input and output
  • Usage metrics and logs for all components
  • Cost-saving mechanisms

With a majority of CCC’s AI solutions relying on computer vision models, a new architecture was required to support image and video files that continue to increase in resolution. There was a strong need to design and implement this architecture as an asynchronous model.

After cycles of research and initial benchmarking efforts, CCC determined SageMaker was a perfect fit to meet a majority of their production requirements, especially the guaranteed uptime SageMaker provides for most of its inference components. The default feature of Amazon SageMaker Asynchronous Inference endpoints saving input/output in Amazon S3 simplifies the task of preserving data generated from complex ensembles. Additionally, with each AI model being hosted by its own endpoint, managing automatic scaling policies at the model or endpoint level becomes easier. By simplifying the management, a potential cost-saving benefit from this is development teams can allocate more time towards fine-tuning scaling policies to minimize over-provisioning of compute resources.

Having decided to proceed with using SageMaker as the pivotal component of the architecture, we also realized SageMaker can be part of an even larger architecture, supplemented with many other serverless AWS-managed services. This choice was needed to facilitate the higher-order orchestration and observability needs of this complex architecture.

Firstly, to remove payload size limitations and greatly reduce timeout risk during high-traffic scenarios, CCC implemented an architecture that runs predictions asynchronously using SageMaker Asynchronous Inference endpoints coupled with other AWS-managed services as the core building blocks. Additionally, the user interface for the system follows the fire-and-forget design pattern. In other words, once a user has uploaded their input to the system, nothing more needs to be done. They will be notified when the prediction is available. The figure below illustrates a high-level overview of our asynchronous event-driven architecture. In the upcoming section, let us do a deep dive into the execution flow of the designed architecture.

Step-by-step solution

Step 1

A client makes a request to the AWS API Gateway endpoint. The content of the request contains the name of the AI service from which they need a prediction and the desired method of notification.

This request is passed to a Lambda function called New Prediction, whose main tasks are to:

  • Check if the requested service by the client is available.
  • Assign a unique prediction ID to the request. This prediction ID can be used by the user to check the status of the prediction throughout the entire process.
  • Generate an Amazon S3 pre-signed URL that the user will need to use in the next step to upload the input content of the prediction request.
  • Create an entry in Amazon DynamoDB with the information of the received request.

The Lambda function will then return a response through the API Gateway endpoint with a message that includes the prediction ID assigned to the request and the Amazon S3 pre-signed URL.

Step 2

The client securely uploads the prediction input content to an S3 bucket using the pre-signed URL generated in the previous step. Input content depends on the AI service and can be composed of images, tabular data, or a combination of both.

Step 3

The S3 bucket is configured to trigger an event when the user uploads the input content. This notification is sent to an Amazon SQS queue and handled by a Lambda function called Process Input. The Process Input Lambda will obtain the information related to that prediction ID from DynamoDB to get the name of the service to which the request is to be made.

This service can either be a single AI model, in which case the Process Input Lambda will make a request to the SageMaker endpoint that hosts that model (Step 3-A), or it can be an ensemble AI service in which case the Process Input Lambda will make a request to the state machine of the step functions that hosts the ensemble logic (Step 3-B).

In either option (single AI model or ensemble AI service), when the final prediction is ready, it will be stored in the appropriate S3 bucket, and the caller will be notified via the method specified in Step 1 (more details about notifications in Step 4).

Step 3-A

If the prediction ID is associated to a single AI model, the Process Input Lambda will make a request to the SageMaker endpoint that serves the model. In this system, two types of SageMaker endpoints are supported:

  • Asynchronous: The Process Input Lambda makes the request to the SageMaker asynchronous endpoint. The immediate response includes the S3 location where SageMaker will save the prediction output. This request is asynchronous, following the fire-and-forget pattern, and does not block the execution flow of the Lambda function.
  • Synchronous: The Process Input Lambda makes the request to the SageMaker synchronous endpoint. Since it is a synchronous request, Process Input waits for the response, and once obtained, it stores it in S3 in an analogous way that SageMaker asynchronous endpoints would do.

In both cases (synchronous or asynchronous endpoints), the prediction is processed in an equivalent way, storing the output in an S3 bucket. When the asynchronous SageMaker endpoint completes a prediction, an Amazon SNS event is triggered. This behavior is also replicated for synchronous endpoints with additional logic in the Lambda function.

Step 3-B

If the prediction ID is associated with an AI ensemble, the Process Input Lambda will make the request to the step function associated to that AI Ensemble. As mentioned above, an AI Ensemble is an architecture based on a group of AI models working together to generate a single overall prediction. The orchestration of an AI ensemble is done through a step function.

The step function has one step per AI service that comprises the ensemble. Each step will invoke a Lambda function that will prepare its corresponding AI service’s input using different combinations of the output content from previous AI service calls of previous steps. It then makes a call to each AI service which in this context, can wither be a single AI model or another AI ensemble.

The same Lambda function, called GetTransformCall used to handle the intermediate predictions of an AI Ensemble is used throughout the step function, but with different input parameters for each step. This input includes the name of the AI service to be called. It also includes the mapping definition to construct the input for the specified AI service. This is done using a custom syntax that the Lambda can decode, which in summary, is a JSON dictionary where the values should be replaced with the content from the previous AI predictions. The Lambda will download these previous predictions from Amazon S3.

In each step, the GetTransformCall Lambda reads from Amazon S3 the previous outputs that are needed to build the input of the specified AI service. It will then invoke the New Prediction Lambda code previously used in Step 1 and provide the service name, callback method (“step function”), and token needed for the callback in the request payload, which is then saved in DynamoDB as a new prediction record. The Lambda also stores the created input of that stage in an S3 bucket. Depending on whether that stage is a single AI model or an AI ensemble, the Lambda makes a request to a SageMaker endpoint or a different step function that manages an AI ensemble that is a dependency of the parent ensemble.

Once the request is made, the step function enters a pending state until it receives the callback token indicating it can move to the next stage. The action of sending a callback token is performed by a Lambda function called notifications (more details in Step 4) when the intermediate prediction is ready. This process is repeated for each stage defined in the step function until the final prediction is ready.

Step 4

When a prediction is ready and stored in the S3 bucket, an SNS notification is triggered. This event can be triggered in different ways depending on the flow:

  1. Automatically when a SageMaker asynchronous endpoint completes a prediction.
  2. As the very last step of the step function.
  3. By Process Input or GetTransformCall Lambda when a synchronous SageMaker endpoint has returned a prediction.

For B and C, we create an SNS message similar to what A automatically sends.

A Lambda function called notifications is subscribed to this SNS topic. The notifications Lambda will get the information related to the prediction ID from DynamoDB, update the entry with status value to “completed” or “error,” and perform the necessary action depending on the callback mode saved in the database record.

If this prediction is an intermediate prediction of an AI ensemble, as described in step 3-B, the callback mode associated to this prediction will be “step function,” and the database record will have a callback token associated with the specific step in the step function. The notifications Lambda will make a call to the AWS Step Functions API using the method “SendTaskSuccess” or “SendTaskFailure.” This will allow the step function to continue to the next step or exit.

If the prediction is the final output of the step function and the callback mode is “Webhook” [or email, message brokers (Kafka), etc.], then the notifications Lambda will notify the client in the specified way. At any point, the user can request the status of their prediction. The request must include the prediction ID that was assigned in Step 1 and point to the correct URL within API Gateway to route the request to the Lambda function called results.

The results Lambda will make a request to DynamoDB, obtaining the status of the request and returning the information to the user. If the status of the prediction is error, then the relevant details on the failure will be included in the response. If the prediction status is success, an S3 pre-signed URL will be returned for the user to download the prediction content.

Outcomes

Preliminary performance testing results are promising and support the case for CCC to extend the implementation of this new deployment architecture.

Notable observations:

  • Tests reveal strength in processing batch or concurrent requests with high throughput and a 0 percent failure rate during high traffic scenarios.
  • Message queues provide stability within the system during sudden influxes of requests until scaling triggers can provision additional compute resources. When increasing traffic by 3x, average request latency only increased by 5 percent.
  • The price of stability is increased latency due to the communication overhead between the various system components. When user traffic is above the baseline threshold, the added latency can be partially mitigated by providing more compute resources if performance is a higher priority over cost.
  • SageMaker’s asynchronous inference endpoints allow the instance count to be scaled to zero while keeping the endpoint active to receive requests. This functionality enables deployments to continue running without incurring compute costs and scale up from zero when needed in two scenarios: service deployments used in lower test environments and those that have minimal traffic without requiring immediate processing.

Conclusion

As observed during the POC process, the innovative design jointly created by CCC and AWS provides a solid foundation for using Amazon SageMaker with other AWS managed services to host complex multi-modal AI ensembles and orchestrate inference pipelines effectively and seamlessly. By leveraging Amazon SageMaker’s out-of-the-box functionalities like Asynchronous Inference, CCC has more opportunities to focus on specialized business-critical tasks. In the spirit of CCC’s research-driven culture, this novel architecture will continue to evolve as CCC leads the way forward, alongside AWS, in unleashing powerful new AI solutions for clients.­­­

For detailed steps on how to create, invoke, and monitor asynchronous inference endpoints, refer to the documentation, which also contains a sample notebook to help you get started. For pricing information, visit Amazon SageMaker Pricing.

For examples on using asynchronous inference with unstructured data such as computer vision and natural language processing (NLP), refer to Run computer vision inference on large videos with Amazon SageMaker asynchronous endpoints and Improve high-value research with Hugging Face and Amazon SageMaker asynchronous inference endpoints, respectively.


About the Authors

Christopher Diaz is a Lead R&D Engineer at CCC Intelligent Solutions. As a member of the R&D team, he has worked on a variety of projects ranging from ETL tooling, backend web development, collaborating with researchers to train AI models on distributed systems, and facilitating the delivery of new AI services between research and operations teams. His recent focus has been on researching cloud tooling solutions to enhance various aspects of the company’s AI model development lifecycle. In his spare time, he enjoys trying new restaurants in his hometown of Chicago and collecting as many LEGO sets as his home can fit. Christopher earned his Bachelor of Science in Computer Science from Northeastern Illinois University.

Emmy Award winner Sam Kinard is a Senior Manager of Software Engineering at CCC Intelligent Solutions. Based in Austin, Texas, he wrangles the AI Runtime Team, which is responsible for serving CCC’s AI products at high availability and large scale. In his spare time, Sam enjoys being sleep deprived because of his two wonderful children. Sam has a Bachelor of Science in Computer Science and a Bachelor of Science in Mathematics from the University of Texas at Austin.

Jaime Hidalgo is a Senior Systems Engineer at CCC Intelligent Solutions. Before joining the AI research team, he led the company’s global migration to Microservices Architecture, designing, building, and automating the infrastructure in AWS to support the deployment of cloud products and services. Currently, he builds and supports an on-premises data center cluster built for AI training and also designs and builds cloud solutions for the company’s future of AI research and deployment.

Daniel Suarez is a Data Science Engineer at CCC Intelligent Solutions. As a member of the AI Engineering team, he works on the automation and preparation of AI Models in the production, evaluation, and monitoring of metrics and other aspects of ML operations. Daniel received a Master’s in Computer Science from the Illinois Institute of Technology and a Master’s and Bachelor’s in Telecommunication Engineering from Universidad Politecnica de Madrid.

Arunprasath Shankar is a Senior AI/ML Specialist Solutions Architect with AWS, helping global customers scale their AI solutions effectively and efficiently in the cloud. In his spare time, Arun enjoys watching sci-fi movies and listening to classical music.

Justin McWhirter is a Solutions Architect Manager at AWS. He works with a team of amazing Solutions Architects who help customers have a positive experience while adopting the AWS platform. When not at work, Justin enjoys playing video games with his two boys, ice hockey, and off-roading in his Jeep.

Read More

Set up Amazon SageMaker Studio with Jupyter Lab 3 using the AWS CDK

Set up Amazon SageMaker Studio with Jupyter Lab 3 using the AWS CDK

Amazon SageMaker Studio is a fully integrated development environment (IDE) for machine learning (ML) partly based on JupyterLab 3. Studio provides a web-based interface to interactively perform ML development tasks required to prepare data and build, train, and deploy ML models. In Studio, you can load data, adjust ML models, move in between steps to adjust experiments, compare results, and deploy ML models for inference.

The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to create AWS CloudFormation stacks through automatic CloudFormation template generation. A stack is a collection of AWS resources, that can be programmatically updated, moved, or deleted. AWS CDK constructs are the building blocks of AWS CDK applications, representing the blueprint to define cloud architectures.

Setting up Studio with AWS CDK has become a streamlined process. The AWS CDK allows you to use native constructs to define and deploy Studio using infrastructure as code (IaC), including AWS Identity and Access Management (AWS IAM) permissions and desired cloud resource configurations, all in one place. This development approach can be used in combination with other common software engineering best practices such as automated code deployments, tests, and CI/CD pipelines. The AWS CDK reduces the time required to perform typical infrastructure deployment tasks while shrinking the surface area for human error through automation.

This post guides you through the steps to get started with setting up and deploying Studio to standardize ML model development and collaboration with fellow ML engineers and ML scientists. All examples in the post are written in the Python programming language. However, the AWS CDK offers built-in support for multiple other programming languages like JavaScript, Java and C#.

Prerequisites

To get started, the following prerequisites apply:

Clone the GitHub repository

First, let’s clone the GitHub repository.

When the repository is successfully pulled, you may inspect the cdk directory containing the following resources:

  • cdk – Contains the main cdk resources
  • app.py – Where the AWS CDK stack is defined
  • cdk.json – Contains metadata, and feature flags

AWS CDK scripts

The two main files we want to look at in the cdk subdirectory are sagemaker_studio_construct.py and sagemaker_studio_stack.py. Let’s look at each file in more detail.

Studio construct file

The Studio construct is defined in the sagemaker_studio_construct.py file.

The Studio construct takes in the virtual private cloud (VPC), listed users, AWS Region, and underlying default instance type as parameters. This AWS CDK construct serves the following functions:

  • Creates the Studio domain (SageMakerStudioDomain)
  • Sets the IAM role sagemaker_studio_execution_role with AmazonSageMakerFullAccess permissions required to create resources. Permissions need to be scoped down further to follow the least privilege principle for improved security.
  • Sets Jupyter server app settings – takes in JUPYTER_SERVER_APP_IMAGE_NAME, defining the jupyter-server-3 container image to be used.
  • Sets kernel gateway app settings  – takes in  KERNEL_GATEWAY_APP_IMAGE_NAME, defining the datascience-2.0 container image to be used.
  • Creates a user profile for each listed user

The following code snippet shows the relevant Studio domain AWS CloudFormation resources defined in AWS CDK:

sagemaker_studio_domain = sagemaker.CfnDomain(
self,
"SageMakerStudioDomain",
auth_mode="IAM",
default_user_settings=sagemaker.CfnDomain.UserSettingsProperty(
execution_role=self.sagemaker_studio_execution_role.role_arn,
jupyter_server_app_settings=sagemaker.CfnDomain.JupyterServerAppSettingsProperty(
default_resource_spec=sagemaker.CfnDomain.ResourceSpecProperty(
instance_type="system",
sage_maker_image_arn=get_sagemaker_image_arn(
JUPYTER_SERVER_APP_IMAGE_NAME, aws_region
),
)
),
kernel_gateway_app_settings=sagemaker.CfnDomain.KernelGatewayAppSettingsProperty(
default_resource_spec=sagemaker.CfnDomain.ResourceSpecProperty(
instance_type=default_instance_type,
sage_maker_image_arn=get_sagemaker_image_arn(
KERNEL_GATEWAY_APP_IMAGE_NAME, aws_region
),
),
),
security_groups=[vpc.vpc_default_security_group],
sharing_settings=sagemaker.CfnDomain.SharingSettingsProperty(
notebook_output_option="Disabled"
),
),
domain_name="SageMakerStudioDomain",
subnet_ids=private_subnets,
vpc_id=vpc.vpc_id,
app_network_access_type="VpcOnly",
)

The following code snippet shows the user profiles created from AWS CloudFormation resources:

for user_name in user_names: sagemaker.CfnUserProfile( self, "SageMakerStudioUserProfile_" + user_name,
 domain_id=sagemaker_studio_domain.attr_domain_id, user_profile_name=user_name, )

Studio stack file

class SagemakerStudioStack(Stack):
    def __init__(
        self,
        scope: Construct,
        construct_id: str,
        **kwargs,
    ) -> None:
        super().__init__(scope, construct_id, **kwargs)
        vpc = ec2.Vpc(self, "SageMakerStudioVpc")
        SageMakerStudio(self, "SageMakerStudio", vpc=vpc, aws_region=self.region)

After the construct has been defined, you can add it by creating an instance of the class and passing the required arguments inside of the stack. The stack creates the AWS CloudFormation resources as part of one coherent deployment. This means that if at least one cloud resource fails to be created, the CloudFormation stack rolls back any changes performed. The following code snippet of the Studio construct instantiates inside of the Studio stack:

Deploy the AWS CDK stack

To deploy your AWS CDK stack, run the following commands from the project’s root directory within your terminal window:

aws configure
pip3 install -r requirements.txt
cdk bootstrap --app "python3 -m cdk.app"
cdk deploy --app "python3 -m cdk.app"

Review the resources the AWS CDK creates in your AWS account and select yes when prompted to deploy the stack.  Wait for your stack deployment to finish.  This typically takes less than 5 minutes; however, adding more resources will prolong deployment time. You can also check the deployment status on the AWS CloudFormation console.

Stack creation in CloudFormation

When the stack has been successfully deployed, check its information by going to the Studio Control Panel.  You should see the SageMaker Studio user profile you created.

Default user profile listed

If you redeploy the stack it will check for changes, performing only the cloud resource updates necessary. For example, this can be used to add users, or change permissions of those users without having to recreate all of the defined cloud resources.

Cleanup

To delete a stack, complete the following steps:

  1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
  2. Open the stack you want to delete.
  3. In the stack details pane, choose Delete.
  4. Choose Delete stack when prompted.

AWS CloudFormation will delete the resources created when the stack was deployed.  This may take some time depending on the amount of resources created.

If you encounter any issues going through these cleanup steps, you may need to manually delete the Studio domain first before repeating the steps in this section.

Conclusion

In this post, we showed how to use AWS cloud-native IaC resources to build an easily reusable template for Studio deployments. SageMaker Studio is a fully integrated web-based IDE that provides a visual interface for ML development tasks based on JupyterLab3.  With AWS CDK stacks, we were able to define constructs for building out cloud components that can be easily modified, edited, or deleted by making changes to the underlying CloudFormation stack.

For more information about Amazon Studio, see Amazon SageMaker Studio.


About the Authors

Cory Hairston is a Software Engineer at the Amazon ML Solutions Lab. He is ardent about learning new technologies and leveraging that information to build reusable software solutions. He is an avid power-lifter and spends his free time making digital art.

Marcelo Aberle is an ML Engineer in the AWS AI organization. He is leading MLOps efforts at the Amazon ML Solutions Lab, helping customers design and implement scalable ML systems. His mission is to guide customers on their enterprise ML journey and accelerate their ML path to production.

Yash Shah is a Science Manager in the Amazon ML Solutions Lab. He and his team of applied scientists and machine learning engineers work on a range of machine learning use cases from healthcare, sports, automotive and manufacturing.

Read More

Churn prediction using multimodality of text and tabular features with Amazon SageMaker Jumpstart

Churn prediction using multimodality of text and tabular features with Amazon SageMaker Jumpstart

Amazon SageMaker JumpStart is the Machine Learning (ML) hub of SageMaker providing pre-trained, publicly available models for a wide range of problem types to help you get started with machine learning.

Understanding customer behavior is top of mind for every business today. Gaining insights into why and how customers buy can help grow revenue. Customer churn is a problem faced by a wide range of companies, from telecommunications to banking, where customers are typically lost to competitors. It’s in a company’s best interest to retain existing customers instead of acquiring new customers, because it usually costs significantly more to attract new customers. When trying to retain customers, companies often focus their efforts on customers who are more likely to leave. User behavior and customer support chat logs can contain valuable indicators on the likelihood of a customer ending the service. In this solution, we train and deploy a churn prediction model that uses a state-of-the-art natural language processing (NLP) model to find useful signals in text. In addition to textual inputs, this model uses traditional structured data inputs such as numerical and categorical fields.

Multimodality is a multi-disciplinary research field that addresses some of the original goals of artificial intelligence by integrating and modeling multiple modalities. This post aims to build a model that can process and relate information from multiple modalities such as tabular and textual features.

We show you how to train, deploy and use a churn prediction model that has processed numerical, categorical, and textual features to make its prediction. Although we dive deep into a churn prediction use case in this post, you can use this solution as a template to generalize fine-tuning pre-trained models with your own dataset, and subsequently run hyperparameter optimization (HPO) to improve accuracy. You can even replace the example dataset with your own and run it end to end to solve your own use cases. The solution outlined in the post is available on GitHub.

JumpStart solution templates

Amazon SageMaker JumpStart provides one-click, end-to-end solutions for many common ML use cases. Explore the following use cases for more information on available solution templates:

The JumpStart solution templates cover a variety of use cases, under each of which several different solution templates are offered (this Document Understanding solution is under the “Extract and analyze data from documents” use case).

Choose the solution template that best fits your use case from the JumpStart landing page. For more information on specific solutions under each use case and how to launch a JumpStart solution, see Solution Templates.

Solution overview

The following figure demonstrates how you can use this solution with Amazon SageMaker components. The SageMaker training jobs are used to train the various NLP models, and SageMaker endpoints are used to deploy the models in each stage. We use Amazon Simple Storage Service (Amazon S3) alongside SageMaker to store the training data and model artifacts, and Amazon CloudWatch to log training and endpoint outputs.

We approach solving the churn prediction problem with the following steps:

  1. Data exploration to prepare the data to be ML ready.
  2. Train a multimodal model with a Hugging Face sentence transformer and Scikit-learn random forest classifier.
  3. Further improve the model performance with HPO using SageMaker automatic model tuning.
  4. Train two AutoGluon multimodal models: an AutoGluon multimodal weighted/stacked ensemble model, and an AutoGluon multimodal fusion model.
  5. Evaluate and compare the model performances on the holdout test data.

Prerequisites

To try out the solution in your own account, make sure that you have the following in place:

  • An AWS account. If you don’t have an account, you can sign up for one.
  • The solution outlined in the post is part of SageMaker JumpStart. To run this JumpStart solution and have the infrastructure deploy to your AWS account, you must create an active Amazon SageMaker Studio instance (see Onboard to Amazon SageMaker Studio). When your Studio instance is ready, use the instructions in JumpStart to launch the solution.
  • When running this notebook on Studio, you should make sure the Python 3 (PyTorch 1.10 Python 3.8 CPU Optimized) image/kernel is used.

You can install the required packages as outlined in the solution to run this notebook:

Open the churn prediction use case

On the Studio console, choose Solutions, models, example notebooks under Quick start solutions in the navigation pane. Navigate to the Churn Prediction with Text solution in JumpStart.

Now we can take a closer look at some of the assets that are included in this solution.

Data exploration

First let’s download the test, validate, and train dataset from the source S3 bucket and upload it to our S3 bucket. The following screenshot shows us 10 observations of the training data.

Let’s begin exploring the train and validation dataset.

As you can see, we have different features such as CustServ Calls, Day Charge, and Day Calls that we use to predict the target column y (whether the customer left the service).

y is known as the target attribute: the attribute that we want the ML model to predict. Because the target attribute is binary, our model performs binary prediction, also known as binary classification.

There are 21 features, including the target variable. The number of examples for training and validation data are 43,000 and 5,000, respectively.

The following screenshot shows the summary statistics of the training dataset.

We have explored the dataset and split it into training, validation, and test sets. The training and validation set is used for training and HPO. The test set is used as the holdout set for model performance evaluation. We now carry out feature engineering steps and then fit the model.

Fit a multimodal model with a Hugging Face sentence transformer and Scikit-learn random forest classifier

The model training consists of two components: a feature engineering step that processes numerical, categorical, and text features, and a model fitting step that fits the transformed features into a Scikit-learn random forest classifier.

For the feature engineering, we complete the following steps:

  1. Fill in the missing values for numerical features.
  2. Encode categorical features into one-hot values, where the missing values are counted as one of the categories for each feature.
  3. Use a Hugging Face sentence transformer to encode the text feature to generate a X-dimensional dense vector, where the value of X depends on a particular sentence transformer.

We choose the top three most downloaded sentence transformer models and use them in the following model fitting and HPO. Specifically, we use all-MiniLM-L6-v2, multi-qa-mpnet-base-dot-v1, and paraphrase-MiniLM-L6-v2. For hyperparameters of the random forest classifier, refer to the GitHub repo.

The following figure depicts the model architecture diagram.

There are many hyperparameters you can tune, such as n-estimators, max-depth, and bootstrap. For more details, refer to the GitHub repo.

For demonstration purposes, we only use numerical features CustServ Calls and Account Length, categorical features plan, and limit, and text feature text to fit the model. Multiple features should be separated by ,.

hyperparameters = {
    "n-estimators": 50,
    "min-impurity-decrease": 0.0,
    "ccp-alpha": 0.0,   
    "sentence-transformer": "sentence-transformers/all-MiniLM-L6-v2",
    "criterion": "gini",
    "max-depth": 6,
    "boostrap": "True",
    "min-samples-split": 4,
    "min-samples-leaf": 1,
    "balanced-data": True,
    "numerical-feature-names": "CustServ Calls,Account Length",
    "categorical-feature-names": "plan,limit",
    "textual-feature-names": "text",
    "label-name": "y"
}
current_folder = utils.get_current_folder(globals())
estimator = PyTorch(
    framework_version='1.5.0',
    py_version='py3',
    entry_point='entry_point.py',
    source_dir=str(Path(current_folder, '../containers/huggingface_transformer_randomforest').resolve()),
    hyperparameters=hyperparameters,
    role=config.IAM_ROLE,
    instance_count=1,
    instance_type=config.TRAINING_INSTANCE_TYPE,
    output_path='s3://' + str(Path(config.S3_BUCKET, config.OUTPUTS_S3_PREFIX_RF)),
    code_location='s3://' + str(Path(config.S3_BUCKET, config.OUTPUTS_S3_PREFIX_RF)),
    base_job_name=config.SOLUTION_PREFIX,
    tags=[{'Key': config.TAG_KEY, 'Value': config.SOLUTION_PREFIX}],
    sagemaker_session=sagemaker_session,
    volume_size=30
)
estimator.fit({
    'train': 's3://' + str(Path(config.S3_BUCKET, config.DATASETS_S3_PREFIX, 'train.jsonl')),
    'validation': 's3://' + str(Path(config.S3_BUCKET, config.DATASETS_S3_PREFIX, 'validation.jsonl'))
})

We deploy the model after training is complete:

from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer
predictor = estimator.deploy(
    endpoint_name=endpoint_name,
    instance_type=config.HOSTING_INSTANCE_TYPE,
    initial_instance_count=1,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer()
)

When calling our new endpoint from the notebook, we use a SageMaker SDK Predictor. A Predictor is used to send data to an endpoint (as part of a request) and interpret the response. JSON is used as the format for both input data and output response because it’s a standard endpoint format and the endpoint response can contain nested data structures.

With our model successfully deployed and our predictor configured, we can try out the churn prediction model on an example input:

data = {
    "CustServ Calls": -20.0,
    "Account Length": 133.12,
    "plan": "D",
    "limit": "unlimited",
    "text": "Well, I've been dealing with TelCom for three months now, and I feel like they're very helpful and responsive to my issues, but for a month now, I've only had one technical support call and that was very long and involved. My phone number was wrong on both contracts, and they gave me a chance to work with TelCom customer service and it was extremely helpful, so I've decided to stick with it. But I would like to have more help in terms of technical support, I haven't had the kind of help with my phone line and I don't have the type of tech support I want. So I would like to negotiate a phone contract, maybe an upgrade from a Sprint plan, or maybe from a Verizon plan.\nTelCom Agent: Very good."
}
response = predictor.predict(data=[data])

The following code shows the response (probability of churn) from querying the endpoint:

20.09% probability of churn

Note that the probability returned by this model has not been calibrated. When the model gives a probability of churn of 20%, for example, this doesn’t necessarily mean that 20% of customers with a probability of 20% resulted in churn. Calibration is a useful property in certain circumstances, but isn’t required in cases where discrimination between cases of churn and non-churn is sufficient. CalibratedClassifierCV from Scikit-learn can be used to calibrate a model.

Now we query the endpoint using the hold-out test data, which consists of 1,939 examples. The following table summarizes the evaluation results for our multimodal model with a Hugging Face sentence transformer and Scikit-learn random forest classifier.

Metric BERT + Random Forest
Accuracy 0.77463
ROC AUC 0.75905

Model performance is dependent on hyperparameter configurations. Training a model with one set of hyperparameter configurations will not guarantee an optimal model. As a result, we run the HPO process in the following section to further improve model performance.

Fit a multimodal model with HPO

In this section, we further improve the model performance by adding HPO tuning with SageMaker automatic model tuning. SageMaker automatic model tuning, also known as hyperparameter tuning, finds the best version of a model by running many training jobs on your dataset using the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose. The best model and its corresponding hyperparameters are selected on the validation data. Next, the best model is evaluated on the hold-out test data, which is the same test data we created in the previous section. Finally, we show that the performance of the model trained with HPO is significantly better than the one trained without HPO.

The following are static hyperparameters we don’t tune and dynamic hyperparameters we want to tune and their searching ranges:

from sagemaker.tuner import ContinuousParameter, IntegerParameter, CategoricalParameter, HyperparameterTuner
hyperparameters = {
    "min_impurity_decrease": 0.0,
    "ccp_alpha": 0.0,
    "numerical-feature-names": "CustServ Calls,Account Length",
    "categorical-feature-names": "plan,limit",
    "textual-feature-names": "text",
    "label-name": "y"
}
hyperparameter_ranges = {
    "sentence-transformer": CategoricalParameter([
    "sentence-transformers/all-MiniLM-L6-v2", "sentence-transformers/multi-qa-mpnet-base-dot-v1", "sentence-transformers/paraphrase-MiniLM-L6-v2"]
    ),
    "criterion": CategoricalParameter(["gini", "entropy"]),
    "max-depth": CategoricalParameter([10, 20, 30, 40, 50, 60, 70, 80, 90, 100, -1]),
    "boostrap": CategoricalParameter(["True", "False"]),
    "min-samples-split": IntegerParameter(2, 10),
    "min-samples-leaf": IntegerParameter(1, 5),
    "n-estimators": CategoricalParameter([100, 200, 400, 800, 1000]),
}
tuning_job_name = f"{config.SOLUTION_PREFIX}-hpo"
current_folder = utils.get_current_folder(globals())
estimator = PyTorch(
    framework_version='1.5.0',
    py_version='py3',
    entry_point='entry_point.py',
    source_dir=str(Path(current_folder, '../containers/huggingface_transformer_randomforest').resolve()),
    hyperparameters=hyperparameters,
    role=config.IAM_ROLE,
    instance_count=1,
    instance_type=config.TRAINING_INSTANCE_TYPE,
    output_path='s3://' + str(Path(config.S3_BUCKET, config.OUTPUTS_S3_PREFIX_RF)),
    code_location='s3://' + str(Path(config.S3_BUCKET, config.OUTPUTS_S3_PREFIX_RF)),
    tags=[{'Key': config.TAG_KEY, 'Value': config.SOLUTION_PREFIX}],
    sagemaker_session=sagemaker_session,
    volume_size=30
)

We define the objective metric name, metric definition (with regex pattern), and objective type for the tuning job.

First, we set the objective as the accuracy score on the validation data (roc auc score on validation data) and defined metrics for the tuning job by specifying the objective metric name and a regular expression (regex). The regular expression is used to match the algorithm’s log output and capture the numeric values of metrics.

objective_metric_name = "roc auc"
metric_definitions = [{"Name": "roc auc", "Regex": "roc auc score on validation data: ([0-9\.]+)"}]
objective_type = "Maximize"

Next, we specify hyperparameter ranges to select the best hyperparameter values from. We set the total number of tuning jobs as 10 and distribute these jobs on five different Amazon Elastic Compute Cloud (Amazon EC2) instances for running parallel tuning jobs.

Finally, we pass those values to instantiate a SageMaker Estimator object, similar to what we did in the previous training step. Instead of calling the fit function of the Estimator object, we pass the Estimator object in as a parameter to the HyperparameterTuner constructor and call the fit function of it to launch tuning jobs:

tuner = HyperparameterTuner(
    estimator,
    objective_metric_name,
    hyperparameter_ranges,
    metric_definitions,
    max_jobs=18, # increase the maximum number of jobs will likely get better performance
    max_parallel_jobs=3,
    objective_type=objective_type,
    base_tuning_job_name=tuning_job_name,
)
# Launch a SageMaker Tuning job to search for the best hyperparameters
tuner.fit(
    {'train': 's3://' + str(Path(config.S3_BUCKET, config.DATASETS_S3_PREFIX, 'train.jsonl')), 'validation': 's3://' + str(Path(config.S3_BUCKET, config.DATASETS_S3_PREFIX, 'validation.jsonl'))},
    logs=True
)

When the tuning job is complete, we can generate the summary table of all the tuning jobs.

After the tuning jobs are complete, we deploy the model that gives the best evaluation metric score on the validation dataset, perform inference on the same hold-out test dataset we did in the previous section, and compute evaluation metrics.

Metric BERT + Random Forest BERT + Random Forest with HPO
Accuracy 0.77463 0.9278
ROC AUC 0.75905 0.79861

We can see running HPO with SageMaker automatic model tuning significantly improves the model performance.

In addition to HPO, model performance is also dependent on the algorithm. It’s important to train multiple state-of-the-art algorithms, compare their performance on the same hold-out test data, and pick up the optimal one. Therefore, we train two more AutoGluon multimodal models in the following sections.

Fit an AutoGluon multimodal weighted/stacked ensemble model

There are two types of AutoGluon multimodality:

  • Train multiple tabular models as well as the TextPredictor model (utilizing the TextPredictor model inside of TabularPredictor), and then combine them via either a weighted ensemble or stacked ensemble, as explained in AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data
  • Fuse multiple neural network models directly and handle raw text (which are also capable of handling additional numerical and categorical columns)

We train a multimodal weighted or stacked ensemble model first in this section, and train a fusion neural network model in the next section.

First, we retrieve the AutoGluon training image:

from sagemaker import image_uris
from sagemaker.estimator import Estimator
train_image_uri = image_uris.retrieve(
    "autogluon",
    region=boto3.Session().region_name,
    version='0.5.2',
    py_version='py38',
    image_scope="training",
    instance_type=config.TRAINING_INSTANCE_TYPE,
)

Next, we pass in hyperparameters. Unlike existing AutoML frameworks that primarily focus on the model or hyperparameter selection, AutoGluonTabular succeeds by ensembling multiple models and stacking them in multiple layers. Therefore, HPO is usually not required for AutoGluon ensemble models.

hyperparameters = {
    "numerical-feature-names": "CustServ Calls,Account Length",
    "categorical-feature-names": "plan,limit",
    "textual-feature-names": "text",
    "label-name": "y",
    "problem_type": "classification", # either classification or regression. For classification, we will identify binary or multiclass classification in the training script
    "eval_metric": "roc_auc",
    "presets": "medium_quality",
    "auto_stack": "False",
    "num_bag_folds": 0,
    "num_bag_sets": 1,
    "num_stack_levels": 0,
    "refit_full": "False",
    "set_best_to_refit_full": "False",
    "save_space": "True",
    "verbosity": 2,
    "pretrained-transformer": "google/electra-small-discriminator"
}

Finally, we create a SageMaker Estimator and call estimator.fit() to start a training job:

# Create SageMaker Estimator instance
training_job_name_ag = f"{config.SOLUTION_PREFIX}-ag"
tabular_estimator_ag = Estimator(
    role=config.IAM_ROLE,
    image_uri=train_image_uri,
    entry_point='train.py',
    source_dir=str(Path(current_folder, '../containers/autogluon_multimodal_ensemble').resolve()),
    instance_count=1,
    instance_type=config.TRAINING_INSTANCE_TYPE,
    max_run=360000,
    hyperparameters=hyperparameters,
    base_job_name=training_job_name_ag,
    output_path='s3://' + str(Path(config.S3_BUCKET, config.OUTPUTS_S3_PREFIX_AG_ENSEMBLE)),
    code_location='s3://' + str(Path(config.S3_BUCKET, config.OUTPUTS_S3_PREFIX_AG_ENSEMBLE)),
    tags=[{'Key': config.TAG_KEY, 'Value': config.SOLUTION_PREFIX}],
)
tabular_estimator_ag.fit(
    {
        'train': 's3://' + str(Path(config.S3_BUCKET, config.DATASETS_S3_PREFIX, 'train.jsonl')),
        'validation': 's3://' + str(Path(config.S3_BUCKET, config.DATASETS_S3_PREFIX, 'validation.jsonl'))
    }, logs=False
)

After training is complete, we retrieve the AutoGluon inference image and deploy the model:

# Retrieve the inference docker container uri
inference_image_uri = image_uris.retrieve(
    "autogluon",
    region=boto3.Session().region_name,
    version='0.5.2',
    py_version='py38',
    image_scope="inference",
    instance_type=config.HOSTING_INSTANCE_TYPE,
)
endpoint_name_ag = f"{config.SOLUTION_PREFIX}-ag-endpoint"
predictor_ag = tabular_estimator_ag.deploy(
    initial_instance_count=1,
    instance_type=config.HOSTING_INSTANCE_TYPE,
    entry_point="inference.py",
    image_uri=inference_image_uri,
    source_dir=str(Path(current_folder, '../containers/autogluon_multimodal_ensemble').resolve()),
    endpoint_name=endpoint_name_ag,
)

After we deploy the endpoints, we query the endpoint using the same test set and compute evaluation metrics. In the following table, we can see AutoGluon multimodal ensemble improves about 3% in ROC AUC compared with the BERT sentence transformer and random forest with HPO.

Metric BERT + Random Forest BERT + Random Forest with HPO AutoGluon Multimodal Ensemble
Accuracy 0.77463 0.9278 0.92625
ROC AUC 0.75905 0.79861 0.82918

Fit an AutoGluon multimodal fusion model

The following diagram illustrates the architecture of the model. For details, see AutoMM for Text + Tabular – Quick Start.

Internally, we use different networks to encode the text columns, categorical columns, and numerical columns. The features generated by individual networks are aggregated by a late-fusion aggregator. The aggregator can output both the logits or score predictions.

Here, we use the pretrained NLP backbone to extract the text features and then use two other towers to extract the feature from the categorical column and numerical column.

In addition, to deal with multiple text fields, we separate these fields with the [SEP] token and alternate 0s and 1s as the segment IDs, as shown in the following diagram.

Similarly, we follow instructions in the previous section to train and deploy the AutoGluon multimodal fusion model:

# Create SageMaker Estimator instance
training_job_name_ag_fusion = f"{config.SOLUTION_PREFIX}-ag-fusion"
hyperparameters = {
    "numerical-feature-names": "CustServ Calls,Account Length",
    "categorical-feature-names": "plan,limit",
    "textual-feature-names": "text",
    "label-name": "y",
    "problem_type": "classification", # either classification or regression. For classification, we will identify binary or multiclass classification in the training script
    "eval_metric": "roc_auc",
    "verbosity": 2,
    "pretrained-transformer": "google/electra-small-discriminator",
}
tabular_estimator_ag_fusion = Estimator(
    role=config.IAM_ROLE,
    image_uri=train_image_uri,
    entry_point='train.py',
    source_dir=str(Path(current_folder, '../containers/autogluon_multimodal_fusion').resolve()),
    instance_count=1,
    instance_type=config.TRAINING_INSTANCE_TYPE,
    max_run=360000,
    hyperparameters=hyperparameters,
    base_job_name=training_job_name_ag_fusion,
    output_path='s3://' + str(Path(config.S3_BUCKET, config.OUTPUTS_S3_PREFIX_AG_FUSION)),
    code_location='s3://' + str(Path(config.S3_BUCKET, config.OUTPUTS_S3_PREFIX_AG_FUSION)),
    tags=[{'Key': config.TAG_KEY, 'Value': config.SOLUTION_PREFIX}],
)
tabular_estimator_ag_fusion.fit(
    {
        'train': 's3://' + str(Path(config.S3_BUCKET, config.DATASETS_S3_PREFIX, 'train.jsonl')),
        'validation': 's3://' + str(Path(config.S3_BUCKET, config.DATASETS_S3_PREFIX, 'validation.jsonl'))
    }, logs=False
)

The following table summarizes the evaluation results for the AutoGluon multimodal fusion model, along with those of three models that we evaluated in the previous sections. We can see the AutoGluon multimodal ensemble and multimodal fusion models achieve the best performance.

Metrics BERT + Random Forest BERT + Random Forest with HPO AutoGluon Multimodal Ensemble AutoGluon Multimodal Fusion
Accuracy 0.77463 0.9278 0.92625 0.9247
ROC AUC 0.75905 0.79861 0.82918 0.81115

Note that the results and relative performance between these models depend on the dataset you use for training. These results are representative, and even though the tendency for certain algorithms to perform better is based on relevant factors, the balance in performance might change given a different data distribution. You can replace the example dataset with your own data to determine what model works best for you.

Demo notebook

You can use the demo notebook to send example data to already-deployed model endpoints. The demo notebook quickly allows you to get hands-on experience by querying the example data. After you launch the Churn Prediction with Text solution, open the demo notebook by choosing Use Endpoint in Notebook.

Clean up

When you’ve finished with this solution, make sure that you delete all unwanted AWS resources by choosing Delete all resources.

Note that you need to manually delete any additional resources that you may have created in this notebook.

Conclusion

In this post, we showed how you can use Sagemaker JumpStart to predict churn using multimodality of text and tabular features.

If you’re interested in learning more about customer churn models, check out the following posts:


About the Authors

Dr. Xin Huang is an Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A journal.

Rajakumar Sampathkumar is a Principal Technical Account Manager at AWS, providing customers guidance on business-technology alignment and supporting the reinvention of their cloud operation models and processes. He is passionate about cloud and machine learning. Raj is also a machine learning specialist and works with AWS customers to design, deploy, and manage their AWS workloads and architectures.

Read More

Leveraging artificial intelligence and machine learning at Parsons with AWS DeepRacer

Leveraging artificial intelligence and machine learning at Parsons with AWS DeepRacer

This post is co-written with Jennifer Bergstrom, Sr. Technical Director, ParsonsX.

Parsons Corporation (NYSE:PSN) is a leading disruptive technology company in critical infrastructure, national defense, space, intelligence, and security markets providing solutions across the globe to help make the world safer, healthier, and more connected. Parsons provides services and capabilities across cybersecurity, missile defense, space ground station technology, transportation, environmental remediation, and water/wastewater treatment to name a few.

Parsons is a builder community and invests heavily in employee development programs and upskilling. With programs such as ParsonsX, Parsons’s digital transformation initiative, and – ‘The Guild,’ – an employee-focused community, Parsons strives to be an employer of choice and engages its employees in career development programs year-round to create a workforce of the future.

In this post, we show you how Parsons is building its next generation workforce by using machine learning (ML) and artificial intelligence (AI) with AWS DeepRacer in a fun and collaborative way.

As Parsons’ footprint moves into the cloud, their leadership recognized the need for a change in culture and a fundamental requirement to educate their engineering task force on the new cloud operating model, tools, and technologies. Parsons is observing industry trends that make it imperative to incorporate AI and ML capabilities into the organization’s strategic and tactical decision-making processes. To best serve the customer’s needs, Parsons must upskill its workforce across the board in AI/ML tooling and how to scale it in an enterprise organization. Parsons is on a mission to make AI/ML a foundation of business across the company.

Parsons chose AWS DeepRacer because it’s a fun, interactive, and exciting challenge that appealed to their broader range of employees and didn’t mandate a significant level of expertise to compete. Parsons found that AWS has many dedicated AWS DeepRacer experts in the field who would help plan, setup and run a series of AI/ML events and challenges. Parsons realized success of this event would be driven by efficient mechanisms and processes the AWS DeepRacer community has in place.

Parsons’ goal was to upskill their employees in an enjoyable and competitive way, with virtual leagues among peer groups and an in-person event for the top racers. The education initiative in partnership with AWS was comprised of four phases.

First, Parsons hosted a virtual live workshop with AWS experts in the AI/ML and DeepRacer community. The workshop taught the basics of reinforcement learning, reward functions, hyperparameter tuning, and accessing the AWS DeepRacer console to train and submit a model.

In the next phase, they hosted a virtual community league race for all participating Parsons employees. Models were optimized, submitted, and raced, and winners were announced at the end of racing. Participants in the virtual leagues were comprised of individual contributors and frontline managers from various job roles across Parsons, including civil engineers, bridge engineers, systems and software engineers, data analysts, project managers, and program managers. Joining them as participants in the league were business unit presidents, SVPs, VPs, senior directors, and directors.

In the third phase, an in-person league was held in Maryland. The top four participants from the virtual leagues saw their models loaded into and raced in physical AWS DeepRacer cars on a track built onsite. The top four competitors at this event included a market CTO, a project signaling engineer, a project engineer, and an engineer intern.

The fourth and final phase of the event had each of the four competitors provide a technical walkthrough of the techniques used to develop, train, and test their models. Through AWS DeepRacer, Parsons not only showcased the impact this event was able to make globally across all the divisions, but also that they were able to create a memorable experience for participants.

Over 500 employees registered from various business units and service organizations across Parsons worldwide right after an announcement of the AWS DeepRacer challenge was published internally. The AWS DeepRacer workshop saw unprecedented interest with over 470 Parsons employees joining the initial workshop. The virtual workshop generated significant engagement – 245 active users developed over 1,500+ models and spent over 500 hours training these models on the AWS DeepRacer console. The virtual league was a resounding success, with 185 racers from across the country participating and submitting 1415 models into the competition!

The virtual AWS DeepRacer league at Parsons provided a fun and inviting environment with lots of iterations, learning, and experimentation. Parsons’ Market CTO, John Statuli, who was one of the top four contenders at the race said, “It was a lot of fun to participate at the AWS DeepRacer event. I have not done any programming in a long time, but the combination of the AWS DeepRacer virtual workshop and the AWS DeepRacer program provided an easy way by which I could participate and compete for the top spot.”

At the final race held in Maryland, Parsons broadcasted a companywide virtual event that showcased a tough competition between their top four competitors from three different business units. Parsons top leadership joined the event, including CTO Rico Lorenzo, D&I CTO Ryan Gabrielle, President of Connected Communities Peter Torrellas, CDO Tim LaChapelle, and ParsonsX Sr. Director Jennifer Bergstrom. At the event, Parsons hosted a webinar with over 100 attendees and a winner’s walkthrough of their models.

With such an overwhelming response from employees across the globe and an interest in AI/ML learning, Parsons is now planning several additional events to continue growing their employees’ knowledgebase. To continue to upskill and educate their workforce, Parsons intends to run more AWS DeepRacer events and workshops focused on object avoidance, an Amazon SageMaker deep dive workshop, and an AWS DeepRacer head-to-head race. Parsons continues to engage with AWS on AI/ML services to build world-class solutions in the fields of critical infrastructure, national defense, space, and cybersecurity.

Whether your organization is new to machine learning or ready to build on existing skills, AWS DeepRacer can help you get there. To learn more visit Getting Started with AWS DeepRacer.


About the Authors

Jenn Bergstrom is a Parsons Fellow and Senior Technical Director. She is passionate about innovative technological solutions and strategies and enjoys designing well-architected cloud solutions for programs across all of Parsons’s domains. When not driving innovation at Parsons, she loves exploring the world with her husband and daughters, and mentoring diverse individuals transitioning into the tech industry. You can reach her on LinkedIn.

Deval Parikh is a Sr. Enterprise Solutions Architect at Amazon Web Services. She is passionate about helping enterprises reimagine their businesses in the cloud by leading them with strategic architectural guidance and building prototypes as an AWS expert. She is also an active board member of the Women at AWS affinity group where she oversees university programs to educate students on cloud technology and careers. She is also an avid hiker and a painter of oil on canvas. You can see many of her paintings at www.devalparikh.com. You can reach her on LinkedIn.

Read More