Build custom code libraries for your Amazon SageMaker Data Wrangler Flows using AWS Code Commit

Build custom code libraries for your Amazon SageMaker Data Wrangler Flows using AWS Code Commit

As organizations grow in size and scale, the complexities of running workloads increase, and the need to develop and operationalize processes and workflows becomes critical. Therefore, organizations have adopted technology best practices, including microservice architecture, MLOps, DevOps, and more, to improve delivery time, reduce defects, and increase employee productivity. This post introduces a best practice for managing custom code within your Amazon SageMaker Data Wrangler workflow.

Data Wrangler is a low-code tool that facilitates data analysis, preprocessing, and visualization. It contains over 300 built-in data transformation steps to aid with feature engineering, normalization, and cleansing to transform your data without having to write any code.

In addition to the built-in transforms, Data Wrangler contains a custom code editor that allows you to implement custom code written in Python, PySpark, or SparkSQL.

When using Data Wrangler custom transform steps to implement your custom functions, you need to implement best practices around developing and deploying code in Data Wrangler flows.

This post shows how you can use code stored in AWS CodeCommit in the Data Wrangler custom transform step. This provides you with additional benefits, including:

  • Improve productivity and collaboration across personnel and teams
  • Version your custom code
  • Modify your Data Wrangler custom transform step without having to log in to Amazon SageMaker Studio to use Data Wrangler
  • Reference parameter files in your custom transform step
  • Scan code in CodeCommit using Amazon CodeGuru or any third-party application for security vulnerabilities before using it in Data Wrangler flowssagemake

Solution overview

This post demonstrates how to build a Data Wrangler flow file with a custom transform step. Instead of hardcoding the custom function into your custom transform step, you pull a script containing the function from CodeCommit, load it, and call the loaded function in your custom transform step.

For this post, we use the bank-full.csv data from the University of California Irving Machine Learning Repository to demonstrate these functionalities. The data is related to the direct marketing campaigns of a banking institution. Often, more than one contact with the same client was required to assess if the product (bank term deposit) would be subscribed (yes) or not subscribed (no).

The following diagram illustrates this solution.

The workflow is as follows:

  1. Create a Data Wrangler flow file and import the dataset from Amazon Simple Storage Service (Amazon S3).
  2. Create a series of Data Wrangler transformation steps:
    • A custom transform step to implement a custom code stored in CodeCommit.
    • Two built-in transform steps.

We keep the transformation steps to a minimum so as not to detract from the aim of this post, which is focused on the custom transform step. For more information about available transformation steps and implementation, refer to Transform Data and the Data Wrangler blog.

  1. In the custom transform step, write code to pull the script and configuration file from CodeCommit, load the script as a Python module, and call a function in the script. The function takes a configuration file as an argument.
  2. Run a Data Wrangler job and set Amazon S3 as the destination.

Destination options also include Amazon SageMaker Feature Store.

Prerequisites

As a prerequisite, we set up the CodeCommit repository, Data Wrangler flow, and CodeCommit permissions.

Create a CodeCommit repository

For this post, we use an AWS CloudFormation template to set up a CodeCommit repository and copy the required files into this repository. Complete the following steps:

  1. Choose Launch Stack:

  1. Select the Region where you want to create the CodeCommit repository.
  2. Enter a name for Stack name.
  3. Enter a name for the repository to be created for RepoName.
  4. Choose Create stack.


AWS CloudFormation takes a few seconds to provision your CodeCommit repository. After the CREATE_COMPLETE status appears, navigate to the CodeCommit console to see your newly created repository.

Set up Data Wrangler

Download the bank.zip dataset from the University of California Irving Machine Learning Repository. Then, extract the contents of bank.zip and upload bank-full.csv to Amazon S3.

To create a Data Wrangler flow file and import the bank-full.csv dataset from Amazon S3, complete the following steps:

  1. Onboard to SageMaker Studio using the quick start for users new to Studio.
  2. Select your SageMaker domain and user profile and on the Launch menu, choose Studio.

  1. On the Studio console, on the File menu, choose New, then choose Data Wrangler Flow.
  2. Choose Amazon S3 for Data sources.
  3. Navigate to your S3 bucket containing the file and upload the bank-full.csv file.

A Preview Error will be thrown.

  1. Change the Delimiter in the Details pane to the right to SEMICOLON.

A preview of the dataset will be displayed in the result window.

  1. In the Details pane, on the Sampling drop-down menu, choose None.

This is a relatively small dataset, so you don’t need to sample.

  1. Choose Import.

Configure CodeCommit permissions

You need to provide Studio with permission to access CodeCommit. We use a CloudFormation template to provision an AWS Identity and Access Management (IAM) policy that gives your Studio role permission to access CodeCommit. Complete the following steps:

  1. Choose Launch Stack:

  1. Select the Region you are working in.
  2. Enter a name for Stack name.
  3. Enter your Studio domain ID for SageMakerDomainID. The domain information is available on the SageMaker console Domains page, as shown in the following screenshot.

  1. Enter your Studio domain user profile name for SageMakerUserProfileName. You can view your user profile name by navigating into your Studio domain. If you have multiple user profiles in your Studio domain, enter the name for the user profile used to launch Studio.

  1. Select the acknowledgement box.

The IAM resources used by this CloudFormation template provide the minimum permissions to successfully create the IAM policy attached to your Studio role for CodeCommit access.

  1. Choose Create stack.

Transformation steps

Next, we add transformations to process the data.

Custom transform step

In this post, we calculate the Variance Inflation factor (VIF) for each numerical feature and drop features that exceed a VIF threshold. We do this in the custom transform step because Data Wrangler doesn’t have a built-in transform for this task as of this writing.

However, we don’t hardcode this VIF function. Instead, we pull this function from the CodeCommit repository into the custom transform step. Then we run the function on the dataset.

  1. On the Data Wrangler console, navigate to your data flow.
  2. Choose the plus sign next to Data types and choose Add transform.

  1. Choose + Add step.
  2. Choose Custom transform.
  3. Optionally, enter a name in the Name field.
  4. Choose Python (PySpark) on the drop-down menu.
  5. For Your custom transform, enter the following code (provide the name of the CodeCommit repository and Region where the repository is located):
# Table is available as variable `df`
import boto3
import os
import json
import numpy as np
from importlib import reload
import sys

# Initialize variables
repo_name= <Enter Name of Repository># Name of repository in CodeCommit
region= <Name of region where repository is located># Name of AWS region
script_name='pyspark_transform.py' # Name of script in CodeCommit
config_name='parameter.json' # Name of configuration file in CodeCommit

# Creating directory to hold downloaded files
newpath=os.getcwd()+"/input/"
if not os.path.exists(newpath):
    os.makedirs(newpath)
module_path=os.getcwd()+'/input/'+script_name

# Downloading configuration file to memory
client=boto3.client('codecommit', region_name=region)
response = client.get_file(
    repositoryName=repo_name,
    filePath=config_name)
param_file=json.loads(response['fileContent'])

# Downloading script to directory
script = client.get_file(
   repositoryName=repo_name,
   filePath=script_name)
with open(module_path,'w') as f:
   f.write(script['fileContent'].decode())

# importing pyspark script as module
sys.path.append(os.getcwd()+"/input/")
import pyspark_transform
#reload(pyspark_transform)

# Executing custom function in pyspark script
df=pyspark_transform.compute_vif(df,param_file)

The code uses the AWS SDK for Python (Boto3) to access CodeCommit API functions. We use the get_file API function to pull files from the CodeCommit repository into the Data Wrangler environment.

  1. Choose Preview.

In the Output pane, a table is displayed showing the different numerical features and their corresponding VIF value. For this exercise, the VIF threshold value is set to 1.2. However, you can modify this threshold value in the parameter.json file found in your CodeCommit repository. You will notice that two columns have been dropped (pdays and previous), bringing the total column count to 15.

  1. Choose Add.

Encode categorical features

Some feature types are categorical variables that need to be transformed into numerical forms. Use the one-hot encode built-in transform to achieve this data transformation. Let’s create numerical features representing the unique value in each categorical feature in the dataset. Complete the following steps:

  1. Choose + Add step.
  2. Choose the Encode categorical transform.
  3. On the Transform drop-down menu, choose One-hot encode.
  4. For Input column, choose all categorical features, including poutcome, y, month, marital, contact, default, education, housing, job, and loan.
  5. For Output style, choose Columns.
  6. Choose Preview to preview the results.

One-hot encoding might take a while to generate results, given the number of features and unique values within each feature.

  1. Choose Add.

For each numerical feature created with one-hot encoding, the name combines the categorical feature name appended with an underscore (_) and the unique categorical value within that feature.

Drop column

The y_yes feature is the target column for this exercise, so we drop the y_no feature.

  1. Choose + Add step.
  2. Choose Manage columns.
  3. Choose Drop column under Transform.
  4. Choose y_no under Columns to drop.
  5. Choose Preview, then choose Add.

Create a Data Wrangler job

Now that you have created all the transform steps, you can create a Data Wrangler job to process your input data and store the output in Amazon S3. Complete the following steps:

  1. Choose Data flow to go back to the Data Flow page.
  2. Choose the plus sign on the last tile of your flow visualization.
  3. Choose Add destination and choose Amazon S3.

  1. Enter the name of the output file for Dataset name.
  2. Choose Browse and choose the bucket destination for Amazon S3 location.
  3. Choose Add destination.
  4. Choose Create job.

  1. Change the Job name value as you see fit.
  2. Choose Next, 2. Configure job.
  3. Change Instance count to 1, because we work with a relatively small dataset, to reduce the cost incurred.
  4. Choose Create.

This will start an Amazon SageMaker Processing job to process your Data Wrangler flow file and store the output in the specified S3 bucket.

Automation

Now that you have created your Data Wrangler flow file, you can schedule your Data Wrangler jobs to automatically run at specific times and frequency. This is a feature that comes out of the box with Data Wrangler and simplifies the process of scheduling Data Wrangler jobs. Furthermore, CRON expressions are supported and provide additional customization and flexibility in scheduling your Data Wrangler jobs.

However, this post shows how you can automate the Data Wrangler job to run every time there is a modification to the files in the CodeCommit repository. This automation technique ensures that any changes to the custom code functions or changes to values in the configuration file in CodeCommit trigger a Data Wrangler job to reflect these changes immediately.

Therefore, you don’t have to manually start a Data Wrangler job to get the output data that reflects the changes you just made. With this automation, you can improve the agility and scale of your Data Wrangler workloads. To automate your Data Wrangler jobs, you configure the following:

  • Amazon SageMaker Pipelines – Pipelines helps you create machine learning (ML) workflows with an easy-to-use Python SDK, and you can visualize and manage your workflow using Studio
  • Amazon EventBridge – EventBridge facilitates connection to AWS services, software as a service (SaaS) applications, and custom applications as event producers to launch workflows.

Create a SageMaker pipeline

First, you need to create a SageMaker pipeline for your Data Wrangler job. Then complete the following steps to export your Data Wrangler flow to a SageMaker pipeline:

  1. Choose the plus sign on your last transform tile (the transform tile before the Destination tile).
  2. Choose Export to.
  3. Choose SageMaker Inference Pipeline (via Jupyter Notebook).

This creates a new Jupyter notebook prepopulated with code to create a SageMaker pipeline for your Data Wrangler job. Before running all the cells in the notebook, you may want to change certain variables.

  1. To add a training step to your pipeline, change the add_training_step variable to True.

Be aware that running a training job will incur additional costs on your account.

  1. Specify a value for the target_attribute_name variable to y_yes.

  1. To change the name of the pipeline, change the pipeline_name variable.

  1. Lastly, run the entire notebook by choosing Run and Run All Cells.

This creates a SageMaker pipeline and runs the Data Wrangler job.

  1. To view your pipeline, choose the home icon on the navigation pane and choose Pipelines.

You can see the new SageMaker pipeline created.

  1. Choose the newly created pipeline to see the run list.
  2. Note the name of the SageMaker pipeline, as you will use it later.
  3. Choose the first run and choose Graph to see a Directed Acyclic Graph (DAG) flow of your SageMaker pipeline.

As shown in the following screenshot, we didn’t add a training step to our pipeline. If you added a training step to your pipeline, it will display in your pipeline run Graph tab under DataWranglerProcessingStep.

Create an EventBridge rule

After successfully creating your SageMaker pipeline for the Data Wrangler job, you can move on to setting up an EventBridge rule. This rule listens to activities in your CodeCommit repository and triggers the run of the pipeline in the event of a modification to any file in the CodeCommit repository. We use a CloudFormation template to automate creating this EventBridge rule. Complete the following steps:

  1. Choose Launch Stack:

  1. Select the Region you are working in.
  2. Enter a name for Stack name.
  3. Enter a name for your EventBridge rule for EventRuleName.
  4. Enter the name of the pipeline you created for PipelineName.
  5. Enter the name of the CodeCommit repository you are working with for RepoName.
  6. Select the acknowledgement box.

The IAM resources that this CloudFormation template uses provide the minimum permissions to successfully create the EventBridge rule.

  1. Choose Create stack.

It takes a few minutes for the CloudFormation template to run successfully. When the Status changes to CREATE_COMPLTE, you can navigate to the EventBridge console to see the created rule.

Now that you have created this rule, any changes you make to the file in your CodeCommit repository will trigger the run of the SageMaker pipeline.

To test the pipeline edit a file in your CodeCommit repository, modify the VIF threshold in your parameter.json file to a different number, and go to the SageMaker pipeline details page to see a new run of your pipeline created.

In this new pipeline run, Data Wrangler drops numerical features that have a greater VIF value than the threshold you specified in your parameter.json file in CodeCommit.

You have successfully automated and decoupled your Data Wrangler job. Furthermore, you can add more steps to your SageMaker pipeline. You can also modify the custom scripts in CodeCommit to implement various functions in your Data Wrangler flow.

It’s also possible to store your scripts and files in Amazon S3 and download them into your Data Wrangler custom transform step as an alternative to CodeCommit. In addition, you ran your custom transform step using the Python (PyScript) framework. However, you can also use the Python (Pandas) framework for your custom transform step, allowing you to run custom Python scripts. You can test this out by changing your framework in the custom transform step to Python (Pandas) and modifying your custom transform step code to pull and implement the Python script version stored in your CodeCommit repository. However, the PySpark option for Data Wrangler provides better performance when working on a large dataset compared to the Python Pandas option.

Clean up

After you’re done experimenting with this use case, clean up the resources you created to avoid incurring additional charges to your account:

  1. Stop the underlying instance used to create your Data Wrangler flow.
  2. Delete the resources created by the various CloudFormation template.
  3. If you see a DELETE_FAILED state, when deleting the CloudFormation template, delete the stack one more time to successfully delete it.

Summary

This post showed you how to decouple your Data Wrangler custom transform step by pulling scripts from CodeCommit. We also showed how to automate your Data Wrangler jobs using SageMaker Pipelines and EventBridge.

Now you can operationalize and scale your Data Wrangler jobs without modifying your Data Wrangler flow file. You can also scan your custom code in CodeCommit using CodeGuru or any third-party application for vulnerabilities before implementing it in Data Wrangler. To know more about end-to-end machine learning operations (MLOps) on AWS, visit Amazon SageMaker for MLOps.


About the Author

Uchenna Egbe is an Associate Solutions Architect at AWS. He spends his free time researching about herbs, teas, superfoods, and how to incorporate them into his daily diet.

Read More

Accelerate Amazon SageMaker inference with C6i Intel-based Amazon EC2 instances

Accelerate Amazon SageMaker inference with C6i Intel-based Amazon EC2 instances

This is a guest post co-written with Antony Vance from Intel.

Customers are always looking for ways to improve the performance and response times of their machine learning (ML) inference workloads without increasing the cost per transaction and without sacrificing the accuracy of the results. Running ML workloads on Amazon SageMaker running Amazon Elastic Compute Cloud (Amazon EC2) C6i instances with Intel’s INT8 inference deployment can help boost the overall performance by up to four times per dollar spent while keeping the loss in inference accuracy less than 1% as compared to FP32 when applied to certain ML workloads. When it comes to running the models in embedded devices where form factor and size of the model is important, quantization can help.

Quantization is a technique to reduce the computational and memory costs of running inference by representing the weights and activations with low-precision data types like 8-bit integer (INT8) instead of the usual 32-bit floating point (FP32). In the following example figure, we show INT8 inference performance in C6i for a BERT-base model.

The BERT-base was fine-tuned with SQuAD v1.1, with PyTorch (v1.11) being the ML framework used with Intel® Extension for PyTorch. A batch size of 1 was used for the comparison. Higher batch sizes will provide different cost per 1 million inferences.

In this post, we show you how to build and deploy INT8 inference with your own processing container for PyTorch. We use Intel extensions for PyTorch for an effective INT8 deployment workflow.

Overview of the technology

EC2 C6i instances are powered by third-generation Intel Xeon Scalable processors (also called Ice Lake) with an all-core turbo frequency of 3.5 GHz.

In the context of deep learning, the predominant numerical format used for research and deployment has so far been 32-bit floating point, or FP32. However, the need for reduced bandwidth and compute requirements of deep learning models has driven research into using lower-precision numerical formats. It has been demonstrated that weights and activations can be represented using 8-bit integers (or INT8) without incurring significant loss in accuracy.

EC2 C6i instances offer many new capabilities that result in performance improvements for AI and ML workloads. C6i instances provide performance advantages in FP32 and INT8 model deployments. FP32 inference is enabled with AVX-512 improvements, and INT8 inference is enabled by AVX-512 VNNI instructions.

C6i is now available on SageMaker endpoints, and developers should expect it to provide over two times price-performance improvements for INT8 inference over FP32 inference and up to four times performance improvement when compared with C5 instance FP32 inference. Refer to the appendix for instance details and benchmark data.

Deep learning deployment on the edge for real-time inference is key to many application areas. It significantly reduces the cost of communicating with the cloud in terms of network bandwidth, network latency, and power consumption. However, edge devices have limited memory, computing resources, and power. This means that a deep learning network must be optimized for embedded deployment. INT8 quantization has become a popular approach for such optimizations for ML frameworks like TensorFlow and PyTorch. SageMaker provides you with a bring your own container (BYOC) approach and integrated tools so that you can run quantization.

For more information, refer to Lower Numerical Precision Deep Learning Inference and Training.

Solution overview

The steps to implement the solution are as follows:

  1. Provision an EC2 C6i instance to quantize and create the ML model.
  2. Use the supplied Python scripts for quantization.
  3. Create a Docker image to deploy the model in SageMaker using the BYOC approach.
  4. Use an Amazon Simple Storage Service (Amazon S3) bucket to copy the model and code for SageMaker access.
  5. Use Amazon Elastic Container Registry (Amazon ECR) to host the Docker image.
  6. Use the AWS Command Line Interface (AWS CLI) to create an inference endpoint in SageMaker.
  7. Run the provided Python test scripts to invoke the SageMaker endpoint for both INT8 and FP32 versions.

This inference deployment setup uses a BERT-base model from the Hugging Face transformers repository (csarron/bert-base-uncased-squad-v1).

Prerequisites

The following are prerequisites for creating the deployment setup:

  • A Linux shell terminal with the AWS CLI installed
  • An AWS account with access to EC2 instance creation (C6i instance type)
  • SageMaker access to deploy a SageMaker model, endpoint configuration, endpoint
  • AWS Identity and Access Management (IAM) access to configure an IAM role and policy
  • Access to Amazon ECR
  • SageMaker access to create a notebook with instructions to launch an endpoint

Generate and deploy a quantized INT8 model on SageMaker

Open an EC2 instance for creating your quantized model and push the model artifacts to Amazon S3. For endpoint deployment, create a custom container with PyTorch and Intel® Extension for PyTorch to deploy the optimized INT8 model. The container gets pushed into Amazon ECR and a C6i based endpoint is created to serve FP32 and INT8 models.

The following diagram illustrates the high-level flow.

To access the code and documentation, refer to the GitHub repo.

Example use case

The Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset consisting of questions posed by crowd-workers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

The following example is a question answering algorithm using a BERT-base model. Given a document as an input, the model will answer simple questions based on the learning and contexts from the input document.

The following is an example input document:

The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometers (2,700,000 sq mi), of which 5,500,000 square kilometers (2,100,000 sq mi) are covered by the rainforest.

For the question “Which name is also used to describe the Amazon rainforest in English?” we get the answer:

also known in English as Amazonia or the Amazon Jungle,Amazonia or the Amazon Jungle, Amazonia.

For the question “How many square kilometers of rainforest is covered in the basin?” we get the answer:

5,500,000 square kilometers (2,100,000 sq mi) are covered by the rainforest.5,500,000.

Quantizing the model in PyTorch

This section gives a quick overview of model quantization steps with PyTorch and Intel extensions.

The code snippets are derived from a SageMaker example.

Let’s go over the changes in detail for the function IPEX_quantize in the file quantize.py.

  1. Import intel extensions for PyTorch to help with quantization and optimization and import torch for array manipulations:
import intel_extension_for_pytorch as ipex
import torch
  1. Apply model calibration for 100 iterations. In this case, you are calibrating the model with the SQuAD dataset:
model.eval()
conf = ipex.quantization.QuantConf(qscheme=torch.per_tensor_affine)
print("Doing calibration...")
for step, batch in enumerate(eval_dataloader):
    print("Calibration step-", step)
    with torch.no_grad():
        with ipex.quantization.calibrate(conf):
            model(**batch)
    if step == 100:
        break
  1. Prepare sample inputs:
jit_inputs = []
    example_batch = next(iter(eval_dataloader))
    for key in example_batch:
        example_tensor = torch.ones_like(example_batch[key])
        jit_inputs.append(example_tensor)
    jit_inputs = tuple(jit_inputs)
  1. Convert the model to an INT8 model using the following configuration:
with torch.no_grad():
    model = ipex.quantization.convert(model, conf, jit_inputs)
  1. Run two iterations of forward pass to enable fusions:
    with torch.no_grad(): model(**example_batch) model(**example_batch)

  1. As a last step, save the TorchScript model:
model.save(os.path.join(args.model_path, "model_int8.pt"))

Clean up

Refer to the Github repo for steps to clean up the AWS resources created.

Conclusion

New EC2 C6i instances in an SageMaker endpoint can accelerate the inference deployment up to 2.5 times greater with INT8 quantization. Quantizing the model in PyTorch is possible with a few APIs from Intel PyTorch extensions. It’s recommended to quantize the model in C6i instances so that model accuracy is maintained in endpoint deployment. The SageMaker examples GitHub repo now provides an end-to-end deployment example pipeline for quantizing and hosting INT8 models.

We encourage you to create a new model or migrate an existing model using INT8 quantization using the EC2 C6i instance type and see the performance gains for yourself.


Notice and disclaimers

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (0BSD)


Appendix

New AWS instances in SageMaker with INT8 deployment support

The following table lists SageMaker instances with and without DL Boost support.

Instance Name Xeon Gen Codename INT8 Enabled? DL Boost Enabled?
ml.c5. xlarge – ml.c5.9xlarge Skylake/1st Yes No
ml.c5.18xlarge Skylake/1st Yes No
ml.c6i.1x – 32xlarge Ice Lake/3rd Yes Yes

To summarize, INT8 enabled supports the INT8 data type and computation; DL Boost enabled supports Deep Learning Boost.

Benchmark data

The following table compares the cost and relative performance between c5 and c6 instances.

Latency and throughput measured with 10000 Inference queries to Sage maker endpoints.

E2E Latency of Inference Endpoint and Cost analysis
P50(ms) P90(ms) Queries/Sec $/1M Queries Relative $/Performance
C5.2xLarge-FP32 76.6 125.3 11.5 $10.2 1.0x
c6i.2xLarge-FP32 70 110.8 13 $9.0 1.1x
c6i.2xLarge-INT8 35.7 48.9 25.56 $4.5 2.3x

INT8 models are expected to provide 2–4 times practical performance improvements with less than 1% accuracy loss for most of the models. Above table covers overhead latency (NW and demo application)

Accuracy for BERT-base model

The following table summarizes the accuracy for the INT8 model with the SQUaD v1.1 dataset.

Metric FP32 INT8
Exact Match 85.8751 85.5061
F1 92.0807 91.8728

The GitHub repo comes with the scripts to check the accuracy of the SQuAD dataset. Refer to invoke-INT8.py and invoke-FP32.py scripts for testing.

Intel Extension for PyTorch

Intel® Extension for PyTorch* (an open–source project at GitHub) extends PyTorch with optimizations for extra performance boosts on Intel hardware. Most of the optimizations will be included in stock PyTorch releases eventually, and the intention of the extension is to deliver up-to-date features and optimizations for PyTorch on Intel hardware. Examples include AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX).

The following figure illustrates the Intel Extension for PyTorch architecture.

For more detailed user guidance (features, performance tuning, and more) for Intel® Extension for PyTorch, refer to Intel® Extension for PyTorch* user guidance.


About the Authors

Rohit Chowdhary is a Sr. Solutions Architect in the Strategic Accounts team at AWS.

Aniruddha Kappagantu is a Software Development Engineer in the AI Platforms team at AWS.

Antony Vance is an AI Architect at Intel with 19 years of experience in computer vision, machine learning, deep learning, embedded software, GPU, and FPGA.

Read More

Intelligently search your organization’s Microsoft Teams data source with the Amazon Kendra connector for Microsoft Teams

Intelligently search your organization’s Microsoft Teams data source with the Amazon Kendra connector for Microsoft Teams

Organizations use messaging platforms like Microsoft Teams to bring the right people together to securely communicate with each other and collaborate to get work done. Microsoft Teams captures invaluable organizational knowledge in the form of the information that flows through it as users collaborate. However, making this knowledge easily and securely available to users can be challenging due to the fragmented nature of conversations across groups, channels, and chats within an organization. Additionally, the conversational nature of Microsoft Teams communication renders a traditional keyword-based approach to search ineffective when trying to find accurate answers to questions from the content and therefore requires intelligent search capabilities that have the ability to process natural language queries.

You can now use the Amazon Kendra connector for Microsoft Teams to index Microsoft Teams messages and documents, and search this content using intelligent search in Amazon Kendra, powered by machine learning (ML).

This post shows how to configure the Amazon Kendra connector for Microsoft Teams and take advantage of the service’s intelligent search capabilities. We use an example of an illustrative Microsoft Teams instance where users discuss technical topics related to AWS.

Solution overview

Microsoft Teams content for active organizations is dynamic in nature due to continuous collaboration. Microsoft Teams includes public channels where any user can participate, and private channels where only those users who are members of these channels can communicate with each other. Furthermore, individuals can directly communicate with one another in one-on-one and ad hoc groups. This communication is in the form of messages and threads of replies, with optional document attachments.

In our solution, we configure Microsoft Teams as a data source for an Amazon Kendra search index using the Amazon Kendra connector for Microsoft Teams. Based on the configuration, when the data source is synchronized, the connector crawls and indexes all the content from Microsoft Teams that was created on or before a specific date. The connector also indexes the Access Control List (ACL) information for each message and document. When access control or user context filtering is enabled, the search results of a query made by a user includes results only from those documents that the user is authorized to read.

The Amazon Kendra connector for Microsoft Teams can integrate with AWS IAM Identity Center (Successor to AWS Single Sign-On). You first must enable IAM Identity Center and create an organization to sync users and groups from your active directory. The connector will use the user name and group lookup for the user context of the search queries.

With Amazon Kendra Experience Builder, you can build and deploy a low-code, fully functional search application to search your Microsoft Teams data source.

Prerequisites

To try out the Amazon Kendra connector for Microsoft Teams using this post as a reference, you need the following:

Note that the Microsoft Graph API places throttling limits on the number of concurrent calls to a service to prevent overuse of resources.

Configure Microsoft Teams

The following screenshot shows our example Microsoft Teams instance with sample content and the PDF file AWS_Well-Architect_Framework.pdf that we will use for our Amazon Kendra search queries.

The following steps describe the configuration of a new Amazon Kendra connector application in the Azure portal. This will create a user OAuth token to be used in configuring the Amazon Kendra connector for Microsoft Teams.

  1. Log in to Azure Portal with your Microsoft credentials.
  2. Register an application with the Microsoft Identity platform.

  1. Next to Client credentials, choose Add a certificate or secret to add a new client secret.

  1. For Description, enter a description (for example, KendraConnectorSecret).
  2. For Expires, choose an expiry date (for example, 6 months).
  3. Choose Add.

  1. Save the secret ID and secret value to use later when creating an Amazon Kendra data source.

  1. Choose Add a permission.

  1. Choose Microsoft Graph to add all necessary Microsoft Graph permissions.

  1. Choose Application permissions.

The registered application should have the following API permissions to allow crawling all entities supported by the Amazon Kendra connector for Microsoft Teams:

  • ChannelMessage.Read.All
  • Chat.Read
  • Chat.Read.All
  • Chat.ReadBasic
  • Chat.ReadBasic.All
  • ChatMessage.Read.All
  • Directory.Read.All
  • Files.Read.All
  • Group.Read.All
  • Mail.Read
  • Mail.ReadBasic
  • User.Read
  • User.Read.All
  • TeamMember.Read.All

However, you can select a lesser scope based on the entities chosen to be crawled. The following lists are the minimum sets of permissions needed for each entity:

  • Channel Post:
    • ChannelMessage.Read.All
    • Group.Read.All
    • User.Read
    • User.Read.All
    • TeamMember.Read.All (user-group mapping for identity crawl)
  • Channel Attachment:
    • ChannelMessage.Read.All
    • Group.Read.All
    • User.Read
    • User.Read.All
    • TeamMember.Read.All (user-group mapping for identity crawl)
  • Channel Wiki:
    • Group.Read.All
    • User.Read
    • User.Read.All
    • TeamMember.Read.All (user-group mapping for identity crawl)
  • Chat Message:
    • Chat.Read.All
    • ChatMessage.Read.All
    • ChatMember.Read.All
    • User.Read
    • User.Read.All
    • Group.Read.All
    • TeamMember.Read.All (user-group mapping for identity crawl)
  • Meeting Chat:
    • Chat.Read.All
    • ChatMessage.Read.All
    • ChatMember.Read.All
    • User.Read
    • User.Read.All
    • Group.Read.All
    • TeamMember.Read.All (user-group mapping for identity crawl)
  • Chat Attachment:
    • Chat.Read.All
    • ChatMessage.Read.All
    • ChatMember.Read.All
    • User.Read
    • User.Read.All
    • Group.Read.All
    • Files.Read.All
    • TeamMember.Read.All (user-group mapping for identity crawl)
  • Meeting File:
    • Chat.Read.All
    • ChatMessage.Read.All
    • ChatMember.Read.All
    • User.Read
    • User.Read.All
    • Group.Read.All
    • Files.Read.All
    • TeamMember.Read.All (user-group mapping for identity crawl)
  • Calendar Meeting:
    • Calendars.Read
    • Group.Read.All
    • TeamMember.Read.All
    • User.Read
    • User.Read.All
    • TeamMember.Read.All (user-group mapping for identity crawl)
  • Meeting Notes:
    • Group.Read.All
    • User.Read
    • User.Read.All
    • Files.Read.All
    • TeamMember.Read.All (user-group mapping for identity crawl)
  1. Select your permissions and choose Add permissions.

Configure the data source using the Amazon Kendra connector for Microsoft Teams

To add a data source to your Amazon Kendra index using the Microsoft Teams connector, you can use an existing Amazon Kendra index, or create a new Amazon Kendra index. Then complete the steps in this section. For more information on this topic, refer to Microsoft Teams.

  1. On the Amazon Kendra console, open the index and choose Data sources in the navigation pane.
  2. Choose Add data source.
  3. Under Microsoft Teams connector, choose Add connector.

  1. In the Specify data source details section, enter the details of your data source and choose Next.
  2. In the Define access and security section, for Tenant ID, enter the Microsoft Teams tenant ID from the Microsoft account dashboard.
  3. Under Authentication, you can either choose Create to add a new secret with the client ID and client secret of the Microsoft Teams tenant, or use an existing AWS Secrets Manager secret that has the client ID and client secret of the Microsoft Teams tenant that you want the connector to access.
  4. Choose Save.

  1. Optionally, choose the appropriate payment model:
    • Model A payment models are restricted to licensing and payment models that require security compliance.
    • Model B payment models are suitable for licensing and payment models that don’t require security compliance.
    • Use Evaluation Mode (default) for limited usage evaluation purposes.
  2. For IAM role, you can choose Create a new role or choose an existing IAM role configured with appropriate IAM policies to access the Secrets Manager secret, Amazon Kendra index, and data source.
  3. Choose Next.

  1. In the Configure sync settings section, provide information regarding your sync scope.

  1. For Sync mode, choose your sync mode (for this post, select Full sync).

With the Full sync option, every time the sync runs, Amazon Kendra will crawl all documents and ingest each document even if ingested earlier. The full refresh enables you to reset your Amazon Kendra index without the need to delete and create a new data source. If you choose New or modified content sync or New, modified, or deleted content sync, every time the sync job runs, it will process only objects added, modified, or deleted since the last crawl. Incremental crawls can help reduce runtime and cost when used with datasets that append new objects to existing data sources on a regular basis.

  1. For Sync run schedule, choose Run on demand.
  2. Choose Next.

  1. In the Set field mappings section, you can optionally configure the field mappings, wherein Microsoft Teams field names may be mapped to a different Amazon Kendra attribute or facet.
  2. Choose Next.

  1. Review your settings and confirm to add the data source.
  2. After the data source is added, choose Data sources in the navigation pane, select the newly added data source, and choose Sync now to start data source synchronization with the Amazon Kendra index.

The sync process can take upwards of 30 minutes (depending on the amount of data to be crawled).

Now let’s enable access control for the Amazon Kendra index.

  1. In the navigation pane, choose your index.
  2. On the User access control tab, choose Edit settings and change the settings to look like the following screenshot.
  3. Choose Next, then choose Update.

Perform intelligent search with Amazon Kendra

Before you try searching on the Amazon Kendra console or using the API, make sure that the data source sync is complete. To check, view the data sources and verify if the last sync was successful.

Now we’re ready to search our index.

  1. On the Amazon Kendra console, navigate to the index and choose Search indexed content in the navigation pane.
  2. Let’s use the query “How do you detect security events” and not provide an access token.

Based on our access control settings, a valid access token is needed to access authenticated content; therefore, when we use this search query without setting any user name or group, no results are returned.

  1. Next, choose Apply token and set the user name to a user in the domain (for example, usertest4) that has access to the Microsoft Teams content.

In this example, the search will return a result from the PDF file uploaded in the Microsoft Teams chat message.

  1. Finally, choose Apply token and set the user name to a different user in the domain (for example, usertest) that has access to different Microsoft Teams content.

In this example, the search will return a different Microsoft Teams chat message.

This confirms that the ACLs ingested in Amazon Kendra by the connector for Microsoft Teams are being enforced in the search results based on the user name.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Amazon Kendra connector for Microsoft Teams, delete that data source.

Conclusion

With the Amazon Kendra connector for Microsoft Teams, organizations can make invaluable information trapped in their Microsoft Teams instances available to their users securely using intelligent search powered by Amazon Kendra. Additionally, the connector provides facets for Microsoft Teams attributes such as channels, authors, and categories for the users to interactively refine the search results based on what they’re looking for.

To learn more about the Amazon Kendra connector for Microsoft Teams, refer to Microsoft Teams.

For more information on how you can create, modify, or delete metadata and content when ingesting your data from the Microsoft Teams, refer to Customizing document metadata during the ingestion process and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.


About the Authors

Praveen Edem is a Senior Solutions Architect at Amazon Web Services. He works with major financial services customers, architecting and modernizing their critical large-scale applications while adopting AWS services. He has over 20 years of IT experience in application development and software architecture.

Gunwant Walbe is a Software Development Engineer at Amazon Web Services. He is an avid learner and keen to adopt new technologies. He develops complex business applications, and Java is his primary language of choice.

Read More

Bring legacy machine learning code into Amazon SageMaker using AWS Step Functions

Bring legacy machine learning code into Amazon SageMaker using AWS Step Functions

Tens of thousands of AWS customers use AWS machine learning (ML) services to accelerate their ML development with fully managed infrastructure and tools. For customers who have been developing ML models on premises, such as their local desktop, they want to migrate their legacy ML models to the AWS Cloud to fully take advantage of the most comprehensive set of ML services, infrastructure, and implementation resources available on AWS.

The term legacy code refers to code that was developed to be manually run on a local desktop, and is not built with cloud-ready SDKs such as the AWS SDK for Python (Boto3) or Amazon SageMaker Python SDK. In other words, these legacy codes aren’t optimized for cloud deployment. The best practice for migration is to refactor these legacy codes using the Amazon SageMaker API or the SageMaker Python SDK. However, in some cases, organizations with a large number of legacy models may not have the time or resources to rewrite all those models.

In this post, we share a scalable and easy-to-implement approach to migrate legacy ML code to the AWS Cloud for inference using Amazon SageMaker and AWS Step Functions, with a minimum amount of code refactoring required. You can easily extend this solution to add more functionality. We demonstrate how two different personas, a data scientist and an MLOps engineer, can collaborate to lift and shift hundreds of legacy models.

Solution overview

In this framework, we run the legacy code in a container as a SageMaker Processing job. SageMaker runs the legacy script inside a processing container. The processing container image can either be a SageMaker built-in image or a custom image. The underlying infrastructure for a Processing job is fully managed by SageMaker. No change to the legacy code is required. Familiarity with creating SageMaker Processing jobs is all that is required.

We assume the involvement of two personas: a data scientist and an MLOps engineer. The data scientist is responsible for moving the code into SageMaker, either manually or by cloning it from a code repository such as AWS CodeCommit. Amazon SageMaker Studio provides an integrated development environment (IDE) for implementing various steps in the ML lifecycle, and the data scientist uses it to manually build a custom container that contains the necessary code artifacts for deployment. The container will be registered in a container registry such as Amazon Elastic Container Registry (Amazon ECR) for deployment purposes.

The MLOps engineer takes ownership of building a Step Functions workflow that we can reuse to deploy the custom container developed by the data scientist with the appropriate parameters. The Step Functions workflow can be as modular as needed to fit the use case, or it can consist of just one step to initiate a single process. To minimize the effort required to migrate the code, we have identified three modular components to build a fully functional deployment process:

  • Preprocessing
  • Inference
  • Postprocessing

The following diagram illustrates our solution architecture and workflow.

The following steps are involved in this solution:

  1. The data scientist persona uses Studio to import legacy code through cloning from a code repository, and then modularizing the code into separate components that follow the steps of the ML lifecycle (preprocessing, inference, and postprocessing).
  2. The data scientist uses Studio, and specifically the Studio Image Build CLI tool provided by SageMaker, to build a Docker image. This CLI tool allows the data scientist to build the image directly within Studio and automatically registers the image into Amazon ECR.
  3. The MLOps engineer uses the registered container image and creates a deployment for a specific use case using Step Functions. Step Functions is a serverless workflow service that can control SageMaker APIs directly through the use of the Amazon States Language.

SageMaker Processing job

Let’s understand how a SageMaker Processing job runs. The following diagram shows how SageMaker spins up a Processing job.

SageMaker takes your script, copies your data from Amazon Simple Storage Service (Amazon S3), and then pulls a processing container. The processing container image can either be a SageMaker built-in image or a custom image that you provide. The underlying infrastructure for a Processing job is fully managed by SageMaker. Cluster resources are provisioned for the duration of your job, and cleaned up when a job is complete. The output of the Processing job is stored in the S3 bucket you specified. To learn more about building your own container, refer to Build Your Own Processing Container (Advanced Scenario).

The SageMaker Processing job sets up your processing image using a Docker container entrypoint script. You can also provide your own custom entrypoint by using the ContainerEntrypoint and ContainerArguments parameters of the AppSpecification API. If you use your own custom entrypoint, you have the added flexibility to run it as a standalone script without rebuilding your images.

For this example, we construct a custom container and use a SageMaker Processing job for inference. Preprocessing and postprocessing jobs utilize the script mode with a pre-built scikit-learn container.

Prerequisites

To follow along this post, complete the following prerequisite steps:

  1. Create a Studio domain. For instructions, refer to Onboard to Amazon SageMaker Domain Using Quick setup.
  2. Create an S3 bucket.
  3. Clone the provided GitHub repo into Studio.

The GitHub repo is organized into different folders that correspond to various stages in the ML lifecycle, facilitating easy navigation and management:

Migrate the legacy code

In this step, we act as the data scientist responsible for migrating the legacy code.

We begin by opening the build_and_push.ipynb notebook.

The initial cell in the notebook guides you in installing the Studio Image Build CLI. This CLI simplifies the setup process by automatically creating a reusable build environment that you can interact with through high-level commands. With the CLI, building an image is as easy as telling it to build, and the result will be a link to the location of your image in Amazon ECR. This approach eliminates the need to manage the complex underlying workflow orchestrated by the CLI, streamlining the image building process.

Before we run the build command, it’s important to ensure that the role running the command has the necessary permissions, as specified in the CLI GitHub readme or related post. Failing to grant the required permissions can result in errors during the build process.

See the following code:

#Install sagemaker_studio_image_build utility
import sys
!{sys.executable} -m pip install sagemaker_studio_image_build

To streamline your legacy code, divide it into three distinct Python scripts named preprocessing.py, predict.py, and postprocessing.py. Adhere to best programming practices by converting the code into functions that are called from a main function. Ensure that all necessary libraries are imported and the requirements.txt file is updated to include any custom libraries.

After you organize the code, package it along with the requirements file into a Docker container. You can easily build the container from within Studio using the following command:

sm-docker build .

By default, the image will be pushed to an ECR repository called sagemakerstudio with the tag latest. Additionally, the execution role of the Studio app will be utilized, along with the default SageMaker Python SDK S3 bucket. However, these settings can be easily altered using the appropriate CLI options. See the following code:

sm-docker build . --repository mynewrepo:1.0 --role SampleDockerBuildRole --bucket sagemaker-us-east-1-0123456789999 --vpc-id vpc-0c70e76ef1c603b94 --subnet-ids subnet-0d984f080338960bb,subnet-0ac3e96808c8092f2 --security-group-ids sg-0d31b4042f2902cd0

Now that the container has been built and registered in an ECR repository, it’s time to dive deeper into how we can use it to run predict.py. We also show you the process of using a pre-built scikit-learn container to run preprocessing.py and postprocessing.py.

Productionize the container

In this step, we act as the MLOps engineer who productionizes the container built in the previous step.

We use Step Functions to orchestrate the workflow. Step Functions allows for exceptional flexibility in integrating a diverse range of services into the workflow, accommodating any existing dependencies that may exist in the legacy system. This approach ensures that all necessary components are seamlessly integrated and run in the desired sequence, resulting in an efficient and effective workflow solution.

Step Functions can control certain AWS services directly from the Amazon States Language. To learn more about working with Step Functions and its integration with SageMaker, refer to Manage SageMaker with Step Functions. Using the Step Functions integration capability with SageMaker, we run the preprocessing and postprocessing scripts using a SageMaker Processing job in script mode and run inference as a SageMaker Processing job using a custom container. We do so using AWS SDK for Python (Boto3) CreateProcessingJob API calls.

Preprocessing

SageMaker offers several options for running custom code. If you only have a script without any custom dependencies, you can run the script as a Bring Your Own Script (BYOS). To do this, simply pass your script to the pre-built scikit-learn framework container and run a SageMaker Processing job in script mode using the ContainerArguments and ContainerEntrypoint parameters in the AppSpecification API. This is a straightforward and convenient method for running simple scripts.

Check out the “Preprocessing Script Mode” state configuration in the sample Step Functions workflow to understand how to configure the CreateProcessingJob API call to run a custom script.

Inference

You can run a custom container using the Build Your Own Processing Container approach. The SageMaker Processing job operates with the /opt/ml local path, and you can specify your ProcessingInputs and their local path in the configuration. The Processing job then copies the artifacts to the local container and starts the job. After the job is complete, it copies the artifacts specified in the local path of the ProcessingOutputs to its specified external location.

Check out the “Inference Custom Container” state configuration in the sample Step Functions workflow to understand how to configure the CreateProcessingJob API call to run a custom container.

Postprocessing

You can run a postprocessing script just like a preprocessing script using the Step Functions CreateProcessingJob step. Running a postprocessing script allows you to perform custom processing tasks after the inference job is complete.

Create the Step Functions workflow

For quickly prototyping, we use the Step Functions Amazon States Language. You can edit the Step Functions definition directly by using the States Language. Refer to the sample Step Functions workflow.

You can create a new Step Functions state machine on the Step Functions console by selecting Write your workflow in code.

Step Functions can look at the resources you use and create a role. However, you may see the following message:

“Step Functions cannot generate an IAM policy if the RoleArn for SageMaker is from a Path. Hardcode the SageMaker RoleArn in your state machine definition, or choose an existing role with the proper permissions for Step Functions to call SageMaker.”

To address this, you must create an AWS Identity and Access Management (IAM) role for Step Functions. For instructions, refer to Creating an IAM role for your state machine. Then attach the following IAM policy to provide the required permissions for running the workflow:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sagemaker:createProcessingJob",
                "sagemaker:ListTags",
                "sagemaker:AddTags"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "iam:PassedToService": "sagemaker.amazonaws.com"
                }
            }
        }
    ]
}

The following figure illustrates the flow of data and container images into each step of the Step Functions workflow.

The following is a list of minimum required parameters to initialize in Step Functions; you can also refer to the sample input parameters JSON:

  • input_uri – The S3 URI for the input files
  • output_uri – The S3 URI for the output files
  • code_uri – The S3 URI for script files
  • custom_image_uri – The container URI for the custom container you have built
  • scikit_image_uri – The container URI for the pre-built scikit-learn framework
  • role – The execution role to run the job
  • instance_type – The instance type you need to use to run the container
  • volume_size – The storage volume size you require for the container
  • max_runtime – The maximum runtime for the container, with a default value of 1 hour

Run the workflow

We have broken down the legacy code into manageable parts: preprocessing, inference, and postprocessing. To support our inference needs, we constructed a custom container equipped with the necessary library dependencies. Our plan is to utilize Step Functions, taking advantage of its ability to call the SageMaker API. We have shown two methods for running custom code using the SageMaker API: a SageMaker Processing job that utilizes a pre-built image and takes a custom script at runtime, and a SageMaker Processing job that uses a custom container, which is packaged with the necessary artifacts to run custom inference.

The following figure shows the run of the Step Functions workflow.

Summary

In this post, we discussed the process of migrating legacy ML Python code from local development environments and implementing a standardized MLOps procedure. With this approach, you can effortlessly transfer hundreds of models and incorporate your desired enterprise deployment practices. We presented two different methods for running custom code on SageMaker, and you can select the one that best suits your needs.

If you require a highly customizable solution, it’s recommended to use the custom container approach. You may find it more suitable to use pre-built images to run your custom script if you have basic scripts and don’t need to create your custom container, as described in the preprocessing step mentioned earlier. Furthermore, if required, you can apply this solution to containerize legacy model training and evaluation steps, just like how the inference step is containerized in this post.


About the Authors

Bhavana Chirumamilla is a Senior Resident Architect at AWS with a strong passion for data and machine learning operations. She brings a wealth of experience and enthusiasm to help enterprises build effective data and ML strategies. In her spare time, Bhavana enjoys spending time with her family and engaging in various activities such as traveling, hiking, gardening, and watching documentaries.

Shyam Namavaram is a senior artificial intelligence (AI) and machine learning (ML) specialist solutions architect at Amazon Web Services (AWS). He passionately works with customers to accelerate their AI and ML adoption by providing technical guidance and helping them innovate and build secure cloud solutions on AWS. He specializes in AI and ML, containers, and analytics technologies. Outside of work, he loves playing sports and experiencing nature with trekking.

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his PhD in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently, he helps customers in the financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

Srinivasa Shaik is a Solutions Architect at AWS based in Boston. He helps enterprise customers accelerate their journey to the cloud. He is passionate about containers and machine learning technologies. In his spare time, he enjoys spending time with his family, cooking, and traveling.

Read More

Maximize performance and reduce your deep learning training cost with AWS Trainium and Amazon SageMaker

Maximize performance and reduce your deep learning training cost with AWS Trainium and Amazon SageMaker

Today, tens of thousands of customers are building, training, and deploying machine learning (ML) models using Amazon SageMaker to power applications that have the potential to reinvent their businesses and customer experiences. These ML models have been increasing in size and complexity over the last few years, which has led to state-of-the-art accuracies across a range of tasks and also pushing the time to train from days to weeks. As a result, customers must scale their models across hundreds to thousands of accelerators, which makes them more expensive to train.

SageMaker is a fully managed ML service that helps developers and data scientists easily build, train, and deploy ML models. SageMaker already provides the broadest and deepest choice of compute offerings featuring hardware accelerators for ML training, including G5 (Nvidia A10G) instances and P4d (Nvidia A100) instances.

Growing compute requirements calls for faster and more cost-effective processing power. To further reduce model training times and enable ML practitioners to iterate faster, AWS has been innovating across chips, servers, and data center connectivity. The new Trn1 instances powered by AWS Trainium chips offer the best price-performance and the fastest ML model training on AWS, providing up to 50% lower cost to train deep learning models over comparable GPU-based instances without any drop in accuracy.

In this post, we show how you can maximize your performance and reduce cost using Trn1 instances with SageMaker.

Solution overview

SageMaker training jobs support ml.trn1 instances, powered by Trainium chips, which are purpose built for high-performance ML training applications in the cloud. You can use ml.trn1 instances on SageMaker to train natural language processing (NLP), computer vision, and recommender models across a broad set of applications, such as speech recognition, recommendation, fraud detection, image and video classification, and forecasting. The ml.trn1 instances feature up to 16 Trainium chips, which is a second-generation ML chip built by AWS after AWS Inferentia. ml.trn1 instances are the first Amazon Elastic Compute Cloud (Amazon EC2) instances with up to 800 Gbps of Elastic Fabric Adapter (EFA) network bandwidth. For efficient data and model parallelism, each ml.trn1.32xl instance has 512 GB of high-bandwidth memory, delivers up to 3.4 petaflops of FP16/BF16 compute power, and features NeuronLink, an intra-instance, high-bandwidth, nonblocking interconnect.

Trainium is available in two configurations and can be used in the US East (N. Virginia) and US West (Oregon) Regions.

The following table summarizes the features of the Trn1 instances.

Instance Size Trainium
Accelerators
Accelerator
Memory
(GB)
vCPUs Instance
Memory
(GiB)
Network
Bandwidth
(Gbps)
EFA and
RDMA
Support
trn1.2xlarge 1 32 8 32 Up to 12.5 No
trn1.32xlarge 16 512 128 512 800 Yes
trn1n.32xlarge (coming soon) 16 512 128 512 1600 Yes

Let’s understand how to use Trainium with SageMaker with a simple example. We will train a text classification model with SageMaker training and PyTorch using the Hugging Face Transformers Library.

We use the Amazon Reviews dataset, which consists of reviews from amazon.com. The data spans a period of 18 years, comprising approximately 35 million reviews up to March 2013. Reviews include product and user information, ratings, and a plaintext review. The following code is an example from the AmazonPolarity test set:

{
title':'Great CD',
'content':"My lovely Pat has one of the GREAT voices of her generation. I have listened to this CD for YEARS and I still LOVE IT. When I'm in a good mood it makes me feel better. A bad mood just evaporates like sugar in the rain. This CD just oozes LIFE. Vocals are jusat STUUNNING and lyrics just kill. One of life's hidden gems. This is a desert isle CD in my book. Why she never made it big is just beyond me. Everytime I play this, no matter black, white, young, old, male, female EVERYBODY says one thing ""Who was that singing ?""",
'label':1
}

For this post, we only use the content and label fields. The content field is a free text review, and the label field is a binary value containing 1 or 0 for positive or negative reviews, respectively.

For our algorithm, we use BERT, a transformer model pre-trained on a large corpus of English data in a self-supervised fashion. This model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification, or question answering.

Implementation details

Let’s begin by taking a closer look at the different components involved in training the model:

  • AWS Trainium – At its core, each Trainium instance has Trainium devices built into it. Trn1.2xlarge has 1 Trainium device, and Trn1.32xlarge has 16 Trainium devices. Each Trainium device consists of compute (2 NeuronCore-v2), 32 GB of HBM device memory, and NeuronLink for fast inter-device communication. Each NeuronCore-v2 consists of a fully independent heterogenous compute unit with separate engines (Tensor/Vector/Scalar/GPSIMD). GPSIMD are fully programmable general-purpose processors that you can use to implement custom operators and run them directly on the NeuronCore engines.
  • Amazon SageMaker Training – SageMaker provides a fully managed training experience to easily train models without having to worry about infrastructure. When you use SageMaker Training, it runs everything needed for a training job, such as code, container, and data, in a compute infrastructure separate from the invocation environment. This allows us to run experiments in parallel and iterate fast. SageMaker provides a Python SDK to launch training jobs. The example in this post uses the SageMaker Python SDK to trigger the training job using Trainium.
  • AWS Neuron – Because Trainium NeuronCore has its own compute engine, we need a mechanism to compile our training code. The AWS Neuron compiler takes the code written in Pytorch/XLA and optimizes it to run on Neuron devices. The Neuron compiler is integrated as part of the Deep Learning Container we will use for training our model.
  • PyTorch/XLA – This Python package uses the XLA deep learning compiler to connect the PyTorch deep learning framework and cloud accelerators like Trainium. Building a new PyTorch network or converting an existing one to run on XLA devices requires only a few lines of XLA-specific code. We will see for our use case what changes we need to make.
  • Distributed training – To run the training efficiently on multiple NeuronCores, we need a mechanism to distribute the training into available NeuronCores. SageMaker supports torchrun with Trainium instances, which can be used to run multiple processes equivalent to the number of NeuronCores in the cluster. This is done by passing the distribution parameter to the SageMaker estimator as follows, which starts a data parallel distributed training where the same model is loaded into different NeuronCores that process separate data batches:
distribution={"torch_distributed": {"enabled": True}}

Script changes needed to run on Trainium

Let’s look at the code changes needed to adopt a regular GPU-based PyTorch script to run on Trainium. At a high level, we need to make the following changes:

  1. Replace GPU devices with Pytorch/XLA devices. Because we use torch distribution, we need to initialize the training with XLA as the device as follows:
    device = "xla"
    torch.distributed.init_process_group(device)

  2. We use the PyTorch/XLA distributed backend to bridge the PyTorch distributed APIs to XLA communication semantics.
  3. We use PyTorch/XLA MpDeviceLoader for the data ingestion pipelines. MpDeviceLoader helps improve performance by overlapping three steps: tracing, compilation, and data batch loading to the device. We need to wrap the PyTorch dataloader with the MpDeviceDataLoader as follows:
    train_device_loader = pl.MpDeviceLoader(train_loader, "xla")

  4. Run the optimization step using the XLA-provided API as shown in the following code. This consolidates the gradients between cores and issues the XLA device step computation.
    torch_xla.core.xla_model.optimizer_step(optimizer)

  5. Map CUDA APIs (if any) to generic PyTorch APIs.
  6. Replace CUDA fused optimizers (if any) with generic PyTorch alternatives.

The entire example, which trains a text classification model using SageMaker and Trainium, is available in the following GitHub repo. The notebook file Fine tune Transformers for building classification models using SageMaker and Trainium.ipynb is the entrypoint and contains step-by-step instructions to run the training.

Benchmark tests

In the test, we ran two training jobs: one on ml.trn1.32xlarge, and one on ml.p4d.24xlarge with the same batch size, training data, and other hyperparameters. During the training jobs, we measured the billable time of the SageMaker training jobs, and calculated the price-performance by multiplying the time required to run training jobs in hours by the price per hour for the instance type. We selected the best result for each instance type out of multiple jobs runs.

The following table summarizes our benchmark findings.

Model Instance Type Price (per node * hour) Throughput (iterations/sec) ValidationAccuracy Billable Time (sec) Training Cost in $
BERT base classification ml.trn1.32xlarge 24.725 6.64 0.984 6033 41.47
BERT base classification ml.p4d.24xlarge 37.69 5.44 0.984 6553 68.6

The results showed that the Trainium instance costs less than the P4d instance, providing similar throughput and accuracy when training the same model with the same input data and training parameters. This means that the Trainium instance delivers better price-performance than GPU-based P4D instances. With a simple example like this, we can see Trainium offers about 22% faster time to train and up to 50% lower cost over P4d instances.

Deploy the trained model

After we train the model, we can deploy it to various instance types such as CPU, GPU, or AWS Inferentia. The key point to note is the trained model isn’t dependent on specialized hardware to deploy and make inference. SageMaker provides mechanisms to deploy a trained model using both real-time or batch mechanisms. The notebook example in the GitHub repo contains code to deploy the trained model as a real-time endpoint using an ml.c5.xlarge (CPU-based) instance.

Conclusion

In this post, we looked at how to use Trainium and SageMaker to quickly set up and train a classification model that gives up to 50% cost savings without compromising on accuracy. You can use Trainium for a wide range of use cases that involve pre-training or fine-tuning Transformer-based models. For more information about support of various model architectures, refer to Model Architecture Fit Guidelines.


About the Authors

Arun Kumar Lokanatha is a Senior ML Solutions Architect with the Amazon SageMaker Service team. He focuses on helping customers build, train, and migrate ML production workloads to SageMaker at scale. He specializes in Deep Learning especially in the area of NLP and CV. Outside of work, he enjoys Running and hiking.

Mark Yu is a Software Engineer in AWS SageMaker. He focuses on building large-scale distributed training systems, optimizing training performance, and developing high-performance ml training hardwares, including SageMaker trainium. Mark also has in-depth knowledge on the machine learning infrastructure optimization. In his spare time, he enjoys hiking, and running.

Omri Fuchs is a Software Development Manager at AWS SageMaker. He is the technical leader responsible for SageMaker training job platform, focusing on optimizing SageMaker training performance, and improving training experience. He has a passion for cutting-edge ML and AI technology. In his spare time, he likes cycling, and hiking.

Gal Oshri is a Senior Product Manager on the Amazon SageMaker team. He has 7 years of experience working on Machine Learning tools, frameworks, and services.

Read More