Enable CI/CD of multi-Region Amazon SageMaker endpoints

Amazon SageMaker and SageMaker inference endpoints provide a capability of training and deploying your AI and machine learning (ML) workloads. With inference endpoints, you can deploy your models for real-time or batch inference. The endpoints support various types of ML models hosted using AWS Deep Learning Containers or your own containers with custom AI/ML algorithms. When you launch SageMaker inference endpoints with multiple instances, SageMaker distributes the instances across multiple Availability Zones (in a single Region) for high availability.

In some cases, however, to ensure lowest possible latency for customers in diverse geographical areas, you may require deploying inference endpoints in multiple Regions. Multi-Regional deployment of SageMaker endpoints and other related application and infrastructure components can also be part of a disaster recovery strategy for your mission-critical workloads aimed at mitigating the risk of a Regional failure.

SageMaker Projects implements a set of pre-built MLOps templates that can help manage endpoint deployments. In this post, we show how you can extend an MLOps SageMaker Projects pipeline to enable multi-Regional deployment of your AI/ML inference endpoints.

Solution overview

SageMaker Projects deploys both training and deployment MLOPs pipelines; you can use these to train a model and deploy it using an inference endpoint. To reduce complexity and cost of a multi-Region solution, we assume that you train the model in a single Region and deploy inference endpoints in two or more Regions.

This post presents a solution that slightly modifies a SageMaker project template to support multi-Region deployment. To better illustrate the changes, the following figure displays both a standard MLOps pipeline created automatically by SageMaker (Steps 1-5) as well as changes required to extend it to a secondary Region (Steps 6-11).

The SageMaker Projects template automatically deploys a boilerplate MLOps solution, which includes the following components:

Amazon EventBridge monitors AWS CodeCommit repositories for changes and starts a run of AWS CodePipeline if a code commit is detected.
If there is a code change, AWS CodeBuild orchestrates the model training using SageMaker training jobs.
After the training job is complete, the SageMaker model registry registers and catalogs the trained model.
To prepare for the deployment stage, CodeBuild extends the default AWS CloudFormation template configuration files with parameters of an approved model from the model registry.
Finally, CodePipeline runs the CloudFormation templates to deploy the approved model to the staging and production inference endpoints.

The following additional steps modify the MLOps Projects template to enable the AI/ML model deployment in the secondary Region:

A replica of the Amazon Simple Storage Service (Amazon S3) bucket in the primary Region storing model artifacts is required in the secondary Region.
The CodePipeline template is extended with more stages to run a cross-Region deployment of the approved model.
As part of the cross-Region deployment process, the CodePipeline template uses a new CloudFormation template to deploy the inference endpoint in a secondary Region. The CloudFormation template deploys the model from the model artifacts from the S3 replica bucket created in Step 6.

9–11 optionally, create resources in Amazon Route 53, Amazon API Gateway, and AWS Lambda to route application traffic to inference endpoints in the secondary Region.

Prerequisites

Create a SageMaker project in your primary Region (us-east-2 in this post). Complete the steps in Building, automating, managing, and scaling ML workflows using Amazon SageMaker Pipelines until the section Modifying the sample code for a custom use case.

Update your pipeline in CodePipeline

In this section, we discuss how to add manual CodePipeline approval and cross-Region model deployment stages to your existing pipeline created for you by SageMaker.

On the CodePipeline console in your primary Region, find and select the pipeline containing your project name and ending with deploy. This pipeline has already been created for you by SageMaker Projects. You modify this pipeline to add AI/ML endpoint deployment stages for the secondary Region.
Choose Edit.
Choose Add stage.
For Stage name, enter SecondaryRegionDeployment.
Choose Add stage.
In the SecondaryRegionDeployment stage, choose Add action group.In this action group, you add a manual approval step for model deployment in the secondary Region.
For Action name, enter ManualApprovaltoDeploytoSecondaryRegion.
For Action provider, choose Manual approval.
Leave all other settings at their defaults and choose Done.
In the SecondaryRegionDeployment stage, choose Add action group (after ManualApprovaltoDeploytoSecondaryRegion).In this action group, you add a cross-Region AWS CloudFormation deployment step. You specify the names of build artifacts that you create later in this post.
For Action name, enter DeploytoSecondaryRegion.
For Action provider, choose AWS Cloud Formation.
For Region, enter your secondary Region name (for example, us-west-2).
For Input artifacts, enter BuildArtifact.
For ActionMode, enter CreateorUpdateStack.
For StackName, enter DeploytoSecondaryRegion.
Under Template, for Artifact Name, select BuildArtifact.
Under Template, for File Name, enter template-export-secondary-region.yml.
Turn Use Configuration File on.
Under Template, for Artifact Name, select BuildArtifact.
Under Template, for File Name, enter secondary-region-config-export.json.
Under Capabilities, choose CAPABILITY_NAMED_IAM.
For Role, choose AmazonSageMakerServiceCatalogProductsUseRole created by SageMaker Projects.
Choose Done.
Choose Save.
If a Save pipeline changes dialog appears, choose Save again.

Modify IAM role

We need to add additional permissions to the AWS Identity and Access Management (IAM) role AmazonSageMakerServiceCatalogProductsUseRole created by AWS Service Catalog to enable CodePipeline and S3 bucket access for cross-Region deployment.

On the IAM console, choose Roles in the navigation pane.
Search for and select AmazonSageMakerServiceCatalogProductsUseRole.
Choose the IAM policy under Policy name: AmazonSageMakerServiceCatalogProductsUseRole-XXXXXXXXX.
Choose Edit Policy and then JSON.
Modify the AWS CloudFormation permissions to allow CodePipeline to sync the S3 bucket in the secondary Region. You can replace the existing IAM policy with the updated one from the following GitHub repo (see lines:16-18, 198, 213)
Choose Review policy.
Choose Save changes.

Add the deployment template for the secondary Region

To spin up an inference endpoint in the secondary Region, the SecondaryRegionDeployment stage needs a CloudFormation template (for endpoint-config-template-secondary-region.yml) and a configuration file (secondary-region-config.json).

The CloudFormation template is configured entirely through parameters; you can further modify it to fit your needs. Similarly, you can use the config file to define the parameters for the endpoint launch configuration, such as the instance type and instance count:

{
  "Parameters": {
    "StageName": "secondary-prod",
    "EndpointInstanceCount": "1",
    "EndpointInstanceType": "ml.m5.large",
    "SamplingPercentage": "100",
    "EnableDataCapture": "true"
  }

To add these files to your project, download them from the provided links and upload them to Amazon SageMaker Studio in the primary Region. In Studio, choose File Browser and then the folder containing your project name and ending with modeldeploy.

Upload these files to the deployment repository’s root folder by choosing the upload icon. Make sure the files are located in the root folder as shown in the following screenshot.

Modify the build Python file

Next, we need to adjust the deployment build.py file to enable SageMaker endpoint deployment in the secondary Region to do the following:

Retrieve the location of model artifacts and Amazon Elastic Container Registry (Amazon ECR) URI for the model image in the secondary Region
Prepare a parameter file that is used to pass the model-specific arguments to the CloudFormation template that deploys the model in the secondary Region

You can download the updated build.py file and replace the existing one in your folder. In Studio, choose File Browser and then the folder containing your project name and ending with modeldeploy. Locate the build.py file and replace it with the one you downloaded.

The CloudFormation template uses the model artifacts stored in a S3 bucket and the Amazon ECR image path to deploy the inference endpoint in the secondary Region. This is different from the deployment from the model registry in the primary Region, because you don’t need to have a model registry in the secondary Region.

Modify the buildspec file

buildspec.yml contains instructions run by CodeBuild. We modify this file to do the following:

Install the SageMaker Python library needed to support the code run
Pass through the –secondary-region and model-specific parameters to build.py
Add the S3 bucket content sync from the primary to secondary Regions
Export the secondary Region CloudFormation template and associated parameter file as artifacts of the CodeBuild step

Open the buildspec.yml file from the model deploy folder and make the highlighted modifications as shown in the following screenshot.

Alternatively, you can download the following buildspec.yml file to replace the default file.

Add CodeBuild environment variables

In this step, you add configuration parameters required for CodeBuild to create the model deployment configuration files in the secondary Region.

On the CodeBuild console in the primary Region, find the project containing your project name and ending with deploy. This project has already been created for you by SageMaker Projects.

Choose the project and on the Edit menu, choose Environment.

In the Advanced configuration section, deselect Allow AWS CodeBuild to modify this service role so it can be used with this build project.
Add the following environment variables, defining the names of the additional CloudFormation templates, secondary Region, and model-specific parameters:
1. EXPORT_TEMPLATE_NAME_SECONDARY_REGION – For Value, enter template-export-secondary-region.yml and for Type, choose PlainText.
2. EXPORT_TEMPLATE_SECONDARY_REGION_CONFIG – For Value, enter secondary-region-config-export.json and for Type, choose PlainText.
3. AWS_SECONDARY_REGION – For Value, enter us-west-2 and for Type, choose PlainText.
4. FRAMEWORK – For Value, enter xgboost (replace with your framework) and for Type, choose PlainText.
5. MODEL_VERSION – For Value, enter 1.0-1 (replace with your model version) and for Type, choose PlainText.
Copy the value of ARTIFACT_BUCKET into Notepad or another text editor. You need this value in the next step.
Choose Update environment.

You need the values you specified for model training for FRAMEWORK and MODEL_VERSION. For example, to find these values for the Abalone model used in MLOps boilerplate deployment, open Studio and on the File Browser menu, open the folder with your project name and ending with modelbuild. Navigate to pipelines/abalone and open the pipeline.py file. Search for sagemaker.image_uris.retrieve and copy the relevant values.

Create an S3 replica bucket in the secondary Region

We need to create an S3 bucket to hold the model artifacts in the secondary Region. SageMaker uses this bucket to get the latest version of model to spin up an inference endpoint. You only need to do this one time. CodeBuild automatically syncs the content of the bucket in the primary Region to the replication bucket with each pipeline run.

On the Amazon S3 console, choose Create bucket.
For Bucket name, enter the value of ARTEFACT_BUCKET copied in the previous step and append -replica to the end (for example, sagemaker-project-X-XXXXXXXX-replica.
For AWS Region, enter your secondary Region (us-west-2).
Leave all other values at their default and choose Create bucket.

Approve a model for deployment

The deployment stage of the pipeline requires an approved model to start. This is required for the deployment in the primary Region.

In Studio (primary Region), choose SageMaker resources in the navigation pane.
For Select the resource to view, choose Model registry.
Choose model group name starting with your project name.
In the right pane, check the model version, stage and status.
If the status shows pending, choose the model version and then choose Update status.
Change status to Approved, then choose Update status.

Deploy and verify the changes

All the changes required for multi-Region deployment of your SageMaker inference endpoint are now complete and you can start the deployment process.

In Studio, save all the files you edited, choose Git, and choose the repository containing your project name and ending with deploy.
Choose the plus sign to make changes.
Under Changed, add build.py and buildspec.yml.
Under Untracked, add endpoint-config-template-secondary-region.yml and secondary-region-config.json.
Enter a comment in the Summary field and choose Commit.
Push the changes to the repository by choosing Push.

Pushing these changes to the CodeCommit repository triggers a new pipeline run, because an EventBridge event monitors for pushed commits. After a few moments, you can monitor the run by navigating to the pipeline on the CodePipeline console.

Make sure to provide manual approval for deployment to production and the secondary Region.

You can verify that the secondary Region endpoint is created on the SageMaker console, by choosing Dashboard in the navigation pane and confirming the endpoint status in Recent activity.

Add API Gateway and Route 53 (Optional)

You can optionally follow the instructions in Call an Amazon SageMaker model endpoint using Amazon API Gateway and AWS Lambda to expose the SageMaker inference endpoint in the secondary Region as an API using API Gateway and Lambda.

Clean up

To delete the SageMaker project, see Delete an MLOps Project using Amazon SageMaker Studio. To ensure the secondary inference endpoint is destroyed, go to the AWS CloudFormation console and delete the related stacks in your primary and secondary Regions; this destroys the SageMaker inference endpoints.

Conclusion

In this post, we showed how a MLOps specialist can modify a preconfigured MLOps template for their own multi-Region deployment use case, such as deploying workloads in multiple geographies or as part of implementing a multi-Regional disaster recovery strategy. With this deployment approach, you don’t need to configure services in the secondary Region and can reuse the CodePipeline and CloudBuild setups in the primary Region for cross-Regional deployment. Additionally, you can save on costs by continuing the training of your models in the primary Region while utilizing SageMaker inference in multiple Regions to scale your AI/ML deployment globally.

Please let us know your feedback in the comments section.

About the Authors

Mehran Najafi, PhD, is a Senior Solutions Architect for AWS focused on AI/ML and SaaS solutions at Scale.

Steven Alyekhin is a Senior Solutions Architect for AWS focused on MLOps at Scale.

Vedere AI