Amazon SageMaker Domain supports SageMaker machine learning (ML) environments, including SageMaker Studio and SageMaker Canvas. SageMaker Studio is a fully integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models, improving data science team productivity by up to 10x. SageMaker Canvas expands access to machine learning by providing business analysts with a visual interface that allows them to generate accurate ML predictions on their own—without requiring any ML experience or having to write a single line of code.
HashiCorp Terraform is an infrastructure as code (IaC) tool that lets you organize your infrastructure in reusable code modules. AWS customers rely on IaC to design, develop, and manage their cloud infrastructure, such as SageMaker Domains. IaC ensures that customer infrastructure and services are consistent, scalable, and reproducible while following best practices in the area of development operations (DevOps). Using Terraform, you can develop and manage your SageMaker Domain and its supporting infrastructure in a consistent and repeatable manner.
In this post, we demonstrate the Terraform implementation to deploy a SageMaker Domain and the Amazon Virtual Private Cloud (Amazon VPC) it associates with. The solution will use Terraform to create:
- A VPC with subnets, security groups, as well as VPC endpoints to support VPC only mode for the SageMaker Domain.
- A SageMaker Domain in VPC only mode with a user profile.
- An AWS Key Management Service (AWS KMS) key to encrypt the SageMaker Studio’s Amazon Elastic File System (Amazon EFS) volume.
- A Lifecycle Configuration attached to the SageMaker Domain to automatically shut down idle Studio notebook instances.
- A SageMaker Domain execution role and IAM policies to enable SageMaker Studio and Canvas functionalities.
The solution described in this post is available at this GitHub repo.
The following image shows SageMaker Domain in VPC only mode.
By launching SageMaker Domain in your VPC, you can control the data flow from your SageMaker Studio and Canvas environments. This allows you to restrict internet access, monitor and inspect traffic using standard AWS networking and security capabilities, and connect to other AWS resources through VPC endpoints.
VPC requirements to use VPC only mode
Creating a SageMaker Domain in VPC only mode requires a VPC with the following configurations:
- At least two private subnets, each in a different Availability Zone, to ensure high availability.
- Ensure your subnets have the required number of IP addresses needed. We recommend between two and four IP addresses per user. The total IP address capacity for a Studio domain is the sum of available IP addresses for each subnet provided when the domain is created.
- Set up one or more security groups with inbound and outbound rules that together allow the following traffic:
- NFS traffic over TCP on port 2049 between the domain and the Amazon EFS volume.
- TCP traffic within the security group. This is required for connectivity between the JupyterServer app and the KernelGateway apps. You must allow access to at least ports in the range 8192–65535.
- Create a gateway endpoint for Amazon Simple Storage Service (Amazon S3). SageMaker Studio needs to access Amazon S3 from your VPC using Gateway VPC endpoints. After you create the gateway endpoint, you need to add it as a target in your route table for traffic destined from your VPC to Amazon S3.
- Create interface VPC endpoints (AWS PrivateLink) to allow Studio to access the following services with the corresponding service names. You must also associate a security group for your VPC with these endpoints to allow all inbound traffic from port 443:
- SageMaker API:
com.amazonaws.region.sagemaker.api. This is required to communicate with the SageMaker API.
- SageMaker runtime:
com.amazonaws.region.sagemaker.runtime. This is required to run Studio notebooks and to train and host models.
- SageMaker Feature Store:
com.amazonaws.region.sagemaker.featurestore-runtime. This is required to use SageMaker Feature Store.
- SageMaker Projects:
com.amazonaws.region.servicecatalog. This is required to use SageMaker Projects.
- SageMaker API:
Additional VPC endpoints to use SageMaker Canvas
In addition to the previously mentioned VPC endpoints, to use SageMaker Canvas, you need to also create the following interface VPC endpoints:
- Amazon Forecast and Amazon Forecast Query:
com.amazonaws.region.forecastquery. These are required to use Amazon Forecast.
- Amazon Rekognition:
com.amazonaws.region.rekognition. This is required to use Amazon Rekognition.
- Amazon Textract:
com.amazonaws.region.textract. This is required to use Amazon Textract.
- Amazon Comprehend:
com.amazonaws.region.comprehend. This is required to use Amazon Comprehend.
- AWS Security Token Service (AWS STS):
com.amazonaws.region.sts. This is required because SageMaker Canvas uses AWS STS to connect to data sources.
- Amazon Athena and AWS Glue:
com.amazonaws.region.glue. This is required to connect to AWS Glue Data Catalog through Amazon Athena.
- Amazon Redshift:
com.amazonaws.region.redshift-data. This is required to connect to the Amazon Redshift data source.
To view all VPC endpoints for each service you can use with SageMaker Canvas, please go to Configure Amazon SageMaker Canvas in a VPC without internet access.
AWS KMS encryption for SageMaker Studio’s EFS volume
The first time a user on your team onboards to SageMaker Studio, SageMaker creates an EFS volume for the team. A home directory is created in the volume for each user who onboards to Studio as part of your team. Notebook files and data files are stored in these directories.
You can encrypt your SageMaker Studio’s EFS volume with a KMS key so your home directories’ data are encrypted at rest. This Terraform solution creates a KMS key and uses it to encrypt SageMaker Studio’s EFS volume.
SageMaker Domain Lifecycle Configuration to automatically shut down idle Studio notebooks
Lifecycle Configurations are shell scripts triggered by Amazon SageMaker Studio lifecycle events, such as starting a new Studio notebook. You can use Lifecycle Configurations to automate customization for your Studio environment.
This Terraform solution creates a SageMaker Lifecycle Configuration to detect and stop idle resources that incur costs within Studio using an auto-shutdown Jupyter extension. Under the hood, the following resources are created or configured to achieve the desired result:
- Create an S3 bucket and upload the latest version of the auto-shutdown extension
sagemaker_studio_autoshutdown-0.1.5.tar.gz. Later, the auto-shutdown script will run the
s3 cpcommand to download the extension file from the S3 bucket on Jupyter Server start-ups. Please refer to the following GitHub repos for more information regarding the auto-shutdown extension and auto-shutdown script.
- Create an aws_sagemaker_studio_lifecycle_config resource “
auto_shutdown”. This resource will encode the
autoshutdown-script.shwith base 64 and create a Lifecycle Configuration for the SageMaker Domain.
- For SageMaker Domain default user settings, specify the Lifecycle Configuration arn and set it as default.
SageMaker execution role IAM permissions
As a managed service, SageMaker performs operations on your behalf on the AWS hardware that is managed by SageMaker. SageMaker can perform only operations that the user permits.
A SageMaker user can grant these permissions with an IAM role (referred to as an execution role). When you create a SageMaker Studio domain, SageMaker allows you to create the execution role by default. You can restrict access to user profiles by changing the SageMaker user profile role. This Terraform solution attaches the following IAM policies to the SageMaker execution role:
- SageMaker managed
AmazonSageMakerFullAccesspolicy. This policy grants the execution role full access to use SageMaker Studio.
- A customer managed IAM policy to access the KMS key used to encrypt the SageMaker Studio’s EFS volume.
- SageMaker managed
AmazonSageMakerCanvasAIServicesAccesspolicies. These policies grant the execution role full access to use SageMaker Canvas.
- In order to enable time series analysis in SageMaker Canvas, you also need to add the IAM trust policy for Amazon Forecast.
In this blog post, we demonstrate how to deploy the Terraform solution. Prior to making the deployment, please ensure to satisfy the following prerequisites:
- An AWS account
- An IAM user with administrative access
To give users following this guide a unified deployment experience, we demonstrate the deployment process with AWS CloudShell. Using CloudShell, a browser-based shell, you can quickly run scripts with the AWS Command Line Interface (AWS CLI), experiment with service APIs using the AWS CLI, and use other tools to increase your productivity.
To deploy the Terraform solution, complete the following steps:
CloudShell launch settings
- Sign in to the AWS Management Console and select the CloudShell service.
- In the navigation bar, in the Region selector, choose US East (N. Virginia).
Your browser will open the CloudShell terminal.
The next steps should be executed in a CloudShell terminal.
Check this Hashicorp guide for up-to-date instructions to install Terraform for Amazon Linux:
yum-config-managerto manage your repositories.
yum-config-managerto add the official HashiCorp Linux repository.
- Install Terraform from the new repository.
- Verify that the installation worked by listing Terraform’s available subcommands.
Clone the code repo
Perform the following steps in a CloudShell terminal.
- Clone the repo and navigate to the sagemaker-domain-vpconly-canvas-with-terraform folder:
- Download the auto-shutdown extension and place it in the
Deploy the Terraform solution
In the CloudShell terminal, run the following Terraform commands:
You should see a success message like:
Now you can run:
After you are satisfied with the resources the plan outlines to be created, you can run:
Enter “yes“ when prompted to confirm the deployment.
If successfully deployed, you should see an output that looks like:
Accessing SageMaker Studio and Canvas
We now have a Studio domain associated with our VPC and a user profile in this domain.
To use the SageMaker Studio console, on the Studio Control Panel, locate your user name (it should be
defaultuser) and choose Open Studio.
We made it! Now you can use your browser to connect to the SageMaker Studio environment. After a few minutes, Studio finishes creating your environment, and you’re greeted with the launcher screen.
To use the SageMaker Canvas console, on the Canvas Control Panel, locate your user name (should be
defaultuser) and choose Open Canvas.
Now you can use your browser to connect to the SageMaker Canvas environment. After a few minutes, Canvas finishes creating your environment, and you’re greeted with the launcher screen.
Feel free to explore the full functionality SageMaker Studio and Canvas has to offer! Please refer to the Conclusion section for additional workshops and tutorials you can use to learn more about SageMaker.
Run the following command to clean up your resources:
Tip: If you set the Amazon EFS retention policy as “
Retain” (the default), you will run into issues during “
terraform destroy” because Terraform is trying to delete the subnets and VPC when the EFS volume as well as its associated security groups (created by SageMaker) still exist. To fix this, first delete the EFS volume manually and then delete the subnets and VPC manually in the AWS console.
The solution in this post provides you the ability to create a SageMaker Domain to support ML environments, including SageMaker Studio and SageMaker Canvas with Terraform. SageMaker Studio provides a fully managed IDE that removes the heavy lifting in the ML process. With SageMaker Canvas, our business users can easily explore and build ML models to make accurate predictions without writing any code. With the ability to launch Studio and Canvas inside a VPC and the use of a KMS key to encrypt the EFS volume, customers can use SageMaker ML environments with enhanced security. Auto shutdown Lifecycle Configuration helps customers save costs on idle Studio notebook instances.
Go test this solution and let us know what you think. For more information about how to use SageMaker Studio and Sagemaker Canvas, see the following:
About the Author
Chen Yang is a Machine Learning Engineer at Amazon Web Services. She is part of the AWS Professional Services team, and has been focusing on building secure machine learning environments for customers. In her spare time, she enjoys running and hiking in the Pacific Northwest.