Build protein folding workflows to accelerate drug discovery on Amazon SageMaker

Build protein folding workflows to accelerate drug discovery on Amazon SageMaker

Drug development is a complex and long process that involves screening thousands of drug candidates and using computational or experimental methods to evaluate leads. According to McKinsey, a single drug can take 10 years and cost an average of $2.6 billion to go through disease target identification, drug screening, drug-target validation, and eventual commercial launch. Drug discovery is the research component of this pipeline that generates candidate drugs with the highest likelihood of being effective with the least harm to patients. Machine learning (ML) methods can help identify suitable compounds at each stage in the drug discovery process, resulting in more streamlined drug prioritization and testing, saving billions in drug development costs (for more information, refer to AI in biopharma research: A time to focus and scale).

Drug targets are typically biological entities called proteins, the building blocks of life. The 3D structure of a protein determines how it interacts with a drug compound; therefore, understanding the protein 3D structure can add significant improvements to the drug development process by screening for drug compounds that fit the target protein structure better. Another area where protein structure prediction can be useful is understanding the diversity of proteins, so that we only select for drugs that selectively target specific proteins without affecting other proteins in the body (for more information, refer to Improving target assessment in biomedical research: the GOT-IT recommendations). Precise 3D structures of target proteins can enable drug design with higher specificity and lower likelihood of cross-interactions with other proteins.

However, predicting how proteins fold into their 3D structure is a difficult problem, and traditional experimental methods such as X-ray crystallography and NMR spectroscopy can be time-consuming and expensive. Recent advances in deep learning methods for protein research have shown promise in using neural networks to predict protein folding with remarkable accuracy. Folding algorithms like AlphaFold2, ESMFold, OpenFold, and RoseTTAFold can be used to quickly build accurate models of protein structures. Unfortunately, these models are computationally expensive to run and the results can be cumbersome to compare at the scale of thousands of candidate protein structures. A scalable solution for using these various tools will allow researchers and commercial R&D teams to quickly incorporate the latest advances in protein structure prediction, manage their experimentation processes, and collaborate with research partners.

Amazon SageMaker is a fully managed service to prepare, build, train, and deploy high-quality ML models quickly by bringing together a broad set of capabilities purpose-built for ML. It offers a fully managed environment for ML, abstracting away the infrastructure, data management, and scalability requirements so you can focus on building, training, and testing your models.

In this post, we present a fully managed ML solution with SageMaker that simplifies the operation of protein folding structure prediction workflows. We first discuss the solution at the high level and its user experience. Next, we walk you through how to easily set up compute-optimized workflows of AlphaFold2 and OpenFold with SageMaker. Finally, we demonstrate how you can track and compare protein structure predictions as part of a typical analysis. The code for this solution is available in the following GitHub repository.

Solution overview

In this solution, scientists can interactively launch protein folding experiments, analyze the 3D structure, monitor the job progress, and track the experiments in Amazon SageMaker Studio.

The following screenshot shows a single run of a protein folding workflow with Amazon SageMaker Studio. It includes the visualization of the 3D structure in a notebook, run status of the SageMaker jobs in the workflow, and links to the input parameters and output data and logs.

The following diagram illustrates the high-level solution architecture.

To understand the architecture, we first define the key components of a protein folding experiment as follows:

  • FASTA target sequence file – The FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes.
  • Genetic databases – A genetic database is one or more sets of genetic data stored together with software to enable users to retrieve genetic data. Several genetic databases are required to run AlphaFold and OpenFold algorithms, such as BFD, MGnify, PDB70, PDB, PDB seqres, UniRef30 (FKA UniClust30), UniProt, and UniRef90.
  • Multiple sequence alignment (MSA) – A sequence alignment is a way of arranging the primary sequences of a protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. The input features for predictions include MSA data.
  • Protein structure prediction – The structure of input target sequences is predicted with folding algorithms like AlphaFold2 and OpenFold that use a multitrack transformer architecture trained on known protein templates.
  • Visualization and metrics – Visualize the 3D structure with the py3Dmol library as an interactive 3D visualization. You can use metrics to evaluate and compare structure predictions, most notably root-mean-square deviation (RMSD) and template modeling Score (TM-score)

The workflow contains the following steps:

  1. Scientists use the web-based SageMaker ML IDE to explore the code base, build protein sequence analysis workflows in SageMaker Studio notebooks, and run protein folding pipelines via the graphical user interface in SageMaker Studio or the SageMaker SDK.
  2. Genetic and structure databases required by AlphaFold and OpenFold are downloaded prior to pipeline setup using Amazon SageMaker Processing, an ephemeral compute feature for ML data processing, to an Amazon Simple Storage Service (Amazon S3) bucket. With SageMaker Processing, you can run a long-running job with a proper compute without setting up any compute cluster and storage and without needing to shut down the cluster. Data is automatically saved to a specified S3 bucket location.
  3. An Amazon FSx for Lustre file system is set up, with the data repository being the S3 bucket location where the databases are saved. FSx for Lustre can scale to hundreds of GB/s of throughput and millions of IOPS with low-latency file retrieval. When starting an estimator job, SageMaker mounts the FSx for Lustre file system to the instance file system, then starts the script.
  4. Amazon SageMaker Pipelines is used to orchestrate multiple runs of protein folding algorithms. SageMaker Pipelines offers a desired visual interface for interactive job submission, traceability of the progress, and repeatability.
  5. Within a pipeline, two computationally heavy protein folding algorithms—AlphaFold and OpenFold—are run with SageMaker estimators. This configuration supports mounting of an FSx for Lustre file system for high throughput database search in the algorithms. A single inference run is divided into two steps: an MSA construction step using an optimal CPU instance and a structure prediction step using a GPU instance. These substeps, like SageMaker Processing in Step 2, are ephemeral, on-demand, and fully managed. Job output such as MSA files, predicted pdb structure files, and other metadata files are saved in a specified S3 location. A pipeline can be designed to run one single protein folding algorithm or run both AlphaFold and OpenFold after a common MSA construction.
  6. Runs of the protein folding prediction are automatically tracked by Amazon SageMaker Experiments for further analysis and comparison. The job logs are kept in Amazon CloudWatch for monitoring.

Prerequisites

To follow this post and run this solution, you need to have completed several prerequisites. Refer to the GitHub repository for a detailed explanation of each step.

Run protein folding on SageMaker

We use the fully managed capabilities of SageMaker to run computationally heavy protein folding jobs without much infrastructure overhead. SageMaker uses container images to run custom scripts for generic data processing, training, and hosting. You can easily start an ephemeral job on-demand that runs a program with a container image with a couple of lines of the SageMaker SDK without self-managing any compute infrastructure. Specifically, the SageMaker estimator job provides flexibility when it comes to choice of container image, run script, and instance configuration, and supports a wide variety of storage options, including file systems such as FSx for Lustre. The following diagram illustrates this architecture.

Folding algorithms like AlphaFold and OpenFold use a multitrack transformer architecture trained on known protein templates to predict the structure of unknown peptide sequences. These predictions can be run on GPU instances to provide best throughput and lowest latency. The input features however for these predictions include MSA data. MSA algorithms are CPU-dependent and can require several hours of processing time.

Running both the MSA and structure prediction steps in the same computing environment can be cost-inefficient because the expensive GPU resources remain idle while the MSA step runs. Therefore, we optimize the workflow into two steps. First, we run a SageMaker estimator job on a CPU instance specifically to compute MSA alignment given a particular FASTA input sequence and source genetic databases. Then we run a SageMaker estimator job on a GPU instance to predict the protein structure with a given input MSA alignment and a folding algorithm like AlphaFold or OpenFold.

Run MSA generation

For MSA computation, we include a custom script run_create_alignment.sh and create_alignments.py script that is adopted from the existing AlphaFold prediction source run_alphafold.py. Note that this script may need to be updated if the source AlphaFold code is updated. The custom script is provided to the SageMaker estimator via script mode. The key components of the container image, script mode implementation, and setting up a SageMaker estimator job are also part of the next step of running folding algorithms, and are described further in the following section.

Run AlphaFold

We get started by running an AlphaFold structure prediction with a single protein sequence using SageMaker. Running an AlphaFold job involves three simple steps, as can be seen in 01-run_stepbystep.ipynb. First, we build a Docker container image based on AlphaFold’s Dockerfile so that we can also run AlphaFold in SageMaker. Second, we construct the script run_alphafold.sh that instructs how AlphaFold should be run. Third, we construct and run a SageMaker estimator with the script, the container, instance type, data, and configuration for the job.

Container image

The runtime requirement for a container image to run AlphaFold (OpenFold as well) in SageMaker can be greatly simplified with AlphaFold’s Dockerfile. We only need to add a handful of simple layers on top to install a SageMaker-specific Python library so that a SageMaker job can communicate with the container image. See the following code:

# In Dockerfile.alphafold
## SageMaker specific
RUN pip3 install sagemaker-training --upgrade --no-cache-dir
ENV PATH="/opt/ml/code:${PATH}"
# this environment variable is used by the SageMaker Estimator to determine our user code directory
ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/code

Input script

We then provide the script run_alphafold.sh that runs run_alphafold.py from the AlphaFold repository that is currently placed in the container /app/alphafold/run_alphafold.py. When this script is run, the location of the genetic databases and the input FASTA sequence will be populated by SageMaker as environment variables (SM_CHANNEL_GENETIC and SM_CHANNEL_FASTA, respectively). For more information, refer to Input Data Configuration.

Estimator job

We next create a job using a SageMaker estimator with the following key input arguments, which instruct SageMaker to run a specific script using a specified container with the instance type or count, your networking option of choice, and other parameters for the job. vpc_subnet_ids and security_group_ids instruct the job to run inside a specific VPC where the FSx for Lustre file system is in so that we can mount and access the filesystem in the SageMaker job. The output path refers to a S3 bucket location where the final product of AlphaFold will be uploaded to at the end of a successful job by SageMaker automatically. Here we also set a parameter DB_PRESET, for example, to be passed in and accessed within run_alphafold.sh as an environmental variable during runtime. See the following code:

from sagemaker.estimator import Estimator
alphafold_image_uri=f'{account}.dkr.ecr.{region}.amazonaws.com/sagemaker-studio-alphafold:v2.3.0'
instance_type='ml.g5.2xlarge'
instance_count=1
vpc_subnet_ids=['subnet-xxxxxxxxx'] # okay to use a default VPC
security_group_ids=['sg-xxxxxxxxx']
env={'DB_PRESET': db_preset} # <full_dbs|reduced_dbs>
output_path='s3://%s/%s/job-output/'%(default_bucket, prefix)

estimator_alphafold = Estimator(
source_dir='src', # directory where run_alphafold.sh and other runtime files locate
entry_point='run_alphafold.sh', # our script that runs /app/alphafold/run_alphafold.py
image_uri=alphafold_image_uri, # container image to use
instance_count=instance_count, #
instance_type=instance_type,
subnets=vpc_subnet_ids,
security_group_ids=security_group_ids,
environment=env,
output_path=output_path,
...)

Finally, we gather the data and let the job know where they are. The fasta data channel is defined as an S3 data input that will be downloaded from an S3 location into the compute instance at the beginning of the job. This allows great flexibility to manage and specify the input sequence. On the other hand, the genetic data channel is defined as a FileSystemInput that will be mounted onto the instance at the beginning of the job. The use of an FSx for Lustre file system as a way to bring in close to 3 TB of data avoids repeatedly downloading data from an S3 bucket to a compute instance. We call the .fit method to kick off an AlphaFold job:

from sagemaker.inputs import FileSystemInput
file_system_id='fs-xxxxxxxxx'
fsx_mount_id='xxxxxxxx'
file_system_directory_path=f'/{fsx_mount_id}/{prefix}/alphafold-genetic-db' # should be the full prefix from the S3 data repository

file_system_access_mode='ro' # Specify the access mode (read-only)
file_system_type='FSxLustre' # Specify your file system type

genetic_db = FileSystemInput(
file_system_id=file_system_id,
file_system_type=file_system_type,
directory_path=file_system_directory_path,
file_system_access_mode=file_system_access_mode)

s3_fasta=sess.upload_data(path='sequence_input/T1030.fasta', # FASTA location locally
key_prefix='alphafoldv2/sequence_input') # S3 prefix. Bucket is sagemaker default bucket
fasta = sagemaker.inputs.TrainingInput(s3_fasta,
distribution='FullyReplicated',
s3_data_type='S3Prefix',
input_mode='File')
data_channels_alphafold = {'genetic': genetic_db, 'fasta': fasta}

estimator_alphafold.fit(inputs=data_channels_alphafold,
wait=False) # wait=False gets the cell back in the notebook; set to True to see the logs as the job progresses

That’s it. We just submitted a job to SageMaker to run AlphaFold. The logs and output including .pdb prediction files will be written to Amazon S3.

Run OpenFold

Running OpenFold in SageMaker follows a similar pattern, as shown in the second half of 01-run_stepbystep.ipynb. We first add a simple layer to get the SageMaker-specific library to make the container image SageMaker compatible on top of OpenFold’s Dockerfile. Secondly, we construct a run_openfold.sh as an entry point for the SageMaker job. In run_openfold.sh, we run the run_pretrained_openfold.py from OpenFold, which is available in the container image with the same genetic databases we downloaded for AlphaFold and OpenFold’s model weights (--openfold_checkpoint_path). In terms of input data locations, besides the genetic databases channel and the FASTA channel, we introduce a third channel, SM_CHANNEL_PARAM, so that we can flexibly pass in the model weights of choice from the estimator construct when we define and submit a job. With the SageMaker estimator, we can easily submit jobs with different entry_point, image_uri, environment, inputs, and other configurations for OpenFold with the same signature. For the data channel, we add a new channel, param, as an Amazon S3 input along with the use of the same genetic databases from the FSx for Lustre file system and FASTA file from Amazon S3. This, again, allows us easily specify the model weight to use from the job construct. See the following code:

s3_param=sess.upload_data(path='openfold_params/finetuning_ptm_2.pt',
key_prefix=f'{prefix}/openfold_params')
param = sagemaker.inputs.TrainingInput(s3_param,
distribution="FullyReplicated",
s3_data_type="S3Prefix",
input_mode='File')

data_channels_openfold = {"genetic": genetic_db, 'fasta': fasta, 'param': param}

estimator_openfold.fit(inputs=data_channels_openfold,
wait=False)

To access the final output after the job completes, we run the following commands:

!aws s3 cp {estimator_openfold.model_data} openfold_output/model.tar.gz
!tar zxfv openfold_output/model.tar.gz -C openfold_output/

Runtime performance

The following table shows the cost savings of 57% and 51% for AlphaFold and OpenFold, respectively, by splitting the MSA alignment and folding algorithms in two jobs as compared to a single compute job. It allows us to right-size the compute for each job: ml.m5.4xlarge for MSA alignment and ml.g5.2xlarge for AlphaFold and OpenFold.

Job Details Instance Type Input FASTA Sequence Runtime Cost
MSA alignment + OpenFold ml.g5.4xlarge T1030 50 mins $1.69
MSA alignment + AlphaFold ml.g5.4xlarge T1030 65 mins $2.19
MSA alignment ml.m5.4xlarge T1030 46 mins $0.71
OpenFold ml.g5.2xlarge T1030 6 mins $0.15
AlphaFold ml.g5.2xlarge T1030 21 mins $0.53

Build a repeatable workflow using SageMaker Pipelines

With SageMaker Pipelines, we can create an ML workflow that takes care of managing data between steps, orchestrating their runs, and logging. SageMaker Pipelines also provides us a UI to visualize our pipeline and easily run our ML workflow.

A pipeline is created by combing a number of steps. In this pipeline, we combine three training steps, which require an SageMaker estimator. The estimators defined in this notebook are very similar to those defined in 01-run_stepbystep.ipynb, with the exception that we use Amazon S3 locations to point to our inputs and outputs. The dynamic variables allow SageMaker Pipelines to run steps one after another and also permit the user to retry failed steps. The following screenshot shows a Directed Acyclic Graph (DAG), which provides information on the requirements for and relationships between each step of our pipeline.

Dynamic variables

SageMaker Pipelines is capable of taking user inputs at the start of every pipeline run. We define the following dynamic variables, which we would like to change during each experiment:

  • FastaInputS3URI – Amazon S3 URI of the FASTA file uploaded via SDK, Boto3, or manually.
  • FastFileName – Name of the FASTA file.
  • db_preset – Selection between full_dbs or reduced_dbs.
  • MaxTemplateDate – AlphaFold’s MSA step will search for the available templates before the date specified by this parameter.
  • ModelPreset – Select between AlphaFold models including monomer, monomer_casp14, monomer_ptm, and multimer.
  • NumMultimerPredictionsPerModel – Number of seeds to run per model when using multimer system.
  • InferenceInstanceType – Instance type to use for inference steps (both AlphaFold and OpenFold). The default value is ml.g5.2xlarge.
  • MSAInstanceType – Instance type to use for MSA step. The default value is ml.m5.4xlarge.

See the following code:

fasta_file = ParameterString(name="FastaFileName")
fasta_input = ParameterString(name="FastaInputS3URI")
pipeline_db_preset = ParameterString(name="db_preset",
default_value='full_dbs',
enum_values=['full_dbs', 'reduced_dbs'])
max_template_date = ParameterString(name="MaxTemplateDate")
model_preset = ParameterString(name="ModelPreset")
num_multimer_predictions_per_model = ParameterString(name="NumMultimerPredictionsPerModel")
msa_instance_type = ParameterString(name="MSAInstanceType", default_value='ml.m5.4xlarge')
instance_type = ParameterString(name="InferenceInstanceType", default_value='ml.g5.2xlarge')

A SageMaker pipeline is constructed by defining a series of steps and then chaining them together in a specific order where the output of a previous step becomes the input to the next step. Steps can be run in parallel and defined to have a dependency on a previous step. In this pipeline, we define an MSA step, which is the dependency for an AlphaFold inference step and OpenFold inference step that run in parallel. See the following code:

step_msa = TrainingStep(
name="RunMSA",
step_args=pipeline_msa_args,
)

step_alphafold = TrainingStep(
name="RunAlphaFold",
step_args=pipeline_alphafold_default_args,
)
step_alphafold.add_depends_on([step_msa])

step_openfold = TrainingStep(
name="RunOpenFold",
step_args=pipeline_openfold_args,
)
step_openfold.add_depends_on([step_msa]

To put all the steps together, we call the Pipeline class and provide a pipeline name, pipeline input variables, and the individual steps:

pipeline_name = f"ProteinFoldWorkflow"
pipeline = Pipeline(
name=pipeline_name,
parameters=[
fasta_input,
instance_type,
msa_instance_type,
pipeline_db_preset
],
steps=[step_msa, step_alphafold, step_openfold],
)

pipeline.upsert(role_arn=role, # run this if it's the first time setting up the pipeline
description='Protein_Workflow_MSA')

Run the pipeline

In the last cell of the notebook 02-define_pipeline.ipynb, we show how to run a pipeline using the SageMaker SDK. The dynamic variables we described earlier are provided as follows:

!mkdir ./sequence_input/
!curl 'https://www.predictioncenter.org/casp14/target.cgi?target=T1030&view=sequence' > ./sequence_input/T1030.fasta
fasta_file_name = 'T1030.fasta'

pathName = f'./sequence_input/{fasta_file_name}'
s3_fasta=sess.upload_data(path=pathName,
key_prefix='alphafoldv2/sequence_input')

PipelineParameters={
'FastaInputS3URI':s3_fasta,
'db_preset': 'full_dbs',
'FastaFileName': fasta_file_name,
'MaxTemplateDate': '2020-05-14',
'ModelPreset': 'monomer',
'NumMultimerPredictionsPerModel': '5',
'InferenceInstanceType':'ml.g5.2xlarge',
'MSAInstanceType':'ml.m5.4xlarge'
}
execution = pipeline.start(execution_display_name='SDK-Executetd',
execution_description='This pipeline was executed via SageMaker SDK',
parameters=PipelineParameters
)

Track experiments and compare protein structures

For our experiment, we use an example protein sequence from the CASP14 competition, which provides an independent mechanism for the assessment of methods of protein structure modeling. The target T1030 is derived from the PDB 6P00 protein, and has 237 amino acids in the primary sequence. We run the SageMaker pipeline to predict the protein structure of this input sequence with both OpenFold and AlphaFold algorithms.

When the pipeline is complete, we download the predicted .pdb files from each folding job and visualize the structure in the notebook using py3Dmol, as in the notebook 04-compare_alphafold_openfold.ipynb.

The following screenshot shows the prediction from the AlphaFold prediction job.

The predicted structure is compared against its known base reference structure with PDB code 6poo archived in RCSB. We analyze the prediction performance against the base PDB code 6poo with three metrics: RMSD, RMSD with superposition, and template modeling score, as described in Comparing structures.

. Input Sequence Comparison With RMSD RMSD with Superposition Template Modeling Score
AlphaFold T1030 6poo 247.26 3.87 0.3515

The folding algorithms are now compared against each other for multiple FASTA sequences: T1030, T1090, and T1076. New target sequences may not have the base pdb structure in reference databases and therefore it’s useful to compare the variability between folding algorithms.

. Input Sequence Comparison With RMSD RMSD with Superposition Template Modeling Score
AlphaFold T1030 OpenFold 73.21 24.8 0.0018
AlphaFold T1076 OpenFold 38.71 28.87 0.0047
AlphaFold T1090 OpenFold 30.03 20.45 0.005

The following screenshot shows the runs of ProteinFoldWorkflow for the three FASTA input sequences with SageMaker Pipeline:

We also log the metrics with SageMaker Experiments as new runs of the same experiment created by the pipeline:

from sagemaker.experiments.run import Run, load_run
metric_type='compare:'
experiment_name = 'proteinfoldworkflow'
with Run(experiment_name=experiment_name, run_name=input_name_1, sagemaker_session=sess) as run:
run.log_metric(name=metric_type + "rmsd_cur", value=rmsd_cur_one, step=1)
run.log_metric(name=metric_type + "rmds_fit", value=rmsd_fit_one, step=1)
run.log_metric(name=metric_type + "tm_score", value=tmscore_one, step=1)

We then analyze and visualize these runs on the Experiments page in SageMaker Studio.

The following chart depicts the RMSD value between AlphaFold and OpenFold for the three sequences: T1030, T1076, and T1090.

Conclusion

In this post, we described how you can use SageMaker Pipelines to set up and run protein folding workflows with two popular structure prediction algorithms: AlphaFold2 and OpenFold. We demonstrated a price performant solution architecture of multiple jobs that separates the compute requirements for MSA generation from structure prediction. We also highlighted how you can visualize, evaluate, and compare predicted 3D structures of proteins in SageMaker Studio.

To get started with protein folding workflows on SageMaker, refer to the sample code in the GitHub repo.


About the authors

Michael Hsieh is a Principal AI/ML Specialist Solutions Architect. He works with HCLS customers to advance their ML journey with AWS technologies and his expertise in medical imaging. As a Seattle transplant, he loves exploring the great mother nature the city has to offer, such as the hiking trails, scenery kayaking in the SLU, and the sunset at Shilshole Bay.

Shivam Patel is a Solutions Architect at AWS. He comes from a background in R&D and combines this with his business knowledge to solve complex problems faced by his customers. Shivam is most passionate about workloads in machine learning, robotics, IoT, and high-performance computing.

Hasan Poonawala is a Senior AI/ML Specialist Solutions Architect at AWS, Hasan helps customers design and deploy machine learning applications in production on AWS. He has over 12 years of work experience as a data scientist, machine learning practitioner, and software developer. In his spare time, Hasan loves to explore nature and spend time with friends and family.

Jasleen Grewal is a Senior Applied Scientist at Amazon Web Services, where she works with AWS customers to solve real world problems using machine learning, with special focus on precision medicine and genomics. She has a strong background in bioinformatics, oncology, and clinical genomics. She is passionate about using AI/ML and cloud services to improve patient care.

Read More

Is your model good? A deep dive into Amazon SageMaker Canvas advanced metrics

Is your model good? A deep dive into Amazon SageMaker Canvas advanced metrics

If you are a business analyst, understanding customer behavior is probably one of the most important things you care about. Understanding the reasons and mechanisms behind customer purchase decisions can facilitate revenue growth. However, the loss of customers (commonly referred to as customer churn) always poses a risk. Gaining insights into why customers leave can be just as crucial for sustaining profits and revenue.

Although machine learning (ML) can provide valuable insights, ML experts were needed to build customer churn prediction models until the introduction of Amazon SageMaker Canvas.

SageMaker Canvas is a low-code/no-code managed service that allows you to create ML models that can solve many business problems without writing a single line of code. It also enables you to evaluate the models using advanced metrics as if you were a data scientist.

In this post, we show how a business analyst can evaluate and understand a classification churn model created with SageMaker Canvas using the Advanced metrics tab. We explain the metrics and show techniques to deal with data to obtain better model performance.

Prerequisites

If you would like to implement all or some of the tasks described in this post, you need an AWS account with access to SageMaker Canvas. Refer to Predict customer churn with no-code machine learning using Amazon SageMaker Canvas to cover the basics around SageMaker Canvas, the churn model, and the dataset.

Introduction to model performance evaluation

As a general guideline, when you need to evaluate the performance of a model, you’re trying to measure how well the model will predict something when it sees new data. This prediction is called inference. You start by training the model using existing data, and then ask the model to predict the outcome on data that it has not already seen. How accurately the model predicts this outcome is what you look at to understand the model performance.

If the model hasn’t seen the new data, how would anybody know if the prediction is good or bad? Well, the idea is to actually use historical data where the results are already known and compare the these values to the model’s predicted values. This is enabled by setting aside a portion of the historical training data so it can be compared with what the model predicts for those values.

In the example of customer churn (which is a categorical classification problem), you start with a historical dataset that describes customers with many attributes (one in each record). One of the attributes, called Churn, can be True or False, describing if the customer left the service or not. To evaluate model accuracy, we split this dataset and train the model using one part (the training dataset), and ask the model to predict the outcome (classify the customer as Churn or not) with the other part (the test dataset). We then compare the model’s prediction to the ground truth contained in the test dataset.

Interpreting advanced metrics

In this section, we discuss the advanced metrics in SageMaker Canvas that can help you understand model performance.

Confusion matrix

SageMaker Canvas uses confusion matrices to help you visualize when a model generates predictions correctly. In a confusion matrix, your results are arranged to compare the predicted values against the actual historical (known) values. The following example explains how a confusion matrix works for a two-category prediction model that predicts positive and negative labels:

  • True positive – The model correctly predicted positive when the true label was positive
  • True negative – The model correctly predicted negative when the true label was negative
  • False positive – The model incorrectly predicted positive when the true label was negative
  • False negative – The model incorrectly predicted negative when the true label was positive

The following image is an example of a confusion matrix for two categories. In our churn model, the actual values come from the test dataset, and the predicted values come from asking our model.

Accuracy

Accuracy is the percentage of correct predictions out of all the rows or samples of the test set. It is the true samples that were predicted as True, plus the false samples that were correctly predicted as False, divided by the total number of samples in the dataset.

It’s one of the most important metrics to understand because it will tell you in what percentage the model correctly predicted, but it can be misleading in some cases. For example:

  • Class imbalance – When the classes in your dataset are not evenly distributed (you have a disproportionate number of samples from one class and very little on others), accuracy can be misleading. In such cases, even a model that simply predicts the majority class for every instance can achieve a high accuracy.
  • Cost-sensitive classification – In some applications, the cost of misclassification for different classes can be different. For example, if we were predicting if a drug can aggravate a condition, a false negative (for example, predicting the drug might not aggravate when it actually does) can be more costly than a false positive (for example, predicting the drug might aggravate when it actually does not).

Precision, recall, and F1 score

Precision is the fraction of true positives (TP) out of all the predicted positives (TP + FP). It measures the proportion of positive predictions that are actually correct.

Recall is the fraction of true positives (TP) out of all the actual positives (TP + FN). It measures the proportion of positive instances that were correctly predicted as positive by the model.

The F1 score combines precision and recall to provide a single score that balances the trade-off between them. It is defined as the harmonic mean of precision and recall:

F1 score = 2 * (precision * recall) / (precision + recall)

The F1 score ranges from 0–1, with a higher score indicating better performance. A perfect F1 score of 1 indicates that the model has achieved both perfect precision and perfect recall, and a score of 0 indicates that the model’s predictions are completely wrong.

The F1 score provides a balanced evaluation of the model’s performance. It considers precision and recall, providing a more informative evaluation metric that reflects the model’s ability to correctly classify positive instances and avoid false positives and false negatives.

For example, in medical diagnosis, fraud detection, and sentiment analysis, F1 is especially relevant. In medical diagnosis, accurately identifying the presence of a specific disease or condition is crucial, and false negatives or false positives can have significant consequences. The F1 score takes into account both precision (the ability to correctly identify positive cases) and recall (the ability to find all positive cases), providing a balanced evaluation of the model’s performance in detecting the disease. Similarly, in fraud detection, where the number of actual fraud cases is relatively low compared to non-fraudulent cases (imbalanced classes), accuracy alone may be misleading due to a high number of true negatives. The F1 score provides a comprehensive measure of the model’s ability to detect both fraudulent and non-fraudulent cases, considering both precision and recall. And in sentiment analysis, if the dataset is imbalanced, accuracy may not accurately reflect the model’s performance in classifying instances of the positive sentiment class.

AUC (area under the curve)

The AUC metric evaluates the ability of a binary classification model to distinguish between positive and negative classes at all classification thresholds. A threshold is a value used by the model to make a decision between the two possible classes, converting the probability of a sample being part of a class into a binary decision. To calculate the AUC, the true positive rate (TPR) and false positive rate (FPR) are plotted across various threshold settings. The TPR measures the proportion of true positives out of all actual positives, while the FPR measures the proportion of false positives out of all actual negatives. The resulting curve, called the receiver operating characteristic (ROC) curve, provides a visual representation of the TPR and FPR at different threshold settings. The AUC value, which ranges from 0–1, represents the area under the ROC curve. Higher AUC values indicate better performance, with a perfect classifier achieving an AUC of 1.

The following plot shows the ROC curve, with TPR as the Y axis and FPR as the X axis. The closer the curve gets to the top left corner of the plot, the better the model does at classifying the data into categories.

To clarify, let’s go over an example. Let’s think about a fraud detection model. Usually, these models are trained from unbalanced datasets. This is due to the fact that, usually, almost all the transactions in the dataset are non-fraudulent with only a few labeled as frauds. In this case, the accuracy alone may not adequately capture the performance of the model because it is probably heavily influenced by the abundance of non-fraudulent cases, leading to misleadingly high accuracy scores.

In this case, the AUC would be a better metric to assess model performance because it provides a comprehensive assessment of a model’s ability to distinguish between fraudulent and non-fraudulent transactions. It offers a more nuanced evaluation, taking into account the trade-off between true positive rate and false positive rate at various classification thresholds.

Just like the F1 score, it is particularly useful when the dataset is imbalanced. It measures the trade-off between TPR and FPR and shows how well the model can differentiate between the two classes regardless of their distribution. This means that even if one class is significantly smaller than the other, the ROC curve assesses the model’s performance in a balanced manner by considering both classes equally.

Additional key topics

Advanced metrics are not the only important tools available to you for evaluating and improving ML model performance. Data preparation, feature engineering, and feature impact analysis are techniques that are essential to model building. These activities play a crucial role in extracting meaningful insights from raw data and improving model performance, leading to more robust and insightful results.

Data preparation and feature engineering

Feature engineering is the process of selecting, transforming, and creating new variables (features) from raw data, and plays a key role in improving the performance of an ML model. Selecting the most relevant variables or features from the available data involves removing irrelevant or redundant features that do not contribute to the model’s predictive power. Transforming data features into a suitable format includes scaling, normalization, and handling missing values. And finally, creating new features from the existing data is done through mathematical transformations, combining or interacting different features, or creating new features from domain-specific knowledge.

Feature importance analysis

SageMaker Canvas generates a feature importance analysis that explains the impact that each column in your dataset has on the model. When you generate predictions, you can see the column impact that identifies which columns have the most impact on each prediction. This will give you insights on which features deserve to be part of your final model and which ones should be discarded. Column impact is a percentage score that indicates how much weight a column has in making predictions in relation to the other columns. For a column impact of 25%, Canvas weighs the prediction as 25% for the column and 75% for the other columns.

Approaches to improve model accuracy

Although there are multiple methods to improve model accuracy, data scientists and ML practitioners usually follow one of the two approaches discussed in this section, using the tools and metrics described earlier.

Model-centric approach

In this approach, the data always remains the same and is used to iteratively improve the model to meet desired results. Tools used with this approach include:

  • Trying multiple relevant ML algorithms
  • Algorithm and hyperparameter tuning and optimization
  • Different model ensemble methods
  • Using pre-trained models (SageMaker provides various built-in or pre-trained models to help ML practitioners)
  • AutoML, which is what SageMaker Canvas does behind the scenes (using Amazon SageMaker Autopilot), which encompasses all of the above

Data-centric approach

In this approach, the focus is on data preparation, improving data quality, and iteratively modifying the data to improve performance:

  • Exploring statistics of the dataset used to train the model, also known as exploratory data analysis (EDA)
  • Improving data quality (data cleaning, missing values imputation, outlier detection and management)
  • Feature selection
  • Feature engineering
  • Data augmentation

Improving model performance with Canvas

We begin with the data-centric approach. We use the model preview functionality to perform an initial EDA. This provides us a baseline that we can use to perform data augmentation, generating a new baseline, and finally getting the best model with a model-centric approach using the standard build functionality.

We use the synthetic dataset from a telecommunications mobile phone carrier. This sample dataset contains 5,000 records, where each record uses 21 attributes to describe the customer profile. Refer to Predict customer churn with no-code machine learning using Amazon SageMaker Canvas for a full description.

Model preview in a data-centric approach

As a first step, we open the dataset, select the column to predict as Churn?, and generate a preview model by choosing Preview model.

The Preview model pane will show the progress until the preview model is ready.

When the model is ready, SageMaker Canvas generates a feature importance analysis.

Finally, when it’s complete, the pane will show a list of columns with its impact on the model. These are useful to understand how relevant the features are on our predictions. Column impact is a percentage score that indicates how much weight a column has in making predictions in relation to the other columns. In the following example, for the Night Calls column, SageMaker Canvas weights the prediction as 4.04% for the column and 95.9% for the other columns. The higher the value, the higher the impact.

As we can see, the preview model has a 95.6% accuracy. Let’s try to improve the model performance using a data-centric approach. We perform data preparation and use feature engineering techniques to improve performance.

As shown in the following screenshot, we can observe that the Phone and State columns have much less impact on our prediction. Therefore, we will use this information as input for our next phase, data preparation.

SageMaker Canvas provides ML data transforms with which you can clean, transform, and prepare your data for model building. You can use these transforms on your datasets without any code, and they will be added to the model recipe, which is a record of the data preparation performed on your data before building the model.

Note that any data transforms you use only modify the input data when building a model and do not modify your dataset or original data source.

The following transforms are available in SageMaker Canvas for you to prepare your data for building:

  • Datetime extraction
  • Drop columns
  • Filter rows
  • Functions and operators
  • Manage rows
  • Rename columns
  • Remove rows
  • Replace values
  • Resample time series data

Let’s start by dropping the columns we have found that have little impact on our prediction.

For example, in this dataset, the phone number is just the equivalent of an account number—it’s useless or even detrimental in predicting other accounts’ likelihood of churn. Likewise, the customer’s state doesn’t impact our model much. Let’s remove the Phone and State columns by unselecting those features under Column name.

Now, let’s perform some additional data transformation and feature engineering.

For example, we noticed in our previous analysis that the charged amount to customers has a direct impact on churn. Let’s therefore create a new column that computes the total charges to our customers by combining Charge, Mins, and Calls for Day, Eve, Night, and Intl. To do so, we use the custom formulas in SageMaker Canvas.

Let’s start by choosing Functions, then we add to the formula textbox the following text:

(Day Calls*Day Charge*Day Mins)+(Eve Calls*Eve Charge*Eve Mins)+(Night Calls*Night Charge*Night Mins)+(Intl Calls*Intl Charge*Intl Mins)

Give the new column a name (for example, Total Charges), and choose Add after the preview has been generated. The model recipe should now look as shown in the following screenshot.

When this data preparation is complete, we train a new preview model to see if the model improved. Choose Preview model again, and the lower right pane will show the progress.

When training is finished, it will proceed to recompute the predicted accuracy, and will also create a new column impact analysis.

And finally, when the whole process is complete, we can see the same pane we saw earlier but with the new preview model accuracy. You can notice model accuracy increased by 0.4% (from 95.6% to 96%).

The numbers in the preceding images may differ from yours because ML introduces some stochasticity in the process of training models, which can lead to different results in different builds.

Model-centric approach to create the model

Canvas offers two options to build your models:

  • Standard build – Builds the best model from an optimized process where speed is exchanged for better accuracy. It uses Auto-ML, which automates various tasks of ML, including model selection, trying various algorithms relevant to your ML use case, hyperparameter tuning, and creating model explainability reports.
  • Quick build – Builds a simple model in a fraction of the time compared to a standard build, but accuracy is exchanged for speed. Quick model is useful when iterating to more quickly understand the impact of data changes to your model accuracy.

Let’s continue using a standard build approach.

Standard build

As we saw before, the standard build builds the best model from an optimized process to maximize accuracy.

The build process for our churn model takes around 45 minutes. During this time, Canvas tests hundreds of candidate pipelines, selecting the best model. In the following screenshot, we can see the expected build time and progress.

With the standard build process, our ML model has improved our model accuracy to 96.903%, which is a significant improvement.

Explore advanced metrics

Let’s explore the model using the Advanced metrics tab. On the Scoring tab, choose Advanced metrics.

This page will show the following confusion matrix jointly with the advanced metrics: F1 score, accuracy, precision, recall, F1 score, and AUC.

Generate predictions

Now that the metrics look good, we can perform an interactive prediction on the Predict tab, either in a batch or single (real-time) prediction.

We have two options:

  • Use this model to run to run batch or single predictions
  • Send the model to Amazon Sagemaker Studio to share with data scientists

Clean up

To avoid incurring future session charges, log out of SageMaker Canvas.

Conclusion

SageMaker Canvas provides powerful tools that enable you to build and assess the accuracy of models, enhancing their performance without the need for coding or specialized data science and ML expertise. As we have seen in the example through the creation of a customer churn model, by combining these tools with both a data-centric and a model-centric approach using advanced metrics, business analysts can create and evaluate prediction models. With a visual interface, you’re also empowered to generate accurate ML predictions on your own. We encourage you to go through the references and see how many of these concepts might apply in other types of ML problems.

References


About the Authors

Marcos is an AWS Sr. Machine Learning Solutions Architect based in Florida, US. In that role, he is responsible for guiding and assisting US startup organizations in their strategy towards the cloud, providing guidance on how to address high-risk issues and optimize their machine learning workloads. He has more than 25 years of experience with technology, including cloud solution development, machine learning, software development, and data center infrastructure.

Indrajit is an AWS Enterprise Sr. Solutions Architect. In his role, he helps customers achieve their business outcomes through cloud adoption. He designs modern application architectures based on microservices, serverless, APIs, and event-driven patterns. He works with customers to realize their data analytics and machine learning goals through adoption of DataOps and MLOps practices and solutions. Indrajit speaks regularly at AWS public events like summits and ASEAN workshops, has published several AWS blog posts, and developed customer-facing technical workshops focused on data and machine learning on AWS.

Read More

Use Stable Diffusion XL with Amazon SageMaker JumpStart in Amazon SageMaker Studio

Use Stable Diffusion XL with Amazon SageMaker JumpStart in Amazon SageMaker Studio

Today we are excited to announce that Stable Diffusion XL 1.0 (SDXL 1.0) is available for customers through Amazon SageMaker JumpStart. SDXL 1.0 is the latest image generation model from Stability AI. SDXL 1.0 enhancements include native 1024-pixel image generation at a variety of aspect ratios. It’s designed for professional use, and calibrated for high-resolution photorealistic images. SDXL 1.0 offers a variety of preset art styles ready to use in marketing, design, and image generation use cases across industries. You can easily try out these models and use them with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML.

In this post, we walk through how to use SDXL 1.0 models via SageMaker JumpStart.

What is Stable Diffusion XL 1.0 (SDXL 1.0)

SDXL 1.0 is the evolution of Stable Diffusion and the next frontier for generative AI for images. SDXL is capable of generating stunning images with complex concepts in various art styles, including photorealism, at quality levels that exceed the best image models available today. Like the original Stable Diffusion series, SDXL is highly customizable (in terms of parameters) and can be deployed on Amazon SageMaker instances.

The following image of a lion was generated using SDXL 1.0 using a simple prompt, which we explore later in this post.

The SDXL 1.0 model includes the following highlights:

  • Freedom of expression – Best-in-class photorealism, as well as an ability to generate high-quality art in virtually any art style. Distinct images are made without having any particular feel that is imparted by the model, ensuring absolute freedom of style.
  • Artistic intelligence – Best-in-class ability to generate concepts that are notoriously difficult for image models to render, such as hands and text, or spatially arranged objects and people (for example, a red box on top of a blue box).
  • Simpler prompting – Unlike other generative image models, SDXL requires only a few words to create complex, detailed, and aesthetically pleasing images. No more need for paragraphs of qualifiers.
  • More accurate – Prompting in SDXL is not only simple, but more true to the intention of prompts. SDXL’s improved CLIP model understands text so effectively that concepts like “The Red Square” are understood to be different from “a red square.” This accuracy allows much more to be done to get the perfect image directly from text, even before using the more advanced features or fine-tuning that Stable Diffusion is famous for.

What is SageMaker JumpStart

With SageMaker JumpStart, ML practitioners can choose from a broad selection of state-of-the-art models for use cases such as content writing, image generation, code generation, question answering, copywriting, summarization, classification, information retrieval, and more. ML practitioners can deploy foundation models to dedicated SageMaker instances from a network isolated environment and customize models using SageMaker for model training and deployment. The SDXL model is discoverable today in Amazon SageMaker Studio and, as of this writing, is available in us-east-1, us-east-2, us-west-2, eu-west-1, ap-northeast-1, and ap-southeast-2 Regions.

Solution overview

In this post, we demonstrate how to deploy SDXL 1.0 to SageMaker and use it to generate images using both text-to-image and image-to-image prompts.

SageMaker Studio is a web-based integrated development environment (IDE) for ML that lets you build, train, debug, deploy, and monitor your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.

Once you are in the SageMaker Studio UI, access SageMaker JumpStart and search for Stable Diffusion XL. Choose the SDXL 1.0 model card, which will open up an example notebook. This means you will be only be responsible for compute costs. There is no associated model cost. Closed weight SDXL 1.0 offers SageMaker optimized scripts and container with faster inference time and can be run on smaller instance compared to the open weight SDXL 1.0. The example notebook will walk you through steps, but we also discuss how to discover and deploy the model later in this post.

In the following sections, we show how you can use SDXL 1.0 to create photorealistic images with shorter prompts and generate text within images. Stable Diffusion XL 1.0 offers enhanced image composition and face generation with stunning visuals and realistic aesthetics.

Stable Diffusion XL 1.0 parameters

The following are the parameters used by SXDL 1.0:

  • cfg_scale – How strictly the diffusion process adheres to the prompt text.
  • height and width – The height and width of image in pixel.
  • steps – The number of diffusion steps to run.
  • seed – Random noise seed. If a seed is provided, the resulting generated image will be deterministic.
  • sampler – Which sampler to use for the diffusion process to denoise our generation with.
  • text_prompts – An array of text prompts to use for generation.
  • weight – Provides each prompt a specific weight

For more information, refer to the Stability AI’s text to image documentation.

The following code is a sample of the input data provided with the prompt:

{
  "cfg_scale": 7,
  "height": 1024,
  "width": 1024,
  "steps": 50,
  "seed": 42,
  "sampler": "K_DPMPP_2M",
  "text_prompts": [
    {
      "text": "A photograph of fresh pizza with basil and tomatoes, from a traditional oven",
      "weight": 1
    }
  ]
}

All examples in this post are based on the sample notebook for Stability Diffusion XL 1.0, which can be found on Stability AI’s GitHub repo.

Generate images using SDXL 1.0

In the following examples, we focus on the capabilities of Stability Diffusion XL 1.0 models, including superior photorealism, enhanced image composition, and the ability to generate realistic faces. We also explore the significantly improved visual aesthetics, resulting in visually appealing outputs. Additionally, we demonstrate the use of shorter prompts, enabling the creation of descriptive imagery with greater ease. Lastly, we illustrate how the text in images is now more legible, further enriching the overall quality of the generated content.

The following example shows using a simple prompt to get detailed images. Using only a few words in the prompt, it was able to create a complex, detailed, and aesthetically pleasing image that resembles the provided prompt.

text = "photograph of latte art of a cat"

output = deployed_model.predict(GenerationRequest(text_prompts=[TextPrompt(text=text)],
                                            seed=5,
                                            height=640,
                                            width=1536,
                                            sampler="DDIM",
                                             ))
decode_and_show(output)

Next, we show the use of the style_preset input parameter, which is only available on SDXL 1.0. Passing in a style_preset parameter guides the image generation model towards a particular style.

Some of the available style_preset parameters are enhance, anime, photographic, digital-art, comic-book, fantasy-art, line-art, analog-film, neon-punk, isometric, low-poly, origami, modeling-compound, cinematic, 3d-mode, pixel-art, and tile-texture. This list of style presets is subject to change; refer to the latest release and documentation for updates.

For this example, we use a prompt to generate a teapot with a style_preset of origami. The model was able to generate a high-quality image in the provided art style.

output = deployed_model.predict(GenerationRequest(text_prompts=[TextPrompt(text="teapot")],
                                            style_preset="origami",
                                            seed = 3,
                                            height = 1024,
                                            width = 1024
                                             ))

Let’s try some more style presets with different prompts. The next example shows a style preset for portrait generation using style_preset="photographic" with the prompt “portrait of an old and tired lion real pose.”

text = "portrait of an old and tired lion real pose"

output = deployed_model.predict(GenerationRequest(text_prompts=[TextPrompt(text=text)],
                                            style_preset="photographic",
                                            seed=111,
                                            height=640,
                                            width=1536,
                                             ))

Now let’s try the same prompt (“portrait of an old and tired lion real pose”) with modeling-compound as the style preset. The output image is a distinct image made without having any particular feel that is imparted by the model, ensuring absolute freedom of style.

Multi-prompting with SDXL 1.0

As we have seen, one of the core foundations of the model is the ability to generate images via prompting. SDXL 1.0 supports multi-prompting. With multi-prompting, you can mix concepts together by assigning each prompt a specific weight. As you can see in the following generated image, it has a jungle background with tall bright green grass. This image was generated using the following prompts. You can compare this to a single prompt from our earlier example.

text1 = "portrait of an old and tired lion real pose"
text2 = "jungle with tall bright green grass"

output = deployed_model.predict(GenerationRequest(
                                            text_prompts=[TextPrompt(text=text1),
                                                          TextPrompt(text=text2, weight=0.7)],
                                            style_preset="photographic",
                                            seed=111,
                                            height=640,
                                            width=1536,
                                             ))

Spatially aware generated images and negative prompts

Next, we look at poster design with a detailed prompt. As we saw earlier, multi-prompting allows you to combine concepts to create new and unique results.

In this example, the prompt is very detailed in terms of subject position, appearance, expectations, and surroundings. The model is also trying to avoid images that have distortion or are poorly rendered with the help of a negative prompt. The image generated shows spatially arranged objects and subjects.

text = “A cute fluffy white cat stands on its hind legs, peering curiously into an ornate golden mirror. But in the reflection, the cat sees not itself, but a mighty lion. The mirror illuminated with a soft glow against a pure white background.”


text = "A cute fluffy white cat stands on its hind legs, peering curiously into an ornate golden mirror. But in the reflection, the cat sees not itself, but a mighty lion. The mirror illuminated with a soft glow against a pure white background."

negative_prompts = ['distorted cat features', 'distorted lion features', 'poorly rendered']

output = deployed_model.predict(GenerationRequest(
                                            text_prompts=[TextPrompt(text=text)],
                                            style_preset="enhance",
                                            seed=43,
                                            height=640,
                                            width=1536,
                                            steps=100,
                                            cfg_scale=7,
                                            negative_prompts=negative_prompts
                                             ))

Let’s try another example, where we keep the same negative prompt but change the detailed prompt and style preset. As you can see, the generated image not only spatially arranges objects, but also changes the style presets with attention to details like the ornate golden mirror and reflection of the subject only.

text = "A cute fluffy white cat stands on its hind legs, peering curiously into an ornate golden mirror. In the reflection the cat sees itself."

negative_prompts = ['distorted cat features', 'distorted lion features', 'poorly rendered']

output = deployed_model.predict(GenerationRequest(
                                            text_prompts=[TextPrompt(text=text)],
                                            style_preset="neon-punk",
                                            seed=4343434,
                                            height=640,
                                            width=1536,
                                            steps=150,
                                            cfg_scale=7,
                                            negative_prompts=negative_prompts
                                             ))

Face generation with SDXL 1.0

In this example, we show how SDXL 1.0 creates enhanced image composition and face generation with realistic features such as hands and fingers. The generated image is of a human figure created by AI with clearly raised hands. Note the details in the fingers and the pose. An AI-generated image such as this would otherwise have been amorphous.

text = "Photo of an old man with hands raised, real pose."

output = deployed_model.predict(GenerationRequest(
                                            text_prompts=[TextPrompt(text=text)],
                                            style_preset="photographic",
                                            seed=11111,
                                            height=640,
                                            width=1536,
                                            steps=100,
                                            cfg_scale=7,
                                             ))

Text generation using SDXL 1.0

SDXL is primed for complex image design workflows that include generation of text within images. This example prompt showcases this capability. Observe how clear the text generation is using SDXL and notice the style preset of cinematic.

text = "Write the following word: Dream"

output = deployed_model.predict(GenerationRequest(text_prompts=[TextPrompt(text=text)],
                                            style_preset="cinematic",
                                            seed=15,
                                            height=640,
                                            width=1536,
                                            sampler="DDIM",
                                            steps=32,
                                             ))

Discover SDXL 1.0 from SageMaker JumpStart

SageMaker JumpStart onboards and maintains foundation models for you to access, customize, and integrate into your ML lifecycles. Some models are open weight models that allow you to access and modify model weights and scripts, whereas some are closed weight models that don’t allow you to access them to protect the IP of model providers. Closed weight models require you to subscribe to the model from the AWS Marketplace model detail page, and SDXL 1.0 is a model with closed weight at this time. In this section, we go over how to discover, subscribe, and deploy a closed weight model from SageMaker Studio.

You can access SageMaker JumpStart by choosing JumpStart under Prebuilt and automated solutions on the SageMaker Studio Home page.

From the SageMaker JumpStart landing page, you can browse for solutions, models, notebooks, and other resources. The following screenshot shows an example of the landing page with solutions and foundation models listed.

Each model has a model card, as shown in the following screenshot, which contains the model name, if it is fine-tunable or not, the provider name, and a short description about the model. You can find the Stable Diffusion XL 1.0 model in the Foundation Model: Image Generation carousel or search for it in the search box.

You can choose Stable Diffusion XL 1.0 to open an example notebook that walks you through how to use the SDXL 1.0 model. The example notebook opens as read-only mode; you need to choose Import notebook to run it.

After importing the notebook, you need to select the appropriate notebook environment (image, kernel, instance type, and so on) before running the code.

Deploy SDXL 1.0 from SageMaker JumpStart

In this section, we walk through how to subscribe and deploy the model.

  1. Open the model listing page in AWS Marketplace using the link available from the example notebook in SageMaker JumpStart.
  2. On the AWS Marketplace listing, choose Continue to subscribe.

If you don’t have the necessary permissions to view or subscribe to the model, reach out to your AWS administrator or procurement point of contact. Many enterprises may limit AWS Marketplace permissions to control the actions that someone can take in the AWS Marketplace Management Portal.

  1. Choose Continue to Subscribe.
  2. On the Subscribe to this software page, review the pricing details and End User Licensing Agreement (EULA). If agreeable, choose Accept offer.
  3. Choose Continue to configuration to start configuring your model.
  4. Choose a supported Region.

You will see a product ARN displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3.

  1. Copy the ARN corresponding to your Region and specify the same in the notebook’s cell instruction.

ARN information may be already available in the example notebook.

  1. Now you’re ready to start following the example notebook.

You can also continue from AWS Marketplace, but we recommend following the example notebook in SageMaker Studio to better understand how deployment works.

Clean up

When you’ve finished working, you can delete the endpoint to release the Amazon Elastic Compute Cloud (Amazon EC2) instances associated with it and stop billing.

Get your list of SageMaker endpoints using the AWS CLI as follows:

!aws sagemaker list-endpoints

Then delete the endpoints:

deployed_model.sagemaker_session.delete_endpoint(endpoint_name)

Conclusion

In this post, we showed you how to get started with the new SDXL 1.0 model in SageMaker Studio. With this model, you can take advantage of the different features offered by SDXL to create realistic images. Because foundation models are pre-trained, they can also help lower training and infrastructure costs and enable customization for your use case.

Resources


About the authors

June Won is a product manager with SageMaker JumpStart. He focuses on making foundation models easily discoverable and usable to help customers build generative AI applications.

Mani Khanuja is an Artificial Intelligence and Machine Learning Specialist SA at Amazon Web Services (AWS). She helps customers using machine learning to solve their business challenges using the AWS. She spends most of her time diving deep and teaching customers on AI/ML projects related to computer vision, natural language processing, forecasting, ML at the edge, and more. She is passionate about ML at edge, therefore, she has created her own lab with self-driving kit and prototype manufacturing production line, where she spends lot of her free time.

Nitin Eusebius is a Sr. Enterprise Solutions Architect at AWS with experience in Software Engineering , Enterprise Architecture and AI/ML. He works with customers on helping them build well-architected applications on the AWS platform. He is passionate about solving technology challenges and helping customers with their cloud journey.

Suleman Patel is a Senior Solutions Architect at Amazon Web Services (AWS), with a special focus on Machine Learning and Modernization. Leveraging his expertise in both business and technology, Suleman helps customers design and build solutions that tackle real-world business problems. When he’s not immersed in his work, Suleman loves exploring the outdoors, taking road trips, and cooking up delicious dishes in the kitchen.

Vivek MadanDr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois at Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences.

Read More

Flag harmful language in spoken conversations with Amazon Transcribe Toxicity Detection

Flag harmful language in spoken conversations with Amazon Transcribe Toxicity Detection

The increase in online social activities such as social networking or online gaming is often riddled with hostile or aggressive behavior that can lead to unsolicited manifestations of hate speech, cyberbullying, or harassment. For example, many online gaming communities offer voice chat functionality to facilitate communication among their users. Although voice chat often supports friendly banter and trash talking, it can also lead to problems such as hate speech, cyberbullying, harassment, and scams. Flagging harmful language helps organizations keep conversations civil and maintain a safe and inclusive online environment for users to create, share, and participate freely. Today, many companies rely solely on human moderators to review toxic content. However, scaling human moderators to meet these needs at a sufficient quality and speed is expensive. As a result, many organizations risk facing high user attrition rates, reputational damage, and regulatory fines. In addition, moderators are often psychologically impacted by reviewing the toxic content.

Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to their applications. Today, we are excited to announce Amazon Transcribe Toxicity Detection, a machine learning (ML)-powered capability that uses both audio and text-based cues to identify and classify voice-based toxic content across seven categories, including sexual harassment, hate speech, threats, abuse, profanity, insults, and graphic language. In addition to text, Toxicity Detection uses speech cues such as tones and pitch to hone in on toxic intent in speech.

This is an improvement from standard content moderation systems that are designed to focus only on specific terms, without accounting for intention. Most enterprises have an SLA of 7–15 days to review content reported by users because moderators must listen to lengthy audio files to evaluate if and when the conversation became toxic. With Amazon Transcribe Toxicity Detection, moderators only review the specific portion of the audio file flagged for toxic content (vs. the entire audio file). The content human moderators must review is reduced by 95%, enabling customers to reduce their SLA to just a few hours, as well as enable them to proactively moderate more content beyond just what’s flagged by the users. It will allow enterprises to automatically detect and moderate content at scale, provide a safe and inclusive online environment, and take action before it can cause user churn or reputational damage. The models used for toxic content detection are maintained by Amazon Transcribe and updated periodically to maintain accuracy and relevance.

In this post, you’ll learn how to:

  • Identify harmful content in speech with Amazon Transcribe Toxicity Detection
  • Use the Amazon Transcribe console for toxicity detection
  • Create a transcription job with toxicity detection using the AWS Command Line Interface (AWS CLI) and Python SDK
  • Use the Amazon Transcribe toxicity detection API response

Detect toxicity in audio chat with Amazon Transcribe Toxicity Detection

Amazon Transcribe now provides a simple, ML-based solution for flagging harmful language in spoken conversations. This feature is especially useful for social media, gaming, and general needs, eliminating the need for customers to provide their own data to train the ML model. Toxicity Detection classifies toxic audio content into the following seven categories and provides a confidence score (0–1) for each category:

  • Profanity – Speech that contains words, phrases, or acronyms that are impolite, vulgar, or offensive.
  • Hate speech – Speech that criticizes, insults, denounces, or dehumanizes a person or group on the basis of an identity (such as race, ethnicity, gender, religion, sexual orientation, ability, and national origin).
  • Sexual – Speech that indicates sexual interest, activity, or arousal using direct or indirect references to body parts, physical traits, or sex.
  • Insults – Speech that includes demeaning, humiliating, mocking, insulting, or belittling language. This type of language is also labeled as bullying.
  • Violence or threat – Speech that includes threats seeking to inflict pain, injury, or hostility toward a person or group.
  • Graphic – Speech that uses visually descriptive and unpleasantly vivid imagery. This type of language is often intentionally verbose to amplify a recipient’s discomfort.
  • Harassment or abusive – Speech intended to affect the psychological well-being of the recipient, including demeaning and objectifying terms.

You can access Toxicity Detection either via the Amazon Transcribe console or by calling the APIs directly using the AWS CLI or the AWS SDKs. On the Amazon Transcribe console, you can upload the audio files you want to test for toxicity and get results in just a few clicks. Amazon Transcribe will identify and categorize toxic content, such as harassment, hate speech, sexual content, violence, insults, and profanity. Amazon Transcribe also provides a confidence score for each category, providing valuable insights into the content’s toxicity level. Toxicity Detection is currently available in the standard Amazon Transcribe API for batch processing and supports US English language.

Amazon Transcribe console walkthrough

To get started, sign in to the AWS Management Console and go to Amazon Transcribe. To create a new transcription job, you need to upload your recorded files into an Amazon Simple Storage Service (Amazon S3) bucket before they can be processed. On the audio settings page, as shown in the following screenshot, enable Toxicity detection and proceed to create the new job. Amazon Transcribe will process the transcription job in the background. As the job progresses, you can expect the status to change to COMPLETED when the process is finished.

To review the results of a transcription job, choose the job from the job list to open it. Scroll down to the Transcription preview section to check results on the Toxicity tab. The UI shows color-coded transcription segments to indicate the level of toxicity, determined by the confidence score. To customize the display, you can use the toggle bars in the Filters pane. These bars allow you to adjust the thresholds and filter the toxicity categories accordingly.

The following screenshot has covered portions of the transcription text due to the presence of sensitive or toxic information.

Transcription API with a toxicity detection request

In this section, we guide you through creating a transcription job with toxicity detection using programming interfaces. If the audio file is not already in an S3 bucket, upload it to ensure access by Amazon Transcribe. Similar to creating a transcription job on the console, when invoking the job, you need to provide the following parameters:

  • TranscriptionJobName – Specify a unique job name.
  • MediaFileUri – Enter the URI location of the audio file on Amazon S3. Amazon Transcribe supports the following audio formats: MP3, MP4, WAV, FLAC, AMR, OGG, or WebM
  • LanguageCode – Set to en-US. As of this writing, Toxicity Detection only supports US English language.
  • ToxicityCategories – Pass the ALL value to include all supported toxicity detection categories.

The following are examples of starting a transcription job with toxicity detection enabled using Python3:

import time
import boto3

transcribe = boto3.client('transcribe', 'us-east-1')
job_name = "toxicity-detection-demo"
job_uri = "s3://my-bucket/my-folder/my-file.wav"
 
# start a transcription job
transcribe.start_transcription_job(
    TranscriptionJobName = job_name,
    Media = { 'MediaFileUri': job_uri },
    OutputBucketName = 'doc-example-bucket', 
    OutputKey = 'my-output-files/',
    LanguageCode = 'en-US',
    ToxicityDetection = [{'ToxicityCategories': ['ALL']}]
)

# wait for the transcription job to complete
while True:
    status = transcribe.get_transcription_job(TranscriptionJobName = job_name)
    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
        break
    print("Not ready yet...")
    time.sleep(5) 
    print(status)

You can invoke the same transcription job with toxicity detection using the following AWS CLI command:

aws transcribe start-transcription-job 
--region us-east-1 
--transcription-job-name toxicity-detection-demo 
--media MediaFileUri=s3://my-bucket/my-folder/my-file.wav 
 --output-bucket-name doc-example-bucket 
--output-key my-output-files/ 
--language-code en-US 
--toxicity-detection ToxicityCategories=ALL

Transcription API with toxicity detection response

The Amazon Transcribe toxicity detection JSON output will include the transcription results in the results field. Enabling toxicity detection adds an extra field called toxicityDetection under the results field. toxicityDetection includes a list of transcribed items with the following parameters:

  • text – The raw transcribed text
  • toxicity – A confidence score of detection (a value between 0–1)
  • categories – A confidence score for each category of toxic speech
  • start_time – The start position of detection in the audio file (seconds)
  • end_time – The end position of detection in the audio file (seconds)

The following is a sample abbreviated toxicity detection response you can download from the console:

{
  "results":{
    "transcripts": [...],
    "items":[...],
    "toxicityDetection": [
      {
        "text": "A TOXIC TRANSCRIPTION SEGMENT GOES HERE.",
        "toxicity": 0.8419,
        "categories": {
          "PROFANITY": 0.7041,
          "HATE_SPEECH": 0.0163,
          "SEXUAL": 0.0097,
          "INSULT": 0.8532,
          "VIOLENCE_OR_THREAT": 0.0031,
          "GRAPHIC": 0.0017,
          "HARASSMENT_OR_ABUSE": 0.0497
        },
        "start_time": 16.298,
        "end_time": 20.35
      },
      ...
    ]
  },
  "status": "COMPLETED"
}

Summary

In this post, we provided an overview of the new Amazon Transcribe Toxicity Detection feature. We also described how you can parse the toxicity detection JSON output. For more information, check out the Amazon Transcribe console and try out the Transcription API with Toxicity Detection.

Amazon Transcribe Toxicity Detection is now available in the following AWS Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Sydney), Europe (Ireland), and Europe (London). To learn more, visit Amazon Transcribe.

Learn more about content moderation on AWS and our content moderation ML use cases. Take the first step towards streamlining your content moderation operations with AWS.


About the author

Lana ZhangLana Zhang is a Senior Solutions Architect at AWS WWSO AI Services team, specializing in AI and ML for content moderation, computer vision, and natural language processing. With her expertise, she is dedicated to promoting AWS AI/ML solutions and assisting customers in transforming their business solutions across diverse industries, including social media, gaming, e-commerce, and advertising & marketing.

Sumit Kumar is a Sr Product Manager, Technical at AWS AI Language Services team. He has 10 years of product management experience across a variety of domains and is passionate about AI/ML. Outside of work, Sumit loves to travel and enjoys playing cricket and Lawn-Tennis.

Read More

Maximize Stable Diffusion performance and lower inference costs with AWS Inferentia2

Maximize Stable Diffusion performance and lower inference costs with AWS Inferentia2

Generative AI models have been experiencing rapid growth in recent months due to its impressive capabilities in creating realistic text, images, code, and audio. Among these models, Stable Diffusion models stand out for their unique strength in creating high-quality images based on text prompts. Stable Diffusion can generate a wide variety of high-quality images, including realistic portraits, landscapes, and even abstract art. And, like other generative AI models, Stable Diffusion models require powerful computing to provide low-latency inference.

In this post, we show how you can run Stable Diffusion models and achieve high performance at the lowest cost in Amazon Elastic Compute Cloud (Amazon EC2) using Amazon EC2 Inf2 instances powered by AWS Inferentia2. We look at the architecture of a Stable Diffusion model and walk through the steps of compiling a Stable Diffusion model using AWS Neuron and deploying it to an Inf2 instance. We also discuss the optimizations that the Neuron SDK automatically makes to improve performance. You can run both Stable Diffusion 2.1 and 1.5 versions on AWS Inferentia2 cost-effectively. Lastly, we show how you can deploy a Stable Diffusion model to an Inf2 instance with Amazon SageMaker.

The Stable Diffusion 2.1 model size in floating point 32 (FP32) is 5 GB and 2.5 GB in bfoat16 (BF16). A single inf2.xlarge instance has one AWS Inferentia2 accelerator with 32 GB of HBM memory. The Stable Diffusion 2.1 model can fit on a single inf2.xlarge instance. Stable Diffusion is a text-to-image model that you can use to create images of different styles and content simply by providing a text prompt as an input. To learn more about the Stable Diffusion model architecture, refer to Create high-quality images with Stable Diffusion models and deploy them cost-efficiently with Amazon SageMaker.

How the Neuron SDK optimizes Stable Diffusion performance

Before we can deploy the Stable Diffusion 2.1 model on AWS Inferentia2 instances, we need to compile the model components using the Neuron SDK. The Neuron SDK, which includes a deep learning compiler, runtime, and tools, compiles and automatically optimizes deep learning models so they can run efficiently on Inf2 instances and extract full performance of the AWS Inferentia2 accelerator. We have examples available for Stable Diffusion 2.1 model on the GitHub repo. This notebook presents an end-to-end example of how to compile a Stable Diffusion model, save the compiled Neuron models, and load it into the runtime for inference.

We use StableDiffusionPipeline from the Hugging Face diffusers library to load and compile the model. We then compile all the components of the model for Neuron using torch_neuronx.trace() and save the optimized model as TorchScript. Compilation processes can be quite memory-intensive, requiring a significant amount of RAM. To circumvent this, before tracing each model, we create a deepcopy of the part of the pipeline that’s being traced. Following this, we delete the pipeline object from memory using del pipe. This technique is particularly useful when compiling on instances with low RAM.

Additionally, we also perform optimizations to the Stable Diffusion models. UNet holds the most computationally intensive aspect of the inference. The UNet component operates on input tensors that have a batch size of two, generating a corresponding output tensor also with a batch size of two, to produce a single image. The elements within these batches are entirely independent of each other. We can take advantage of this behavior to get optimal latency by running one batch on each Neuron core. We compile the UNet for one batch (by using input tensors with one batch), then use the torch_neuronx.DataParallel API to load this single batch model onto each core. The output of this API is a seamless two-batch module: we can pass to the UNet the inputs of two batches, and a two-batch output is returned, but internally, the two single-batch models are running on the two Neuron cores. This strategy optimizes resource utilization and reduces latency.

Compile and deploy a Stable Diffusion model on an Inf2 EC2 instance

To compile and deploy the Stable Diffusion model on an Inf2 EC2 instance, sign to the AWS Management Console and create an inf2.8xlarge instance. Note that an inf2.8xlarge instance is required only for the compilation of the model because compilation requires a higher host memory. The Stable Diffusion model can be hosted on an inf2.xlarge instance. You can find the latest AMI with Neuron libraries using the following AWS Command Line Interface (AWS CLI) command:

aws ec2 describe-images --region us-east-1 --owners amazon 
--filters 'Name=name,Values=Deep Learning AMI Neuron PyTorch 1.13.? (Amazon Linux 2) ????????' 'Name=state,Values=available' 
--query 'reverse(sort_by(Images, &CreationDate))[:1].ImageId' 
--output text

For this example, we created an EC2 instance using the Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04). You can then create a JupyterLab lab environment by connecting to the instance and running the following steps:

run source /opt/aws_neuron_venv_pytorch/bin/activate
pip install jupyterlab
jupyter-lab

A notebook with all the steps for compiling and hosting the model is located on GitHub.

Let’s look at the compilation steps for one of the text encoder blocks. Other blocks that are part of the Stable Diffusion pipeline can be compiled similarly.

The first step is to load the pre-trained model from Hugging Face. The StableDiffusionPipeline.from_pretrained method loads the pre-trained model into our pipeline object, pipe. We then create a deepcopy of the text encoder from our pipeline, effectively cloning it. The del pipe command is then used to delete the original pipeline object, freeing up the memory that was consumed by it. Here, we are quantizing the model to BF16 weights:

model_id = "stabilityai/stable-diffusion-2-1-base"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16)
text_encoder = copy.deepcopy(pipe.text_encoder)
del pipe

This step involves wrapping our text encoder with the NeuronTextEncoder wrapper. The output of a compiled text encoder module will be of dict. We convert it to a list type using this wrapper:

text_encoder = NeuronTextEncoder(text_encoder)

We initialize PyTorch tensor emb with some values. The emb tensor is used as example input for the torch_neuronx.trace function. This function traces our text encoder and compiles it into a format optimized for Neuron. The directory path for the compiled model is constructed by joining COMPILER_WORKDIR_ROOT with the subdirectory text_encoder:

emb = torch.tensor([...])
text_encoder_neuron = torch_neuronx.trace(
        text_encoder.neuron_text_encoder,
        emb,
        compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder'),
        )

The compiled text encoder is saved using torch.jit.save. It’s stored under the file name model.pt in the text_encoder directory of our compiler’s workspace:

text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt')
torch.jit.save(text_encoder_neuron, text_encoder_filename)

The notebook includes similar steps to compile other components of the model: UNet, VAE decoder, and VAE post_quant_conv. After you have compiled all the models, you can load and run the model following these steps:

  1. Define the paths for the compiled models.
  2. Load a pre-trained StableDiffusionPipeline model, with its configuration specified to use the bfloat16 data type.
  3. Load the UNet model onto two Neuron cores using the torch_neuronx.DataParallel API. This allows data parallel inference to be performed, which can significantly speed up model performance.
  4. Load the remaining parts of the model (text_encoder, decoder, and post_quant_conv) onto a single Neuron core.

You can then run the pipeline by providing input text as prompts. The following are some pictures generated by the model for the prompts:

  • Portrait of renaud sechan, pen and ink, intricate line drawings, by craig mullins, ruan jia, kentaro miura, greg rutkowski, loundraw

  • Portrait of old coal miner in 19th century, beautiful painting, with highly detailed face painting by greg rutkowski

  • A castle in the middle of a forest

Host Stable Diffusion 2.1 on AWS Inferentia2 and SageMaker

Hosting Stable Diffusion models with SageMaker also requires compilation with the Neuron SDK. You can complete the compilation ahead of time or during runtime using Large Model Inference (LMI) containers. Compilation ahead of time allows for faster model loading times and is the preferred option.

SageMaker LMI containers provide two ways to deploy the model:

  • A no-code option where we just provide a serving.properties file with the required configurations
  • Bring your own inference script

We look at both solutions and go over the configurations and the inference script (model.py). In this post, we demonstrate the deployment using a pre-compiled model stored in an Amazon Simple Storage Service (Amazon S3) bucket. You can use this pre-compiled model for your deployments.

Configure the model with a provided script

In this section, we show how to configure the LMI container to host the Stable Diffusion models. The SD2.1 notebook available on GitHub. The first step is to create the model configuration package per the following directory structure. Our aim is to use the minimal model configurations needed to host the model. The directory structure needed is as follows:

<config-root-directory> / 
    ├── serving.properties
    │   
    └── model.py [OPTIONAL]

Next, we create the serving.properties file with the following parameters:

%%writefile code_sd/serving.properties
engine=Python
option.entryPoint=djl_python.transformers-neuronx
option.use_stable_diffusion=True
option.model_id=s3url
option.tensor_parallel_degree=2
option.dtype=bf16

The parameters specify the following:

  • option.model_id – The LMI containers use s5cmd to load the model from the S3 location and therefore we need to specify the location of where our compiled weights are.
  • option.entryPoint – To use the built-in handlers, we specify the transformers-neuronx class. If you have a custom inference script, you need to provide that instead.
  • option.dtype – This specifies to load the weights in a specific size. For this post, we use BF16, which further reduces our memory requirements vs. FP32 and lowers our latency due to that.
  • option.tensor_parallel_degree – This parameter specifies the number of accelerators we use for this model. The AWS Inferentia2 chip accelerator has two Neuron cores and so specifying a value of 2 means we use one accelerator (two cores). This means we can now create multiple workers to increase the throughput of the endpoint.
  • option.engine – This is set to Python to indicate we will not be using other compilers like DeepSpeed or Faster Transformer for this hosting.

Bring your own script

If you want to bring your own custom inference script, you need to remove the option.entryPoint from serving.properties. The LMI container in that case will look for a model.py file in the same location as the serving.properties and use that to run the inferencing.

Create your own inference script (model.py)

Creating your own inference script is relatively straightforward using the LMI container. The container requires your model.py file to have an implementation of the following method:

def handle(inputs: Input) which returns an object of type Outputs

Let’s examine some of the critical areas of the attached notebook, which demonstrates the bring your own script function.

Replace the cross_attention module with the optimized version:

# Replace original cross-attention module with custom cross-attention module for better performance
    CrossAttention.get_attention_scores = get_attention_scores
Load the compiled weights for the following
text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder.pt')
decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder.pt')
unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet.pt')
post_quant_conv_filename =. os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv.pt')

These are the names of the compiled weights file we used when creating the compilations. Feel free to change the file names, but make sure your weights file names match what you specify here.

Then we need to load them using the Neuron SDK and set these in the actual model weights. When loading the UNet optimized weights, note we are also specifying the number of Neuron cores we need to load these onto. Here, we load to a single accelerator with two cores:

# Load the compiled UNet onto two neuron cores.
    pipe.unet = NeuronUNet(UNetWrap(pipe.unet))
    logging.info(f"Loading model: unet:created")
    device_ids = [idx for idx in range(tensor_parallel_degree)]
   
    pipe.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_filename), device_ids, set_dynamic_batching=False)
   
 
    # Load other compiled models onto a single neuron core.
 
    # - load encoders
    pipe.text_encoder = NeuronTextEncoder(pipe.text_encoder)
    clip_compiled = torch.jit.load(text_encoder_filename)
    pipe.text_encoder.neuron_text_encoder = clip_compiled
    #- load decoders
    pipe.vae.decoder = torch.jit.load(decoder_filename)
    pipe.vae.post_quant_conv = torch.jit.load(post_quant_conv_filename)

Running the inference with a prompt invokes the pipe object to generate an image.

Create the SageMaker endpoint

We use Boto3 APIs to create a SageMaker endpoint. Complete the following steps:

  1. Create the tarball with just the serving and the optional model.py files and upload it to Amazon S3.
  2. Create the model using the image container and the model tarball uploaded earlier.
  3. Create the endpoint config using the following key parameters:
    1. Use an ml.inf2.xlarge instance.
    2. Set ContainerStartupHealthCheckTimeoutInSeconds to 240 to ensure the health check starts after the model is deployed.
    3. Set VolumeInGB to a larger value so it can be used for loading the model weights that are 32 GB in size.

Create a SageMaker model

After you create the model.tar.gz file and upload it to Amazon S3, we need to create a SageMaker model. We use the LMI container and the model artifact from the previous step to create the SageMaker model. SageMaker allows us to customize and inject various environment variables. For this workflow, we can leave everything as default. See the following code:

inference_image_uri = (
    f"763104351884.dkr.ecr.{region}.amazonaws.com/djl-inference:0 djl-serving-inf2"
)

Create the model object, which essentially creates a lockdown container that is loaded onto the instance and used for inferencing:

model_name = name_from_base(f"inf2-sd")
create_model_response = boto3_sm_client.create_model(
    ModelName=model_name,
    ExecutionRoleArn=role,
    PrimaryContainer={"Image": inference_image_uri, "ModelDataUrl": s3_code_artifact},
)

Create a SageMaker endpoint

In this demo, we use an ml.inf2.xlarge instance. We need to set the VolumeSizeInGB parameters to provide the necessary disk space to load the model and the weights. This parameter is applicable to instances supporting the Amazon Elastic Block Store (Amazon EBS) volume attachment. We can leave the model download timeout and container startup health check to a higher value, which will give adequate time for the container to pull the weights from Amazon S3 and load into the AWS Inferentia2 accelerators. For more details, refer to CreateEndpointConfig.

endpoint_config_response = boto3_sm_client.create_endpoint_config(

EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "VariantName": "variant1",
            "ModelName": model_name,
            "InstanceType": "ml.inf2.xlarge", # - 
            "InitialInstanceCount": 1,
            "ContainerStartupHealthCheckTimeoutInSeconds": 360, 
            "VolumeSizeInGB": 400
        },
    ],
)

Lastly, we create a SageMaker endpoint:

create_endpoint_response = boto3_sm_client.create_endpoint(
    EndpointName=f"{endpoint_name}", EndpointConfigName=endpoint_config_name
)

Invoke the model endpoint

This is a generative model, so we pass in the prompt that the model uses to generate the image. The payload is of the type JSON:

response_model = boto3_sm_run_client.invoke_endpoint(

EndpointName=endpoint_name,
    Body=json.dumps(
        {
            "prompt": "Mountain Landscape", 
            "parameters": {} # 
        }
    ), 
    ContentType="application/json",
)

Benchmarking the Stable Diffusion model on Inf2

We ran a few tests to benchmark the Stable Diffusion model with BF 16 data type on Inf2, and we are able to derive latency numbers that rival or exceed some of the other accelerators for Stable Diffusion. This, coupled with the lower cost of AWS Inferentia2 chips, makes this an extremely valuable proposition.

The following numbers are from the Stable Diffusion model deployed on an inf2.xl instance. For more information about costs, refer to Amazon EC2 Inf2 Instances.

Model Resolution Data type Iterations P95 Latency (ms) Inf2.xl On-Demand cost per hour Inf2.xl (Cost per image)
Stable Diffusion 1.5 512×512 bf16 50 2,427.4 $0.76 $0.0005125
Stable Diffusion 1.5 768×768 bf16 50 8,235.9 $0.76 $0.0017387
Stable Diffusion 1.5 512×512 bf16 30 1,456.5 $0.76 $0.0003075
Stable Diffusion 1.5 768×768 bf16 30 4,941.6 $0.76 $0.0010432
Stable Diffusion 2.1 512×512 bf16 50 1,976.9 $0.76 $0.0004174
Stable Diffusion 2.1 768×768 bf16 50 6,836.3 $0.76 $0.0014432
Stable Diffusion 2.1 512×512 bf16 30 1,186.2 $0.76 $0.0002504
Stable Diffusion 2.1 768×768 bf16 30 4,101.8 $0.76 $0.0008659

Conclusion

In this post, we dove deep into the compilation, optimization, and deployment of the Stable Diffusion 2.1 model using Inf2 instances. We also demonstrated deployment of Stable Diffusion models using SageMaker. Inf2 instances also deliver great price performance for Stable Diffusion 1.5. To learn more about why Inf2 instances are great for generative AI and large language models, refer to Amazon EC2 Inf2 Instances for Low-Cost, High-Performance Generative AI Inference are Now Generally Available. For performance details, refer to Inf2 Performance. Check out additional examples on the GitHub repo.

Special thanks to Matthew Mcclain, Beni Hegedus, Kamran Khan, Shruti Koparkar, and Qing Lan for reviewing and providing valuable inputs.


About the Authors

Vivek Gangasani is a Senior Machine Learning Solutions Architect at Amazon Web Services. He works with machine learning startups to build and deploy AI/ML applications on AWS. He is currently focused on delivering solutions for MLOps, ML inference, and low-code ML. He has worked on projects in different domains, including natural language processing and computer vision.

K.C. Tung is a Senior Solution Architect in AWS Annapurna Labs. He specializes in large deep learning model training and deployment at scale in cloud. He has a Ph.D. in molecular biophysics from the University of Texas Southwestern Medical Center in Dallas. He has spoken at AWS Summits and AWS Reinvent. Today he helps customers to train and deploy large PyTorch and TensorFlow models in AWS cloud. He is the author of two books: Learn TensorFlow Enterprise and TensorFlow 2 Pocket Reference.

Rupinder Grewal is a Sr Ai/ML Specialist Solutions Architect with AWS. He currently focuses on serving of models and MLOps on SageMaker. Prior to this role he has worked as Machine Learning Engineer building and hosting models. Outside of work he enjoys playing tennis and biking on mountain trails.

Read More

AWS offers new artificial intelligence, machine learning, and generative AI guides to plan your AI strategy

AWS offers new artificial intelligence, machine learning, and generative AI guides to plan your AI strategy

Breakthroughs in artificial intelligence (AI) and machine learning (ML) have been in the headlines for months—and for good reason. The emerging and evolving capabilities of this technology promises new business opportunities for customer across all sectors and industries. But the speed of this revolution has made it harder for organizations and consumers to assess what these breakthroughs mean for them specifically.

Over the years, AWS has invested in the democratizing of access to—and understanding of —AI, ML and generative AI. Through announcements around the latest developments in generative AI and the establishment of a $100 million Generative AI Innovation Center program, Amazon Web Services (AWS) has been at the forefront of helping drive understanding about the role that these innovations can play in the lives of both individuals and organizations. To help you understand your options in relation to AI and ML, AWS has published two new guides: the AWS Cloud Adoption Framework for Artificial Intelligence, Machine Learning, and Generative AI and the Getting Started Resource Center machine learning decision guide.

AWS CAF for AI, ML, and Generative AI

The AWS Cloud Adoption Framework for Artificial Intelligence, Machine Learning, and Generative AI (CAF-AI) is designed to help you navigate your AI journey. It’s a mental model for organizations that strive to generate business value from AI/ML. Based on our own—and our customers’—experience, we provide in this framework of best practices for an AI transformation and accelerate business outcomes through innovative use of AI on AWS.

Used by customers and partner teams, CAF-AI helps derive, prioritize, evolve, and communicate a strategy for AI transformation. The following figure shows how we simplify an AI journey through CAF-AI: by working backward from business outcomes (1) to the opportunities that AI, ML, and generative AI provide (2), across your transformation domains (3) and your foundational capabilities (4) through an iterative process (5) of assessing, deriving, and implementing action items for an AI strategy.

In CAF-AI, we describe the AI/ML journey you may experience as your organizational capabilities on AI and ML mature. To guide you, we zoom in on the evolution of foundational capabilities that we have observed assist an organization to grow its maturity in AI further.

We also provide prescriptive guidance through an overview of the target state of these foundational capabilities and explain how to evolve them step by step to generate business value along the way. The following figure shows these foundational capabilities for cloud and AI/ML adoption. A capability is an organizational ability to use processes to deploy resources (such as people, technology, and other tangible or intangible assets) to achieve an outcome. Because the CAF-AI is a living index of knowledge, you can expect it to grow and change over time.

Designed as a starting and orientation point throughout a customer’s ML and AI journey, CAF-AI is intended to be a document that organizations can draw inspiration from as they shape their mid-term AI and ML agenda and try to understand the important topics and perspectives that influence it. Depending on where you are at on your AI/ML journey, you might focus on a specific section and hone your skills there, or use the whole document to judge maturity and help direct near-term improvement areas.

Because the business problem space to which AI/ML can be applied isn’t a single function or domain, it applies across all functions of businesses and all industry domains where you are looking for ways to reset the playing field in markets where AI/ML does make an economical difference. The AWS Cloud Adoption Framework for Artificial Intelligence, Machine Learning, and Generative AI is one of the many tools AWS provides to help you achieve this outcome. As AI/ML enables solutions and solution paths to problems that have remained uneconomical to solve for decades (or were technically impossible to tackle without AI/ML), the resulting business outcomes can be profound.

The Getting Started Resource Center machine learning decision guide

AWS has always been about choice. As you ramp up your use of AI, it is paramount that you have the right support in choosing the best service, model, and infrastructure for your business needs. The Getting Started Resource Center machine learning decision guide is designed to provide you with a detailed overview of the AI and ML services offered by AWS, and provide structured guidance on how to choose the services that might be right for you and your use cases.

The decision guide can also help you articulate and consider the criteria that will inform your choices. For example, it describes the range of AWS ML services (see the following screenshot), each of which caters to different levels of management requirement, depending on how much control and customization you need.

The guide also explains the unique capabilities of AWS services in realizing the power of foundation models and where you can make the most of this fast-evolving branch of machine learning.

It offers details on specific services, links to detailed, service-level technical guides, a comparison table that highlights the unique capabilities of key services, and criteria for selecting AI and ML services. It also provides a curated set of links to key resources that can help you get started in using AI, ML, and generative AI services on AWS.

If you want to understand the breadth of AI, ML, and generative AI offerings provided by AWS, this decision guide is a great place to start.

Conclusion

The Getting Started Resource Center machine learning decision guide, together with the AWS Cloud Adoption Framework for Artificial Intelligence, Machine Learning, and Generative AI, covers the technical and non-technical questions that we often hear. We hope you find these new resources useful and look forward to your feedback on them.


About the Authors

Caleb Wilkinson has more than a decade of experience building AI solutions. As a Senior Machine Learning Strategist at AWS, Caleb pioneers innovative applications of AI that push the boundaries of possibility and helps organizations benefit responsibly from artificial intelligence. He is the co-author of CAF-AI.

Alexander Wöhlke has a decade of experience in AI and ML. He is Senior Machine Learning Strategist and Technical Product Manager at the AWS Generative AI Innovation Center. He works with large organizations on their AI-Strategy and helps them take calculated risks at the forefront of technological development. He is the co-author of CAF-AI.

Geof Wheelwright manages the AWS decision content team, which writes and develops the growing collection of decision guides on the AWS Getting Started Resource Center. His team created the Choosing an AWS machine learning decision guide. He has enjoyed working with AI and its ancestors since first being introduced to simple, text-based Apple II versions of ELIZA in the early 1980s.

Read More

New technical deep dive course: Generative AI Foundations on AWS

New technical deep dive course: Generative AI Foundations on AWS

Generative AI Foundations on AWS is a new technical deep dive course that gives you the conceptual fundamentals, practical advice, and hands-on guidance to pre-train, fine-tune, and deploy state-of-the-art foundation models on AWS and beyond. Developed by AWS generative AI worldwide foundations lead Emily Webber, this free hands-on course and the supporting GitHub source code launched via AWS Youtube. If you are looking for a curated playlist of the top resources, concepts, and guidance to get up to speed on foundation models, and especially those that unlock generative capabilities in your data science and machine learning projects, then look no further.

During this 8-hour deep dive, you will be introduced to the key techniques, services, and trends that will help you understand foundation models from the ground up. This means breaking down theory, mathematics, and abstract concepts combined with hands-on exercises to gain functional intuition for practical application. Throughout the course, we focus on a wide spectrum of progressively complex generative AI techniques, giving you a strong base to understand, design, and apply your own models for the best performance. We’ll start with recapping foundation models, understanding where they come from, how they work, how they relate to generative AI, and what you can to do customize them. You’ll then learn about picking the right foundation model to suit your use case.

Once you’ve developed a strong contextual understanding of foundation models and how to use them, you’ll be introduced to the core subject of this course: pre-training new foundation models. You’ll learn why you’d want to do this as well as how and where it’s competitive. You’ll even learn how to use the scaling laws to pick the right model, dataset, and compute sizes. We’ll cover preparing training datasets at scale on AWS, including picking the right instances and storage techniques. We’ll cover fine-tuning your foundation models, evaluating recent techniques, and understanding how to run these with your scripts and models. We’ll dive into reinforcement learning with human feedback, exploring how to use it skillfully and at scale to truly maximize your foundation model performance.

Finally, you’ll learn how to apply theory to production by deploying your new foundation model on Amazon SageMaker, including across multiple GPUs and using top design patterns like retrieval augmented generation and chained dialogue. As an added bonus, we’ll walk you through a Stable Diffusion deep dive, prompt engineering best practices, standing up LangChain, and more.

More of a reader than a video consumer? You can check out my 15-chapter book “Pretrain Vision and Large Language Models in Python: End-to-end techniques for building and deploying foundation models on AWS,” which released May 31, 2023, with Packt publishing and is available now on Amazon. Want to jump right into the code? I’m with you—every video starts with a 45-minute overview of the key concepts and visuals. Then I’ll give you a 15-minute walkthrough of the hands-on portion. All of the example notebooks and supporting code will ship in a public repository, which you can use to step through on your own. Feel free to reach out to me on Medium, LinkedIn, GitHub, or through your AWS teams. Learn more about generative AI on AWS.

Happy trails!

Course outline

1. Introduction to Foundation Models

  • What are large language models and how do they work?
  • Where do they come from?
  • What are other types of generative AI?
  • How do you customize a foundation model?
  • How do you evaluate a Generative model?
  • Hands-on walk through: Foundation Models on SageMaker

Lesson 1 slides

Lesson 1 hands-on demo resources

2. Picking the right foundation model

  • Why starting with the right foundation model matters
  • Considering size
  • Considering accuracy
    • Considering ease-of-use
  • Considering licensing
  • Considering previous examples of this model working well in your industry
    • Considering external benchmarks

Lesson 2 slides

Lesson 2 hands-on demo resources

3. Using pretrained foundation models: prompt engineering and fine-tuning

  • The benefits of starting with a pre-trained foundation model
  • Prompt engineering:
    • Zero-shot
    • Single-shot
    • Few-shot
    • Summarization
      • Classification
    • Translation
  • Fine-tuning
    • Classic fine-tuning
    • Parameter efficient fine-tuning
    • Hugging Face’s new library
    • Hands-on walk through: prompt engineering and fine-tuning on SageMaker

Lesson 3 slides

Lesson 3 hands-on demo resources

4. Pretraining a new foundation model

  • Why would you want or need to create a new foundation model?
    • Comparing pretraining to fine-tuning
  • Preparing your dataset for pretraining
  • Distributed training on SageMaker: libraries, scripts, jobs, resources
  • Why and how to adapt a new script to SageMaker distributed training

Lesson 4 slides

Lesson 4 hands-on demo resources

5. Preparing data and training at scale

  • Options for prepping data at scale on AWS
  • Explain SageMaker job parallelism on CPU instances
  • Explain modes of sending data to SageMaker Training
  • Introduction to FSx for Lustre
  • Using FSx for Lustre at scale for SageMaker Training
  • Hands-on walk through: configuring Lustre for SageMaker Training

Lesson 5 slides

Lesson 5 hands-on demo resources

6. Reinforcement learning with human feedback

  • What is this technique and why do we care about it
  • How it gets around problems with subjectivity and objectivity through ranking human preferences at scale
  • How does it work?
  • How to do this with SageMaker Ground Truth
  • Updated reward modeling
  • Hands-on walk through: RLFH on SageMaker

Lesson 6 slides

Lesson 6 hands-on demo resources

7. Deploying a foundation model

  • Why do we want to deploy models?
  • Different options for deploying FM’s on AWS
  • How to optimize your model for deployment
  • Large model deployment container deep dive
  • Top configuration tips for deploying FM’s on SageMaker
  • Prompt engineering tips for invoking foundation models
  • Using retrieval augmented generation to mitigate hallucinations
  • Hands-on walk through: Deploying an FM on SageMaker

Lesson 7 slides

Lesson 7 hands-on demo resources


About the author

Emily Webber joined AWS just after SageMaker launched, and has been trying to tell the world about it ever since! Outside of building new ML experiences for customers, Emily enjoys meditating and studying Tibetan Buddhism.

Read More

AWS Reaffirms its Commitment to Responsible Generative AI

AWS Reaffirms its Commitment to Responsible Generative AI

As a pioneer in artificial intelligence and machine learning, AWS is committed to developing and deploying generative AI responsibly

As one of the most transformational innovations of our time, generative AI continues to capture the world’s imagination, and we remain as committed as ever to harnessing it responsibly. With a team of dedicated responsible AI experts, complemented by our engineering and development organization, we continually test and assess our products and services to define, measure, and mitigate concerns about accuracy, fairness, intellectual property, appropriate use, toxicity, and privacy. And while we don’t have all of the answers today, we are working alongside others to develop new approaches and solutions to address these emerging challenges. We believe we can both drive innovation in AI, while continuing to implement the necessary safeguards to protect our customers and consumers.

At AWS, we know that generative AI technology and how it is used will continue to evolve, posing new challenges that will require additional attention and mitigation. That’s why Amazon is actively engaged with organizations and standard bodies focused on the responsible development of next-generation AI systems including NIST, ISO, the Responsible AI Institute, and the Partnership on AI. In fact, last week at the White House, Amazon signed voluntary commitments to foster the safe, responsible, and effective development of AI technology. We are eager to share knowledge with policymakers, academics, and civil society, as we recognize the unique challenges posed by generative AI will require ongoing collaboration.

This commitment is consistent with our approach to developing our own generative AI services, including building foundation models (FMs) with responsible AI in mind at each stage of our comprehensive development process. Throughout design, development, deployment, and operations we consider a range of factors including 1/ accuracy, e.g., how closely a summary matches the underlying document; whether a biography is factually correct; 2/ fairness, e.g., whether outputs treat demographic groups similarly; 3/ intellectual property and copyright considerations; 4/ appropriate usage, e.g., filtering out user requests for legal advice, medical diagnoses, or illegal activities, 5/ toxicity, e.g., hate speech, profanity, and insults; and 6/ privacy, e.g., protecting personal information and customer prompts. We build solutions to address these issues into our processes for acquiring training data, into the FMs themselves, and into the technology that we use to pre-process user prompts and post-process outputs. For all our FMs, we invest actively to improve our features, and to learn from customers as they experiment with new use cases.

For example, Amazon’s Titan FMs are built to detect and remove harmful content in the data that customers provide for customization, reject inappropriate content in the user input, and filter the model’s outputs containing inappropriate content (such as hate speech, profanity, and violence).

To help developers build applications responsibly, Amazon CodeWhisperer provides a reference tracker that displays the licensing information for a code recommendation and provides link to the corresponding open-source repository when necessary. This makes it easier for developers to decide whether to use the code in their project and make the relevant source code attributions as they see fit. In addition, Amazon CodeWhisperer filters out code recommendations that include toxic phrases, and recommendations that indicate bias.

Through innovative services like these, we will continue to help our customers realize the benefits of generative AI, while collaborating across the public and private sectors to ensure we’re doing so responsibly. Together, we will build trust among customers and the broader public, as we harness this transformative new technology as a force for good.


About the Author

Peter Hallinan leads initiatives in the science and practice of Responsible AI at AWS AI, alongside a team of responsible AI experts. He has deep expertise in AI (PhD, Harvard) and entrepreneurship (Blindsight, sold to Amazon). His volunteer activities have included serving as a consulting professor at the Stanford University School of Medicine, and as the president of the American Chamber of Commerce in Madagascar. When possible, he’s off in the mountains with his children: skiing, climbing, hiking and rafting

Read More