Label text for aspect-based sentiment analysis using SageMaker Ground Truth

Label text for aspect-based sentiment analysis using SageMaker Ground Truth

The Amazon Machine Learning Solutions Lab (MLSL) recently created a tool for annotating text with named-entity recognition (NER) and relationship labels using Amazon SageMaker Ground Truth. Annotators use this tool to label text with named entities and link their relationships, thereby building a dataset for training state-of-the-art natural language processing (NLP) machine learning (ML) models. Most importantly, this is now publicly available to all AWS customers.

Customer Use Case: is one of the world’s leading online travel platforms. Understanding what customers are saying about the company’s 28 million+ property listings on the platform is essential for maintaining a top-notch customer experience. Previously, could only utilize traditional sentiment analysis to interpret customer-generated reviews at scale. Looking to upgrade the specificity of these interpretations, recently turned to the MLSL for help with building a custom annotated dataset for training an aspect-based sentiment analysis model.

Traditional sentiment analysis is the process of classifying a piece of text as positive, negative, or neutral as a singular sentiment. This works to broadly understand if users are satisfied or unsatisfied with a particular experience. For example, with traditional sentiment analysis, the following text may be classified as “neutral”:

Our stay at the hotel was nice. The staff was friendly and the rooms were clean, but our beds were quite uncomfortable.

Aspect-based sentiment analysis offers a more nuanced understanding of content. In the case of, rather than taking a customer review as a whole and classifying it categorically, it can take sentiment from within a review and assign it to specific aspects. For example, customer reviews of a given hotel might praise the immaculate pool and fitness area, but give critical feedback on the restaurant and lounge.

The statement which would have been classified as “neutral” by traditional sentiment analysis will, with aspect-based sentiment analysis, become:

Our stay at the hotel was nice. The staff was friendly and the rooms were clean, but our beds were quite uncomfortable.

  • Hotel: Positive
  • Staff: Positive
  • Room: Positive
  • Beds: Negative sought to build a custom aspect-based sentiment analysis model that would tell them which specific parts of the guest experience (from a list of 50+ aspects) were positivenegative, or neutral.

Before could build a training dataset for this model, they needed a way to annotate it. MLSL’s annotation tool provided the much-needed customized solution. Human review was performed on a large collection of hotel reviews. Then, annotators completed named-entity annotation on sentiment and guest-experience text spans and phrases before linking appropriate spans together.

The new aspect-based model lets personalize both accommodations and reviews to its customers. Highlighting the positive and negative aspects of each accommodation enables the customers to choose their perfect match. In addition, different customers care about different aspects of the accommodation, and the new model opens up the opportunity to show the most relevant reviews to each one.

Labeling Requirements

Although Ground Truth provides a built-in NER text annotation capability, it doesn’t provide the ability to link entities together. With this in mind, and MLSL worked out the following high-level requirements for a new named entity recognition text labeling tool that:

  • Accepts as input: text, entity labels, relationship labels, and classification labels.
  • Optionally accepts as input pre-annotated data with the preceding label and relationship annotations.
  • Presents the annotator with either unannotated or pre-annotated text.
  • Allows annotators to highlight and annotate arbitrary text with an entity label.
  • Allows annotators to create relationships between two entity annotations.
  • Allows annotators to easily navigate large numbers of entity labels.
  • Supports grouping entity labels into categories.
  • Allow overlapping relationships, which means that the same annotated text segment can be related to more than one other annotated text segment.
  • Allows overlapping entity label annotations, which means that two annotations can overlap the same piece of text. For example, the text “Seattle Space Needle” can have both the annotations “Seattle” → “locations”, and “Seattle Space Needle” → “attractions”.
  • Output format is compatible with input format, and it can be fed back into subsequent labeling tasks.
  • Supports UTF-8 encoded text containing emoji and other multi-byte characters.
  • Supports left-to-right languages.

Sample Annotation

Consider the following document:

We loved the location of this hotel! The rooftop lounge gave us the perfect view of space needle. It is also a short drive away from pike place market and the waterfront.
Food was only available via room service, which was a little disappointing but makes sense in this post-pandemic world.
Overall, a reasonably priced experience.

Loading this document into the new NER annotation presents a worker with the following interface:

Worker presented with an unannotated document

Worker presented with an unannotated document

In this case, the worker’s job is to:

  • Label entities related to the property (location, price, food, etc.)
  • Label entities related to sentiment (positive, negative, or neutral)
  • Link property-related named entities to sentiment-related keywords to accurately capture the guest experience
Worker performing annotations

Worker performing annotations

Annotation speed was an important consideration of the tool. Using a sequence of intuitive keyboard shortcuts and mouse gestures, annotators can drive the interface and:

  • Add and remove named entity annotations
  • Add relationships between named entities
  • Jump to the beginning and end of the document
  • Submit the document

Additionally, there is support for overlapping labels. For example, Seattle Space Needle: in this phrase, Seattle is annotated both as a location by itself and as a part of the attraction name.

The completed annotation provides a more complete, nuanced analysis of the data:

Completed document

Completed document

Relationships can be configured in many levels, from entity categories to other entity categories (for example, from “food” to “sentiment”), or between individual entity types. Relationships are directed, so annotators can link an aspect like food to a sentiment, but not vice-versa (unless explicitly enabled). When drawing relationships, the annotation tool will automatically deduce the relationship label and direction.

Configuring the NER Annotation Tool

In this section, we cover how to customize the NER annotation tool for customer-specific use cases. This includes configuring:

  • The input text to annotate
  • Entity labels
  • Relationship Labels
  • Classification Labels
  • Pre-annotated data
  • Worker instructions

We’ll cover the specifics of the input and output document formats, as well as provide some examples of each.

Input Document Format

The NER annotation tool expects the following JSON formatted input document (Fields with a question mark next to the name are optional).

  text: string;
  tokenRows?: string[][];
  documentId?: string;
  entityLabels?: {
    name: string;
    shortName?: string;
    category?: string;
    shortCategory?: string;
    color?: string;
  classificationLabels?: string[];
  relationshipLabels?: {
    name: string;
    allowedRelationships?: {
        sourceEntityLabelCategories?: string[];
        targetEntityLabelCategories?: string[];
        sourceEntityLabels?: string[];
        targetEntityLabels?: string[];
  entityAnnotations?: {
    id: string;
    start: number;
    end: number;
    text: string;
    label: string;
    labelCategory?: string;
  relationshipAnnotations?: {
    sourceEntityAnnotationId: string;
    targetEntityAnnotationId: string;
    label: string;
  classificationAnnotations?: string[];
  meta?: {
    instructions?: string;
    disableSubmitConfirmation?: boolean;
    multiClassification: boolean;

In a nutshell, the input format has these characteristics:

  • Either entityLabels or classificationLabels (or both) are required to annotate.
  • If entityLabels are given, then relationshipLabels can be added.
  • Relationships can be allowed between different entity/category labels or a mix of these.
  • The “source” of a relationship is the entity that the directed arrow starts with, while the “target” is where it’s heading.
Field Type Description
text string Required. Input text for annotation.
tokenRows string[][] Optional. Custom tokenization of input text. Array of arrays of strings. Top level array represents each row of text (line breaks), and second level array represents tokens on each row. All characters/runes in the input text must be accounted for in tokenRows, including any white space.
documentId string Optional. Optional value for customers to keep track of document being annotated.
entityLabels object[] Required if classificationLabels is blank. Array of entity labels.
entityLabels[].name string Required. Entity label display name.
entityLabels[].category string Optional. Entity label category name.
entityLabels[].shortName string Optional. Display this text over annotated entities rather than the full name.
entityLabels[].shortCategory string Optional. Display this text in the entity annotation select dropdown instead of the first four letters of the category name.
entityLabels.color string Optional. Hex color code with “#” prefix. If blank, then it will automatically assign a color to the entity label.
relationshipLabels object[] Optional. Array of relationship labels.
relationshipLabels[].name string Required. Relationship label display name.
relationshipLabels[].allowedRelationships object[] Optional. Array of values restricting what types of source and destination entity labels this relationship can be assigned to. Each item in array is “OR’ed” together.
relationshipLabels[].allowedRelationships[].sourceEntityLabelCategories string[] Required to set either sourceEntityLabelCategories or sourceEntityLabels (or both). List of legal source entity label category types for this relationship.
relationshipLabels[].allowedRelationships[].targetEntityLabelCategories string[] Required to set either targetEntityLabelCategories or targetEntityLabels (or both). List of legal target entity label category types for this relationship.
relationshipLabels[].allowedRelationships[].sourceEntityLabels string[] Required to set either sourceEntityLabelCategories or sourceEntityLabels (or both). List of legal source entity label types for this relationship.
relationshipLabels[].allowedRelationships[].sourceEntityLabels string[] Required to set either targetEntityLabelCategories or targetEntityLabels (or both). List of legal target entity label types for this relationship.
classificationLabels string[] Required if entityLabels is blank. List of document level classification labels.
entityAnnotations object[] Optional. Array of entity annotations to pre-annotate input text with.
entityAnnotations[].id string Required. Unique identifier for this entity annotation. Used to reference this entity in relationshipAnnotations.
entityAnnotations[].start number Required. Start rune offset of this entity annotation.
entityAnnotations[].end number Required. End rune offset of this entity annotation.
entityAnnotations[].text string Required. Text content between start and end rune offset.
entityAnnotations[].label string Required. Associated entity label name (from the names in entityLabels).
entityAnnotations[].labelCategory string Optional.Associated entity label category (from the categories in entityLabels).
relationshipAnnotations object[] Optional. Array of relationship annotations.
relationshipAnnotations[].sourceEntityAnnotationId string Required. Source entity annotation ID for this relationship.
relationshipAnnotations[].targetEntityAnnotationId string Required. Target entity annotation ID for this relationship.
relationshipAnnotations[].label string Required. Associated relationship label name.
classificationAnnotations string[] Optional. Array of classifications to pre-annotate the document with.
meta object Optional. Additional configuration parameters.
meta.instructions string Optional. Instructions for the labeling annotator in Markdown format.
meta.disableSubmitConfirmation boolean Optional. Set to true to disable submit confirmation modal.
meta.multiClassification boolean Optional. Set to true to enable multi-label mode for classificationLabels.

Here are a few sample documents to get a better sense of this input format

Documents that adhere to this schema are provided to Ground Truth as individual line items in an input manifest.

Output Document Format

The output format is designed to feedback easily into a new annotation task. Optional fields in the output document are set if they are also set in the input document. The only difference between the input and output formats is the meta object.

  text: string;
  tokenRows?: string[][];
  documentId?: string;
  entityLabels?: {
    name: string;
    shortName?: string;
    category?: string;
    shortCategory?: string;
    color?: string;
  relationshipLabels: {
    name: string;
    allowedRelationships?: {
        sourceEntityLabelCategories?: string[];
        targetEntityLabelCategories?: string[];
        sourceEntityLabels?: string[];
        targetEntityLabels?: string[];
  classificationLabels?: string[];
  entityAnnotations?: {
    id: string;
    start: number;
    end: number;
    text: string;
    labelCategory?: string;
    label: string;
  relationshipAnnotations?: {
    sourceEntityAnnotationId: string;
    targetEntityAnnotationId: string;
    label: string;
  classificationAnnotations?: string[];
  meta: {
    instructions?: string;
    disableSubmitConfirmation?: boolean;
    multiClassification: boolean;
    runes: string[];
    rejected: boolean;
    rejectedReason: string;
Field Type Description
meta.rejected boolean Is set to true if the annotator rejected this document.
meta.rejectedReason string Annotator’s reason given for rejecting the document.
meta.runes string[] Array of runes accounting for all of the characters in the input text. Used to calculate entity annotation start and end offsets.

Here is a sample output document that’s been annotated:

Runes note:

A “rune” in this context is a single highlight-able character in text, including multi-byte characters such as emoji.

  • Because different programming languages represent multi-byte characters differently, using “Runes” to define every highlight-able character as a single atomic element means that we have an unambiguous way to describe any given text selection.
  • For example, Python treats the Swedish flag as four characters:

    But JavaScript treats the same emoji as two characters

To eliminate any ambiguity, we will treat the Swedish flag (and all other emoji and multi-byte characters) as a single atomic element.

  • Offset: Rune position relative to Input Text (starting with index 0)

Performing NER Annotations with Ground Truth

As a fully managed data labeling service, Ground Truth builds training datasets for ML. For this use case, we use Ground Truth to send a collection of text documents to a pool of workers for annotation. Finally, we review for quality.

Ground Truth can be configured to build a data labeling job using the new NER tool as a custom template.

Specifically, we will:

  1. Create a private labeling workforce of workers to perform the annotation task
  2. Create a Ground Truth input manifest with the documents we want to annotate and then upload it to Amazon Simple Storage Service (Amazon S3)
  3. Create pre-labeling task and post-labeling task Lambda functions
  4. Create a Ground Truth labeling job using the custom NER template
  5. Annotate documents
  6. Review results

NER Tool Resources

A complete list of referenced resources and sample documents can be found in the following chart:

Description Filename
Production custom worker task template worker-template.liquid.html
Sample Ground Truth Pre-Labeling Lambda
Sample Ground Truth Post-Labeling Lambda
Sample Input Document #1 (pre-labeled) review-01.json
Sample Input Document #2 (pre-labeled) review-02.json
Sample Input Document #3 (custom tokenization) review-03.json
Sample Input Document #4 (Document classification) review-04.json
Sample Ground Truth Input Manifest reviews.manifest
Output for Sample Input Document #1 review-01-output.json

Labeling Workforce Creation

Ground Truth uses SageMaker labeling workforces to manage workers and distribute tasks. Create a private workforce, a worker team called ner-worker-team, and assign yourself to the team using the instructions found in Create a Private Workforce (Amazon SageMaker Console).

Once you’ve added yourself to a private workforce and confirmed your email, note the worker portal URL from the AWS Management Console:

  • Navigate to SageMaker
  • Navigate to Ground Truth → Labeling workforces
  • Select the Private tab
  • Note the URL Labeling portal sign-in URL

Log in to the worker portal to view and start work on labeling tasks.

Input Manifest

The Ground Truth input data manifest is a JSON-lines file where each line contains a single worker task. In our case, each line will contain a single JSON encoded Input Document containing the text that we want to annotate and the NER annotation schema.

Download a sample input manifest reviews.manifest from

Note: each row in the input manifest needs a top-level key source or source-ref. You can learn more in Use an Input Manifest File in the Amazon SageMaker Developer Guide.

Upload Input Manifest to Amazon S3

Upload this input manifest to an S3 bucket using the AWS Management Console or from the command line, thereby replacing your-bucket with an actual bucket name.

aws s3 cp reviews.manifest s3://your-bucket/ner-input/reviews.manifest

Download custom worker template

Download the NER tool custom worker template from by viewing the source and saving the contents locally, or from the command line:


Create pre-labeling task and post-labeling task Lambda functions

Download sample pre-labeling task Lambda function: from

Download sample pre-labeling task Lambda function: from

  • Create pre-labeling task Lambda function from the AWS Management Console:
    • Navigate to Lambda
    • Select Create function
    • Specify Function name as smgt-ner-pre-labeling-task-lambda
    • Select RuntimePython 3.6
    • Select Create function
    • In Function, paste the contents of
    • Select Deploy
  • Create post-labeling task Lambda function from the AWS Management Console:
    • Navigate to Lambda
    • Select Create function
    • Specify Function name as smgt-ner-post-labeling-task-lambda
    • Select RuntimePython 3.6
    • Expand Change default execution role
    • Select Create a new role from AWS policy templates
    • Enter the Role name: smgt-ner-post-labeling-task-lambda-role
    • Select Create function
    • Select the Permissions tab
    • Select the Role name: smgt-ner-post-labeling-task-lambda-role to open the IAM console
    • Add two policies to the role
      • Select Attach policies
      • Attach the AmazonS3FullAccess policy
      • Select Add inline policy
      • Select the JSON tab
      • Paste in the following inline policy:
            "Version": "2012-10-17",
            "Statement": {
                "Effect": "Allow",
                "Action": "sts:AssumeRole",
                "Resource": "arn:aws:iam::YOUR_ACCOUNT_NUMBER:role/service-role/AmazonSageMaker-ExecutionRole-*"

    • Navigate back to the smgt-ner-post-labeling-task-lambda Lambda function configuration page
    • Select the Configuration tab
    • In Function code →, paste the contents of
    • Select Deploy

Create a Ground Truth labeling job

From the AWS Management Console:

  • Navigate to the Amazon SageMaker service
  • Navigate to Ground TruthLabeling Jobs.
  • Select Create labeling job
  • Specify a Job Name
  • Select Manual Data Setup
  • Specify the Input dataset location where you uploaded the input manifest earlier (e.g., s3://your-bucket/ner-input/sample-smgt-input-manifest.jsonl)
  • Specify the Output dataset location to point to a different folder in the same bucket (e.g., s3://your-bucket/ner-output/)
  • Specify an IAM Role by selecting Create new role
    • Allow this role to access any S3 bucket by selecting S3 buckets you specifyAny S3 bucket when creating the policy
    • In a new AWS Management Console window, open the IAM console and select Roles
    • Search for the name of the role that you just created (for example, AmazonSageMaker-ExecutionRole-20210301T154158)
    • Select the role name to open the role in the console
    • Attach the following three policies:
      • Select Attach policies
      • Attach the AWSLambda_FullAccess to the role
      • Select Trust RelationshipsEdit Trust Relationships
      • Edit the trust relationship JSON,
      • Replace YOUR_ACCOUNT_NUMBER with your numerical AWS Account number, to read:
          "Version": "2012-10-17",
          "Statement": [
              "Effect": "Allow",
              "Principal": {
                "Service": ""
              "Action": "sts:AssumeRole"
              "Effect": "Allow",
              "Principal": {
                "AWS": "arn:aws:iam::YOUR_ACCOUNT_NUMBER:role/service-role/smgt-ner-post-labeling-task-lambda-role"
              "Action": "sts:AssumeRole"

      • Save the trust relationship
  • Return to the new Ground Truth job in the previous AWS Management Console window: under Task Category, select Custom
  • Select Next
  • Select Worker types: Private
  • Select the Private team : ner-worker-team that was created in the preceding section
  • In the Custom labeling task setup text area, clear the default content and paste in the content of the worker-template.liquid.html file obtained earlier
  • Specify the Pre-labeling task Lambda function with the previously created function: smgt-ner-pre-labeling
  • Specify the Post-labeling task Lambda function with the function created earlier: smgt-ner-post-labeling
  • Select Create

Annotate documents

Once the Ground Truth job is created, we can start annotating documents. Open the worker portal for our workforce created earlier (In the AWS Management Console, navigate to the SageMakerGround Truth → Labeling workforces, Private, and open the Labeling portal sign-in URL )

Sign in and select the first labeling task in the table, and then select “Start working” to open the annotator. Perform your annotations and select submit on all three of the sample documents.

Review results

As Ground Truth annotators complete tasks, results will be available in the output S3 bucket:


Once all tasks for a labeling job are complete, the consolidated output is available in the output.manifest file located here:


This output manifest is a JSON-lines file with one annotated text document per line in the “Output Document Format” specified previously. This file is compatible with the “Input Document Format”, and it can be fed directly into a subsequent Ground Truth job for another round of annotation. Alternatively, it can be parsed and sent to an ML training job. Some scenarios where we might employ a second round of annotations are:

  • Breaking the annotation process into two steps where the first annotator identifies entity annotations and the second annotator draws relationships
  • Taking a sample of our output.manifest and sending it to a second, more experienced annotator for review as a quality control check

Custom Ground Truth Annotation Templates

The NER annotation tool described in this document is implemented as a custom Ground Truth annotation template. AWS customers can build their own custom annotation interfaces using the instructions found here:


By working together, and the Amazon MLSL were able to develop a powerful text annotation tool that is capable of creating complex named-entity recognition and relationship annotations.

We encourage AWS customers with an NER text annotation use case to try the tool described in this post. If you’d like help accelerate the use of ML in your products and services, please contact the Amazon Machine Learning Solutions Lab.

About the Authors

Dan Noble is a Software Development Engineer at Amazon where he helps build delightful user experiences. In his spare time, he enjoys reading, exercising, and having adventures with his family.

Pri Nonis is a Deep Learning Architect at the Amazon ML Solutions Lab, where he works with customers across various verticals, and helps them accelerate their cloud migration journey, and to solve their ML problems using state-of-the-art solutions and technologies.

Niharika Jayanthi is a Front End Engineer at AWS, where she develops custom annotation solutions for Amazon SageMaker customers. Outside of work, she enjoys going to museums and working out.

Amit Beka is a Machine Learning Manager at, with over 15 years of experience in software development and machine learning. He is fascinated with people and languages, and how computers are still puzzled by both.

Read More

Optimize your inference jobs using dynamic batch inference with TorchServe on Amazon SageMaker

Optimize your inference jobs using dynamic batch inference with TorchServe on Amazon SageMaker

In deep learning, batch processing refers to feeding multiple inputs into a model. Although it’s essential during training, it can be very helpful to manage the cost and optimize throughput during inference time as well. Hardware accelerators are optimized for parallelism, and batching helps saturate the compute capacity and often leads to higher throughput.

Batching can be helpful in several scenarios during model deployment in production. Here we broadly categorize them into two use cases:

  • Real-time applications where several inference requests are received from different clients and are dynamically batched and fed to the serving model. Latency is usually important in these use cases.
  • Offline applications where several inputs or requests are batched on the client side and sent to the serving model. Higher throughput is often the objective for these use cases, which helps manage the cost. Example use cases include video analysis and model evaluation.

Amazon SageMaker provides two popular options for your inference jobs. For real-time applications, SageMaker Hosting uses TorchServe as the backend serving library that handles the dynamic batching of the received requests. For offline applications, you can use SageMaker batch transform jobs. In this post, we go through an example of each option to help you get started.

Because TorchServe is natively integrated with SageMaker via the SageMaker PyTorch inference toolkit, you can easily deploy a PyTorch model onto TorchServe using SageMaker Hosting. There may be also times when you need to customize your environment further using custom Docker images. In this post, we first show how to deploy a real-time endpoint using the native SageMaker PyTorch inference toolkit and configuring the batch size to optimize throughput. In the second example, we demonstrate how to use a custom Docker image to configure advanced TorchServe configurations that aren’t available as an environment variable to optimize your batch inference job.

Best practices for batch inference

Batch processing can increase throughput and optimize your resources because it helps complete a larger number of inferences in a certain amount of time at the expense of latency. To optimize model deployment for higher throughput, the general guideline is to increase the batch size until throughput decreases. This most often suits offline applications, where several inputs are batched (such as video frames, images, or text) to get prediction outputs.

For real-time applications, latency is often a main concern. There’s a trade-off between higher throughput and increased batch size and latency; you may need to adjust as needed to meet your latency SLA. In terms of best practices on the cloud, the cost per a certain number of inferences is a helpful guideline in making an informed decision that meets your business needs. One contributing factor in managing the cost is choosing the right accelerator. For more information, see Choose the best AI accelerator and model compilation for computer vision inference with Amazon SageMaker.

TorchServe dynamic batching on SageMaker

TorchServe is the native PyTorch library for serving models in production at scale. It’s a joint development from Facebook and AWS. TorchServe allows you to monitor, add custom metrics, support multiple models, scale up and down the number of workers through secure management APIs, and provide inference and explanation endpoints.

To support batch processing, TorchServe provides a dynamic batching feature. It aggregates the received requests within a specified time frame, batches them together, and sends the batch for inference. The received requests are processed through the handlers in TorchServe. TorchServe has several default handlers, and you’re welcome to author a custom handler if your use case isn’t covered. When using a custom handler, make sure that the batch inference logic has been implemented in the handler. An example of a custom handler with batch inference support is available on GitHub.

You can configure dynamic batching using two settings, batch_size and max_batch_delay, either through environment variables in SageMaker or through the file in TorchServe (if using a custom container). TorchServe uses any of the settings that comes first, either the maximum batch size (batch_size) or specified time window to wait for the batch of requests through max_batch_delay.

With TorchServe integrations with SageMaker, you can now deploy PyTorch models natively on SageMaker, where you can define a SageMaker PyTorch model. You can add custom model loading, inference, and preprocessing and postprocessing logic in a script passed as an entry point to the SageMaker PyTorch (see the following example code). Alternatively, you can use a custom container to deploy your models. For more information, see The SageMaker PyTorch Model Server.

You can set the batch size for PyTorch models on SageMaker through environment variables. If you choose to use a custom container, you can bundle settings in with your model when packaging your model in TorchServe. The following code snippet shows an example how to set the batch size using environment variables and how to deploy a PyTorch model on SageMaker:

from SageMaker.pytorch.model import PyTorchModel

env_variables_dict = {

pytorch_model = PyTorchModel(

predictor = pytorch_model.deploy(initial_instance_count=1, instance_type="ml.c5.2xlarge", serializer=SageMaker.serializers.JSONSerializer(), deserializer=SageMaker.deserializers.BytesDeserializer())

In the code snippet, model_artifact refers to all the required files for loading back the trained model, which is archived in a .tar file and pushed into an Amazon Simple Storage Service (Amazon S3) bucket. The is similar to the TorchServe custom handler; it has several functions that you can override to accommodate the model initialization, preprocessing and postprocessing of received requests, and inference logic.

The following notebook shows a full example of deploying a Hugging Face BERT model.

If you need a custom container, you can build a custom container image and push it to the Amazon Elastic Container Registry (Amazon ECR) repository. The model artifact in this case can be a TorchServe .mar file that bundles the model artifacts along with handler. We demonstrate this in the next section, where we use a SageMaker batch transform job.

SageMaker batch transform job

For offline use cases where requests are batched from a data source such as a dataset, SageMaker provides batch transform jobs. These jobs enable you to read data from an S3 bucket and write the results to a target S3 bucket. For more information, see Use Batch Transform to Get Inferences from Large Datasets. A full example of batch inference using batch transform jobs can be found in the following notebook, where we use a machine translation model from the FLORES competition. In this example, we show how to use a custom container to score our model using SageMaker. Using a custom inference container allows you to further customize your TorchServe configuration. In this example, we want to change and disable JSON decoding, which we can do through the TorchServe file.

When using a custom handler for TorchServe, we need to make sure that the handler implements the batch inference logic. Each handler can have custom functions to perform preprocessing, inference, and postprocessing. An example of a custom handler with batch inference support is available on GitHub.

We use our custom container to bundle the model artifacts with the handler as we do in TorchServe (making a .mar file). We also need an entry point to the Docker container that starts TorchServe with the batch size and JSON decoding set in We demonstrate this in the example notebook.

The SageMaker batch transform job requires access to the input files from an S3 bucket, where it divides the input files into mini batches and sends them for inference. Consider the following points when configuring the batch transformation job:

  • Place the input files (such as a dataset) in an S3 bucket and set it as a data source in the job settings.
  • Assign an S3 bucket in which to save the results of the batch transform job.
  • Set BatchStrategy to MultiRecord and SplitType to Line if you need the batch transform job to make mini batches from the input file. If it can’t automatically split the dataset into mini batches, you can divide it into mini batches by putting each batch in a separate input file, placed in the data source S3 bucket.
  • Make sure that the batch size fits into the memory. SageMaker usually handles this automatically; however, when dividing batches manually, this needs to be tuned based on the memory.

The following code is an example for a batch transform job:

s3_bucket_name= 'SageMaker-us-west-2-XXXXXXXX'
batch_input = f"s3://{s3_bucket_name}/folder/jobename_TorchServe_SageMaker/"
batch_output = f"s3://{s3_bucket_name}/folder/jobname_TorchServe_SageMaker_output/"

batch_job_name = 'job-batch' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

request = {
    "ModelClientConfig": {
        "InvocationsTimeoutInSeconds": 3600,
        "InvocationsMaxRetries": 1,
    "TransformJobName": batch_job_name,
    "ModelName": model_name,
    "BatchStrategy": "MultiRecord",
    "TransformOutput": {"S3OutputPath": batch_output, "AssembleWith": "Line", "Accept": "application/json"},
    "TransformInput": {
        "DataSource": {
            "S3DataSource": {"S3DataType": "S3Prefix", "S3Uri": batch_input}
        "SplitType" : "Line",
        "ContentType": "application/json",
    "TransformResources": {"InstanceType": "ml.p2.xlarge", "InstanceCount": 1},

When we use the preceding settings and launch our transform job, it reads the input files from the source S3 bucket in batches and sends them for inference. The results are written back to the S3 bucket specified to the outputs.

The following code snippet shows how to create and launch a job using the preceding settings:


while True:
    response = sm.describe_transform_job(TransformJobName=batch_job_name)
    status = response["TransformJobStatus"]
    if status == "Completed":
        print("Transform job ended with status: " + status)
    if status == "Failed":
        message = response["FailureReason"]
        print("Transform failed with the following error: {}".format(message))
        raise Exception("Transform job failed")
    print("Transform job is still in status: " + status)


In this post, we reviewed the two modes SageMaker offers for online and offline inference. The former uses dynamic batching provided in TorchServe to batch the requests from multiple clients. The latter uses a SageMaker transform job to batch the requests from input files in an S3 bucket and run inference.

We also showed how to serve models on SageMaker using native SageMaker PyTorch inference toolkit container images, and how to use custom containers for use cases that require advanced TorchServe configuration settings.

As TorchServe continues to evolve to address the needs of the PyTorch community, new features are integrated into SageMaker to provide performant ways for serving models in production. For more information, check out the TorchServe GitHub repo and the SageMaker examples.

About the Authors

Phi Nguyen is a solutions architect at AWS helping customers with their cloud journey with a special focus on data lake, analytics, semantics technologies and machine learning. In his spare time, you can find him biking to work, coaching his son’s soccer team or enjoying nature walk with his family.

Nikhil Kulkarni is a software developer with AWS Machine Learning, focusing on making machine learning workloads more performant on the cloud and is a co-creator of AWS Deep Learning Containers for training and inference. He’s passionate about distributed Deep Learning Systems. Outside of work, he enjoys reading books, fiddling with the guitar and making pizza.

Hamid Shojanazeri is a Partner Engineer at Pytorch working on OSS high performance model optimization and serving. Hamid holds a P.h.D in Computer vision and worked as a researcher in multimedia labs in Australia, Malaysia and NLP lead in He likes to find simpler solutions to hard problems and is an art enthusiast in his spare time.

Geeta Chauhan leads AI Partner Engineering at Meta AI with expertise in building resilient, anti-fragile, large scale distributed platforms for startups and Fortune 500s. Her team works with strategic partners, machine learning leaders across the industry and all major cloud service providers for building and launching new AI product services and experiences; and taking PyTorch models from research to production.. She is a winner of Women in IT – Silicon Valley – CTO of the year 2019, an ACM Distinguished Speaker and thought leader on topics ranging from Ethics in AI, Deep Learning, Blockchain, IoT. She is passionate about promoting use of AI for Good.

Read More

Graph-based recommendation system with Neptune ML: An illustration on social network link prediction challenges

Graph-based recommendation system with Neptune ML: An illustration on social network link prediction challenges

Recommendation systems are one of the most widely adopted machine learning (ML) technologies in real-world applications, ranging from social networks to ecommerce platforms. Users of many online systems rely on recommendation systems to make new friendships, discover new music according to suggested music lists, or even make ecommerce purchase decisions based on the recommended products. In social networks, one common use case is to recommend new friends to a user based on the users’ other connections. Users with common friends likely know each other. Therefore, they should have a higher score for a recommendation system to propose if they haven’t been connected yet.

Social networks can naturally be expressed in a graph, where the nodes represent people, and the connections between people, such as friendship or co-workers, are represented by edges. The following illustrates one such social network. Let’s imagine that we have a social network with the members (nodes) Bill, Terry, Henry, Gary, and Alistair. Their relationships are represented by a link (edge), and each person’s interests, such as sports, arts, games, and comics, are represented by node properties.

The objective here is to predict if there is a potential missing link between members. For example, should we recommend a connection between Henry and Terry? Looking at the graph, we can see that they have two mutual friends, Gary and Alistair. Therefore, there is a good chance that Henry and Terry either already knew each other or may get to know each other soon. How about Henry and Bill? They don’t have any mutual friends, but they do have some weak connection through their friends’ connections. In addition, they both have similar interests in arts, comics, and games. Should we promote this connection? All of these questions and intuitions are the core logic of social network recommendation systems.

One possible way to do this is recommending relationships based on graph exploration. In graph query languages, such as Apache TinkerPop Gremlin, the implementation of rule sets such as counting common friends, is relatively easy, and it can be used to determine the link between Henry and Terry. However, these rule sets will be very complicated when we want account for other attributes such as node properties, connection strength, etc. Let’s imagine a rule set to determine the link between Henry and Bill. This rule set must account for their common interests and their weak connections through certain paths in the graph. To increase robustness, we might also need to add a distance factor to favor strong connections and penalize the weak ones. Similarly, we would want a factor to favor common interests. Soon, the rule sets that can reveal complex hidden patterns will become impossible to enumerate.

ML technology lets us discover hidden patterns by learning algorithms. One example is XGBoost, which is widely used for classification or regression tasks. However, algorithms such as XGBoost use a conventional ML approach based on a tabular data format. These approaches aren’t optimized for graph data structures, and they require complex feature engineering to cope with these data patterns.

In the preceding social network example, the graph interaction information is critical to improving the recommendation accuracy. Graph Neural Network (GNN) is a deep learning (DL) framework that can be applied to graph data to perform edge-level, node-level, or graph-level prediction tasks. GNNs can leverage individual node characteristics as well as graph structure information when learning the graph representation and underlying patterns. Therefore, in recent years, GNN-based methods have set new standards on many recommender system benchmarks. See more detailed information in recent research papers: A Comprehensive Survey on Graph Neural Networks and Graph Learning based Recommender Systems: A Review.

The following is one famous example of such a use case. Researchers and engineers at Pinterest have trained Graph Convolutional Neural Networks for Web-Scale Recommender Systems, called PinSage, with three billion nodes representing pins and boards, and 18 billion edges. PinSage generates high-quality embeddings that represent pins (visual bookmarks to online content). These can be used for a wide range of downstream recommendation tasks, such as nearest-neighbor lookups in the learned embedding space for content discovery and recommendations.

In this post, we will walk you through how to use GNNs for recommendation use cases by casting this as a link prediction problem. We’ll also illustrate how Neptune ML can facilitate implementation. We will also provide sample code on GitHub to train your first GNN with Neptune ML, and make recommendation inferences on the demo graph through link prediction tasks.

Link prediction with Graph Neural Networks

Considering the previous social network example, we would like to recommend new friends to Henry. Both Terry and Bill would be good candidates. Terry has more common friends (Gary, Alistair) with Henry but no common interests. While Bill shares common interests (arts, comics, games) with Henry, but no common friends. Which one would be a better recommendation? When framed as a link prediction problem, the task is to assign a score to any possible link between the two nodes. The higher the link score, the more likely this recommendation will converge. By learning link structures already present in the graph, a link prediction model can generalize new link predictions that ‘complete’ the graph.

The parameters of the function f that predicts the link score is learned during the training phase. Since the function f makes a prediction for any two nodes in the graph, the feature vectors associated with the nodes are essential to the learning process. To predict the link score between Henry and Bill, we have a set of raw data features (arts, comics, games) that can represent Henry and Bill. We transform this, along with the connections in the graph, using a GNN network to form new representations known as node embeddings. We can also supplement or replace the initial raw features with vectors from an embedding lookup table that can be learned during the training process. Ideally, the embedded features for Henry and Bill should represent their interests as well as their topological information from the graph.

How GNNs work

A GNN transforms the initial node features to node embeddings by using a technique called message passing. The message passing process is illustrated in the following figure. In the beginning, the node attributes or features are converted into numerical attributes. In our case, we do one-hot encoding of the categorical features (Henry’s interests: arts, comics, games). Then, the first layer of GNN aggregates all of the neighbors’ (Gary and Alistair) raw features (in black) to form a new set of features (in yellow). A common approach is the linear transformation of all of the neighboring features, then aggregate them through a normalized sum, and pass the results into a non-linear activation function, such as ReLU, to generate a new vector set. The following figure illustrates how message passing works for node Henry. H, the GNN message passing algorithm, will compute representations for all of the graph nodes. These are later used as the input features for the second layer.

The second layer of a GNN repeats the same process. It takes the previously computed feature (in yellow) from the first layer as input, aggregates all of Gary and Alistair’s neighbors’ new embedded features, and generates second layer feature vectors for Henry (in orange). As you can see, by repeating the message passing mechanism, we extended the feature aggregation to 2-hop neighbors. In our illustration, we limit ourselves to 2-hop neighbors, but extending into 3-hop neighbors can be done in the same way by adding another GNN layer.

The final embeddings from Henry and Bill (in orange) are used for computing the score. During the training process, the link score is defined as 1 when the edge exists between the two nodes (positive sample), and as 0 when the edges between the two nodes don’t exist (negative sample). Then, the error or loss between the actual score and the prediction f(e1,e2) is back-propagated into previous layers to adjust the weights. Once the training is finished, we can rely on the embedded feature vectors for each node to compute their link scores with our function f.

In this example, we simplified the learning task on a homogeneous graph, where all of the nodes and edges are of the same type. For example, all of the nodes in the graph are the “People” type, and all of the edges are the “friends with” type. However, the learning algorithm also supports heterogeneous graphs with different node and edge types. We can extend the previous use case to recommend products to different users that share similar interactions and interests. See more details in this research paper: Modeling Relational Data with Graph Convolutional Networks.

At AWS re:Invent 2020, we introduced Amazon Neptune ML, which lets our customers train ML models on graph data, without necessarily having deep ML expertise. In this example, with the help of Neptune ML, we will show you how to build your own recommender system on graph data.

Train your Graph Convolution Network with Amazon Neptune ML

Neptune ML uses graph neural network technology to automatically create, train, and deploy ML models on your graph data. Neptune ML supports common graph prediction tasks, such as node classification and regression, edge classification and regression, and link prediction.

It is powered by:

  • Amazon Neptune: a fast, reliable, and fully managed graph database, which is optimized for storing billions of relationships and querying the graph with millisecond latency. Amazon Neptune supports three open standards for building graph applications: Apache TinkerPop Gremlin, RDF SPARQL, and openCypher. Learn more at Overview of Amazon Neptune Features.
  • Amazon SageMaker: a fully managed service that provides every developer and data scientist with the ability to prepare build, train, and deploy ML models quickly.
  • Deep Graph Library (DGL): an open-source, high-performance, and scalable Python package for DL on graphs. It provides fast and memory-efficient message passing primitives for training Graph Neural Networks. Neptune ML uses DGL to automatically choose and train the best ML model for your workload. This enables you to make ML-based predictions on graph data in hours instead of weeks.

The easiest way to get started with Neptune ML is to use the AWS CloudFormation quickstart template. The template installs all of the necessary components, including a Neptune DB cluster, and sets up the network configurations, IAM roles, and associated SageMaker notebook instance with pre-populated notebook samples for Neptune ML.

The following figure illustrates different steps for Neptune ML to train a GNN-based recommendation system. Let’s zoom in on each step and explore what it involves:

  1. Data export configuration

The first step in our Neptune ML process is to export the graph data from the Neptune cluster. We must specify the parameters and model configuration for the data export task. We use the Neptune workbench for all of the configurations and commends. The workbench lets us work with the Neptune DB cluster using Jupyter notebooks hosted by Amazon SageMaker. In addition, it provides a number of magic commands in the notebooks that save a great deal of time and effort. Here is our example of export parameters:

"command": "export-pg", 
"params": { "endpoint": neptune_host,
            "profile": "neptune_ml",
            "cloneCluster": False
"outputS3Path": f'{s3_bucket_uri}/neptune-export',
"additionalParams": {
        "neptune_ml": {
          "version": "v2.0",
        "targets": [
                "edge": ["User", "FRIEND", "User"],
                "type" : "link_prediction"
         "features": [
                "node": "User",
                "property": "interests",
                "type": "category",
                "separator": " ;" 
"jobSize": "small"}

In export_params, we must configure the basic setup, such as the Neptune cluster and output Amazon Simple Storage Service (S3) path for exported data storage. The configuration specified in additionalParams is the type of ML task to perform. In this example, link prediction is optionally used to predict a particular edge type (User—FRIEND—User). If no target type is specified, then Neptune ML will assume that the task is Link Prediction. The parameters also specify details about the data stored in our graph and how the ML model will interpret that data (we have “User” as node, and “interests” as node property).

To run each step in the ML building process, simply use Neptune workbench commands. The Neptune workbench contains a line magic and a cell magic that can save you a lot of time managing these steps. To run the data export, use the Neptune workbench command: %neptune_ml export start

Once the export job completes, we will have the Neptune graph exported into CSV format and stored in an S3 bucket. There will be two types of files: nodes.csv and edges.csv. A file named training-data-configuration.json will also be generated which has the configuration needed for Neptune ML to perform model training.

See Export data from Neptune for Neptune ML for more information.

  1. Data preprocessing

Neptune ML performs feature extraction and encoding as part of the data-processing steps. Common types of property pre-processing include: encoding categorical features through one-hot encoding, bucketing numerical features, or using word2vec to encode a string property or other free-form text property values.

In our example, we will simply use the property “interests”. Neptune ML encodes the values as multi-categorical. However, if a categorical value is complex (more than three words per node), then Neptune ML infers the property type to be text and uses the text_word2vec encoding.

To run data preprocessing, use the following Neptune notebook magic command: %neptune_ml dataprocessing start

At the end of this step, a DGL graph is generated from the exported dataset for use by the model training step. Neptune ML automatically tunes the model with Hyperparameter Optimization Tuning jobs defined in training-data-configuration.json. We can download and modify this file to tune the model’s hyperparameters, such as batch-size, num-hidden, num-epochs, dropout, etc. Here is a sample configuration.json file.

See Processing the graph data exported from Neptune for training for more information.

  1. Model training

The next step is the automated training of the GNN model. The model training is done in two stages. The first stage uses a SageMaker Processing job to generate a model training strategy. This is a configuration set that specifies what type of model and model hyperparameter ranges will be used for the model training.

Then, a SageMaker hyperparameter tuning job will be launched. The SageMaker Hyperparameter Tuning Optimization job runs a pre-specified number of model training job trials on the processed data, tries different hyperparameter combinations according to the model-hpo-configuration.json file, and stores the model artifacts generated by the training in the output Amazon S3 location.

To start the training step, you can use the %neptune_ml training start command.

Once all of the training jobs are complete, the Hyperparameter tuning job will save the artifacts from the best performing model, which will be used for inference.

At the end of the training, Neptune ML will instruct SageMaker to save the trained model, the raw embeddings calculated for the nodes and edges, and the mapping information between the embeddings and node indices.

See Training a model using Neptune ML for more information.

  1. Create an inference endpoint in Amazon SageMaker

Now that the graph representation is learned, we can deploy the learned model behind an endpoint to perform inference requests. The model input will be the User for which we need to generate friends’ recommendations, along with the edge type, and the output will be the list of the likely recommended friends for that user.

To deploy the model to the SageMaker endpoint instance, use the %neptune_ml endpoint create command.

  1. Query the ML model using Gremlin

Once the endpoint is ready, we can use it for graph inference queries. Neptune ML supports graph inference queries in Gremlin or SPARQL. In our example, we can now check the friends recommendation with Neptune ML on User “Henry”. It requires nearly the same syntax to traverse the edge, and it lists the other Users that are connected to Henry through the FRIEND connection.

    V().hasLabel('User').has('name', 'Henry').
1 Bill

Neptune#ml.prediction returns the connection determined by Neptune ML predictions by using the model that we just trained on the social graph. Bill is returned just like our expectation.

Here is another sample prediction query that is used to predict the top eight users that are most likely to connect with Henry:

with("Neptune#ml.limit",8).V().hasLabel('User').has('name', 'Henry').

1 Bill, 2 Colin, 3 Sarah, 4 Gordon, 5 Mary, 6 Josie, 7 Arnold, 8 Terry

The results are ranked from stronger connection to weaker, where link Henry — FRIEND — Colin and Henry — FRIEND — Terry is also proposed. This proposition is through graph-based ML where complex interaction patterns on graph can be explored.

See Gremlin inference queries in Neptune ML for more information.

Model transform or retraining when graph data changes

Another question you might ask is: what if my social network changes, or if I want to make recommendations for newly added users? In these scenarios, where you have continuously changing graphs, you may need to update ML predictions with the newest graph data. The generated model artifacts after training are directly tied to the training graph. This means that the inference endpoint must be updated once the entities in the original training graph changes.

However, you don’t need to retrain the whole model to make predictions on the updated graph. With an incremental model inference workflow, you only need to export the Neptune DB data, perform an incremental data preprocessing, run a model batch transform job, and then update the inference endpoint. The model-transform step takes the trained model from the main workflow and the results of the incremental data preprocessing step as inputs. Then it outputs a new model artifact to use for inference. This new model artifact is created from the up-to-date graph data.

One special focus here is for the model-transform step command. It can compute model artifacts on graph data that was not used for model training. The node embeddings are re-computed and any existing node embeddings are overridden. Neptune ML applies the learned GNN encoder from the previous trained model to the new graph data nodes with their new features. Therefore, the new graph data must be processed using the same feature encodings, and it must adhere to the same graph schema as the original graph data. See more Neptune ML implementation details at Generating new model artifacts.

Moreover, you can retrain the whole model if the graph changes dramatically, or if the previously trained model could no longer accurately represent the underlying interactions. In this case, re-using the learned model parameters on a new graph cannot guarantee a similar model performance. You must retrain your model on the new graph. To accelerate the hyperparameters search, Neptune ML can leverage the information from the previous model training task with warm start: the results of previous training jobs are used to select good combinations of hyperparameters to search over the new tuning job.

See workflows for handling evolving graph data for more details.


In this post, you have seen how Neptune ML and GNNs can help you make recommendations on graph data using a link prediction task by combining information from the complex interaction patterns in the graph.

Link prediction is one way of implementing a recommendation system on graph. You can construct your recommender in many other ways. You can use the embeddings learned during link prediction training to cluster the nodes into different segments in an unsupervised manner, and recommend items to the one belonging to the same segment. Furthermore, you can obtain the embeddings and feed them into a downstream similarity-based recommendation system as an input feature. Now this additional input feature also encodes the semantic information derived from graph and can provide significant improvements to the overall precision of the system. Learn more about Amazon Neptune ML by visiting the website or feel free to ask questions in the comments!

About the Authors

Yanwei Cui, PhD, is a Machine Learning Specialist Solutions Architect at AWS. He started machine learning research at IRISA (Research Institute of Computer Science and Random Systems), and has several years of experience building artificial intelligence powered industrial applications in computer vision, natural language processing and online user behavior prediction. At AWS, he shares the domain expertise and helps customers to unlock business potentials, and to drive actionable outcomes with machine learning at scale. Outside of work, he enjoys reading and traveling.

Will Badr is a Principal AI/ML Specialist SA who works as part of the global Amazon Machine Learning team. Will is passionate about using technology in innovative ways to positively impact the community. In his spare time, he likes to go diving, play soccer and explore the Pacific Islands.

Read More

Secure access to Amazon SageMaker Studio with AWS SSO and a SAML application

Secure access to Amazon SageMaker Studio with AWS SSO and a SAML application

Cloud security at AWS is the highest priority. Amazon SageMaker Studio offers various mechanisms to protect your data and code using integration with AWS security services like AWS Identity and Access Management (IAM), AWS Key Management Service (AWS KMS), or network isolation with Amazon Virtual Private Cloud (Amazon VPC).

Customers in highly regulated industries, like financial services, can set up Studio in VPC only mode to enable network isolation and disable internet access from Studio notebooks. You can use IAM integration with Studio to control which users have access to resources like Studio notebooks, the Studio IDE, or Amazon SageMaker training jobs.

A popular use case is to restrict access to the Studio IDE to only users from inside a specified network CIDR range or a designated VPC. You can achieve this by implementing IAM identity-based SageMaker policies and attaching those policies to the IAM users or groups that require those permissions. However, the SageMaker domain must be configured with IAM authentication mode, because the IAM identity-based policies aren’t supported in AWS Single Sign-On (SSO) authentication mode.

Many customers use AWS SSO to enable centralized workforce identity control and provide a consistent user sign-in experience. This post shows how to implement this use case while keeping AWS SSO capabilities to access Studio.

Solution overview

When you set up a SageMaker domain in VPC-only mode and specify the subnets and security groups, SageMaker creates elastic network interfaces (ENIs) that are associated with your security groups in the specified subnets. ENIs allow your training containers to connect to resources in your VPC.

In this mode, the direct internet access from notebooks is completely disabled, and all the traffic is routed through an ENI in your private VPC. This also includes traffic from Studio UI widgets and interfaces—such as experiment management, autopilot, and model monitor—to their respective backend SageMaker APIs. AWS recommends using VPC only mode to exercise fine-grained control on network access of Studio.

The first challenge is that even though Studio is deployed with no internet connectivity, Studio IDE can still be accessed from anywhere, assuming access to the AWS Management Console and Studio is granted to an IAM principal. This situation isn’t acceptable if you want to fully isolate Studio from a public network and contain all communication within a tightly controlled private VPC.

To address this challenge and disable any access to Studio IDE except from a designated VPC or a CIDR range, you can use the CreatePresignedDomainUrl SageMaker API. The IAM role or user used to call this API defines the permissions to access Studio. Now you can use IAM identity-based policies to implement the desired access configuration. For example, to enable access only from a designated VPC, add the following condition to the IAM policy, associated with an IAM principal, which is used to generate a presigned domain URL:

"Condition": {
                "StringEquals": {
                    "aws:SourceVpc": "vpc-111bbaaa"

To enable access only from a designated VPC endpoint or endpoints, specify the following condition:

"Condition": {
                "ForAnyValue:StringEquals": {
                    "aws:sourceVpce": [

Use the following condition to restrict access from a designated CIDR range:

"Condition": {
                "IpAddress": {
                    "aws:SourceIp": [

The second challenge is this that IAM-based access control works only when the SageMaker domain is configured in IAM authentication mode; you can’t use it when the SageMaker domain is deployed in AWS SSO mode. The next section shows how to address these challenges and implement IAM-based access control with AWS SSO access to Studio.

Architecture overview

Studio is published as a SAML application, which is assigned to a specific SageMaker Studio user profile. Users can conveniently access Studio directly from the AWS SSO portal, as shown in the following screenshot.

The solution integrates with a custom SAML 2.0 application as the mechanism to trigger the user authentication for Studio. It requires that the custom SAML application is configured with the Amazon API Gateway endpoint URL as its Assertion Consumer Service (ACS), and needs mapping attributes containing the AWS SSO user ID as well as the SageMaker domain ID.

The API Gateway endpoint calls an AWS Lambda function that parses the SAML response to extract the domain ID and user ID and use them to generate a Studio presigned URL. The Lambda function finally performs a redirection via an HTTP 302 response to sign in the user in Studio.

An IAM policy controls the network environment that Studio users are allowed to log in from, which includes restricting conditions as described in the previous section. This IAM policy is attached to the Lambda function. The IAM policy contains a permission to call the sagemaker:CreatePresignedDomainURL API for a specific user profile only:

    "Version": "2012-10-17",
    "Statement": [
            "Action": [
            "Resource": "arn:aws:sagemaker: <Region>:<Account_id>
            "Effect": "Allow"
            "Condition": {
                "NotIpAddress": {
                    "aws:VpcSourceIp": ""
            "Action": [
            "Resource": "arn:aws:sagemaker: <Region>:<Account_id>
            "Effect": "Deny"

The following diagram shows the solution architecture.

The solution deploys a SageMaker domain into your private VPC and VPC endpoints to access Studio, SageMaker runtime, and the SageMaker API via a private connection without need for an internet gateway. The VPC endpoints are configured with private DNS enabled (PrivateDnsEnabled=True) to associate a private hosted zone with your VPC. This enables Studio to access the SageMaker API using the default public DNS name api.sagemaker.<Region> resolved to the private IP address of the endpoint rather than using the VPC endpoint URL.

You need to add VPC endpoints to your VPC if you want to access any other AWS services like Amazon Simple Storage Service (Amazon S3), Amazon Elastic Container Registry (Amazon ECR), AWS Security Token Service (AWS STS), AWS CloudFormation, or AWS CodeCommit.

You can fully control permissions used to generate the presigned URL and any other API calls with IAM policies attached to the Lambda function execution role or control access to any used AWS service via VPC endpoint policies. For examples of using IAM policies to control access to Studio and SageMaker API, refer to Control Access to the SageMaker API by Using Identity-based Policies.

Although the solution requires the Studio domain to be deployed in IAM mode, it does allow for AWS SSO to be used as the mechanism for end users to log in to Studio.

The following subsections contain detailed descriptions of the main solution components.

API Gateway

The API Gateway endpoint acts as the target for the application ACS URL configured in the custom SAML 2.0 application. The endpoint is private, and has a resource called /saml and a POST method with integration request configured as Lambda proxy. The solution uses a VPC endpoint with a configured com.amazonaws.<region>.execute-api DNS name to call this API endpoint from within the VPC.


A custom SAML 2.0 application is configured with the API Gateway endpoint URL https:/{ restapi-id} as its application ACS URL, and uses attribute mappings with the following requirements:

  • User identifier:
    • User attribute in the application – user name
    • Maps user attribute in AWS SSO${user:AD_GUID}
  • SageMaker domain ID identifier:
    • User attribute in the applicationdomain-id
    • Maps user attribute in AWS SSO – Domain ID for the Studio instance

The application implements the access control for an AWS SSO user by provisioning a Studio user profile with the name equal to the AWS SSO user ID.

Lambda function

The solution configures a Lambda function as an invocation point for the API Gateway /saml resource. The function parses the SAMLResponse sent by AWS SSO, extracts the domain-id as well as the user name, and calls the createPresignedDomainUrl SageMaker API to retrieve the Studio URL and token and redirect the user to log in using an HTTP 302 response. The Lambda function has a specific IAM policy attached to its execution role that allows the sagemaker:createPresignedDomainUrl action only when it’s requested from a specific network CIDR range using the VpcSourceIp condition.

The Lambda function doesn’t have any logic to validate the SAML response, for example to check a signature. However, because the API Gateway endpoint serving as the ACS is private or internal only, it’s not mandatory for this proof of concept environment.

Deploy the solution

The GitHub repository provides the full source code for the end-to-end solution.

To deploy the solution, you must have administrator (or power user) permissions for an AWS account, and install the AWS Command Line Interface (AWS CLI) and AWS SAM CLI and minimum Python 3.8.

The solution supports deployment to three AWS Regions: eu-west-1, eu-central-1, and us-east-1. Make sure you select one of these Regions for deployment.

To start testing the solution, you must complete the following deployment steps from the solution’s GitHub README file:

  1. Set up AWS SSO if you don’t have it configured.
  2. Deploy the solution using the SAM application.
  3. Create a new custom SAML 2.0 application.

After you complete the deployment steps, you can proceed with the solution test.

Test the solution

The solution simulates two use cases to demonstrate the usage of AWS SSO and SageMaker identity-based policies:

  • Positive use case – A user accesses Studio from within a designated CIDR range through a VPC endpoint
  • Negative use case – A user accesses Studio from a public IP address

To test these use cases, the solution created three Amazon Elastic Compute Cloud (Amazon EC2) instances:

  • Private host – An EC2 Windows instance in a private subnet that is able to access Studio (your on-premises secured environment)
  • Bastion host – An EC2 Linux instance in the public subnet used to establish an SSH tunnel into the private host on the private network
  • Public host – An EC2 Windows instance in a public subnet to demonstrate that the user can’t access Studio from an unauthorized IP address

Test Studio access from an authorized network

Follow these steps to perform the test:

  1. To access the EC2 Windows instance on the private network, run the command provided as the value of the SAM output key TunnelCommand. Make sure that the private key of the key pair specified in the parameter is in the directory where the SSH tunnel command runs from. The command creates an SSH tunnel from the local computer on localhost:3389 to the EC2 Windows instance on the private network. See the following example code:
    ssh -i sso-username.pem -A -N -L localhost:3389: ec2-user@

  2. On your local desktop or notebook, open a new RDP connection (for example using Microsoft Remote Desktop) using localhost as the target remote host. This connection is tunneled via the bastion host to the private EC2 Windows instance. Use the user name Administrator and password from the stack output SageMakerWindowsPassword.
  3. Open the Firefox web browser from the remote desktop.
  4. Navigate and log in to the AWS SSO portal using the credentials associated with the user name that you specified as the ssoUserName parameter.
  5. Choose the SageMaker Secure Demo AWS SSO application from the AWS SSO portal.

You’re redirected to the Studio IDE in a new browser window.

Test Studio access from an unauthorized network

Now follow these steps to simulate access from an unauthorized network:

  1. Open a new RDP connection on the IP provided in the SageMakerWindowsPublicHost SAML output.
  2. Open the Firefox web browser from the remote desktop.
  3. Navigate and log in to the AWS SSO portal using the credentials associated with the user name that was specified as the ssoUserName parameter.
  4. Choose the SageMaker Secure Demo AWS SSO application from the AWS SSO portal.

This time you receive an unauthorized access message.

Clean up

To avoid charges, you must remove all solution-provisioned and manually created resources from your AWS account. Follow the instructions in the solution’s README file.


We demonstrated that by introducing a middleware authentication layer between the end user and Studio, we can control the environment that user is allowed to access Studio from and explicitly block every other unauthorized environment.

To further tighten security, you can add an IAM policy to a user role to prevent access to Studio from the console. If you use AWS Organizations, you can implement the following service control policy for the organizational units or accounts that need access to Studio:

  "Version": "2012-10-17",
  "Statement": [
      "Action": [
      "Resource": "*",
      "Effect": "Allow"
      "Condition": {
        "NotIpAddress": {
          "aws:VpcSourceIp": "<Authorized CIDR>"
      "Action": [
      "Resource": "*",
      "Effect": "Deny"

Although the solution described in this post uses API Gateway and Lambda, you can explore other ways such as an EC2 instance with an instance role using the same permission validation workflow as described or even an independent system to handle user authentication and authorization and generate a Studio presigned URL.

Further reading

Securing access to Studio is an active research topic, and there are other relevant posts on similar approaches. Refer to the following posts on the AWS Machine Learning Blog to learn more about other services and architectures you can use:

About the Authors

Jerome Bachelet is a Solutions Architect at Amazon Web Services. He thrives on helping customers get the most value out of AWS to achieve their business objectives. Jerome has over 10 years of experience working with data protection and data security solutions. Besides being in the cloud, Jerome enjoys travels and quality time with his wife and 2 daughters in the Geneva, Switzerland area.

Yevgeniy Ilyin is a Solutions Architect at AWS. He has over 20 years of experience working at all levels of software development and solutions architecture and has used programming languages from COBOL and Assembler to .NET, Java, and Python. He develops and codes cloud native solutions with a focus on big data, analytics, and data engineering.

Read More

Industrial automation at Tyson with computer vision, AWS Panorama, and Amazon SageMaker

Industrial automation at Tyson with computer vision, AWS Panorama, and Amazon SageMaker

This is the first in a two-part blog series on how Tyson Foods, Inc., is utilizing Amazon SageMaker and AWS Panorama to automate industrial processes at their meat packing plants by bringing the benefits of artificial intelligence applications at the edge. In part one, we discuss an inventory counting application for packaging lines. In part two, we discuss a vision-based anomaly detection solution at the edge for predictive maintenance of industrial equipment.

As one of the largest processors and marketers of chicken, beef, and pork in the world, Tyson Foods, Inc., is known for bringing innovative solutions to their production and packing plants. In Feb 2020, Tyson announced its plan to bring Computer Vision (CV) to its chicken plants and launched a pilot with AWS to pioneer efforts on inventory management. Tyson collaborated with Amazon ML Solutions Lab to create a state-of-the-art chicken tray counting CV solution that provides real-time insights into packed inventory levels. In this post, we provide an overview of the AWS architecture and a complete walkthrough of the solution to demonstrate the key components in the tray counting pipeline set up at Tyson’s plant. We will focus on the data collection and labeling, training, and deploying of CV models at the edge using Amazon SageMaker, Apache MXNet Gluon, and AWS Panorama.

Operational excellence is a key priority at Tyson Foods. Tyson employs strict quality assurance (QA) measures in their packaging lines, ensuring that only those packaged products that pass their quality control protocols are shipped to its customers. In order to meet customer demand and to stay ahead of any production issue, Tyson closely monitors packed chicken tray counts. However, current manual techniques to count chicken trays that pass QA are not accurate and do not present a clear picture of over/under production levels. Alternate strategies such as monitoring hourly total weight of production per rack does not provide immediate feedback to the plant employees. With a chicken processing capacity of 45,000,000 head per week, production accuracy and efficiency are critical to Tyson’s business. CV can be effectively used in such scenarios to accurately estimate the amount of chicken processed in real-time, empowering employees to identify potential bottlenecks in packaging and production lines as they occur. This enables implementation of corrective measures and improves production efficiency.

Streaming and processing on-premise video streams at the cloud for CV applications requires high network bandwidth and provisioning of relevant infrastructure. This can be a cost prohibitive task. AWS Panorama removes these requirements and enables Tyson to process video streams at the edge on the AWS Panorama Appliance. It reduces latency to/from the cloud and bandwidth costs, while providing an easy-to-use interface for managing devices and applications at the edge.

Object detection is one of the most commonly used CV algorithms that can localize the position of objects in images and videos. This technology is currently being used in various real-life applications such as pedestrian spotting in autonomous vehicles, detecting tumors in medical scans, people counting systems to monitor footfall in retail spaces, amongst others. It is also crucial for inventory management use cases, such as meat tray counting for Tyson, to reduce waste by creating a feedback loop with production processes, cost savings, and delivery of customer shipments on time.

The following sections of this blog post outline how we use live-stream videos from one of the Tyson Foods plants to train an object detection model using Amazon SageMaker. We then deploy it at the edge with the AWS Panorama device.

AWS Panorama

AWS Panorama is a machine learning(ML) appliance that allows organizations to bring CV to on-premise cameras to make predictions locally with high accuracy and low latency. The AWS Panorama Appliance is a hardware device that allows you to run applications that use ML to collect data from video streams, output video with text and graphical overlays, and interact with other AWS services. The appliance can run multiple CV models against multiple video streams in parallel and output the results in real time. It is designed for use in commercial and industrial settings.

The AWS Panorama Appliance enables you to run self-contained CV applications at the edge, without sending images to the AWS Cloud. You can also use the AWS SDK on the AWS Panorama Appliance to integrate with other AWS services and use them to track data from the application over time. To build and deploy applications, you use the AWS Panorama Application CLI. The CLI is a command line tool that generates default application folders and configuration files, builds containers with Docker, and uploads assets.

AWS Panorama supports models built with Apache MXNet, DarkNet, GluonCV, Keras, ONNX, PyTorch, TensorFlow, and TensorFlow Lite. Refer to this blog post to learn more about building applications on AWS Panorama. During the deployment process AWS Panorama takes care of compiling the model specific to the edge platform through Amazon SageMaker Neo compilation. The inference results can be routed to AWS services such as Amazon S3, Amazon CloudWatch or integrated with on-premise line-of-business applications. The deployment logs are stored in Amazon CloudWatch.

To track any change in inference script logic or trained model, one can create a new version of the application. Application versions are immutable snapshots of an application’s configuration. AWS Panorama saves previous versions of your applications so that you can roll back updates that aren’t successful, or run different versions on different appliances.

For more information, refer to the AWS Panorama page. To learn more about building sample applications, refer to AWS Panorama Samples.


A plant employee continuously fills-in packed chicken trays into plastic bins and stacks them over time, as show in the preceding figure. We want to be able to detect and count the total number of trays across all the bins stacked vertically.

A trained object detection model can predict bounding boxes of all the trays placed in a bin at every video frame. This can be used to gauge tray counts in a bin at a given instance. We also know that at any point in time, only one bin is being filled with packed trays; the tray counts continuously oscillate from high (during filling) to low (when a new bin obstructs the view of filled bin).

With this knowledge, we adopt the following strategy to count total number of chicken trays:

  1. Maintain two different counters – local and global. Global counter maintains total trays binned and local counter stores maximum number of trays placed in a new bin.
  2. Update local counter as new trays are placed in the bin.
  3. Detect a new bin event in the following ways:
    1. The tray count in a given frame goes to zero. (or)
    2. The stream of tray numbers in the last n frames drops continuously.
  4. Once the new bin event is detected, add the local counter value to global counter.
  5. Reset local counter to zero.

We tested this algorithm on several hours of video and got consistent results.

Training an object detection model with Amazon SageMaker

Dataset creation:

Capturing new images for labelling jobs

Capturing new images for labeling jobs

We collected image samples from the packaging line using the AWS Panorama Appliance. The script to process images and save them was packaged as an application and deployed on AWS Panorama. The application collects video frames from an on-premise camera set up near the packaging zone and saves them at 60 seconds intervals to an Amazon S3 bucket; this prevents capturing similar images in the video sequence that are a few seconds apart. We also mask out adjacent regions in the image that are not relevant for the use-case.

We labeled the chicken trays with bounding boxes using Amazon SageMaker Ground Truth’s streaming labeling job. We also set up an Amazon S3 Event notification that publishes object-created events to an Amazon Simple Notification Service (SNS) topic, which acts as the input source for the labeling job. When the AWS Panorama application script saves an image to an S3 bucket, an event notification is published to the SNS topic, which then sends this image to the labeling job. As the annotators label every incoming image, Ground Truth saves the labels into a manifest file, which contains S3 path of the image as well as coordinates of chicken tray bounding boxes.

We perform several data augmentations (for example: random noise, random contrast and brightness, channel shuffle) on the labeled images to make the model robust to variations in real-life. The original and augmented images were combined to form a unified dataset.

Model Training:

Once the labeling job is completed, we manually trigger an AWS Lambda function. This Lambda function bundles images and their corresponding labels from the output manifest into an LST file. Our training and test files had images collected from different packaging lines to prevent any data leak in evaluation. The Lambda function then triggers an Amazon SageMaker training job.

We use SageMaker Script Mode, which allows you to bring your own training algorithms and directly train models while staying within the user-friendly confines of Amazon SageMaker. We train models like SSD, Yolo-v3 (for real-time inference latency) with various backbone network combinations from GluonCV Model Zoo for object detection in script-mode. Neural networks have the tendency to overfit training data, leading to poor out-of-sample results. GluonCV provides image normalization and image augmentations, such as randomized image flipping and cropping, to help reduce overfitting during training. The model training code is containerized and uses the Docker image in our AWS Elastic Container Registry. The training job takes the S3 image folder and LST file paths as inputs and saves the best model artifact (.params and .json) to S3 upon completion.

Model Evaluation Pipeline

Model Evaluation Pipeline

The top-2 models based on our test set were SSD-resnet50 and Yolov3-darketnet53, with a mAP score of 0.91 each. We also performed real-world testing by deploying an inference application on AWS Panorama device along with the trained model. The inference script saves the predictions and video frames to an Amazon S3 bucket. We created another SageMaker Ground Truth job for annotating ground truth and then performed additional quantitative model evaluation. The ground truth and predicted bounding box labels on images were saved in S3 for qualitative evaluation. The models were able to generalize on the real-world data and yielded consistent performance similar to that on our test-set.

You can find full, end-to-end examples of creating custom training jobs, training state-of-the-art object detection models, implementing Hyperparameter Optimization (HPO), and model deployment on Amazon SageMaker on the AWS Labs GitHub repo.

Deploying meat-tray counting application

Production Architecture

Production Architecture

Before deployment, we package all our assets – model, inference script, camera and global variable configuration into a single container as mentioned in this blog post. Our continuous integration and continuous deployment (CI/CD) pipeline updates any change in the inference script as a new application version. Once the new application version is published, we deploy it programmatically using boto3 SDK in Python.

Upon application deployment, AWS Panorama first creates an AWS SageMaker Neo Compilation job to compile the model for the AWS Panorama device. The inference application script imports the compiled-model on the device and performs chicken-tray detection at every frame. In addition to SageMaker Neo-Compilation, we enabled post-training quantization by adding a os.environ['TVM_TENSORRT_USE_FP16'] = '1' flag in the script. This reduces the size of model weights from float 32 to float 16, decreasing model size by half and improving latency without degradation in performance. The inference results are captured in AWS SiteWise Monitor through MQTT messages from the AWS Panorama device via AWS IoT core. The results are then pushed to Amazon S3 and visualized in Amazon QuickSight Dashboards. The plant managers and employees can directly view these dashboards to understand throughput of every packaging line in real-time.


By combining AWS Cloud service like Amazon SageMaker, Amazon S3 and edge service like AWS Panorama, Tyson Foods Inc., is infusing artificial intelligence to automate human-intensive industrial processes like inventory counting in its manufacturing plants. Real-time edge inference capabilities enable Tyson to identify over/under production and dynamically adjust their production flow to maximize efficiency. Furthermore, by owning the AWS Panorama device at the edge, Tyson is also able to save costs associated with expensive network bandwidth to transfer video files to the cloud and can now process all their video/image assets locally in their network.

This blog post provides you with an end-end edge application overview and reference architectures for developing a CV application with AWS Panorama. We discussed 3 different aspects of building an edge CV application.

  1. Data: Data collection, processing and labeling using AWS Panorama and Amazon SageMaker Ground Truth.
  2. Model: Model training and evaluation using Amazon SageMaker and AWS Lambda
  3. Application Package: Bundling trained model, scripts and configuration files for AWS Panorama.

Stay tuned for part two of this series on how Tyson is using AWS Panorama for CV based predictive maintenance of industrial machines.

Click here to start your journey with AWS Panorama. To learn more about collaborating with ML Solutions Lab, see Amazon Machine Learning Solutions Lab.

About the Authors

Divya Bhargavi is a data scientist at the Amazon ML Solutions Lab where she works with customers across various verticals and applies creative problem solving to generate value for customers with state-of-the-art ML/AI solutions.

Dilip Subramaniam is a Senior Developer with the Emerging Technologies team at Tyson Foods. He is passionate about building large-scale distributed applications to solve business problems and simplify processes using his knowledge in Software Development, Machine Learning, and Big Data.

Read More

Develop an automatic review image inspection service with Amazon SageMaker

Develop an automatic review image inspection service with Amazon SageMaker

This is a guest post by Jihye Park, a Data Scientist at MUSINSA. 

MUSINSA is one of the largest online fashion platforms in South Korea, serving 8.4M customers and selling 6,000 fashion brands. Our monthly user traffic reaches 4M, and over 90% of our demographics consist of teens and young adults who are sensitive to fashion trends. MUSINSA is a trend-setting platform leader in the country, leading with massive amounts of data.

The MUSINSA Data Solution Team engages in everything related to data collected from the MUSINSA Store. We do full stack development from log collection to data modeling and model serving. We develop various data-based products, including the Live Product Recommendation Service on our app’s main page and the Keyword Highlighting Service that detects and highlights words such as ‘size’ or ‘satisfaction level’ from text reviews.

Challenges in the Automate Review Image Inspection Process

The quality and quantity of customer reviews are critical for ecommerce businesses, as customers make purchase decisions without seeing the products in person. We give credits to those who write image reviews on the products they purchased (that is, reviews with photos of the products or photos of them wearing/using the products) to enhance customer experience and increase the purchase conversion rate. To determine if the submitted photos met our criteria for credits, all of the photos are inspected individually by humans. For example, our criteria states that a “Style Review” should contain photos featuring the whole body of a person wearing/using the product while a “Product Review” should provide a full shot of the product. The following images show examples of a Product Review and a Style Review. Uploaders’ consent has been granted for use of the photos.

Examples of Product Review

Examples of Product Review. 

Examples of Style Review

Examples of Style Review. 

Over 20,000 photos are uploaded daily to the MUSINSA Store platform that require inspection. The inspection process classifies images as ‘package’, ‘product’, ‘full-length’, or ‘half-length’. The image inspection process is completely manual, so it was extremely time consuming and classifications are often done differently by different individuals, even with the guidelines. Faced with this challenge, we used Amazon SageMaker to automate this task.

Amazon SageMaker is a fully managed service for building, training, and deploying machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows. It let us quickly implement the automated image inspection service with good results.

We will go into detail about how we addressed our problems using ML models and used Amazon SageMaker along the way.

Automation of the Review Image Inspection Process

The first step toward automating the Image Review Inspection process was to manually label images, thereby matching them to the appropriate categories and inspection criteria. For example, we classified images as a “full body shot,” “upper body shot,” “packaging shot,” “product shot,” etc. In the case of a Product Review, credits were given only for a product shot image. Likewise, in the case of a Style Review, credits were given for a full body shot.

As for image classification, we largely depended on a pre-trained convolutional neural network (CNN) model due to the sheer volume of input images required to train our model. While defining and categorizing meaningful features from images are both critical to training a model, an image can have a limitless number of features. Therefore, using the CNN model made the most sense, and we pre-trained our model with 10,000+ ImageNet datasets, then we used transfer learning. This meant that our model could be trained more effectively with our image labels later.

Image Collection with Amazon SageMaker Ground Truth

However, transfer learning had its own limitations, because a model must be newly trained on higher layers. This means that it constantly required input images. On the other hand, this method performed well and required fewer input images when trained on entire layers. It easily identified features from images from these layers because it had already been trained with a massive amount of data. At MUSINSA, our entire infrastructure runs on AWS, and we are storing customer-uploaded photos in Amazon Simple Storage Service (S3). We categorized these images into different folders based on the labels we defined, and we used Amazon SageMaker Ground Truth for the following reasons:

  1. More consistent results – In manual processes, a single inspector’s mistake could be fed into model training without any intervention. With SageMaker Ground Truth, we could have several inspectors review the same image and make sure that the inputs from the most trustworthy inspector were rated higher for image labeling, thus leading to more reliable results.
  2. Less manual work – SageMaker Ground Truth automated data labeling can be applied with a confidence score threshold so that any images that cannot be confidently machine-labelled are sent for human labeling. This ensures the best balance of cost and accuracy. More information is available in the Amazon SageMaker Ground Truth Developer Guide.
    Using this method, we reduced the number of manually-classified images by 43%. The following table shows the number of images processed per iteration after we adopted Ground Truth (note that the training and validation data are accumulated data, while the other metrics are on a per-iteration basis).SageMaker Ground Truth Performance results
  3. Directly load results – When building models in SageMaker, we could load the resulting manifest files generated by SageMaker Ground Truth and use them for training.

In summary, categorizing 10,000 images required 22 inspectors five days and cost $980.

Development of Image Classification Model with Amazon SageMaker Studio

We needed to classify review images as full body shots, upper body shots, package shots, product shots, and products into applicable categories. To accomplish our goals, we considered two models: the ResNet-based SageMaker built-in model and the Tensorflow-based MobileNet. We tested both on the same test datasets and found that the SageMaker built-in model was more accurate, with a 0.98 F1 score vs 0.88 from the TensorFlow model. Therefore, we decided on the SageMaker built-in model.

The SageMaker Studio-based model training process was as follows:

  1. Import labeled images from SageMaker Ground Truth
  2. Preprocess images – image resizing and augmenting
  3. Load the Amazon SageMaker built-in model as a Docker image
  4. Tune hyperparameters through grid search
  5. Apply transfer learning
  6. Re-tune parameters based on training metrics
  7. Save the model

SageMaker made it straightforward to train the model with just one click and without worrying about provisioning and managing a fleet of servers for training.

For hyperparameter turning, we employed grid search to determine the optimal values of hyperparameters, as the number of training layers (num_layers) and training cycles (epochs) during transfer learning had affected our classification model accuracy.

epochs_list = [5, 10, 15]
num_layers_list = [18, 34, 50]
from import TrainingJobAnalytics
metric_df = pd.DataFrame()
for i in range(len(epochs_list)):
    for j in range(len(num_layers_list)):
        # hyperparameter settings
                                 image_shape = "3,256,256",
        , logs=True)
        latest_job_name = ic.latest_training_job.job_name
        job_metric = TrainingJobAnalytics(training_job_name=latest_job_name).dataframe()
        job_metric['epochs'] = epochs_list[i]
        job_metric['num_layers'] = num_layers_list[j]
        metric_df = pd.concat([metric_df, job_metric])

Model Serving with SageMaker Batch Transform and Apache Airflow

The image classification model we built required ML workflows to determine if a review image was qualified for credits. We established workflows with the following four steps.

  1. Import review images and metadata that must be automatically reviewed
  2. Infer the labels of the images (inference)
  3. Determine if credits should be given based on the inferred labels
  4. Store the results table in the production database

We are using Apache Airflow to manage data product workflows. It is a workflow scheduling and monitoring platform developed by Airbnb known for simple and intuitive web UI graphs. It supports Amazon SageMaker, so it easily migrates the code developed with SageMaker Studio to Apache Airflow. There are two ways to run SageMaker jobs on Apache Airflow:

  1. Using Amazon SageMaker Operators
  2. Using Python Operators : Write a Python function with Amazon SageMaker Python SDK on Apache Airflow and import it as a callable parameter
def transform(dt, bucket, training_job, **kwargs):
    estimator = sagemaker.estimator.Estimator.attach(training_job)
    transformer = estimator.transformer(instance_count=1,


transform_op = PythonOperator(
        op_kwargs={"dt": dt,
                   "bucket": bucket,
                   "training_job": training_job})

The second option let us maintain our existing Python codes that we already had on SageMaker Studio, and it didn’t require us to learn new grammars for Amazon SageMaker Operators.

However, we went through some trial and error, as it was our first time integrating Apache Airflow with Amazon SageMaker. The lessons we learned were:

  1. Boto3 update: Amazon SageMaker Python SDK version 2 required Boto3 1.14.12 or newer. Therefore, we needed to update the Boto3 version of our existing Apache Airflow environment, which was at 1.13.4.
  2. IAM Role and permission inheritance: AWS IAM roles used by Apache Airflow needed to inherit roles that could run Amazon SageMaker.
  3. Network configuration: To run SageMaker codes with Apache Airflow, its endpoints needed to be configured for network connections. The following endpoints were based on the AWS Regions and services that we were using. For more information, see the AWS website.



By automating review image inspection processes, we gained the following business outcomes:

  1. Increased work efficiency – Currently, 76% of images of the categories where the service were applied are inspected automatically with a 98% inspection accuracy.
  2. Consistency in giving credits – Credits are given based on clear criteria. However, there were occasions where credits were given differently for similar cases due to differences in inspectors’ judgments. The ML model applies rules more consistently with and higher consistency in applying our credit policies.
  3. Reduced human errors – Every human engagement carries a risk of human errors. For example, we had cases where Style Review criteria were used for Product Reviews. Our automatic inspection model dramatically reduced the risks of these human errors.

We gained the following benefits specifically by using Amazon SageMaker to automate the image inspection process:

  1. Established an environment where we can build and test models through modular processes – What we liked most about Amazon SageMaker is that it consists of modules. This lets us build and test services easily and quickly. We obviously needed some time to learn about Amazon SageMaker at first, but once learned, we could easily apply it in our operations. We believe that Amazon SageMaker is ideal for businesses requiring rapid service developments, as in the case of the MUSINSA Store.
  2. Collect reliable input data with Amazon SageMaker Ground Truth – Collecting input data is becoming increasingly more important than modeling itself in the area of ML. With the rapid advancement of ML, pre-trained models can perform much better than before, and without additional tuning. AutoML has also removed the need to write codes for ML modeling. Therefore, the ability to collect quality input data is more important than ever, and using labeling services such as Amazon SageMaker Ground Truth is critical.


Going forward, we plan to automate not only model serving but also model training through automatic batches. We want our model to identify the optimal hyperparameters automatically when new labels or images are added. In addition, we will continue improving the performance of our model, namely recalls and precision, based on the previously mentioned automated training method. We will increase our model coverage so that it can inspect more review images, reduce more costs, and achieve higher accuracies, which will all lead to higher customer satisfaction.

For more information about how to use Amazon SageMaker to solve your business problems using ML, visit the product webpage. And, as always, stay up to date with the latest AWS Machine Learning News here.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

About the Authors

Jihye Park is a Data Scientist at MUSINSA who is responsible for data analysis and modeling. She loves working with ubiquitous data such as ecommerce. Her main role is data modeling but she has interests in data engineering too.

Sungmin Kim is a Sr. Solutions Architect at Amazon Web Services. He works with startups to architect, design, automated, and build solutions on AWS for their business needs. He specializes in AI/ML and Analytics.

Read More

How ReliaQuest uses Amazon SageMaker to accelerate its AI innovation by 35x 

How ReliaQuest uses Amazon SageMaker to accelerate its AI innovation by 35x 

Cybersecurity continues to be a top concern for enterprises. Yet the constantly evolving threat landscape that they face makes it harder than ever to be confident in their cybersecurity protections.

To address this, ReliaQuest built GreyMatter, an Open XDR-as-a-Service platform that brings together telemetry from any security and business solution, whether on-premises or in one or multiple clouds, to unify detection, investigation, response, and resilience.

In 2021, ReliaQuest turned to AWS to help it enhance its artificial intelligence (AI) capabilities and build new features faster.

Using Amazon SageMaker, Amazon Elastic Container Registry (ECR), and AWS Step Functions, ReliaQuest reduced the time needed to deploy and test critical new AI capabilities for its GreyMatter platform from eighteen months to two weeks. This increased the speed of its AI innovation by 35x.

“This innovative architecture has dramatically decreased the time to value of ReliaQuest’s data science initiatives.

Now, we can truly focus on what’s most important – developing powerful solutions to further improve the security of our customer’s environments in an ever-changing threat landscape.”

Lauren Jenkins, Snr Product Manager, Data Science, ReliaQuest

Using AI to enhance the performance of human analysts

GreyMatter takes a fundamentally new approach to cybersecurity, pairing advanced software with a team of highly-trained security analysts to deliver drastically improved security effectiveness and efficiency.

Although ReliaQuest’s security analysts are some of the best-trained security talent in the industry, a single analyst may receive hundreds of new security incidents on any given day. These analysts must review each incident to determine the threat level and the optimal response method.

To streamline this process, and reduce time to resolution, ReliaQuest set out to develop an AI-driven recommendation system that automatically matches new security incidents to similar previous occurrences. This enhanced the speed with which human analysts can identify the incident type as well as the best next action.

Using Amazon SageMaker to put AI to work faster

ReliaQuest had developed an initial machine learning (ML) model, but it was missing the supporting infrastructure to utilize it.

To solve this, ReliaQuest’s Data Scientist, Mattie Langford, and ML Ops Engineer, Riley Rohloff, turned to Amazon SageMaker. SageMaker is an end-to-end ML platform that helps developers and data scientists quickly and easily build, train, and deploy ML models.

Amazon SageMaker accelerates the deployment of ML workloads by simplifying the ML build process. It provides a broad set of ML capabilities on top of fully-managed infrastructure. This removes the undifferentiated heavy lifting that too-often hinders ML development.

ReliaQuest chose SageMaker because of its built-in hosting feature, a key capability that enabled ReliaQuest to quickly deploy its initial pre-trained model onto fully-managed infrastructure.

ReliaQuest also used Amazon ECR to store its pre-trained model images, using Amazon ECRs fully-managed container registry that makes it easy to store, manage, share, and deploy container images and artifacts, such as pre-trained ML models, anywhere.

ReliaQuest chose Amazon ECR because of its native integration with Amazon SageMaker. This enabled it to serve custom model images for both training and predictions, the latter via a custom Flask application it had built.

Using Amazon SageMaker and Amazon ECR, a single ReliaQuest team developed, tested, and deployed its pre-trained model behind a managed endpoint quickly and efficiently, without needing to hand-off to or depend on other teams for support.

Using AWS Step Functions to automatically retrain and improve model performance

In addition, ReliaQuest was able to build an entire orchestration layer for their ML workflow using AWS Step Functions, a low-code visual workflow service that can orchestrate AWS services, automate business processes, and enable serverless applications.

ReliaQuest chose AWS Step Functions because of its deep functionality and integration with other AWS services. This enabled ReliaQuest to build a fully automated learning loop for its model, including:

  • a trigger that looked for updated data in an S3 bucket
  • a full retraining process that created a new training job with the updated data
  • a performance assessment of that training job
  • pre-defined accuracy thresholds to determine whether to update the deployed model through a new endpoint configuration.

Using AWS to increase innovation and reimagine cybersecurity protection

By combining Amazon SageMaker, Amazon ECR, and AWS Step Functions, ReliaQuest was able to improve the speed with which it deployed and tested valuable new AI capabilities from eighteen months to two weeks, an acceleration of 35x in its new feature deployment.

Not only do these new capabilities continue to enhance GreyMatter’s continuous threat detection, threat hunting, and remediation capabilities for its customers, but also they deliver ReliaQuest a step-change improvement in its ability to test and deploy new capabilities into the future.

In the complex landscape of cybersecurity threats, ReliaQuest’s use of AI to enhance its human analysts will continue to improve their effectiveness. Furthermore, its accelerated innovation capabilities will enable it to continue helping its customers stay ahead of the rapidly evolving threats that they face.

Learn more about how you can accelerate your ability to innovate with AI by visiting Getting Started with Amazon SageMaker or reviewing the Amazon SageMaker Developer Resources today.

About the Author

Daniel Burke is the European lead for AI and ML in the Private Equity group at AWS. In this role, Daniel works directly with Private Equity funds and their portfolio companies to design and implement AI and ML solutions that accelerate innovation and generate additional enterprise value.

Read More

Blur faces in videos automatically with Amazon Rekognition Video

Blur faces in videos automatically with Amazon Rekognition Video

With the advent of artificial intelligence (AI) and machine learning (ML), customers and the general public have become increasingly aware of their privacy, as well as the value that it holds in today’s data-driven world. Enterprises are actively seeking out and marketing privacy-first solutions, especially in the Computer Vision (CV) domain. They need to reassure their customers that personal information such as faces are anonymized and generally kept safe.

Face blurring is one of the best-known practices when anonymizing both images and videos. It usually involves first detecting the face in an image/video, then applying a blob of pixels or other distortion effects on it. This workload can be considered a CV task. First, we analyze the pixels of the image/video until a face is recognized, then we extract the area where the face is in every frame, and finally we apply a mask on the previously found pixels. The first part of this can be achieved with ML and Deep Learning tools, such as Amazon Rekognition, while the second part is standard pixel manipulation.

In this post, we demonstrate how AWS Step Functions can be used to orchestrate AWS Lambda functions that call Amazon Rekognition Video to detect faces in videos, and use an open source CV and ML software library called OpenCV to blur them.

Solution overview

In our solution, AWS Step Functions, a low-code visual workflow service used to orchestrate AWS services, automate business processes, and build serverless applications, is used to orchestrate the calls and manage the flow of data between AWS Lambda functions. When an object is created in an Amazon Simple Storage Service (S3) bucket, for example by a video file upload, an ObjectCreated event is detected and a first Lambda function is triggered. This Lambda function makes an asynchronous call to the Amazon Rekognition Video face detection API and starts the execution of the AWS Step Functions workflow.

Inside the workflow, we use a Lambda function and a Wait State until the Amazon Rekognition Video asynchronous analysis started earlier finishes execution. Afterward, another Lambda function retrieves the result of the completed process from Amazon Rekognition and passes it to another Lambda function that uses OpenCV to blur the detected faces. To easily use OpenCV with our Lambda function, we built a Docker image hosted on Amazon Elastic Container Registry (ECR), and then deployed on AWS Lambda thanks to Container Image Support.

The architecture is entirely serverless, so we don’t need to provision, scale, or maintain our infrastructure. We also use Amazon Rekognition, a highly scalable and managed AWS AI service that requires no deep learning expertise.

Moreover, we have built our application with the AWS Cloud Development Kit (AWS CDK), an open-source software development framework. This lets us write Infrastructure as Code (IaC) using Python, thereby making the application easy to deploy, modify, and maintain.

Let’s look closer at the suggested architecture:

  1. The event flow starts at the moment of the video ingestion into Amazon S3. Amazon Rekognition Video supports MPEG-4 and MOV file formats, encoded using the H.264 codec.
  2. After the video file has been stored into Amazon S3, it automatically kicks-off an event triggering a Lambda function.
  3. The Lambda function uses the video’s attributes (name and location on Amazon S3) to start the face detection job on Amazon Rekognition through an API call.
  4. The same Lambda function then starts the Step Functions state machine, forwarding the video’s attributes and the Amazon Rekognition job ID.
  5. The Step Functions workflow starts with a Lambda function waiting for the Amazon Rekognition job to be finished. Once it’s done, another Lambda function gets the results from Amazon Rekognition.
  6. Finally, a Lambda function with Container Image Support fetches its Docker image, which supports OpenCV from Amazon ECR, blurs the faces detected by Amazon Rekognition, and temporarily stores the output video locally.
  7. Then, the blurred video is put into the output S3 bucket and removed from local files.

Providing a serverless function access to OpenCV is easier than ever with Container Image Support. Instead of uploading a code package to AWS Lambda, the function’s code resides in a Docker image that is hosted in Amazon Elastic Container Registry.

# Install the function's dependencies
# Copy file requirements.txt from your project folder and install
# the requirements in the app directory.
COPY requirements.txt  .
RUN  pip install -r requirements.txt
# Copy helper functions
# Copy handler function (from the local app directory)
# Overwrite the command by providing a different command directly in the template.
CMD ["app.lambda_function"]

If you want to build your own application using Amazon Rekognition face detection for videos and OpenCV to process videos with Python, consider the following:

  • Amazon Rekognition API responses for videos contain faces-detected timestamps in milliseconds
  • OpenCV works on frames and uses the video’s frame rate to combine frames into a video

Therefore, you must convert Amazon Rekognition information to make it usable with OpenCV. You may find our implementation in the apply_faces_to_video function, in /rekopoc-apply-faces-to-video-docker/

Deploy the application

If you want to deploy the sample application to your own account, go to this GitHub repository. Clone it to your local environment (you can also use tools such as AWS Cloud9) and deploy it via cdk deploy. Find more details in the later section “Deploy the AWS CDK application”. First, let’s look at the repository project structure.

Project structure

This project contains source code and supporting files for a serverless application that you can deploy with the AWS CDK. It includes the following files and folders.

  • rekognition_video_face_blurring_cdk/ – CDK Python code for deploying the application.
  • rekopoc-apply-faces-to-video-docker/ – Code for Lambda function: uses OpenCV to blur faces per frame in video, uploads final result to output S3 bucket.
  • rekopoc-check-status/ – Code for Lambda function: Gets face detection results for the Amazon Rekognition Video analysis.
  • rekopoc-get-timestamps-faces/ – Code for Lambda function: Gets bounding boxes of detected faces and associated timestamps.
  • rekopoc-start-face-detect/ – Code for Lambda function: is triggered by an S3 event when a new .mp4 or .mov video file is uploaded, starts asynchronous detection of faces in a stored video, and starts the execution of AWS Step Functions’ State Machine.
  • requirements.txt – Required packages for deploying the AWS CDK application.

The application uses several AWS resources, including AWS Step Functions, Lambda functions, and S3 buckets. These resources are defined in the rekognition_video_face_blurring_cdk/ of this project. Update the Python code to add AWS resources through the same deployment process that updates your application code. Depending on the size of the video that you want to anonymize, you might need to update the configuration of the Lambda functions and adjust memory and timeout. You can provision a maximum of 10,240 MB (10 GB) of memory, and configure your AWS Lambda functions to run up to 15 minutes per execution.

Deploy the AWS CDK application

The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to define your cloud application resources using familiar programming languages. This project uses the AWS CDK in Python.

To build and deploy your application for the first time, you must:

Step 1: Ensure you have Docker running.
You will need Docker running to build the image before pushing it to Amazon ECR.

Step 2: Configure your AWS credentials.
The easiest way to satisfy this requirement is to issue the following command in your shell:

aws configure

For additional guidance on how to set up your AWS CLI installation, follow the Quick configuration with aws configure from the AWS CLI user guide.

Step 3: Install the AWS CDK and the requirements.
Simply run the following in your shell:

npm install -g aws-cdk
pip install -r requirements.txt
  • The first command will install the AWS CDK Toolkit globally using Node Package Manager.
  • The second command will install all of the Python packages needed by the AWS CDK using pip package manager. This command should be issued from the root folder of the cloned GitHub repository.

Step 4: Bootstrap your AWS environment for the CDK and deploy the application.

cdk bootstrap
cdk deploy
  • The first command will provision initial resources that the AWS CDK needs to perform the deployment. These resources include an Amazon S3 bucket for storing files and IAM roles that grant permissions needed to perform deployments.
  • Finally, cdk deploy will deploy the stack.

Step 5: Test the application.
Upload a video to the input S3 bucket through the AWS Management Console, the AWS CLI, or the SDK, and find the result in the output bucket.


To delete the sample application that you created, use the AWS CDK:

cdk destroy


In this post, we showed you how to deploy a solution to automatically blur videos without provisioning any resources to your AWS account. We used Amazon Rekognition Video face detection feature, Container Image Support for AWS Lambda functions to easily work with OpenCV, and we orchestrated the whole workflow with AWS Step Functions. Finally, we made our solution comprehensive and reusable with the AWS CDK to make it easier to deploy and adapt.

Next Steps

If you have feedback about this post, submit it in the Comments section below. For more information, visit the following links about the tools and services that we used and follow the code in GitHub. We look forward to your feedback and contributions!

About the Authors

Anastasia Pachni Tsitiridou is a Solutions Architect at AWS. She is based in Amsterdam and supports ISVs across the Benelux in their cloud journey. She studied Electrical and Computer Engineering before being introduced to Computer Vision. What she enjoys most nowadays is working at the intersection of CV and ML.

Olivier Sutter is a Solutions Architect in France. He is based in Paris and always sets his customers’ best interests as his top priority. With a strong academic background in applied mathematics, he started developing his AI/ML passion at university, and now thrives applying this knowledge on real-world use-cases with his customers.

Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customer throughout Benelux. He has been a developer since very young, starting to code at the age of 7. He has started learning AI/ML since the latest years of university, and has fallen in love with it since then.

Read More

How Wix empowers customer care with AI capabilities using Amazon Transcribe

How Wix empowers customer care with AI capabilities using Amazon Transcribe

With over 200 million users worldwide, Wix is a leading cloud-based development platform for building fully personalized, high-quality websites. Wix makes it easy for anyone to create a beautiful and professional web presence. When Wix started, it was easy to understand user sentiment and identify product improvement opportunities because the user base was small. Such information could include the quality of support operations, product issues, and feature requests.

Thousands of Wix customer care experts support tens of thousands of calls a day in various languages from countries around the world. Wix previously used user surveys to measure user sentiment regarding the company brand, products, services, or interactions with customer care agents. At best, we managed to receive feedback on 12% of our calls. In addition, this process was manual and limited in coverage. We were losing sight of important information crucial to customer success. This is where machine learning (ML) can solve many of these challenges.

ML capabilities such as automatic speech recognition enables you to process 100% of your customer conversations and improve your ability to understand and serve your customers. With accurate call transcripts, you can unlock further insights such as sentiment, trending issues, and agent effectiveness at resolving calls. Sentiment analysis is the use of natural language processing (NLP), a subfield of ML that determines whether data is positive, negative, or neutral. This helps agents and supervisors better understand and anticipate customer needs while enabling them to make informed decisions using actionable insights.

Wix wanted to expand visibility of customer conversation sentiment to 100% with the help of ML. In this post, we explain how Wix used Amazon Transcribe, a speech to text service, accurately redacted personally identifiable information (PII) from phone calls and other customer interactions from other channels, to develop a sentiment analysis system that can effectively determine how users feel throughout an interaction with customer care agents.

How we integrated Amazon Transcribe

Building a sentiment analysis service requires three main components:

  • A data store for audio calls and transcribed data. For our solution, we used Amazon Simple Storage Service (Amazon S3).
  • An automatic speech recognition ML model (Amazon Transcribe) for converting audio into text transcriptions.
  • A sentiment analysis ML model for predicting sentiment.

For transcription (speech to text), we evaluated three leading vendors. The predominant parameters were accuracy, ease of use, and features for the call center use case (such as PII redaction). We found Amazon Transcribe to be the leading solution. The following are some of the differentiating capabilities:

  • Custom vocabulary, is a list of specific words that you want Amazon Transcribe to recognize in your audio input. These are generally domain-specific words, phrases, or proper nouns that Amazon Transcribe isn’t recognizing. Custom vocabularies worked well to capture Wix’s specific terminology and phrases, such as the company name. The following is an example of the vocabulary we used:
Phrase Sounds Like
Wix weeks
Wix picks weeks-dot-com wix-that-come
Wix Professionals Wix-affection-als

After you upload your custom vocabulary list, you can use it for a transcription job.

  • Channel identification in which Amazon Transcribe takes an audio file or stream that has multiple channels, transcribes each channel, and distinguishes between two different speakers (such as the agent and caller) automatically.
  • Automatic redaction of PII data from output and a blocklist of words and phrases.
  • Custom language models allow you to submit training data (a corpus of text data) to train custom language models that target domain-specific use cases and improve transcription accuracy. For example, you can provide Amazon Transcribe with industry-specific terms or acronyms that it might not otherwise recognize.

Custom language models are more powerful than custom vocabularies, because they can utilize a larger corpus of data, allow for tuning data, and understand individual terms as well as context. Because of the additional data and training involved, custom language models can produce significant accuracy improvements. To supercharge your accuracy, you can combine custom vocabularies with custom language models.

With these customization features, boosted the accuracy of Amazon Transcribe to specifically understand how users interact with Wix products and services. We first used a custom language model to produce transcriptions, then used custom vocabulary to replace words (as seen in the preceding examples). Then we trained with additional labeled data such as manually labeled transcriptions from real calls and knowledge base articles related to various vertical domains such as stores and payments.

Word error rate (WER) is the most common way to measure accuracy. WER counts how many words need to be changed to reach 100% accuracy. After we completed our model training with the customization features mentioned, we managed to increase the transcription accuracy (in US English) to 92% (8% WER).

92% is great, but we’re not done yet; we will continue to improve our transcription accuracy.

For sentiment analysis, we decided to develop a proprietary sentiment model that was tailored to identify sentiment regarding specific Wix features and data, and enabled custom integrations across various internal services.

Architecture overview

The following diagram illustrates our solution architecture and workflow.

We start the process by listening for events (via a webhook) of calls that are completed. For every incoming new call, we download the call in audio format (.mp3) and save it to Amazon S3 with call metadata such as user ID and job ID.

When the audio download is complete, we start an asynchronous Amazon Transcribe transcription job. We receive a response (JSON) that consists of a list where each transcribed word is defined as a row containing additional metadata. We can then aggregate sentences based on stopwords or gaps in given timestamps between words.

Amazon Transcribe can have a response time of 1–10 minutes for a call lasting 30–120 minutes. To tackle this issue, we built a service that manages the asynchronous jobs and maintains consistency and synchronization of predefined steps. For example, we define the order of what steps need to be completed in the job before others can.

After the transcription is complete and returned from Amazon Transcribe, we save it to Amazon S3 for future use, and pass it on to our sentiment analysis model for processing. The response for sentiment ranges on a scale of 0–1 (0 being positive and 1 being negative). Finally, we save and log the results for future use.

Conclusion and next steps

Going forward, we want not only to better predict the sentiment of calls and chats, but to also understand and predict the root cause. This approach of combining predictive analytics with proactive care requires innovation and is yet to be tackled at scale.

With sentiment analysis, we can detect and trigger proactive care based on negative sentiment. We can also utilize the findings to improve visibility for our product managers on how our users feel about certain products and features, including negative trends related to specific releases.

Sentiment analysis is just one example of the many use cases that we can achieve with Amazon Transcribe.

In the future, we plan to use Amazon Transcribe to understand not just how users feel, but what topics they’re talking about. This can help us reach an even greater depth of what is needed to increase user success. For example, we can determine which products and features need urgent care, how to improve our customer care interactions across channels, and how to predict and prevent escalations from even happening.

We encourage you to try Amazon Transcribe and review the Developer Guide for more details.

About the Authors

Assaf Elovic is Head of R&D at Wix, leading all customer care engineering efforts in the fields of virtual assistants, predictive analysis, and proactive care. Prior to this role, Assaf was an entrepreneur specializing in conversational user interfaces and NLP. Assaf holds a B.Sc. in Computer Science and B.A. in Economics from IDC Hertzylia.

Mykhailo Ulianchenko is an Engineering Manager at Customer Care, Wix. His teams are responsible for delivering data-driven products that help Wix to provide the best customer care service.  Prior to the managerial position, Mykhailo was working as a software engineer in server, mobile and front-end areas. He is a big fan of extreme sports and Brazilian jiu-jitsu.

Vitalii Kloz is a Software Engineer at, working on building flexible and resilient applications to automate data pipelines at Wix and, particularly, to enhance users’ experience with Wix Customer Care by providing data-driven insights using Machine Learning. Vitalii holds B.Sc in Computer Science from Kyiv University and is currently studying for M.Sc.

Yaniv Vaknin is a Machine Learning Specialist at Amazon Web Services. Prior to AWS, Yaniv held leadership positions with AI startups and Enterprise including co-founder and CEO of Yaniv works with AWS customers to harness the power of Machine Learning to solve real world tasks and derive value. In his spare time, Yaniv enjoys playing soccer with his boys.

Gili Nachum is a senior AI/ML Specialist Solutions Architect in the AWS EMEA ML team. Gili is passionate about ML and in specific the cost and performance challenges of training deep learning models. Previously Gili was a SW architect working on Big Data, and Search.

Read More

How to approach conversation design with Amazon Lex: Building and testing (Part 3)

How to approach conversation design with Amazon Lex: Building and testing (Part 3)

In parts one and two of our guide to conversation design with Amazon Lex, we discussed how to gather requirements for your conversational AI application and draft conversational flows. In this post, we help you bring all the pieces together. You’ll learn how draft an interaction model to deliver natural conversational experiences, and how to test and tune your application.

This is the final post in a series on conversational design.

Draft an interaction model

In the second post of this series, you identified some use cases that you wanted to automate and wrote sample interactions between a user and your application. In this post, we use these use cases to build an Amazon Lex framework, called an interaction model, but first, let’s review some important definitions.

An interaction model defines the tasks that your application can help a user accomplish in a structure that a bot can understand. In your interaction model, you define the five major components for Amazon Lex: intents, sample utterances, slot names, slot values, and slot synonyms. You can make an interaction model (also called a bot definition) in whatever format works best for you and your team, such as in a spreadsheet or directly on the Amazon Lex console.

We start by defining intents. An intent is a user’s goal—what is the user trying to accomplish when they interact with your bot? Continuing with the banking bot example, a user goal might be something like blocking a card or opening a bank account. For example, in the following dialog, the user’s intent is to set up a recurring payment.

Turn Participant Dialog Notes
1 Amazon Lex Thanks for contacting Your Favorite Loan Company. What can I help you with today?
1 Customer I wanna set up a recurring payment.
2 Amazon Lex Your current balance is [$200.00]. That includes an overdue balance of [$100.00]. A payment of [$200.00] is due on October 15th. To set up a recurring payment, you’ll need the routing number and the account number. Do you have that information ready? Can customers change their debit date? Or only the account?
2 Customer No
3 Amazon Lex Okay. I can wait. Let me know when you have the information.
3 Customer

I’m ready.

4 Amazon Lex To start, what’s the 9-digit routing number?
4 Customer 999-999-999
5 Amazon Lex And what’s the account number?
5 Customer 1234 567 890

Each intent should come directly from a use case that you previously established in the earlier steps. We go into more detail about how to define intents later in this post.

Continuing on, a prompt label is a value defined by the application’s designer that maps to something the bot says.

A sample utterance is something the user says to the bot that is defined in the interaction model to help the bot classify customer intent. For example, if you’re creating an intent for opening a bank account, you’d likely want to include utterances like “open an account,” “help with opening an account,” or “How can I open a bank account?” The idea behind sample utterances is that by defining a class of utterances with similar semantic content, the bot can use these to make an educated guess about what the user’s goal is. Even if you don’t define every possible utterance (and you shouldn’t), the bot can guess what the user is trying to do.

A slot is a piece of information that the user provides in order to accomplish their goal. For example, if a customer wants to open a bank account, we need to know the type of account. We can use a slot to collect those account types, and name it something that builders will understand, like AccountType. Slots can be either required or optional, depending on the use case. For example, you might need a required slot like BirthDate to authenticate your user, but collect an optional slot type of AccountType to disambiguate between the different accounts a user might have. Slot values are the pieces of information that you want the bot to recognize as a slot, like “checking” or “savings.” Synonyms are alternate ways of saying a slot value, like “ISA” or “deposit account.”

Finally, a slot corresponding utterance is an utterance that contains a slot value, but doesn’t contain an intent, such as “to my savings account” or “it’s for my savings account.” In these utterances, you can’t tell what the user is trying to accomplish without the context of the rest of the conversation, but they do contain valuable slot information that you need the bot to capture.

The bot also has some available actions, ElicitIntent and ElicitSlot, which mean that the bot is either trying to capture the user’s intent or the bot is looking to gather some slot information.

Now that you’ve defined the values for all these components and put all those pieces together, you’ve created the first draft of an interaction model. Here’s an example, complete with the bot’s available actions.

Turn user stories into intents and slots

From your user stories, you’ve identified the use cases that you want your application to be able to help your users fulfill, such as blocking a bank card due to fraud or opening a new credit card. Make a complete list of all use cases that you developed. Now, it’s time to work backwards from the use cases to create user intents.

Start by getting a group of people together from all different teams of your organization—business analysts, technical pros, and leadership team members should all be present. Ask each person to create a list of possible things that they might reasonably say to a human agent or to an AI application for help with their use case. For example, if your use case is to open an account, you might list things like “I’d like to open a bank account,” “Can you help me open a new account?” or “Opening a savings account.” Be flexible with what you write. Have each person write 10–20 utterances per use case. Keep in mind the variety available in human language:

  • Verb variation – Open, start, begin, get started, establish
  • Noun variation – Account, savings, first credit card, new customer
  • Phrase or full sentence – Open account, I’d like to open an account
  • Statements versus questions – I want to open an account, Can you help me with getting started with a new account?
  • Implicit understanding – I’m a new customer, Help for new customers
  • Tone (formal or informal) – I need some assistance with opening a new high-yield savings account, I wanna open a new card

Now, compare with each other. Combine all the utterances into a team-wide list, and organize them with the most frequent utterances first. You can use these as a head start on your sample utterances. Try to classify each utterance into a single use case. You might think that this seems easy or obvious, since you just created these utterances directly from a list of use cases, but you might be surprised by how ambiguous human language can be.

Now that you have your utterances and your use cases, decide on which ones you want to turn into intents for your bot. Again, this requires input from your team to complete successfully, but here are some basic strategies. For each use case that you created utterances for and classified utterances for, you should turn that into an intent. If you find that you’re running into lots of ambiguity and having trouble classifying utterances, you should make a judgment call with your team about how you want to handle those tricky cases. You can merge use cases into a single intent if you find that there is too much similarity between the utterances, or you can split use cases up into more fine-grained intents if a single intent has too much variety in the utterances to classify them successfully.

Another strategy for dealing with these ambiguous utterances is to use slots. If you have an assortment of similarly defined intents, like OpenACreditCard and OpenADebitCard, you might find that utterances like “open a card” cause confusion in the model. After all, as a human being, it’s tough to decide just from that utterance whether the card is credit or debit card without more information. You can use slots to help by defining them in the model as a required piece of information, so that the bot looks for the words “credit” or “debit” in the utterance. Then, if those slots aren’t filled, use that information to surface a disambiguation prompt like “Would you like to open a new credit card or a new debit card?” to help get the necessary information. You should keep a running list of utterances that are difficult to classify and use them for testing to see how users navigate these tricky situations.

Remember that design is an iterative process and that no single interaction model will be perfect on the first try. This is why we continue with the next steps of prototyping and testing in order to build a successful conversational application.

Prototype your design

Given the often ambiguous nature of designing a conversational AI system, prototyping your design is crucial. Prototyping is a great way to gather meaningful feedback from real users in realistic contexts. In a design prototype, you want to build a simple way to test your design and gather feedback, without investing too much time building the software, because the design isn’t even finalized yet.

Following our example from earlier, we can build out a simple prototype to evaluate our user experience and amend our design as needed. Let’s build a mini-prototype with two intents: ReportCreditCardFraudIntent and OpenANewCreditCardAccountIntent.

Unknown charge on my account
I think someone stole my card
Credit card fraud department
Fraudulent charges on my account
Open a new account
Help with opening a credit card
Open a credit card account
I want to open a credit card

Before we even build these intents on the Amazon Lex console, we can make a prototype to make sure that we’re covering the most common utterances that a user might say. One simple way to do this would be to engage a few potential end-users, provide them with a scenario like pretending their card was stolen, and have them provide a few utterances. You can match this against what you’ve outlined and collected with your team, and use this data to help enhance your design. You might find that users are very unlikely to just say “credit card” at the open menu, or you might find that it’s the most common utterance. Gathering information from a likely pool of users helps you understand your customers better to make your design more robust.

The preceding example is a quick way to test your initial designs without much code. Other examples for prototyping your design would be Wizard of Oz (where the designer plays the role of the bot opposite a user who doesn’t necessarily know they’re talking to a human) or visual prototypes to help visualize the best experience (like a video simulating a chat window).

Test and tune your bot

Now that you’ve gathered all the different elements of your design, and the experience has been built and integrated, you can start testing.

The first step is to test against the design documentation you’ve put together (the sample dialogs, conversation flows, and interaction model). Thoroughly test all the different intents, slots, slot values, paths, and error handling flows that you’ve designed, going step by step through each one. The following is an example list of things to test:

  • Intent classification – Is the bot correctly predicting the intent for all utterances?
  • Slot values – Is the bot correctly recognizing all the possible slot values? For example, if you’re using a slot with phone numbers over voice, does the bot recognize both “one zero zero” and “one hundred” as valid inputs?
  • Error handling – Are there places in the flow where you get stuck in a loop? Does the bot correctly recover if some kind of error occurs?
  • Prompts – Are the prompts eliciting the expected response? Is the wording clear and understandable for all users?

The following is a sample test plan for a call center bot that you can use to guide your own testing.

Test ID Scenario Steps to test Utterance Successful?
Sample_100 You notice a fraudulent charge on your account Call number yes
Sample_100 Say “credit card fraud” Credit card fraud yes
Sample_100 Say or enter date of birth when prompted January 1, 1980 yes

After testing, you may find that your bot requires some tuning. Go through your interaction model and add in any commonly missed utterances, new intents or slots, or change the wording in problematic prompts that are losing users along the way. This is a great place to explore an automated testing framework to expedite the testing process, but manual testing offers different insights about the user experience that can help alert you to any usability defects before launch.

Finally, you should also provide your users with a way to test what you’ve built against the business requirements that you defined in part one of this series. You need to make sure that before you launch your application to production it handles all customer requests and fulfills the business requirements that you received. Before beginning user testing, define the test plan with all stakeholders so it’s clear to everyone on the team how you define success. Make sure that at this point, you’ve developed your application in an environment that is as close as possible to the production environment, so that any feedback from this testing provides insight for production. Provide testers with the test plan and clearly document the results, so that it’s easy to use the data from testing to make decisions about how best to move forward.

After you’ve launched your application, the work isn’t done! Design is an iterative experience and continually requires fresh perspectives to improve. As part of the business requirements, you should define how you’ll monitor the health of the system in order to identify issues, such as missed utterances. For example, you might want to explore an analytics framework dashboard or a business intelligence dashboard to help spot gaps in utterance coverage or places where users exit early. Use this information to improve your interaction model, test the new design, and ultimately, tune your application.


In this series, we covered all the important basics for creating a great conversational experience using Amazon Lex. We encourage you to test and iterate through your design multiple times to ensure the best possible customer experience. Keeping these best practices in mind, we hope you explore all the different and creative ways that humans interface with the technology around us.

And remember that we at AWS Professional Services and our extensive AWS Partner Network are available to help you and your team through the process. Whether you’re only in need of consultation and advice, or whether you need full access to a designer, our goal is to help you achieve the best conversational interface for you and your customers.

About the Authors

Nancy Clarke is a Conversation Designer with the AWS Professional Services Natural Language AI team. When she’s not at her desk, you’ll find her gardening, hiking, or re-reading the Lord of the Rings for the billionth time.

Rosie Connolly is a Conversation Designer with the AWS Professional Services Natural Language AI team. A linguist by training, she has worked with language in some form for over 15 years. When she’s not working with customers, she enjoys running, reading, and dreaming of her future on American Ninja Warrior.

Claire Mitchell is a Design Strategy Lead with the AWS Professional Services AWS Professional Services Emerging Technologies Intelligence Practice—Solutions team. Occasionally she spends time exploring speculative design practices, textiles, and playing the drums.

Read More