Automate a shared bikes and scooters classification model with Amazon SageMaker Autopilot

Amazon SageMaker Autopilot makes it possible for organizations to quickly build and deploy an end-to-end machine learning (ML) model and inference pipeline with just a few lines of code or even without any code at all with Amazon SageMaker Studio. Autopilot offloads the heavy lifting of configuring infrastructure and the time it takes to build an entire pipeline, including feature engineering, model selection, and hyperparameter tuning.

In this post, we show how to go from raw data to a robust and fully deployed inference pipeline with Autopilot.

Solution overview

We use Lyft’s public dataset on bikesharing for this simulation to predict whether or not a user participates in the Bike Share for All program. This is a simple binary classification problem.

We want to showcase how easy it is to build an automated and real-time inference pipeline to classify users based on their participation in the Bike Share for All program. To this end, we simulate an end-to-end data ingestion and inference pipeline for an imaginary bikeshare company operating in the San Francisco Bay Area.

The architecture is broken down into two parts: the ingestion pipeline and the inference pipeline.

We primarily focus on the ML pipeline in the first section of this post, and review the data ingestion pipeline in the second part.

Prerequisites

To follow along with this example, complete the following prerequisites:

  1. Create a new SageMaker notebook instance.
  2. Create an Amazon Kinesis Data Firehose delivery stream with an AWS Lambda transform function. For instructions, see Amazon Kinesis Firehose Data Transformation with AWS Lambda. This step is optional and only needed to simulate data streaming.

Data exploration

Let’s download and visualize the dataset, which is located in a public Amazon Simple Storage Service (Amazon S3) bucket and static website:

# The dataset is located in a public bucket and static s3 website.
# https://www.lyft.com/bikes/bay-wheels/system-data

import pandas as pd
import numpy as np
import os
from time import sleep

!wget -q -O '201907-baywheels-tripdata.zip' https://s3.amazonaws.com/baywheels-data/201907-baywheels-tripdata.csv.zip
!unzip -q -o 201907-baywheels-tripdata.zip
csv_file = os.listdir('.')
data = pd.read_csv('201907-baywheels-tripdata.csv', low_memory=False)
data.head()

The following screenshot shows a subset of the data before transformation.

The last column of the data contains the target we want to predict, which is a binary variable taking either a Yes or No value, indicating whether the user participates in the Bike Share for All program.

Let’s take a look at the distribution of our target variable for any data imbalance.

# For plotting
%matplotlib inline
import matplotlib.pyplot as plt
#!pip install seaborn # If you need this library
import seaborn as sns
display(sns.countplot(x='bike_share_for_all_trip', data=data))

As shown in the graph above, the data is imbalanced, with fewer people participating in the program.

We need to balance the data to prevent an over-representation bias. This step is optional because Autopilot also offers an internal approach to handle class imbalance automatically, which defaults to a F1 score validation metric. Additionally, if you choose to balance the data yourself, you can use more advanced techniques for handling class imbalance, such as SMOTE or GAN.

For this post, we downsample the majority class (No) as a data balancing technique:

The following code enriches the data and under-samples the overrepresented class:

df = data.copy()
df.drop(columns=['rental_access_method'], inplace=True)

df['start_time'] = pd.to_datetime(df['start_time'])
df['start_time'] = pd.to_datetime(df['end_time'])

# Adding some day breakdown
df = df.assign(day_of_week=df.start_time.dt.dayofweek,
                            hour_of_day=df.start_time.dt.hour,
                            trip_month=df.start_time.dt.month)
# Breaking the day in 4 parts: ['morning', 'afternoon', 'evening']
conditions = [
    (df['hour_of_day'] >= 5) & (df['hour_of_day'] < 12),
    (df['hour_of_day'] >= 12) & (df['hour_of_day'] < 18),
    (df['hour_of_day'] >= 18) & (df['hour_of_day'] < 21),
]
choices = ['morning', 'afternoon', 'evening']
df['part_of_day'] = np.select(conditions, choices, default='night')
df.dropna(inplace=True)

# Downsampling the majority to rebalance the data
# We are getting about an even distribution
df.sort_values(by='bike_share_for_all_trip', inplace=True)
slice_pointe = int(df['bike_share_for_all_trip'].value_counts()['Yes'] * 2.1)
df = df[-slice_pointe:]
# The data is balanced now. Let's reshuffle the data
df = df.sample(frac=1).reset_index(drop=True)

We deliberately left our categorical features not encoded, including our binary target value. This is because Autopilot takes care of encoding and decoding the data for us as part of the automatic feature engineering and pipeline deployment, as we see in the next section.

The following screenshot shows a sample of our data.

The data in the following graphs looks otherwise normal, with a bimodal distribution representing the two peaks for the morning hours and the afternoon rush hours, as you would expect. We also observe low activities on weekends and at night.

In the next section, we feed the data to Autopilot so that it can run an experiment for us.

Build a binary classification model

Autopilot requires that we specify the input and output destination buckets. It uses the input bucket to load the data and the output bucket to save the artifacts, such as feature engineering and the generated Jupyter notebooks. We retain 5% of the dataset to evaluate and validate the model’s performance after the training is complete and upload 95% of the dataset to the S3 input bucket. See the following code:

import sagemaker
import boto3

# Let's define our storage.
# We will use the default sagemaker bucket and will enforce encryption 
 
bucket = sagemaker.Session().default_bucket()  # SageMaker default bucket. 
#Encrypting the bucket
s3 = boto3.client('s3')
SSEConfig={
        'Rules': [
            {
                'ApplyServerSideEncryptionByDefault': {
                    'SSEAlgorithm': 'AES256',
                }
            },
        ]
    }
s3.put_bucket_encryption(Bucket=bucket, ServerSideEncryptionConfiguration=SSEConfig)

prefix = 'sagemaker-automl01'                  # prefix for ther bucket
role = sagemaker.get_execution_role()          # IAM role object to use by SageMaker
sagemaker_session = sagemaker.Session()        # Sagemaker API
region = sagemaker_session.boto_region_name    # AWS Region

# Where we will load our data 
input_path = "s3://{}/{}/automl_bike_train_share-1".format(bucket, prefix) 
output_path = "s3://{}/{}/automl_bike_output_share-1".format(bucket, prefix)

# Spliting data in train/test set.
# We will use 95% of the data for training and the remainder for testing.
slice_point = int(df.shape[0] * 0.95) 
training_set = df[:slice_point] # 95%
testing_set = df[slice_point:]  # 5%

# Just making sure we have split it correctly
assert training_set.shape[0] + testing_set.shape[0] == df.shape[0]

# Let's save the data locally and upload it to our s3 data location
training_set.to_csv('bike_train.csv')
testing_set.to_csv('bike_test.csv', header=False)

# Uploading file the trasining set to the input bucket
sagemaker.s3.S3Uploader.upload(local_path='bike_train.csv', desired_s3_uri=input_path)

After we upload the data to the input destination, it’s time to start Autopilot:

from sagemaker.automl.automl import AutoML
# You give your job a name and provide the s3 path where you uploaded the data
bike_automl_binary = AutoML(role=role, 
                         target_attribute_name='bike_share_for_all_trip', 
                         output_path=output_path,
                         max_candidates=30)
# Starting the training 
bike_automl_binary.fit(inputs=input_path, 
                       wait=False, logs=False)

All we need to start experimenting is to call the fit() method. Autopilot needs the input and output S3 location and the target attribute column as the required parameters. After feature processing, Autopilot calls SageMaker automatic model tuning to find the best version of a model by running many training jobs on your dataset. We added the optional max_candidates parameter to limit the number of candidates to 30, which is the number of training jobs that Autopilot launches with different combinations of algorithms and hyperparameters in order to find the best model. If you don’t specify this parameter, it defaults to 250.

We can observe the progress of Autopilot with the following code:

# Let's monitor the progress this will take a while. Go grup some coffe.
from time import sleep

def check_job_status():
    return bike_automl_binary.describe_auto_ml_job()['AutoMLJobStatus']

def discribe():
    return bike_automl_binary.describe_auto_ml_job()

while True:
    print (check_job_status(), discribe()['AutoMLJobSecondaryStatus'], end='** ') 
    if check_job_status() in ["Completed", "Failed"]:
        if "Failed" in check_job_status():
            print(discribe()['FailureReason'])
        break
    sleep(20)

The training takes some time to complete. While it’s running, let’s look at the Autopilot workflow.

To find the best candidate, use the following code:

# Let's take a look at the best candidate selected by AutoPilot
from IPython.display import JSON
def jsonView(obj, rootName=None):
    return JSON(obj, root=rootName, expanded=True)

bestCandidate = bike_automl_binary.describe_auto_ml_job()['BestCandidate']
display(jsonView(bestCandidate['FinalAutoMLJobObjectiveMetric'], 'FinalAutoMLJobObjectiveMetric'))

The following screenshot shows our output.

Our model achieved a validation accuracy of 96%, so we’re going to deploy it. We could add a condition such that we only use the model if the accuracy is above a certain level.

Inference pipeline

Before we deploy our model, let’s examine our best candidate and what’s happening in our pipeline. See the following code:

display(jsonView(bestCandidate['InferenceContainers'], 'InferenceContainers'))

The following diagram shows our output.

Autopilot has built the model and has packaged it in three different containers, each sequentially running a specific task: transform, predict, and reverse-transform. This multi-step inference is possible with a SageMaker inference pipeline.

A multi-step inference can also chain multiple inference models. For instance, one container can perform principal component analysis before passing the data to the XGBoost container.

Deploy the inference pipeline to an endpoint

The deployment process involves just a few lines of code:

# We chose to difine an endpoint name.
from datetime import datetime as dt
today = str(dt.today())[:10]
endpoint_name='binary-bike-share-' + today
endpoint = bike_automl_binary.deploy(initial_instance_count=1,
                                  instance_type='ml.m5.xlarge',
                                  endpoint_name=endpoint_name,
                                  candidate=bestCandidate,
                                  wait=True)

Let’s configure our endpoint for prediction with a predictor:

from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import CSVDeserializer
csv_serializer = CSVSerializer()
csv_deserializer = CSVDeserializer()
# Initialize the predictor
predictor = sagemaker.predictor.Predictor(endpoint_name=endpoint_name, 
                                                  sagemaker_session=sagemaker.Session(),
                                                  serializer=csv_serializer,
                                                  deserializer=csv_deserializer
                                                  )

Now that we have our endpoint and predictor ready, it’s time to use the testing data we set aside and test the accuracy of our model. We start by defining a utility function that sends the data one line at a time to our inference endpoint and gets a prediction in return. Because we have an XGBoost model, we drop the target variable before sending the CSV line to the endpoint. Additionally, we removed the header from the testing CSV before looping through the file, which is also another requirement for XGBoost on SageMaker. See the following code:

# The fuction takes 3 arguments: the file containing the test set,
# The predictor and finaly the number of lines to send for prediction.
# The function returns two Series: inferred and Actual.
def get_inference(file, predictor, n=1):
    infered = []
    actual = []
    with open(file, 'r') as csv:
        for i in range(n):
            line = csv.readline().split(',')
            #print(line)
            try:
                # Here we remove the target variable from the csv line before predicting 
                observed = line.pop(14).strip('n')
                actual.append(observed)
            except:
                pass
            obj = ','.join(line)
            
            predicted = predictor.predict(obj)[0][0]
            infered.append(predicted)
            pd.Series(infered)
            data = {'Infered': pd.Series(infered), 'Observed': pd.Series(actual)}
    return  pd.DataFrame(data=data)
    
n = testing_set.shape[0] # The size of the testing data
inference_df = get_inference('bike_test.csv', predictor, n)

inference_df['Binary_Result'] = (inference_df['Observed'] == inference_df['Infered'])
display(inference_df.head())

The following screenshot shows our output.

Now let’s calculate the accuracy of our model.

See the following code:

count_binary = inference_df['Binary_Result'].value_counts()
accuracy = count_binary[True]/n
print('Accuracy:', accuracy)

We get an accuracy of 92%. This is slightly lower than the 96% obtained during the validation step, but it’s still high enough. We don’t expect the accuracy to be exactly the same because the test is performed with a new dataset.

Data ingestion

We downloaded the data directly and configured it for training. In real life, you may have to send the data directly from the edge device into the data lake and have SageMaker load it directly from the data lake into the notebook.

Kinesis Data Firehose is a good option and the most straightforward way to reliably load streaming data into data lakes, data stores, and analytics tools. It can capture, transform, and load streaming data into Amazon S3 and other AWS data stores.

For our use case, we create a Kinesis Data Firehose delivery stream with a Lambda transformation function to do some lightweight data cleaning as it traverses the stream. See the following code:

# Data processing libraries
import pandas as pd  # Data processing
import numpy as np
import base64
from io import StringIO


def lambda_handler(event, context):
    output = []
    print('Received', len(event['records']), 'Records')
    for record in event['records']:

        payload = base64.b64decode(record['data']).decode('utf-8')
        df = pd.read_csv(StringIO(payload), index_col=0)

        df.drop(columns=['rental_access_method'], inplace=True)

        df['start_time'] = pd.to_datetime(df['start_time'])
        df['start_time'] = pd.to_datetime(df['end_time'])

        # Adding some day breakdown
        df = df.assign(day_of_week=df.start_time.dt.dayofweek,
                                 hour_of_day=df.start_time.dt.hour,
                                 trip_month=df.start_time.dt.month)
        # Breaking the day in 4 parts: ['morning', 'afternoon', 'evening']
        conditions = [
            (df['hour_of_day'] >= 5) & (df['hour_of_day'] < 12),
            (df['hour_of_day'] >= 12) & (df['hour_of_day'] < 18),
            (df['hour_of_day'] >= 18) & (df['hour_of_day'] < 21),
        ]
        choices = ['morning', 'afternoon', 'evening']
        df['part_of_day'] = np.select(conditions, choices, default='night')
        df.dropna(inplace=True)

        # Downsampling the majority to rebalance the data
        # We are getting about an even distribution
        df.sort_values(by='bike_share_for_all_trip', inplace=True)
        slice_pointe = int(df['bike_share_for_all_trip'].value_counts()['Yes'] * 2.1)
        df = df[-slice_pointe:]
        # The data is balanced now. Let's reshuffle the data
        df = df.sample(frac=1).reset_index(drop=True)

        data = base64.b64encode(bytes(df.to_csv(), 'utf-8')).decode("utf-8")
        output_record = {
            'recordId': record['recordId'],
            'result': 'Ok',
            'data': data

        }
        output.append(output_record)
    print('Returned', len(output), 'Records')
    print('Event', event)

    return {'records': output}

This Lambda function performs light transformation of the data streamed from the devices onto the data lake. It expects a CSV formatted data file.

For the ingestion step, we download the data and simulate a data stream to Kinesis Data Firehose with a Lambda transform function and into our S3 data lake.

Let’s simulate streaming a few lines:

# Saving the data in one file.
file = '201907-baywheels-tripdata.csv' 
data.to_csv(file)

# Stream the data 'n' lines at a time.
# Only run this for a minute and stop the cell
def streamer(file, n):
    with open(file, 'r') as csvfile:  
        header = next(csvfile)
        data = header
        counter = 0
        loop = True
        while loop == True:
            for i in range(n):
                line = csvfile.readline()
                data+=line
                # We reached the end of the csv file.
                if line == '':
                    loop = False
            counter+=n
            # Use your kinesis streaming name
            stream = client.put_record(DeliveryStreamName='firehose12-DeliveryStream-OJYW71BPYHF2', Record={"Data": bytes(data, 'utf-8')})
            data = header
            print( file, 'HTTPStatusCode: '+ str(stream['ResponseMetadata']['HTTPStatusCode']), 'csv_lines_sent: ' + str(counter), end=' -*- ')
            
            sleep(random.randrange(1, 3))
        return
# Streaming for 500 lines at a time. You can change this number up and down.
streamer(file, 500)

# We can now load our data as a DataFrame because it’s streamed into the S3 data lake:
# Getting data from s3 location where it was streamed.
STREAMED_DATA = 's3://firehose12-deliverybucket-11z0ya3patrod/firehose/2020'
csv_uri = sagemaker.s3.S3Downloader.list(STREAMED_DATA)
in_memory_string = [sagemaker.s3.S3Downloader.read_file(file) for file in csv_uri]
in_memory_csv = [pd.read_csv(StringIO(file), index_col=0) for file in in_memory_string]
display(df.tail())

Clean up

It’s important to delete all the resources used in this exercise to minimize cost. The following code deletes the SageMaker inference endpoint we created as well the training and testing data we uploaded:

#Delete the s3 data
predictor.delete_endpoint()

# Delete s3 data
s3 = boto3.resource('s3')
ml_bucket = sagemaker.Session().default_bucket()
delete_data = s3.Bucket(ml_bucket).objects.filter(Prefix=prefix).delete()

Conclusion

ML engineers, data scientists, and software developers can use Autopilot to build and deploy an inference pipeline with little to no ML programming experience. Autopilot saves time and resources, using data science and ML best practices. Large organizations can now shift engineering resources away from infrastructure configuration towards improving models and solving business use cases. Startups and smaller organizations can get started on machine learning with little to no ML expertise.

We recommend learning more about other important features SageMaker has to offer, such as the Amazon SageMaker Feature Store, which integrates with Amazon SageMaker Pipelines to create, add feature search and discovery, and reuse automated ML workflows. You can run multiple Autopilot simulations with different feature or target variants in your dataset. You could also approach this as a dynamic vehicle allocation problem in which your model tries to predict vehicle demand based on time (such as time of day or day of the week) or location, or a combination of both.


About the Authors

Doug Mbaya is a Senior Solution architect with a focus in data and analytics. Doug works closely with AWS partners, helping them integrate data and analytics solution in the cloud. Doug’s prior experience includes  supporting AWS customers in the ride sharing and food delivery segment.

Valerio Perrone is an Applied Science Manager working on Amazon SageMaker Automatic Model Tuning and Autopilot.

Read More

Apply profanity masking in Amazon Translate

Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable, and customizable language translation. This post shows how you can mask profane words and phrases with a grawlix string (“?$#@$”).

Amazon Translate typically chooses clean words for your translation output. But in some situations, you want to prevent words that are commonly considered as profane terms from appearing in the translated output. For example, when you’re translating video captions or subtitle content, or enabling in-game chat, and you want the translated content to be age appropriate and clear of any profanity, Amazon Translate allows you to mask the profane words and phrases using the profanity masking setting. You can apply profanity masking to both real-time translation or asynchronous batch processing in Amazon Translate. When using Amazon Translate with profanity masking enabled, the five-character sequence ?$#@$ is used to mask each profane word or phrase, regardless of the number of characters. Amazon Translate detects each profane word or phrase literally, not contextually.

Solution overview

To mask profane words and phrases in your translation output, you can enable the profanity option under the additional settings on the Amazon Translate console when you run the translations with Amazon Translate both through real-time and asynchronous batch processing requests. The following sections demonstrate using profanity masking for real-time translation requests via the Amazon Translate console, AWS Command Line Interface (AWS CLI), or with the Amazon Translate SDK (Python Boto3).

Amazon Translate console

To demonstrate handling profanity with real-time translation, we use the following sample text in French to be translated into English:

Ne sois pas une garce

Complete the following steps on the Amazon Translate console:

  1. Choose French (fr) as the Source language.
  2. Choose English (en) as the Target Language.
  3. Enter the preceding example text in the Source Language text area.

The translated text appears under Target language. It contains a word that is considered profane in English.

  1. Expand Additional settings and enable Profanity.

The word is now replaced with the grawlix string ?$#@$.

AWS CLI

Calling the translate-text AWS CLI command with --settings Profanity=MASK masks profane words and phrases in your translated text.

The following AWS CLI commands are formatted for Unix, Linux, and macOS. For Windows, replace the backslash () Unix continuation character at the end of each line with a caret (^).

aws translate translate-text 
--text <<INPUT TEXT>> 
--source-language-code fr 
--target-language-code en 
--settings Profanity=MASK

You get a response like the following snippet:

{
    "TranslatedText": "<output text with ?$#@$>",
    "SourceLanguageCode": "fr",
    "TargetLanguageCode": "en",
    "AppliedSettings": {
        "Profanity": "MASK"
    }
}

Amazon Translate SDK (Python Boto3)

The following Python 3 code uses the real-time translation call with the profanity setting:

import boto3
import json

translate = boto3.client('translate')

SOURCE_TEXT = ("<Sample Input Text>")

OUTPUT_LANG_CODE = 'en'

result = translate.translate_text(
    Text=SOURCE_TEXT,
    SourceLanguageCode='auto',
    TargetLanguageCode=OUTPUT_LANG_CODE,
    Settings={'Profanity': 'MASK'}
)

print("Translated Text:{}".format(result['TranslatedText']))

Conclusion

You can use the profanity masking setting to mask words and phrases that are considered profane to keep your translated text clean and meet your business requirements. To learn more about all the ways you can customize your translations, refer to Customizing Your Translations using Amazon Translate.


About the Authors

Siva Rajamani is a Boston-based Enterprise Solutions Architect at AWS. He enjoys working closely with customers and supporting their digital transformation and AWS adoption journey. His core areas of focus are serverless, application integration, and security. Outside of work, he enjoys outdoors activities and watching documentaries.

Sudhanshu Malhotra is a Boston-based Enterprise Solutions Architect for AWS. He’s a technology enthusiast who enjoys helping customers find innovative solutions to complex business challenges. His core areas of focus are DevOps, machine learning, and security. When he’s not working with customers on their journey to the cloud, he enjoys reading, hiking, and exploring new cuisines.

Watson G. Srivathsan is the Sr. Product Manager for Amazon Translate, AWS’s natural language processing service. On weekends you will find him exploring the outdoors in the Pacific Northwest.

Read More

How Süddeutsche Zeitung optimized their audio narration process with Amazon Polly

This is a guest post by Jakob Kohl, a Software Developer at the Süddeutsche Zeitung. Süddeutsche Zeitung is one of the leading quality dailies in Germany when it comes to paid subscriptions and unique users. Its website, SZ.de, reaches more than 15 million monthly unique users as of October 2021.

Thanks to smart speakers and podcasts, the audio industry has experienced a real boom in recent years. At Süddeutsche Zeitung, we’re constantly looking for new ways to make our diverse journalism even more accessible. As pioneers in digital journalism, we want to open up more opportunities for Süddeutsche Zeitung readers to consume articles. We started looking for solutions that could provide high-quality audio narration for our articles. Our ultimate goal was to launch a “listen to the article” feature.

In this post, we share how we optimized our audio narration process with Amazon Polly, a service that turns text into lifelike speech using advanced deep learning technologies.

Why Amazon Polly?

We believe that Vicki, the German neural Amazon Polly voice, is currently the best German voice on the market. Amazon Polly offers the impressive feature to switch between languages, correctly pronouncing for example English movie titles as well as personal names in different languages (for an example, listen to the article Schall und Wahn on our website).

A big part of our infrastructure already runs on AWS, so using Amazon Polly was a perfect fit. We can combine Amazon Polly with the following components:

  • An Amazon Simple Notification Service (Amazon SNS) topic to which we can subscribe for articles. The articles are sent to this topic by the CMS whenever they’re saved by an editor.
  • An Amazon CloudFront distribution with Lambda@Edge to paywall premium articles, which we can reuse for audio versions of articles.

The Amazon Polly API is easy to use and well documented. It took us less than a week to get our proof of concept to work.

The challenge

Hundreds of new articles are published every day on SZ.de. After initial publication, they might get updated several times for various reasons—new paragraphs are added in news-driven articles, typos are fixed, teasers are changed, or metadata is optimized for search engines.

Generating speech for the initial publication of an article is straightforward, because the whole text needs to be synthesized. But how can we quickly generate the audio for updated versions of articles without paying twice for the same content? Our biggest challenge was to prevent sending the whole text to Amazon Polly repeatedly for every single update.

Our technical solution

Every time an editor saves an article, the new version of the article is published to an SNS topic. An AWS Lambda function is subscribed to this topic and called for every new version of an article. This function runs the following steps:

  1. Check if the new version of the article has already been completely synthesized. If so, the function stops immediately (this may happen when only metadata is changed that doesn’t affect the audio).
  2. Convert the article into multiple SSML documents, roughly one for each text paragraph.
  3. For each SSML document, the function checks if it has already been synthesized to audio using calculated hashes. For example:
    1. If an article is saved for the first time, all SSML documents must be synthesized.
    2. If a typo has been fixed in a single paragraph, only the SSML document for this paragraph must be re-synthesized.
    3. If a new paragraph is added to the article, only the SSML document for this new paragraph must be synthesized.
  4. Send all not-yet-synthesized SSML documents separately to Amazon Polly.

These checks help optimize performance and reduce cost by preventing the synthesis of an entire article multiple times. We avoid incurring additional charges due to minor changes such as a title edit or metadata adjustments for SEO reasons.

The following diagram illustrates the solution workflow.

After Amazon Polly synthesizes the SSML documents, the audio files are sent to an output bucket in Amazon Simple Storage Service (Amazon S3). A second Lambda function is listening for object creation on that bucket, waits for the completion of all audio fragments of an article, and merges them into a final audio file using FFmpeg from a Lambda layer. This final audio is sent to another S3 bucket, which is used as the origin in our CloudFront distribution. In CloudFront, we reuse an existing paywall for premium articles for the corresponding audio version.

Based on our freemium model, we provide a shortened audio version of premium articles. Non-subscribers are able to listen to the first paragraph for free, but are required to purchase a subscription to access the full article.

Conclusion

Integration of Amazon Polly into our existing infrastructure was very straightforward. Our content requires minimal customization because we only include paragraphs and some additional breaks. The most challenging part was performance and cost optimization, which we achieved by splitting the article up into multiple SSML documents corresponding to paragraphs, checking for changes in each SSML document, and building the whole audio file by merging the fragments. With these optimizations, we are able to achieve the following:

  • Decrease the amount of synthesized characters by at least 50% by only synthesizing real changes.
  • Reduce the time it takes for a change in the article text to appear in the audio because there is less audio to synthesize.
  • Add arbitrary audio files between paragraphs without re-synthesizing the whole article. For example, we can include a sound file in the shortened audio version of a premium articles to separate the first paragraph from the ensuing note that a subscription is needed to listen to the full version.

In the first month after the launch of the “listen to the article” feature in our SZ.de articles, we received a lot of positive user feedback. We were able to reach almost 30,000 users during the first 2 months after launch. From these users, approximately 200 converted into a paid subscription only from listening to the teaser of an article behind our paywall. The “listen to the article” feature isn’t behind our paywall, but users can only listen to premium articles fully if they have a subscription. Our website also offers free articles without a paywall. In the future, we will expand the feature to other SZ platforms, especially our mobile news apps.


About the Author

Jakob Kohl is a Software Developer at the Süddeutsche Zeitung, where he enjoys working with modern technologies on an agile website team. He is one of the main developers of the “listen to an SZ article” feature. In his leisure time, he likes building wooden furniture, where technical and visual design is as important as in web development.

Read More

Reduce costs and complexity of ML preprocessing with Amazon S3 Object Lambda

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. Often, customers have objects in S3 buckets that need further processing to be used effectively by consuming applications. Data engineers must support these application-specific data views with trade-offs between persisting derived copies or transforming data at the consumer level. Neither solution is ideal because it introduces operational complexity, causes data consistency challenges, and wastes more expensive computing resources.

These trade-offs broadly apply to many machine learning (ML) pipelines that train on unstructured data, such as audio, video, and free-form text, among other sources. In each example, the training job must download data from S3 buckets, prepare an application-specific view, and then use an AI algorithm. This post demonstrates a design pattern for reducing costs, complexity, and centrally managing this second step. It uses the concrete example of image processing, though the approach broadly applies to any workload. The economic benefits are also most pronounced when the transformation step doesn’t require GPU, but the AI algorithm does.

The proposed solution also centralizes data transformation code and enables just-in-time (JIT) transformation. Furthermore, the approach uses a serverless infrastructure to reduce operational overhead and undifferentiated heavy lifting.

Solution overview

When ML algorithms process unstructured data like images and video, it requires various normalization tasks (such as grey-scaling and resizing). This step exists to accelerate model convergence, avoid overfitting, and improve prediction accuracy. You often perform these preprocessing steps on instances that later run the AI training. That approach creates inefficiencies, because those resources typically have more expensive processors (for example, GPUs) than these tasks require. Instead, our solution externalizes those operations across economic, horizontally scalable Amazon S3 Object Lambda functions.

This design pattern has three critical benefits. First, it centralizes the shared data transformation steps, such as image normalization and removing ML pipeline code duplication. Next, S3 Object Lambda functions avoid data consistency issues in derived data through JIT conversions. Third, the serverless infrastructure reduces operational overhead, increases access time, and limits costs to the per-millisecond time running your code.

An elegant solution exists in which you can centralize these data preprocessing and data conversion operations with S3 Object Lambda. S3 Object Lambda enables you to add code that modifies data from Amazon S3 before returning it to an application. The code runs within an AWS Lambda function, a serverless compute service. Lambda can instantly scale to tens of thousands of parallel runs while supporting dozens of programming languages and even custom containers. For more information, see Introducing Amazon S3 Object Lambda – Use Your Code to Process Data as It Is Being Retrieved from S3.

The following diagram illustrates the solution architecture.

Normalize datasets used to train machine learning model

In this solution, you have an S3 bucket that contains the raw images to be processed. Next, you create an S3 Access Point for these images. If you build multiple ML models, you can create separate S3 Access Points for each model. Alternatively, AWS Identity and Access Management (IAM) policies for access points support sharing reusable functions across ML pipelines. Then you attach a Lambda function that has your preprocessing business logic to the S3 Access Point. After you retrieve the data, you call the S3 Access Point to perform JIT data transformations. Finally, you update your ML model to use the new S3 Object Lambda Access Point to retrieve data from Amazon S3.

Create the normalization access point

This section walks through the steps to create the S3 Object Lambda access point.

Raw data is stored in an S3 bucket. To provide the user with the right set of permissions to access this data, while avoiding complex bucket policies that can cause unexpected impact to another application, you need to create S3 Access Points. S3 Access Points are unique host names that you can use to reach S3 buckets. With S3 Access Points, you can create individual access control policies for each access point to control access to shared datasets easily and securely.

  1. Create your access point.
  2. Create a Lambda function that performs the image resizing and conversion. See the following Python code:
import boto3
import cv2
import numpy as np
import requests
import io

def lambda_handler(event, context):
    print(event)
    object_get_context = event["getObjectContext"]
    request_route = object_get_context["outputRoute"]
    request_token = object_get_context["outputToken"]
    s3_url = object_get_context["inputS3Url"]

    # Get object from S3
    response = requests.get(s3_url)
    nparr = np.fromstring(response.content, np.uint8)
    img = cv2.imdecode(nparr, flags=1)

    # Transform object
    new_shape=(256,256)
    resized = cv2.resize(img, new_shape, interpolation= cv2.INTER_AREA)
    gray_scaled = cv2.cvtColor(resized,cv2.COLOR_BGR2GRAY)

    # Transform object
    is_success, buffer = cv2.imencode(".jpg", gray_scaled)
    if not is_success:
       raise ValueError('Unable to imencode()')

    transformed_object = io.BytesIO(buffer).getvalue()

    # Write object back to S3 Object Lambda
    s3 = boto3.client('s3')
    s3.write_get_object_response(
        Body=transformed_object,
        RequestRoute=request_route,
        RequestToken=request_token)

    return {'status_code': 200}
  1. Create an Object Lambda access point using the supporting access point from Step 1.

The Lambda function uses the supporting access point to download the original objects.

  1. Update Amazon SageMaker to use the new S3 Object Lambda access point to retrieve data from Amazon S3. See the following bash code:
aws s3api get-object --bucket arn:aws:s3-object-lambda:us-west-2:12345678901:accesspoint/image-normalizer --key images/test.png

Cost savings analysis

Traditionally, ML pipelines copy images and other files from Amazon S3 to SageMaker instances and then perform normalization. However, transforming these actions on training instances has inefficiencies. First, Lambda functions horizontally scale to handle the burst then elastically shrink, only charging per millisecond when the code is running. Many preprocessing steps don’t require GPUs and can even use ARM64. That creates an incentive to move that processing to more economical compute such as Lambda functions powered by AWS Graviton2 processors.

Using an example from the Lambda pricing calculator, you can configure the function with 256 MB of memory and compare the costs for both x86 and Graviton (ARM64). We chose this size because it’s sufficient for many single-image data preparation tasks. Next, use the SageMaker pricing calculator to compute expenses for an ml.p2.xlarge instance. This is the smallest supported SageMaker training instance with GPU support. These results show up to 90% compute savings for operations that don’t use GPUs and can shift to Lambda. The following table summarizes these findings.

 

Lambda with x86 Lambda with Graviton2 (ARM) SageMaker ml.p2.xlarge
Memory (GB) 0.25 0.25 61
CPU  —  — 4
GPU  —  — 1
Cost/hour $0.061 $0.049 $0.90

Conclusion

You can build modern applications to unlock insights into your data. These different applications have unique data view requirements, such as formatting and preprocessing actions. Addressing these other use cases can result in data duplication, increasing costs, and more complexity to maintain consistency. This post offers a solution for efficiently handling these situations using S3 Object Lambda functions.

Not only does this remove the need for duplication, but it also forms a path to scale these actions across less expensive compute horizontally! Even optimizing the transformation code for the ml.p2.xlarge instance would still be significantly more costly because of the idle GPUs.

For more ideas on using serverless and ML, see Machine learning inference at scale using AWS serverless and Deploying machine learning models with serverless templates.


About the Authors

Nate Bachmeier is an AWS Senior Solutions Architect nomadically explores New York, one cloud integration at a time. He specializes in migrating and modernizing customers’ workloads. Besides this, Nate is a full-time student and has two kids.

Marvin Fernandes is a Solutions Architect at AWS, based in the New York City area. He has over 20 years of experience building and running financial services applications. He is currently working with large enterprise customers to solve complex business problems by crafting scalable, flexible, and resilient cloud architectures.

Read More

Extract entities from insurance documents using Amazon Comprehend named entity recognition

Intelligent document processing (IDP) is a common use case for customers on AWS. You can utilize Amazon Comprehend and Amazon Textract for a variety of use cases ranging from document extraction, data classification, and entity extraction. One specific industry that uses IDP is insurance. They use IDP to automate data extraction for common use cases such as claims intake, policy servicing, quoting, payments, and next best actions. However, in some cases, an office receives a document with complex, label-less information. This is normally difficult for optical character recognition (OCR) software to capture, and identifying relationships and key entities becomes a challenge. The solution is often requires manual human entry to ensure high accuracy.

In this post, we demonstrate how you can use named entity recognition (NER) for documents in their native formats in Amazon Comprehend to address these challenges.

Solution overview

In an insurance scenario, an insurer might receive a demand letter from an attorney’s office. The demand letter includes information such as what law office is sending the letter, who their client is, and what actions are required to satisfy their requests, as shown in the following example:

Because of the varied locations that this information could be found in a demand letter, these documents are often forwarded to an individual adjuster, who takes the time to read through the letter to determine all the necessary information required to proceed with a claim. The document may have multiple names, addresses, and requests that each need to be classified. If the client is mixed up with the beneficiary, or the addresses are switched, delays could add up and negative consequences could impact the company and customers. Because there are often small differences between categories like addresses and names, the documents are often processed by humans rather than using an IDP approach.

The preceding example document has many instances of overlapping entity values (entities that share similar properties but aren’t related). Examples of this are the address of the law office vs. the address of the insurance company or the names of the different individuals (attorney name, beneficiary, policy holder). Additionally, there is positional information (where the entity is positioned within the document) that a traditional text-only algorithm might miss. Therefore, traditional recognition techniques may not meet requirements.

In this post, we use named entity recognition in Amazon Comprehend to solve these challenges. The benefit of using this method is that the custom entity recognition model uses both the natural language and positional information of the text to accurately extract custom entities that may otherwise be impacted when flattening a document, as demonstrated in our preceding example of overlapping entity values. For this post, we use an AWS artificially created dataset of legal requisition and demand letters for life insurance, but you can use this approach across any industry and document that may benefit from spatial data in custom NER training. The following diagram depicts the solution architecture:

We implement the solution with the following high-level steps:

  1. Clone the repository containing the sample dataset.
  2. Create an Amazon Simple Storage Service (Amazon S3) bucket.
  3. Create and train your custom entity recognition model.
  4. Use the model by running an asynchronous batch job.

Prerequisites

You need to complete the following prerequisites to use this solution:

  1. Install Python 3.8.x.
  2. Make sure you have pip installed.
  3. Install and configure the AWS Command Line Interface (AWS CLI).
  4. Configure your AWS credentials.

Annotate your documents

To train a custom entity recognition model that can be used on your PDF, Word, and plain text documents, you need to first annotate PDF documents using a custom Amazon SageMaker Ground Truth annotation template that is provided by Amazon Comprehend. For instructions, see Custom document annotation for extracting named entities in documents using Amazon Comprehend.

We recommend a minimum of 250 documents and 100 annotations per entity to ensure good quality predictions. With more training data, you’re more likely to produce a higher-quality model.

When you’ve finished annotating, you can train a custom entity recognition model and use it to extract custom entities from PDF, Word, and plain text documents for batch (asynchronous) processing.

For this post, we have already labeled our sample dataset, and you don’t have to annotate the documents provided. However, if you want to use your own documents or adjust the entities, you have to annotate the documents. For instructions, see Custom document annotation for extracting named entities in documents using Amazon Comprehend.

We extract the following entities (which are case sensitive):

  • Law Firm
  • Law Office Address
  • Insurance Company
  • Insurance Company Address
  • Policy Holder Name
  • Beneficiary Name
  • Policy Number
  • Payout
  • Required Action
  • Sender

The dataset provided is entirely artificially generated. Any mention of names, places, and incidents are either products of the author’s imagination or are used fictitiously. Any resemblance to actual events or locales or persons, living or dead, is entirely coincidental.

Clone the repository

Start by cloning the repository by running the following command:

git clone https://github.com/aws-samples/aws-legal-entity-extraction

The repository contains the following files:

aws-legal-entity-extraction
	/source
	/annotations
	output.manifest
	sample.pdf
	bucketnamechange.py

Create an S3 bucket

To create an S3 bucket to use for this example, complete the following steps:

  1. On the Amazon S3 console, choose Buckets in the navigation pane.
  2. Choose Create bucket.
  3. Note the name of the bucket you just created.

To reuse the annotations that we already made for the dataset, we have to modify the output.manifest file and reference the bucket we just created.

  1. Modify the file by running the following commands:
    cd aws-legal-entity-extraction
    python3 bucketnamechange.py
    Enter the name of your bucket: <Enter the name of the bucket you created>

When the script is finished running, you receive the following message:

The manifest file is updated with the correct bucket

We can now begin training our model.

Create and train the model

To start training your model, complete the following steps:

  1. On the Amazon S3 console, upload the /source folder, /annotations folder, output.manifest, and sample.pdf files.

Your bucket should look similar to the following screenshot.

  1. On the Amazon Comprehend console, under Customization in the navigation pane, choose Custom entity recognition.
  2. Choose Create new model.
  3. For Model name, enter a name.
  4. For Language, choose English.
  5. For Custom entity type, add the following case-sensitive entities:
    1. Law Firm
    2. Law Office Address
    3. Insurance Company
    4. Insurance Company Address
    5. Policy Holder Name
    6. Beneficiary Name
    7. Policy Number
    8. Payout
    9. Required Action
    10. Sender

  6. In Data specifications, for Data format, select Augmented manifest to reference the manifest we created when we annotated the documents.
  7. For Training model type, select PDF, Word Documents.

This specifies the type of documents you’re using for training and inference.

  1. For SageMaker Ground Truth augmented manifest file S3 location, enter the location of the output.manifest file in your S3 bucket.
  2. For S3 prefix for Annotation data files, enter the path to the annotations folder.
  3. For S3 prefix for Source documents, enter the path to the source folder.
  4. For Attribute names, enter legal-entity-label-job-labeling-job-20220104T172242.

The attribute name corresponds to the name of the labeling job you create for annotating the documents. For the pre-annotated documents, we use the name legal-entity-label-job-labeling-job-20220104T172242. If you choose to annotate your documents, substitute this value with the name of your annotation job.

  1. Create a new AWS Identity and Access Management (IAM) role and give it permissions to the bucket that contains all your data.
  2. Finish creating the model (select the Autosplit option for your data source to see similar metrics to those in the following screenshots).

Now your recognizer model is visible on the dashboard with the model training status and metrics.

The model may take several minutes to train.

The following screenshot shows your model metrics when the training is complete.

Use the custom entity recognition model

To use the custom entity recognition models trained on PDF documents, we create a batch job to process them asynchronously.

  1. On the Amazon Comprehend console, choose Analysis jobs.
  2. Choose Create job.
  3. Under Input data, enter the Amazon S3 location of the annotated PDF documents to process (for this post, the sample.pdf file).
  4. For Input format, select One document per file.
  5. Under Output Data, enter the Amazon S3 location you want them to populate in. For this post, we create a new folder called analysis-output in the S3 bucket containing all source PDF documents, annotated documents, and manifest.
  6. Use an IAM role with permissions to the sample.pdf folder.

You can use the role created earlier.

  1. Choose Create job.

This is an asynchronous job so it may take a few minutes to complete processing. When the job is complete, you get link to the output. When you open this output, you see a series of files as follows:

You can open the file sample.pdf.out in your preferred text editor. If you search for the Entities Block, you can find the entities identified in the document. The following table shows an example.

Type Text Score
Insurance Company Budget Mutual Insurance Company 0.999984086
Insurance Company Address 9876 Infinity Aven Springfield, MI 65541 0.999982051
Law Firm Bill & Carr 0.99997298
Law Office Address 9241 13th Ave SWn Spokane, Washington (WA),99217 0.999274625
Beneficiary Name Laura Mcdaniel 0.999972464
Policy Holder Name Keith Holt 0.999781546
Policy Number (#892877136) 0.999950143
Payout $15,000 0.999980728
Sender Angela Berry 0.999723455
Required Action We are requesting that you forward the full policy amount of Please forward ann acknowledgement of our demand and please forward the umbrella policy information if one isn applicable. Please send my secretary any information regarding liens on his policy. 0.999989449

Expand the solution

You can choose from a myriad of possibilities for what to do with the detected entities, such as the following:

  • Ingest them into a backend system of record
  • Create a searchable index based on the extracted entities
  • Enrich machine learning and analytics using extracted entity values as parameters for model training and inference
  • Configure back-office flows and triggers based on detected entity value (such as specific law firms or payout values)

The following diagram depicts these options:

Conclusion

Complex document types can often be impediments to full-scale IDP automation. In this post, we demonstrated how you can build and use custom NER models directly from PDF documents. This method is especially powerful for instances where positional information is especially pertinent (similar entity values and varied document formats). Although we demonstrated this solution by using legal requisition letters in insurance, you can extrapolate this use case across healthcare, manufacturing, retail, financial services, and many other industries.

To learn more about Amazon Comprehend, visit the Amazon Comprehend Developer Guide.


About the Authors

Raj Pathak is a Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Document Extraction, Contact Center Transformation and Computer Vision.

Enzo Staton is a Solutions Architect with a passion for working with companies to increase their cloud knowledge. He works closely as a trusted advisor and industry specialists with customers around the country.

Read More

Implement MLOps using AWS pre-trained AI Services with AWS Organizations

The AWS Machine Learning Operations (MLOps) framework is an iterative and repetitive process for evolving AI models over time. Like DevOps, practitioners gain efficiencies promoting their artifacts through various environments (such as quality assurance, integration, and production) for quality control. In parallel, customers rapidly adopt multi-account strategies through AWS Organizations and AWS Control Tower to create secure, isolated environments. This combination can introduce challenges for implementing MLOps with AWS pre-trained AI Services, such as Amazon Rekognition Custom Labels. This post discusses design patterns for reducing that complexity while still maintaining security best practices.

Overview

Customers across every industry vertical recognize the value of operationalizing machine learning (ML) efficiently and reducing the time to deliver business value. Most AWS pre-trained AI Services address this situation through out-of-the-box capabilities for computer vision, translation, and fraud detection, among other common use cases. Many use cases require domain-specific predictions that go beyond generic answers. The AI Services can fine-tune the predictive model results using customer-labeled data for those scenarios.

Over time, the domain-specific vocabulary changes and evolves. For example, suppose a tool manufacturer creates a computer vision model to detect its products in images (such as hammers and screwdrivers). In a future release, the business adds support for wrenches and saws. These new labels necessitate code changes on the manufacturer’s websites and custom applications. Now, there are dependencies that both artifacts must release simultaneously.

The AWS MLOps framework addresses these release challenges through iterative and repetitive processes. Before reaching production end-users, the model artifacts must traverse various quality gates like application code. You typically implement those quality gates using multiple AWS accounts within an AWS organization. This approach gives the flexibility to centrally manage these application domains and enforce guardrails and business requirements. It’s becoming increasingly common to have tens or even hundreds of accounts within your organization. However, you must balance your workload isolation needs against the team size and complexity.

MLOps practitioners have standard procedures for promoting artifacts between accounts (such as QA to production). These patterns are straightforward to implement, relying on copying code and binary resources between Amazon Simple Storage Service (Amazon S3) buckets. However, AWS pre-trained AI Services don’t currently support copying the trained custom model across AWS accounts. Until such a mechanism exists, you need to retrain the models in each AWS account using the same dataset. This approach involves time and cost for retraining the model in a new account. This mechanism can be a viable option for some customers. However in this post, we demonstrate the means to define and evolve these custom models centrally while securely sharing them across an AWS organization’s accounts.

Solution overview

This post discusses design patterns for securely sharing AWS pre-trained AI Service domain-specific models. These services include Amazon Fraud Detector, Amazon Transcribe, and Amazon Rekognition, to name a few. Although these strategies are broadly applicable, we focus on Rekognition Custom Labels as a concrete example. We intentionally avoid diving too deep into Rekognition Custom Labels-specific nuances.

The architecture begins with a configured AWS Control Tower in the management account. AWS Control Tower provides the easiest way to set up and govern a secure, multi-account AWS environment. As shown in the following diagram, we use Account Factory in AWS Control Tower to create five AWS accounts:

  • CI/CD account for deployment orchestration (for example, with AWS CodeStar)
  • Production account for external end-users (for example, a public website)
  • Quality assurance account for internal development teams (such as preproduction)
  • ML account for custom models and supporting systems
  • AWS Lake House account holding proprietary customer data

This configuration might be too granular or coarse, depending on your regulatory requirements, industry, and size. Refer to Managing the multi-account environment using AWS Organizations and AWS Control Tower for more guidance.

Example AWS Organizations account configuration

Create the Rekognition Custom Labels model

The first step to creating a Rekognition Custom Labels model is choosing the AWS account to host it. You might begin your ML journey using a single ML account. This approach consolidates any tooling and procedures into one place. However, this centralization can cause bloat in the individual account and lead to monolithic environments. More mature enterprises segment this ML account by team or workload. Regardless of the granularity, the object is the same to define centrally and train models once.

This post demonstrates using a Rekognition Custom Labels model with a single ML account and a separate data lake account (see the following diagram). When the data resides in a different account, you must configure a resource policy to provide cross-account access to the S3 bucket objects. This procedure securely shares the bucket’s contents with the ML account. See the quick start samples for more information on creating an Amazon Rekognition domain-specific model.

Sharing data across accounts

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AWSRekognitionS3AclBucketRead20191011",
            "Effect": "Allow",
            "Principal": {
                "Service": "rekognition.amazonaws.com"
            },
            "Action": [
                "s3:GetBucketAcl",
                "s3:GetBucketLocation"
            ],
            "Resource": "arn:aws:s3:::S3:"
        },
        {
            "Sid": "AWSRekognitionS3GetBucket20191011",
            "Effect": "Allow",
            "Principal": {
                "Service": "rekognition.amazonaws.com"
            },
            "Action": [
                "s3:GetObject",
                "s3:GetObjectAcl",
                "s3:GetObjectVersion",
                "s3:GetObjectTagging"
            ],
            "Resource": "arn:aws:s3:::S3:/*"
        },
        {
            "Sid": "AWSRekognitionS3ACLBucketWrite20191011",
            "Effect": "Allow",
            "Principal": {
                "Service": "rekognition.amazonaws.com"
            },
            "Action": "s3:GetBucketAcl",
            "Resource": "arn:aws:s3:::S3:"
        },
        {
            "Sid": "AWSRekognitionS3PutObject20191011",
            "Effect": "Allow",
            "Principal": {
                "Service": "rekognition.amazonaws.com"
            },
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::S3:/*",
            "Condition": {
                "StringEquals": {
                    "s3:x-amz-acl": "bucket-owner-full-control"
                }
            }
        }
    ]
}

Enable cross-account access

After you build and deploy the model, the endpoint is only available within the ML account. Do not use a static key to share access. You must delegate access to the production (or QA) account using AWS Identity and Access Management (IAM) roles. To create a cross-account role in the ML account, complete the following steps:

  1. On the Rekognition Custom Labels console, choose Projects and choose your project name.
  2. Choose Models and your model name.
  3. On the Use model tab, scroll down to the Use your model section.
  4. Copy the model Amazon Resource Name (ARN). It should be formatted as follows: arn:aws:rekognition:region-name:account-id:project/model-name/version/version-id/timestamp.
  5. Create a role with rekognition:DetectCustomLabels permissions to the model ARN and a trust policy allowing sts:AssumeRole from the production (or QA) account (for example, arn:aws:iam::PROD_ACCOUNT_ID_HERE:root).
  6. Optionally, attach additional policies for any workload-specific actions (such as accessing S3 buckets).
  7. Optionally, configure the condition element to enforce additional delegation requirements.
  8. Record the new role’s ARN to use in the next section.

Invoke the endpoint

With the security policies in place, it’s time to test the configuration. A simple approach involves creating an Amazon Elastic Compute Cloud (Amazon EC2) instance and using the AWS Command Line Interface (AWS CLI). Invoke the endpoint with the following steps:

  1. In the production (or QA) account, create a role for Amazon EC2.
  2. Attach a policy allowing sts:AssumeRole to the ML account’s cross-role ARN.
  3. Launch an Amazon Linux 2 instance with the role from the previous step.
  4. Wait for it to provision, then connect to the Linux instance using SSH.
  5. Invoke the command aws iam assume-role to switch to the cross-account role from the previous section.
  6. Start the model endpoint, if not already running, using the Rekognition console or the start-project-version AWS CLI command.
  7. Invoke the command aws rekognition detect-custom-labels to test the operation.

You can also perform this test using the AWS SDK and another compute resource (for example, AWS Lambda).

Avoiding the public internet

In the previous section, the detect-custom-labels request uses the virtual private cloud’s (VPC) internet gateway and traverses the public internet. TLS/SSL encryption sufficiently secures the communication channel for many workloads. You can use AWS PrivateLink to enable connections between the VPC and supporting services without requiring an internet gateway, NAT device, VPN connection, transit gateway, or AWS Direct Connect connection. Then, the detect-custom-labels request never leaves the AWS network exposed to the public internet. AWS PrivateLink supports all services used within this post. You can also enforce pre-trained AI Services using private connectivity with IAM in the cross-role policy. This control adds another level of protection that prevents misconfigured clients from using the pre-trained AI Service’s internet-facing endpoint. For additional information, see Using Amazon Rekognition with Amazon VPC endpoints, AWS PrivateLink for Amazon S3, and Using AWS STS interface VPC endpoints.

The following diagram illustrates the VPC endpoint configuration between the production account, ML account, and QA account.

Using VPC Endpoints across accounts

Build a CI/CD pipeline for promoting models

AWS recommends continuously providing more training and test data to custom label the Amazon Rekognition project dataset to improve models. After you add more data to a project, a new model can enhance accuracy or alter labels.

In MLOps, model artifacts must be consistent. To accomplish this with pre-trained AI Services, AWS recommends promoting the model endpoint by updating the code’s reference to the new model version’s ARN. This approach avoids retraining the domain-specific model in each environment (such as QA and production accounts). Your applications can use the new model’s ARN as a runtime variable using AWS Systems Manager within a multi-account or multi-stage environment.

Three granularity levels limit access to the cross-account model, specifically at the account, project, and model version level. Models are idempotent and have a unique ARN that maps to specific point-in-time training: arn:aws:rekognition:account:region:project/project_name/version/name/timestamp.

The following diagram illustrates the model rotation from QA to production.

Promoting the model version

In the preceding architecture, the production and QA applications make API calls to use the v2 or v3 model endpoints through their respective VPC endpoints. They receive the ARN from its configuration store (for example, Amazon Systems Manager Parameter Store or AWS AppConfig). This process works with n number of environments, but we demonstrate only using two accounts for simplicity. Optionally, removing the superseded model versions prevents additional consumption of those resources.

The ML account has an IAM role for each environment-specific (such as the Production account) that requires access. The CI/CD pipeline as part of the deployment alters the inline policy of the IAM role to allow for access to the appropriate model.

Consider the scenario of promoting Model-v2 from the QA account to the production account. This process requires the following steps:

  1. On the Rekognition Custom Labels console, transition the Model-v2 endpoint into a running state.
  2. Grant the IAM cross-account role in the ML account access to the new version of Model-v2.

Note that the resource element supports wildcards in the ARN.

  1. Send a test invocation to Model-v2 from the production application using the delegation role.
  2. Optionally, remove the cross-account role’s access to Model-v1.
  3. Optionally, repeat steps 2–3 for each additional AWS account.
  4. Optionally, stop the Model-v1 endpoint to avoid incurring costs.

Global policy propagation from the IAM control plane to the IAM data plane in every Region is an eventually consistent operation. This design can create slight delays for multi-Regional configurations.

Create guardrails through service control policies

Using cross-account roles creates a secure mechanism for sharing pre-trained managed AI resources. But what happens when that role’s policy is too permissive? You can mitigate these risks by using service control policies (SCPs) to set permission guardrails across accounts. Guardrails specify the maximum permissions available for an IAM identity. These capabilities can prevent a model consumer account from, for example, stopping the shared Amazon Rekognition endpoint. After defining appropriate guardrail requirements, organizational units within Organizations allow centrally managing those policies across multiple accounts.

{    
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyModifyingRekgnotionProjects",
      "Effect": "Deny",
      "Action": [
        "rekognition:CreateProject*",
        "rekognition:DeleteProject*",
        "rekognition:StartProject*",
        "rekognition:StopProject*",
      ],
      "Resource": [
        “arn:aws:rekognition:*:*:project/*
      ]
    }
  ]
}

You can also configure detective controls to monitor their configuration and make sure it doesn’t drift out of compliance. AWS IAM Access Analyzer supports assessing policies across the organization and reporting unused permissions. Additionally, AWS Config enables assessing, auditing, and evaluating configurations of AWS resources. This capability supports standard security and compliance requirements, such as verifying and remediating the S3 bucket’s encryption settings.

Conclusion

You need out-of-the-box solutions to add ML capabilities like computer vision, translation, and fraud detection. You also need security boundaries that isolate your different environments for quality control, compliance, and regulatory purposes. AWS pre-trained AI services and AWS Control Tower deliver that functionality in a manner that is easily accessible and secure.

AWS pre-trained AI services don’t currently support copying the trained custom model across AWS accounts. Until such a mechanism exists, you need to retrain the models in each AWS account using the same dataset. This post demonstrates an alternative design approach using IAM cross-account policies to share model endpoints while maintaining robust security control. Furthermore, you can stop paying for redundant training jobs! For more information on cross-account policies, see IAM tutorial: Delegate access across AWS accounts using IAM roles.


About the Authors

Nate Bachmeier is an AWS Senior Solutions Architect that nomadically explores New York, one cloud integration at a time. He specializes in migrating and modernizing customers’ workloads. Besides this, Nate is a full-time student and has two kids.

Mario Bourgoin is a Senior Partner Solutions Architect for AWS, an AI/ML specialist, and the global tech lead for MLOps.  He works with enterprise customers and partners deploying AI solutions in the cloud.  He has more than 30 years experience doing machine learning and AI at startups and in enterprises, starting with creating one of the first commercial machine learning systems for big data.  Mario spends the balance of his time playing with his three Belgian Tervurens, cooking dinners for his family, and learning about mathematics and cosmology.

Tim Murphy is a Senior Solutions Architect for AWS, working with enterprise customers in various industries to build business based solutions in the cloud. He has spent the last decade working with startups, non-profits, commercial enterprise, and government agencies, deploying infrastructure at scale. In his spare time when he isn’t tinkering with technology, you’ll most likely find him in far flung areas of the earth hiking mountains, surfing waves, or biking through a new city.

Read More