Build a risk management machine learning workflow on Amazon SageMaker with no code

Since the global financial crisis, risk management has taken a major role in shaping decision-making for banks, including predicting loan status for potential customers. This is often a data-intensive exercise that requires machine learning (ML). However, not all organizations have the data science resources and expertise to build a risk management ML workflow.

Amazon SageMaker is a fully managed ML platform that allows data engineers and business analysts to quickly and easily build, train, and deploy ML models. Data engineers and business analysts can collaborate using the no-code/low-code capabilities of SageMaker. Data engineers can use Amazon SageMaker Data Wrangler to quickly aggregate and prepare data for model building without writing code. Then business analysts can use the visual point-and-click interface of Amazon SageMaker Canvas to generate accurate ML predictions on their own.

In this post, we show how simple it is for data engineers and business analysts to collaborate to build an ML workflow involving data preparation, model building, and inference without writing code.

Solution overview

Although ML development is a complex and iterative process, you can generalize an ML workflow into the data preparation, model development, and model deployment stages.

Data Wrangler and Canvas abstract the complexities of data preparation and model development, so you can focus on delivering value to your business by drawing insights from your data without being an expert in code development. The following architecture diagram highlights the components in a no-code/low-code solution.

Amazon Simple Storage Service (Amazon S3) acts as our data repository for raw data, engineered data, and model artifacts. You can also choose to import data from Amazon Redshift, Amazon Athena, Databricks, and Snowflake.

As data scientists, we then use Data Wrangler for exploratory data analysis and feature engineering. Although Canvas can run feature engineering tasks, feature engineering usually requires some statistical and domain knowledge to enrich a dataset into the right form for model development. Therefore, we give this responsibility to data engineers so they can transform data without writing code with Data Wrangler.

After data preparation, we pass model building responsibilities to data analysts, who can use Canvas to train a model without having to write any code.

Finally, we make single and batch predictions directly within Canvas from the resulting model without having to deploy model endpoints ourselves.

Dataset overview

We use SageMaker features to predict the status of a loan using a modified version of Lending Club’s publicly available loan analysis dataset. The dataset contains loan data for loans issued through 2007–2011. The columns describing the loan and the borrower are our features. The column loan_status is the target variable, which is what we’re trying to predict.

To demonstrate in Data Wrangler, we split the dataset in two CSV files: part one and part two. We’ve removed some columns from Lending Club’s original dataset to simplify the demo. Our dataset contains over 37,000 rows and 21 feature columns, as described in the following table.

Column name Description
loan_status Current status of the loan (target variable).
loan_amount The listed amount of the loan applied for by the borrower. If the credit department reduces the loan amount, it’s reflected in this value.
funded_amount_by_investors The total amount committed by investors for that loan at that time.
term The number of payments on the loan. Values are in months and can be either 36 or 60.
interest_rate Interest rate on the loan.
installment The monthly payment owed by the borrower if the loan originates.
grade LC assigned loan grade.
sub_grade LC assigned loan subgrade.
employment_length Employment length in years. Possible values are between 0–10, where 0 means less than one year and 10 means ten or more years.
home_ownership The home ownership status provided by the borrower during registration. Our values are RENT, OWN, MORTGAGE, and OTHER.
annual_income The self-reported annual income provided by the borrower during registration.
verification_status Indicates if income was verified or not by the LC.
issued_amount The month at which the loan was funded.
purpose A category provided by the borrower for the loan request.
dti A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.
earliest_credit_line The month the borrower’s earliest reported credit line was opened.
inquiries_last_6_months The number of inquiries in the past 6 months (excluding auto and mortgage inquiries).
open_credit_lines The number of open credit lines in the borrower’s credit file.
derogatory_public_records The number of derogatory public records.
revolving_line_utilization_rate Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.
total_credit_lines The total number of credit lines currently in the borrower’s credit file.

We use this dataset for our data preparation and model training.

Prerequisites

Complete the following prerequisite steps:

  1. Upload both loan files to an S3 bucket of your choice.
  2. Make sure you have the necessary permissions. For more information, refer to Get Started with Data Wrangler.
  3. Set up a SageMaker domain configured to use Data Wrangler. For instructions, refer to Onboard to Amazon SageMaker Domain.

Import the data

Create a new Data Wrangler data flow from the Amazon SageMaker Studio UI.

Import data from Amazon S3 by selecting the CSV files from the S3 bucket where you placed your dataset. After you import both files, you can see two separate workflows in the Data flow view.

You can choose several sampling options when importing your data in a Data Wrangler flow. Sampling can help when you have a dataset that is too large to prepare interactively, or when you want to preserve the proportion of rare events in your sampled dataset. Because our dataset is small, we don’t use sampling.

Prepare the data

For our use case, we have two datasets with a common column: id. As a first step in data preparation, we want to combine these files by joining them. For instructions, refer to Transform Data.

We use the Join data transformation step and use the Inner join type on the id column.

As a result of our join transformation, Data Wrangler creates two additional columns: id_0 and id_1. However, these columns are unnecessary for our model building purposes. We drop these redundant columns using the Manage columns transform step.


We’ve imported our datasets, joined them, and removed unnecessary columns. We’re now ready to enrich our data through feature engineering and prepare for model building.

Perform feature engineering

We used Data Wrangler for preparing data. You can also use the Data Quality and Insights Report feature within Data Wrangler to verify your data quality and detect abnormalities in your data. Data scientists often need to use these data insights to efficiently apply the right domain knowledge to engineering features. For this post, we assume we’ve completed these quality assessments and can move on to feature engineering.

In this step, we apply a few transformations to numeric, categorical, and text columns.

We first normalize the interest rate to scale the values between 0–1. We do this using the Process numeric transform to scale the interest_rate column using a min-max scaler. The purpose for normalization (or standardization) is to eliminate bias from our model. Variables that are measured at different scales won’t contribute equally to the model learning process. Therefore, a transformation function like a min-max scaler transform helps normalize features.

To convert a categorial variable into a numeric value, we use one-hot encoding. We choose the Encode categorical transform, then choose One-hot encode. One-hot encoding improves an ML model’s predictive ability. This process converts a categorical value into a new feature by assigning a binary value of 1 or 0 to the feature. As a simple example, if you had one column that held either a value of yes or no, one-hot encoding would convert that column to two columns: a Yes column and a No column. A yes value would have 1 in the Yes column and a 0 in the No column. One-hot encoding makes our data more useful because numeric values can more easily determine a probability for our predictions.

Finally, we featurize the employer_title column to transform its string values into a numerical vector. We apply the Count Vectorizer and a standard tokenizer within the Vectorize transform. Tokenization breaks down a sentence or series of text into words, whereas a vectorizer converts text data into a machine-readable form. These words are represented as vectors.

With all feature engineering steps complete, we can export the data and output the results into our S3 bucket. Alternatively, you can export your flow as Python code, or a Jupyter notebook to create a pipeline with your view using Amazon SageMaker Pipelines. Consider this when you want to run your feature engineering steps at scale or as part of an ML pipeline.

We can now use the Data Wrangler output file as our input for Canvas. We reference this as a dataset in Canvas to build our ML model.

In our case, we exported our prepared dataset to the default Studio bucket with an output prefix. We reference this dataset location when loading the data into Canvas for model building next.

Build and train your ML model with Canvas

On the SageMaker console, launch the Canvas application. To build an ML model from the prepared data in the previous section, we perform the following steps:

  1. Import the prepared dataset to Canvas from the S3 bucket.

We reference the same S3 path where we exported the Data Wrangler results from the previous section.

  1. Create new model in Canvas and name it loan_prediction_model.
  2. Select the imported dataset and add it to the model object.

To have Canvas build a model, we must select the target column.

  1. Because our goal is to predict the probability of a lender’s ability to repay a loan, we choose the loan_status column.

Canvas automatically identifies the type of ML problem statement. At the time of writing, Canvas supports regression, classification, and time series forecasting problems. You can specify the type of problem or have Canvas automatically infer the problem from your data.

  1. Choose your option to start the model building process: Quick build or Standard build.

The Quick build option uses your dataset to train a model within 2–15 minutes. This is useful when you’re experimenting with a new dataset to determine if the dataset you have will be sufficient to make predictions. We use this option for this post.

The Standard build option choses accuracy over speed and uses approximately 250 model candidates to train the model. The process usually takes 1–2 hours.

After the model is built, you can review the results of the model. Canvas estimates that your model is able to predict the right outcome 82.9% of the time. Your own results may vary due to the variability in training models.

In addition, you can dive deep into details analysis of the model to learn more about the model.

Feature importance represents the estimated importance of each feature in predicting the target column. In this case, the credit line column has the most significant impact in predicting if a customer will pay back the loan amount, followed by interest rate and annual income.

The confusion matrix in the Advanced metrics section contains information for users that want a deeper understanding of their model performance.

Before you can deploy your model for production workloads, use Canvas to test the model. Canvas manages our model endpoint and allows us to make predictions directly in the Canvas user interface.

  1. Choose Predict and review the findings on either the Batch prediction or Single prediction tab.

In the following example, we make a single prediction by modifying values to predict our target variable loan_status in real time

We can also select a larger dataset and have Canvas generate batch predictions on our behalf.

Conclusion

End-to-end machine learning is complex and iterative, and often involves multiple personas, technologies, and processes. Data Wrangler and Canvas enable collaboration between teams without requiring these teams to write any code.

A data engineer can easily prepare data using Data Wrangler without writing any code and pass the prepared dataset to a business analyst. A business analyst can then easily build accurate ML models with just a few click using Canvas and get accurate predictions in real time or in batch.

Get started with Data Wrangler using these tools without having to manage any infrastructure. You can set up Canvas quickly and immediately start creating ML models to support your business needs.


About the Authors

Peter Chung is a Solutions Architect for AWS, and is passionate about helping customers uncover insights from their data. He has been building solutions to help organizations make data-driven decisions in both the public and private sectors. He holds all AWS certifications as well as two GCP certifications.

 Meenakshisundaram Thandavarayan is a Senior AI/ML specialist with AWS. He helps hi-tech strategic accounts on their AI and ML journey. He is very passionate about data-driven AI.

Dan Ferguson is a Solutions Architect at AWS, based in New York, USA. As a machine learning services expert, Dan works to support customers on their journey to integrating ML workflows efficiently, effectively, and sustainably.

Read More

Use Amazon Lex to capture street addresses

Amazon Lex provides automatic speech recognition (ASR) and natural language understanding (NLU) technologies to transcribe user input, identify the nature of their request, and efficiently manage conversations. Lex lets you create sophisticated conversations, streamline your user experience to improve customer satisfaction (CSAT) scores, and increase containment in your contact centers.

Natural, effective customer interactions require that the Lex virtual agent accurately interprets the information provided by the customer. One scenario that can be particularly challenging is capturing a street address during a call. For example, consider a customer who has recently moved to a new city and calls in to update their street address for their wireless account. Even a single United States zip code can contain a wide range of street names. Getting the right address over the phone can be difficult, even for human agents.

In this post, we’ll demonstrate how you can use Amazon Lex and the Amazon Location Service to provide an effective user experience for capturing their address via voice or text.

Solution overview

For this example, we’ll use an Amazon Lex bot that provides self-service capabilities as part of an Amazon Connect contact flow. When the user calls in on their phone, they can ask to change their address, and the bot will ask them for their customer number and their new address. In many cases, the new address will be captured correctly in the first try. For more challenging addresses, the bot may ask them to restate their street name, spell their street name, or repeat their zip code or address number to capture the correct address.

Here’s a sample user interaction to model our Lex bot:

IVR: Hi, welcome to ACME bank customer service. How can I help? You can check account balances, order checks, or change your address.

User: I want to change my address.

IVR: Can you please tell me your customer number?

User: 123456.

IVR: Thanks. Please tell me your new zip code.

User: 32312.

IVR: OK, what’s your new street address?

User: 6800 Thomasville Road, Suite 1-oh-1.

IVR: Thank you. To make sure I get it right, can you tell me just the name of your street?

User: Thomasville Road.

IVR: OK, your new address is 6800 Thomasville Road, Suite 101, Tallahassee Florida 32312, USA. Is that right?

User: Yes.

IVR: OK, your address has been updated. Is there anything else I can help with?

User: No thanks.

IVR: Thank you for reaching out. Have a great day!

As an alternative approach, you can capture the whole address in a single turn, rather than asking for the zip code first:

IVR: Hi, welcome to ACME bank customer service. How can I help? You can check account balances, order checks, or change your address.

User: I want to update my address.

IVR: Can you please tell me your customer number?

User: 123456.

IVR: Thanks. Please tell me your new address, including the street, city, state, and zip code.

User: 6800 Thomasville Road, Suite 1-oh-1, Tallahassee Florida, 32312.

IVR: Thank you. To make sure I get it right, can you tell me just the name of your street?

User: Thomasville Road.

IVR: OK, your new address is 6800 Thomasville Road, Suite 101, Tallahassee Florida 32312, US. Is that right?

User: Yes.

IVR: OK, your address has been updated. Is there anything else I can help with?

User: No thanks.

IVR: Thank you for reaching out. Have a great day!

Solution architecture

We’ll use an Amazon Lex bot integrated with Amazon Connect in this solution. When the user calls in and provides their new address, Lex uses automatic speech recognition to transcribe their speech to text. Then, it uses an AWS Lambda fulfillment function to send the transcribed text to Amazon Location Service, which performs address lookup and returns a normalized address.

As part of the AWS CloudFormation stack, you can also create an optional Amazon CloudWatch Logs log group for capturing Lex conversation logs, which can be used to create a conversation analytics dashboard to visualize the results (see the post Building a business intelligence dashboard for your Amazon Lex bots for one way to do this).

How it works

This solution combines several techniques to create an effective user experience, including:

  • Amazon Lex automatic speech recognition technology to convert speech to text.
  • Integration with Amazon Location Service for address lookup and normalization.
  • Lex spelling styles, to implement a “say-spell” approach when voice inputs are not clear (for example, ask the user to say their street name, and then if necessary, to spell it).

The first step is to make sure that the required slots have been captured.

In the first code section that follows, we prompt the user for their zip code and street address using the Lex ElicitSlot dialog action. The elicit_slot_with_retries() function prompts the user based on a set of configurable prompts.

 
    # check for ZipCode code slot; if not available; elicit it
    zip_code = None
    zipCode = slot_values.get('ZipCode', None)
    if zipCode is not None:
        zip_code = zipCode['value'].get('interpretedValue', None)
    else:
        response = helpers.elicit_slot_with_retries( intent, activeContexts, sessionAttributes, 'ZipCode', requestAttributes)
        return response		
    # check for StreetAddress slot
    street_address = None
    streetAddress = slot_values.get('StreetAddress', None)
    if streetAddress is not None:
        street_address = streetAddress['value'].get('interpretedValue', None)
    else:
        # give the caller extra time for this response
        sessionAttributes['x-amz-lex:audio:end-timeout-ms:' + intent_name + ':StreetAddress'] = 2000
        response = helpers.elicit_slot_with_retries( intent, activeContexts, sessionAttributes, 'StreetAddress', requestAttributes)
        return response
    street_address = parse_address.parse(street_address)
    sessionAttributes['inputAddress'] = street_address

The last section of code above uses a helper function parse_address.parse() that converts spoken numbers into digits (for example, it converts “sixty eight hundred” to “6800”).

Then, we send the user’s utterance to Amazon Location Service and inspect the response. We discard any entries that don’t have a street, a street number, or have an incorrect zip code. In cases where we have to re-prompt for a street name or number, we also discard any previously suggested addresses.

# validate the address using the AWS Location Service
    location_response = locationClient.search_place_index_for_text(IndexName='explore.place', Text=street_address)
    # inspect the response from Amazon Location Service
    if location_response.get('Results', None) is not None:
        for address in location_response['Results']:
            if address.get('Place', None) is not None:
                addressLabel = address['Place'].get('Label', None)
                addressNumber = address['Place'].get('AddressNumber', None)
                street = address['Place'].get('Street', None)
                postalCode = address['Place'].get('PostalCode', None)
                if street is None:
                    continue                    
                if addressNumber is None:
                    continue                    
                if zip_code is not None:
                    if postalCode[:len(zip_code)] != zip_code:
                        continue
                already_tried = False
                prior_suggestions = helpers.get_all_values('suggested_address', sessionAttributes)
                for prior_suggestion in prior_suggestions:
                    if addressLabel == prior_suggestion:
                        already_tried = True
                        break                    
                if already_tried:
                    continue
                # the first entry with a valid street that was not already tried is the next best guess
                resolvedAddress = addressLabel
                break

Once we have a resolved address, we confirm it with the user.

if (event.get('inputMode') == 'Speech'):
        response_string = '<speak>OK, your new address is <say-as interpret-as="address">'
        response_string += resolvedAddress + '</say-as>. Is that right?</speak>'
        response_message = helpers.format_message_array(response_string, 'SSML')
    else:
       response_string = 'OK, your new address is ' + resolvedAddress + '. Is that right?'
        response_message = helpers.format_message_array(response_string, 'PlainText')
    intent['state'] = 'Fulfilled'
    response = helpers.confirm(intent, activeContexts, sessionAttributes, response_message, requestAttributes)
    return response

If we don’t get a resolved address back from the Amazon Location Service, or if the user says the address that we suggested wasn’t right, then we re-prompt for some additional information, and try again. The additional information slots include:

  • StreetName: slot type AMAZON.StreetName
  • SpelledStreetName: slot type AMAZON.AlphaNumeric (using Amazon Lex spelling styles)
  • StreetAddressNumber: slot type AMAZON.Number

The logic to re-prompt is controlled by the next_retry() function, which consults a list of actions to try:

RETRY_ACTIONS = [
    { "street_name": {
          "method": elicit_street_name,
          "style": None,
          "no-match": "Thank you. To make sure I get it right, can you tell me just the name of your street?",
          "incorrect": "Let's try again. Can you tell me just the name of your street?"
       }
    },
    { "street_name_spelled_by_letter": {
          "method": elicit_spelled_street, 
          "style": "SpellByLetter",
          "no-match": "Let's try a different way. Can you please spell just the name of your street?",
          "incorrect": "Let's try a different way. Can you please spell just the name of your street?"
       }
    },
    { "street_address_number": {
          "method": elicit_street_address_number, 
          "style": None,
          "no-match": "I didn't find a matching address. Can you please tell me your street address number?",
          "incorrect": "OK, let's try your street address number. Can you tell me that once more?"
       }
    },
    { "street_name_spelled_by_word": {
          "method": elicit_spelled_street, 
          "style": "SpellByWord",
          "no-match": "Let's try one last time. Please spell the name of your street. You can use words for letters, such as a as in apple, or b like bob.",
          "incorrect": "Let's try one last time. Please spell the name of your street. You can use words for letters, such as a as in apple, or b like bob."
       }
    },
    { "agent": {
          "method": route_to_agent, 
          "style": None,
          "no-match": "Sorry, I was not able to find a match for your address. Let me get you to an agent.",
          "incorrect": "Sorry, I was not able to find a match for your address. Let me get you to an agent."
       }
    }
]

The next_retry() function will try these actions in order. You can modify the sequence of prompts by changing the order in the RETRY_ACTIONS list. You can also configure different prompts for scenarios where Amazon Location Service doesn’t find a match, versus when the user says that the suggested address wasn’t correct. As you can see, we may ask the user to restate their street name, and failing that, to spell it using Amazon Lex spelling styles. We refer to this as a “say-spell” approach, and it’s similar to how a human agent would interact with a customer in this scenario.

To see this in action, you can deploy it in your AWS account.

Prerequisites

You can use the CloudFormation link that follows to deploy the solution in your own AWS account. Before deploying this solution, you should confirm that you have the following prerequisites:

  • An available AWS account where you can deploy the solution.
  • Access to the following AWS services:
    • Amazon Lex
    • AWS Lambda, for integration with Amazon Location Service
    • Amazon Location Service, for address lookup
    • AWS Identity and Access Management (IAM), for creating the necessary policies and roles
    • CloudWatch Logs, to create log groups for the Lambda function and optionally for capturing Lex conversation logs
    • CloudFormation to create the stack
  • An Amazon Connect instance (for instructions on setting one up, see Create an Amazon Connect instance).

The following AWS Regions support Amazon Lex, Amazon Connect, and Amazon Location Service: US East (N. Virginia), US West (Oregon), Europe (Frankfurt), Asia Pacific (Singapore), Asia Pacific (Sydney) Region, and Asia Pacific (Tokyo).

Deploying the sample solution

Sign in to the AWS Management Console in your AWS account, and select the following link to deploy the sample solution:

This will create a new CloudFormation stack.

Enter a Stack name, such as lex-update-address-example. Enter the ARN (Amazon Resource Name) for the Amazon Connect instance that you’ll use for testing the solution. You can keep the default values for the other parameters, or change them to suit your needs. Choose Next, and add any tags that you may want for your stack (optional). Choose Next again, review the stack details, select the checkbox to acknowledge that IAM resources will be created, and then choose Create stack.

After a few minutes, your stack will be complete, and include the following resources:

  • A Lex bot, including a published version with an alias (Development-Alias)
  • A Lambda fulfillment function for the bot (BotHandler)
  • A CloudWatch Logs log group for Lex conversation logs
  • Required Amazon IAM roles
  • A custom resource that adds a sample contact flow to your Connect instance

At this point, you can try the example interaction above in the Lex V2 console. You should see the sample bot with the name that you specified in the CloudFormation template (e.g., update-address-bot).

Choose this bot, choose Bot versions in the left-side navigation panel, choose the Version 1 version, and then choose Intents in the left-side panel. You’ll see the list of intents, as well as a Test button.

To test, select the Test button, select Development-Alias, and then select Confirm to open the test window.

Try “I want to change my address” to get started. This will use the UpdateAddressZipFirst intent to capture an address, starting by asking for the zip code, and then asking for the street address.

You can also say “I want to update my address” to try the UpdateAddress intent, which captures an address all at once with a single utterance.

Testing with Amazon Connect

Now let’s try this with voice using a Connect instance. A sample contact flow was already configured in your Connect instance:

All you need to do is set up a phone number, and associate it with this contact flow. To do this, follow these steps:

  • Launch Amazon Connect in the AWS Console.
  • Open your Connect instance by selecting the Access URL, and logging in to the instance.
  • In Dashboard, select View phone numbers.
  • Select Claim a number, choose a country from the Country drop-down, and choose a number.
  • Enter a Description, such as “Example flow to update an address with Amazon Lex”, and select the contact flow that you just created.
  • Choose Save.

Now you’re ready to call in to your Connect instance to test your bot using voice. Just dial the number on your phone, and try some US addresses. To try the zip code first approach, say “change my address”. To try the change address in one turn approach, say “update my address”. You can also just say, “my new address is”, followed by a valid US address.

But wait… there’s more

Another challenging use case for voice scenarios is capturing a user’s email address. This is often needed for user verification purposes, or simply to let the user change their email address on file. Lex has built-in support for email addresses using the AMAZON.EmailAddress built-in slot type, which also supports Lex spelling styles.

Using a “say-spell” approach for capturing email addresses can be very effective, and since the approach is similar to the user experience in the street address capture scenarios that we described above, we’ve included it here. Give it a try!

Clean up

You may want to clean up the resources created as part of the CloudFormation template when you’re done using the bot to avoid incurring ongoing charges. To do this, delete the CloudFormation Stack.

Conclusion

Amazon Lex offers powerful automated speech recognition and natural language understanding capabilities that can be used to capture the information needed from your users to provide automated, self-service functionality. Capturing a customer’s address via speech recognition can be challenging due to the range of names for streets, cities, and towns. However, you can easily integrate Amazon Lex with the Amazon Location Service to look up the correct address, based on the customer’s input. You can incorporate this technique in your own Lex conversation flows.


About the Author

Brian Yost is a Senior Technical Program manager on the AWS Lex team. In his spare time, he enjoys mountain biking, home brewing, and tinkering with technology.

Read More

Customize pronunciation using lexicons in Amazon Polly

Amazon Polly is a text-to-speech service that uses advanced deep learning technologies to synthesize natural-sounding human speech. It is used in a variety of use cases, such as contact center systems, delivering conversational user experiences with human-like voices for automated real-time status check, automated account and billing inquiries, and by news agencies like The Washington Post to allow readers to listen to news articles.

As of today, Amazon Polly provides over 60 voices in 30+ language variants. Amazon Polly also uses context to pronounce certain words differently based upon the verb tense and other contextual information. For example, “read” in “I read a book” (present tense) and “I will read a book” (future tense) is pronounced differently.

However, in some situations you may want to customize the way Amazon Polly pronounces a word. For example, you may need to match the pronunciation with local dialect or vernacular. Names of things (e.g., Tomato can be pronounced as tom-ah-to or tom-ay-to), people, streets, or places are often pronounced in many different ways.

In this post, we demonstrate how you can leverage lexicons for creating custom pronunciations. You can apply lexicons for use cases such as publishing, education, or call centers.

Customize pronunciation using SSML tag

Let’s say you stream a popular podcast from Australia and you use the Amazon Polly Australian English (Olivia) voice to convert your script into human-like speech. In one of your scripts, you want to use words that are unknown to Amazon Polly voice. For example, you want to send Mātariki (Māori New Year) greetings to your New Zealand listeners. For such scenarios, Amazon Polly supports phonetic pronunciation, which you can use to achieve a pronunciation that is close to the correct pronunciation in the foreign language.

You can use the <phoneme> Speech Synthesis Markup Language (SSML) tag to suggest a phonetic pronunciation in the ph attribute. Let me show you how you can use <phoneme> SSML tag.

First, login into your AWS console and search for Amazon Polly in the search bar at the top. Select Amazon Polly and then choose Try Polly button.

In the Amazon Polly console, select Australian English from the language dropdown and enter following text in the Input text box and then click on Listen to test the pronunciation.

I’m wishing you all a very Happy Mātariki.

Sample speech without applying phonetic pronunciation:

If you hear the sample speech above, you can notice that the pronunciation of Mātariki – a word which is not part of Australian English – isn’t quite spot-on. Now, let’s look at how in such scenarios we can use phonetic pronunciation using <phoneme> SSML tag to customize the speech produced by Amazon Polly.

To use SSML tags, turn ON the SSML option in Amazon Polly console. Then copy and paste following SSML script containing phonetic pronunciation for Mātariki specified inside the ph attribute of the <phoneme> tag.

<speak>
I’m wishing you all a very Happy
<phoneme alphabet="x-sampa" ph="mA:.tA:.ri.ki">Mātariki</phoneme>.
</speak>

With the <phoneme> tag, Amazon Polly uses the pronunciation specified by the ph attribute instead of the standard pronunciation associated by default with the language used by the selected voice.

Sample speech after applying phonetic pronunciation:

If you hear the sample sound, you’ll notice that we opted for a different pronunciation for some of vowels (e.g., ā) to make Amazon Polly synthesize the sounds that are closer to the correct pronunciation. Now you might have a question, how do I generate the phonetic transcription “mA:.tA:.ri.ki” for the word Mātariki?

You can create phonetic transcriptions by referring to the Phoneme and Viseme tables for the supported languages. In the example above we have used the phonemes for Australian English.

Amazon Polly offers support in two phonetic alphabets: IPA and X-Sampa. Benefit of X-Sampa is that they are standard ASCII characters, so it is easier to type the phonetic transcription with a normal keyboard. You can use either of IPA or X-Sampa to generate your transcriptions, but make sure to stay consistent with your choice, especially when you use a lexicon file which we’ll cover in the next section.

Each phoneme in the phoneme table represents a speech sound. The bolded letters in the “Example” column of the Phoneme/Viseme table in the Australian English page linked above represent the part of the word the “Phoneme” corresponds to. For example, the phoneme /j/ represents the sound that an Australian English speaker makes when pronouncing the letter “y” in “yes.”

Customize pronunciation using lexicons

Phoneme tags are suitable for one-off situations to customize isolated cases, but these are not scalable. If you process huge volume of text, managed by different editors and reviewers, we recommend using lexicons. Using lexicons, you can achieve consistency in adding custom pronunciations and simultaneously reduce manual effort of inserting phoneme tags into the script.

A good practice is that after you test the custom pronunciation on the Amazon Polly console using the <phoneme> tag, you create a library of customized pronunciations using lexicons. Once lexicons file is uploaded, Amazon Polly will automatically apply phonetic pronunciations specified in the lexicons file and eliminate the need to manually provide a <phoneme> tag.

Create a lexicon file

A lexicon file contains the mapping between words and their phonetic pronunciations. Pronunciation Lexicon Specification (PLS) is a W3C recommendation for specifying interoperable pronunciation information. The following is an example PLS document:

<?xml version="1.0" encoding="UTF-8"?>
 <lexicon version="1.0" 
     xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon 
       http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
     alphabet="x-sampa" xml:lang="en-AU">

<lexeme>
<grapheme>Matariki</grapheme>
<grapheme>Mātariki</grapheme>
<phoneme>mA:.tA:.ri.ki</phoneme>
</lexeme>

<lexeme>
<grapheme>NZ</grapheme>
<alias>New Zealand</alias>
</lexeme>

 </lexicon>

Make sure that you use correct value for the xml:lang field. Use en-AU if you’re uploading the lexicon file to use with the Amazon Polly Australian English voice. For a complete list of supported languages, refer to Languages Supported by Amazon Polly.

To specify a custom pronunciation, you need to add a <lexeme> element which is a container for a lexical entry with one or more <grapheme> element and one or more pronunciation information provided inside <phoneme> element.

The <grapheme> element contains the text describing the orthography of the <lexeme> element. You can use a <grapheme> element to specify the word whose pronunciation you want to customize. You can add multiple <grapheme> elements to specify all word variations, for example with or without macrons. The <grapheme> element is case-sensitive, and during speech synthesis Amazon Polly string matches the words inside your script that you’re converting to speech. If a match is found, it uses the <phoneme> element, which describes how the <lexeme> is pronounced to generate phonetic transcription.

You can also use <alias> for commonly used abbreviations. In the preceding example of a lexicon file, NZ is used as an alias for New Zealand. This means that whenever Amazon Polly comes across “NZ” (with matching case) in the body of the text, it’ll read those two letters as “New Zealand”.

For more information on lexicon file format, see Pronunciation Lexicon Specification (PLS) Version 1.0 on the W3C website.

You can save a lexicon file with as a .pls or .xml file before uploading it to Amazon Polly.

Upload and apply the lexicon file

Upload your lexicon file to Amazon Polly using the following instructions:

  1. On the Amazon Polly console, choose Lexicons in the navigation pane.
  2. Choose Upload lexicon.
  3. Enter a name for the lexicon and then choose a lexicon file.
  4. Choose the file to upload.
  5. Choose Upload lexicon.

If a lexicon by the same name (whether a .pls or .xml file) already exists, uploading the lexicon overwrites the existing lexicon.

Now you can apply the lexicon to customize pronunciation.

  1. Choose Text-to-Speech in the navigation pane.
  2. Expand Additional settings.
  3. Turn on Customize pronunciation.
  4. Choose the lexicon on the drop-down menu.

You can also choose Upload lexicon to upload a new lexicon file (or a new version).

It’s a good practice to version control the lexicon file in a source code repository. Keeping the custom pronunciations in a lexicon file ensures that you can consistently refer to phonetic pronunciations for certain words across the organization. Also, keep in mind the pronunciation lexicon limits mentioned on Quotas in Amazon Polly page.

Test the pronunciation after applying the lexicon

Let’s perform quick test using “Wishing all my listeners in NZ, a very Happy Mātariki” as the input text.

We can compare the audio files before and after applying the lexicon.

Before applying the lexicon:

After applying the lexicon:

Conclusion

In this post, we discussed how you can customize pronunciations of commonly used acronyms or words not found in the selected language in Amazon Polly. You can use <phoneme> SSML tag which is great for inserting one-off customizations or testing purposes. We recommend using Lexicon to create a consistent set of pronunciations for frequently used words across your organization. This enables your content writers to spend time on writing instead of the tedious task of adding phonetic pronunciations in the script repetitively. You can try this in your AWS account on the Amazon Polly console.

Summary of resources


About the Authors

Ratan Kumar is a Solutions Architect based out of Auckland, New Zealand. He works with large enterprise customers helping them design and build secure, cost-effective, and reliable internet scale applications using the AWS cloud. He is passionate about technology and likes sharing knowledge through blog posts and twitch sessions.

Maciek Tegi is a Principal Audio Designer and a Product Manager for Polly Brand Voices. He has worked in professional capacity in the tech industry, movies, commercials and game localization. In 2013, he was the first audio engineer hired to the Alexa Text-To- Speech team. Maciek was involved in releasing 12 Alexa TTS voices across different countries, over 20 Polly voices, and 4 Alexa celebrity voices. Maciek is a triathlete, and an avid acoustic guitar player.

Read More

Personalize your machine translation results by using fuzzy matching with Amazon Translate

A person’s vernacular is part of the characteristics that make them unique. There are often countless different ways to express one specific idea. When a firm communicates with their customers, it’s critical that the message is delivered in a way that best represents the information they’re trying to convey. This becomes even more important when it comes to professional language translation. Customers of translation systems and services expect accurate and highly customized outputs. To achieve this, they often reuse previous translation outputs—called translation memory (TM)—and compare them to new input text. In computer-assisted translation, this technique is known as fuzzy matching. The primary function of fuzzy matching is to assist the translator by speeding up the translation process. When an exact match can’t be found in the TM database for the text being translated, translation management systems (TMSs) often have the option to search for a match that is less than exact. Potential matches are provided to the translator as additional input for final translation. Translators who enhance their workflow with machine translation capabilities such as Amazon Translate often expect fuzzy matching data to be used as part of the automated translation solution.

In this post, you learn how to customize output from Amazon Translate according to translation memory fuzzy match quality scores.

Translation Quality Match

The XML Localization Interchange File Format (XLIFF) standard is often used as a data exchange format between TMSs and Amazon Translate. XLIFF files produced by TMSs include source and target text data along with match quality scores based on the available TM. These scores—usually expressed as a percentage—indicate how close the translation memory is to the text being translated.

Some customers with very strict requirements only want machine translation to be used when match quality scores are below a certain threshold. Beyond this threshold, they expect their own translation memory to take precedence. Translators often need to apply these preferences manually either within their TMS or by altering the text data. This flow is illustrated in the following diagram. The machine translation system processes the translation data—text and fuzzy match scores— which is then reviewed and manually edited by translators, based on their desired quality thresholds. Applying thresholds as part of the machine translation step allows you to remove these manual steps, which improves efficiency and optimizes cost.

Machine Translation Review Flow

Figure 1: Machine Translation Review Flow

The solution presented in this post allows you to enforce rules based on match quality score thresholds to drive whether a given input text should be machine translated by Amazon Translate or not. When not machine translated, the resulting text is left to the discretion of the translators reviewing the final output.

Solution Architecture

The solution architecture illustrated in Figure 2 leverages the following services:

  • Amazon Simple Storage Service – Amazon S3 buckets contain the following content:

    • Fuzzy match threshold configuration files
    • Source text to be translated
    • Amazon Translate input and output data locations
  • AWS Systems Manager – We use Parameter Store parameters to store match quality threshold configuration values
  • AWS Lambda – We use two Lambda functions:

    • One function preprocesses the quality match threshold configuration files and persists the data into Parameter Store
    • One function automatically creates the asynchronous translation jobs
  • Amazon Simple Queue Service – An Amazon SQS queue triggers the translation flow as a result of new files coming into the source bucket
Solution Architecture Diagram

Figure 2: Solution Architecture

You first set up quality thresholds for your translation jobs by editing a configuration file and uploading it into the fuzzy match threshold configuration S3 bucket. The following is a sample configuration in CSV format. We chose CSV for simplicity, although you can use any format. Each line represents a threshold to be applied to either a specific translation job or as a default value to any job.

default, 75
SourceMT-Test, 80

The specifications of the configuration file are as follows:

  • Column 1 should be populated with the name of the XLIFF file—without extension—provided to the Amazon Translate job as input data.
  • Column 2 should be populated with the quality match percentage threshold. For any score below this value, machine translation is used.
  • For all XLIFF files whose name doesn’t match any name listed in the configuration file, the default threshold is used—the line with the keyword default set in Column 1.
Auto-generated parameter in Systems Manager Parameter Store

Figure 3: Auto-generated parameter in Systems Manager Parameter Store

When a new file is uploaded, Amazon S3 triggers the Lambda function in charge of processing the parameters. This function reads and stores the threshold parameters into Parameter Store for future usage. Using Parameter Store avoids performing redundant Amazon S3 GET requests each time a new translation job is initiated. The sample configuration file produces the parameter tags shown in the following screenshot.

The job initialization Lambda function uses these parameters to preprocess the data prior to invoking Amazon Translate. We use an English-to-Spanish translation XLIFF input file, as shown in the following code. It contains the initial text to be translated, broken down into what is referred to as segments, represented in the source tags.

<group id="g8">
    <trans-unit id="t8" translate="yes">
        <source>Consent Form</source>
        <target state-qualifier="fuzzy-match"/>
        <alt-trans extype="fuzzy-match" match-quality="99%" >
            <source>CONSENT FORM</source>
            <target>FORMULARIO DE CONSENTIMIENTO</target>
        </alt-trans>
    </trans-unit>
</group>

<group id="g67">
    <trans-unit id="t110" translate="yes">
        <source>Screening Visit:</source>
        <target state-qualifier="fuzzy-match"/>
        <alt-trans extype="fuzzy-match" match-quality="50%">
            <source>Screening Visit</source>
            <target>Selección</target>
        </alt-trans>
    </trans-unit>
</group>

The source text has been pre-matched with the translation memory beforehand. The data contains potential translation alternatives—represented as <alt-trans> tags—alongside a match quality attribute, expressed as a percentage. The business rule is as follows:

  • Segments received with alternative translations and a match quality below the threshold are untouched or empty. This signals to Amazon Translate that they must be translated.
  • Segments received with alternative translations with a match quality above the threshold are pre-populated with the suggested target text. Amazon Translate skips those segments.

Let’s assume the quality match threshold configured for this job is 80%. The first segment with 99% match quality isn’t machine translated, whereas the second segment is, because its match quality is below the defined threshold. In this configuration, Amazon Translate produces the following output:

<group id="g8">
	<trans-unit id="t8" translate="yes">
		<source>Consent Form</source>
		<target state-qualifier="fuzzy-match" translate:match-quality="99%">FORMULARIO DE CONSENTIMIENTO</target>
		<alt-trans extype="fuzzy-match" match-quality="99%" >
			<source>CONSENT FORM</source>
			<target>FORMULARIO DE CONSENTIMIENTO</target>
		</alt-trans>
	</trans-unit>
</group>

<group id="g67">
	<trans-unit id="t110" translate="yes">
		<source>Screening Visit:</source>
		<target state-qualifier="fuzzy-match">Visita de selección</target>
		<alt-trans extype="fuzzy-match" match-quality="50%">
			<source>Screening Visit</source>
			<target>Selección</target>
		</alt-trans>
	</trans-unit>
</group>

In the second segment, Amazon Translate overwrites the target text initially suggested (Selección) with a higher quality translation: Visita de selección.

One possible extension to this use case could be to reuse the translated output and create our own translation memory. Amazon Translate supports customization of machine translation using translation memory thanks to the parallel data feature. Text segments previously machine translated due to their initial low-quality score could then be reused in new translation projects.

In the following sections, we walk you through the process of deploying and testing this solution. You use AWS CloudFormation scripts and data samples to launch an asynchronous translation job personalized with a configurable quality match threshold.

Prerequisites

For this walkthrough, you must have an AWS account. If you don’t have an account yet, you can create and activate one.

Launch AWS CloudFormation stack

  1. Choose Launch Stack:
  2. For Stack name, enter a name.
  3. For ConfigBucketName, enter the S3 bucket containing the threshold configuration files.
  4. For ParameterStoreRoot, enter the root path of the parameters created by the parameters processing Lambda function.
  5. For QueueName, enter the SQS queue that you create to post new file notifications from the source bucket to the job initialization Lambda function. This is the function that reads the configuration file.
  6. For SourceBucketName, enter the S3 bucket containing the XLIFF files to be translated. If you prefer to use a preexisting bucket, you need to change the value of the CreateSourceBucket parameter to No.
  7. For WorkingBucketName, enter the S3 bucket Amazon Translate uses for input and output data.
  8. Choose Next.

    Figure 4: CloudFormation stack details

  9. Optionally on the Stack Options page, add key names and values for the tags you may want to assign to the resources about to be created.
  10. Choose Next.
  11. On the Review page, select I acknowledge that this template might cause AWS CloudFormation to create IAM resources.
  12. Review the other settings, then choose Create stack.

AWS CloudFormation takes several minutes to create the resources on your behalf. You can watch the progress on the Events tab on the AWS CloudFormation console. When the stack has been created, you can see a CREATE_COMPLETE message in the Status column on the Overview tab.

Test the solution

Let’s go through a simple example.

  1. Download the following sample data.
  2. Unzip the content.

There should be two files: an .xlf file in XLIFF format, and a threshold configuration file with .cfg as the extension. The following is an excerpt of the XLIFF file.

English to French sample file extract

Figure 5: English to French sample file extract

  1. On the Amazon S3 console, upload the quality threshold configuration file into the configuration bucket you specified earlier.

The value set for test_En_to_Fr is 75%. You should be able to see the parameters on the Systems Manager console in the Parameter Store section.

  1. Still on the Amazon S3 console, upload the .xlf file into the S3 bucket you configured as source. Make sure the file is under a folder named translate (for example, <my_bucket>/translate/test_En_to_Fr.xlf).

This starts the translation flow.

  1. Open the Amazon Translate console.

A new job should appear with a status of In Progress.

Auto-generated parameter in Systems Manager Parameter Store

Figure 6: In progress translation jobs on Amazon Translate console

  1. Once the job is complete, click into the job’s link and consult the output. All segments should have been translated.

All segments should have been translated. In the translated XLIFF file, look for segments with additional attributes named lscustom:match-quality, as shown in the following screenshot. These custom attributes identify segments where suggested translation was retained based on score.

Custom attributes identifying segments where suggested translation was retained based on score

Figure 7: Custom attributes identifying segments where suggested translation was retained based on score

These were derived from the translation memory according to the quality threshold. All other segments were machine translated.

You have now deployed and tested an automated asynchronous translation job assistant that enforces configurable translation memory match quality thresholds. Great job!

Cleanup

If you deployed the solution into your account, don’t forget to delete the CloudFormation stack to avoid any unexpected cost. You need to empty the S3 buckets manually beforehand.

Conclusion

In this post, you learned how to customize your Amazon Translate translation jobs based on standard XLIFF fuzzy matching quality metrics. With this solution, you can greatly reduce the manual labor involved in reviewing machine translated text while also optimizing your usage of Amazon Translate. You can also extend the solution with data ingestion automation and workflow orchestration capabilities, as described in Speed Up Translation Jobs with a Fully Automated Translation System Assistant.


About the Authors

Narcisse Zekpa is a Solutions Architect based in Boston. He helps customers in the Northeast U.S. accelerate their adoption of the AWS Cloud, by providing architectural guidelines, design innovative, and scalable solutions. When Narcisse is not building, he enjoys spending time with his family, traveling, cooking, and playing basketball.

Dimitri Restaino is a Solutions Architect at AWS, based out of Brooklyn, New York. He works primarily with Healthcare and Financial Services companies in the North East, helping to design innovative and creative solutions to best serve their customers. Coming from a software development background, he is excited by the new possibilities that serverless technology can bring to the world. Outside of work, he loves to hike and explore the NYC food scene.

Read More

Enhance the caller experience with hints in Amazon Lex

We understand speech input better if we have some background on the topic of conversation. Consider a customer service agent at an auto parts wholesaler helping with orders. If the agent knows that the customer is looking for tires, they’re more likely to recognize responses (for example, “Michelin”) on the phone. Agents often pick up such clues or hints based on their domain knowledge and access to business intelligence dashboards. Amazon Lex now supports a hints capability to enhance the recognition of relevant phrases in a conversation. You can programmatically provide phrases as hints during a live interaction to influence the transcription of spoken input. Better recognition drives efficient conversations, reduces agent handling time, and ultimately increases customer satisfaction.

In this post, we review the runtime hints capability and use it to implement verification of callers based on their mother’s maiden name.

Overview of the runtime hints capability

You can provide a list of phrases or words to help your bot with the transcription of speech input. You can use these hints with built-in slot types such as first and last names, street names, city, state, and country. You can also configure these for your custom slot types.

You can use the capability to transcribe names that may be difficult to pronounce or understand. For example, in the following sample conversation, we use it to transcribe the name “Loreck.”

Conversation 1

IVR: Welcome to ACME bank. How can I help you today?

Caller: I want to check my account balance.

IVR: Sure. Which account should I pull up?

Caller: Checking

IVR: What is the account number?

Caller: 1111 2222 3333 4444

IVR: For verification purposes, what is your mother’s maiden name?

Caller: Loreck

IVR: Thank you. The balance on your checking account is 123 dollars.

Words provided as hints are preferred over other similar words. For example, in the second sample conversation, the runtime hint (“Smythe”) is selected over a more common transcription (“Smith”).

Conversation 2

IVR: Welcome to ACME bank. How can I help you today?

Caller: I want to check my account balance.

IVR: Sure. Which account should I pull up?

Caller: Checking

IVR: What is the account number?

Caller: 5555 6666 7777 8888

IVR: For verification purposes, what is your mother’s maiden name?

Caller: Smythe

IVR: Thank you. The balance on your checking account is 456 dollars.

If the name doesn’t match the runtime hint, you can fail the verification and route the call to an agent.

Conversation 3

IVR: Welcome to ACME bank. How can I help you today?

Caller: I want to check my account balance.

IVR: Sure. Which account should I pull up?

Caller: Savings

IVR: What is the account number?

Caller: 5555 6666 7777 8888

IVR: For verification purposes, what is your mother’s maiden name?

Caller: Jane

IVR: There is an issue with your account. For support, you will be forwarded to an agent.

Solution overview

Let’s review the overall architecture for the solution (see the following diagram):

  • We use an Amazon Lex bot integrated with an Amazon Connect contact flow to deliver the conversational experience.
  • We use a dialog codehook in the Amazon Lex bot to invoke an AWS Lambda function that provides the runtime hint at the previous turn of the conversation.
  • For the purposes of this post, the mother’s maiden name data used for authentication is stored in an Amazon DynamoDB table.
  • After the caller is authenticated, the control is passed to the bot to perform transactions (for example, check balance)

In addition to the Lambda function, you can also send runtime hints to Amazon Lex V2 using the PutSession, RecognizeText, RecognizeUtterance, or StartConversation operations. The runtime hints can be set at any point in the conversation and are persisted at every turn until cleared.

Deploy the sample Amazon Lex bot

To create the sample bot and configure the runtime phrase hints, perform the following steps. This creates an Amazon Lex bot called BankingBot, and one slot type (accountNumber).

  1. Download the Amazon Lex bot.
  2. On the Amazon Lex console, choose Actions, Import.
  3. Choose the file BankingBot.zip that you downloaded, and choose Import.
  4. Choose the bot BankingBot on the Amazon Lex console.
  5. Choose the language English (GB).
  6. Choose Build.
  7. Download the supporting Lambda code.
  8. On the Lambda console, create a new function and select Author from scratch.
  9. For Function name, enter BankingBotEnglish.
  10. For Runtime, choose Python 3.8.
  11. Choose Create function.
  12. In the Code source section, open lambda_function.py and delete the existing code.
  13. Download the function code and open it in a text editor.
  14. Copy the code and enter it into the empty function code field.
  15. Choose deploy.
  16. On the Amazon Lex console, select the bot BankingBot.
  17. Choose Deployment and then Aliases, then choose the alias TestBotAlias.
  18. On the Aliases page, choose Languages and choose English (GB).
  19. For Source, select the bot BankingBotEnglish.
  20. For Lambda version or alias, enter $LATEST.
  21. On the DynamoDB console, choose Create table.
  22. Provide the name as customerDatabase.
  23. Provide the partition key as accountNumber.
  24. Add an item with accountNumber: “1111222233334444” and mothersMaidenName “Loreck”.
  25. Add item with accountNumber: “5555666677778888” and mothersMaidenName “Smythe”.
  26. Make sure the Lambda function has permissions to read from the DynamoDB table customerDatabase.
  27. On the Amazon Connect console, choose Contact flows.
  28. In the Amazon Lex section, select your Amazon Lex bot and make it available for use in the Amazon Connect contact flow.
  29. Download the contact flow to integrate with the Amazon Lex bot.
  30. Choose the contact flow to load it into the application.
  31. Make sure the right bot is configured in the “Get Customer Input” block.
  32. Choose a queue in the “Set working queue” block.
  33. Add a phone number to the contact flow.
  34. Test the IVR flow by calling in to the phone number.

Test the solution

You can now call in to the Amazon Connect phone number and interact with the bot.

Conclusion

Runtime hints allow you to influence the transcription of words or phrases dynamically in the conversation. You can use business logic to identify the hints as the conversation evolves. Better recognition of the user input allows you to deliver an enhanced experience. You can configure runtime hints via the Lex V2 SDK. The capability is available in all AWS Regions where Amazon Lex operates in the English (Australia), English (UK), and English (US) locales.

To learn more, refer to runtime hints.


About the Authors

Kai Loreck is a professional services Amazon Connect consultant. He works on designing and implementing scalable customer experience solutions. In his spare time, he can be found playing sports, snowboarding, or hiking in the mountains.

Anubhav Mishra is a Product Manager with AWS. He spends his time understanding customers and designing product experiences to address their business challenges.

Sravan Bodapati is an Applied Science Manager at AWS Lex. He focuses on building cutting edge Artificial Intelligence and Machine Learning solutions for AWS customers in ASR and NLP space. In his spare time, he enjoys hiking, learning economics, watching TV shows and spending time with his family.

Read More

Run automatic model tuning with Amazon SageMaker JumpStart

In December 2020, AWS announced the general availability of Amazon SageMaker JumpStart, a capability of Amazon SageMaker that helps you quickly and easily get started with machine learning (ML). In March 2022, we also announced the support for APIs in JumpStart. JumpStart provides one-click fine-tuning and deployment of a wide variety of pre-trained models across popular ML tasks, as well as a selection of end-to-end solutions that solve common business problems. These features remove the heavy lifting from each step of the ML process, making it simpler to develop high-quality models and reducing time to deployment.

In this post, we demonstrate how to run automatic model tuning with JumpStart.

SageMaker automatic model tuning

Traditionally, ML engineers implement a trial and error method to find the right set of hyperparameters. Trial and error involves running multiple jobs sequentially or in parallel while provisioning the resources needed to run the experiment.

With SageMaker automatic model tuning, ML engineers and data scientists can offload the time-consuming task of optimizing their model and let SageMaker run the experimentation. SageMaker takes advantage of the elasticity of the AWS platform to efficiently and concurrently run multiple training simulations on a dataset and find the best hyperparameters for a model.

SageMaker automatic model tuning finds the best version of a model by running many training jobs on your dataset using the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose.

Automatic model tuning uses either a Bayesian (default) or a random search strategy to find the best values for hyperparameters. Bayesian search treats hyperparameter tuning like a regression problem. When choosing the best hyperparameters for the next training job, it considers everything that it knows about the problem so far and allows the algorithm to exploit the best-known results.

In this post, we use the default Bayesian search strategy to demonstrate the steps involved in running automatic model tuning with JumpStart using the LightGBM model.

JumpStart currently supports 10 example notebooks with automatic model tuning. It also supports four popular algorithms for tabular data modeling. The tasks and links to their sample notebooks are summarized in the following table.

Task Pre-trained Models Supports Custom Dataset Frameworks Supported Example Notebooks
Image Classification yes yes PyTorch, TensorFlow Introduction to JumpStart – Image Classification
Object Detection yes yes PyTorch, TensorFlow, MXNet Introduction to JumpStart – Object Detection
Semantic Segmentation yes yes MXNet Introduction to JumpStart – Semantic Segmentation
Text Classification yes yes TensorFlow Introduction to JumpStart – Text Classification
Sentence Pair Classification yes yes TensorFlow, Hugging Face Introduction to JumpStart – Sentence Pair Classification
Question Answering yes yes PyTorch Introduction to JumpStart – Question Answering
Tabular Classification yes yes LightGBM, CatBoost, XGBoost, Linear Learner Introduction to JumpStart – Tabular Classification – LightGBM, CatBoost
Introduction to JumpStart – Tabular Classification – XGBoost, Linear Learner
Tabular Regression yes yes LightGBM, CatBoost, XGBoost, Linear Learner Introduction to JumpStart – Tabular Regression – LightGBM, CatBoost
Introduction to JumpStart – Tabular Regression – XGBoost, Linear Learner

Solution overview

This technical workflow gives an overview of the different Amazon Sagemaker features and steps needed to automatically tune a JumpStart model.

In the following sections, we provide a step-by-step walkthrough of how to run automatic model tuning with JumpStart using the LightGBM algorithm. We provide an accompanying notebook for this walkthrough.

We walk through the following high-level steps:

  1. Retrieve the JumpStart pre-trained model and images container.
  2. Set static hyperparameters.
  3. Define the tunable hyperparameter ranges.
  4. Initialize the automatic model tuning.
  5. Run the tuning job.
  6. Deploy the best model to an endpoint.

Retrieve the JumpStart pre-trained model and images container

In this section, we choose the LightGBM classification model for fine-tuning. We use the ml.m5.xlarge instance type on which the model is run. We then retrieve the training Docker container, the training algorithm source, and the pre-trained model. See the following code:

training_instance_type = "ml.m5.xlarge"

# Retrieve the docker image
train_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    model_id=train_model_id,
    model_version=train_model_version,
    image_scope=train_scope,
    instance_type=training_instance_type,
)
# Retrieve the training script
train_source_uri = script_uris.retrieve(
    model_id=train_model_id, model_version=train_model_version, script_scope=train_scope
)
# Retrieve the pre-trained model tarball to further fine-tune
train_model_uri = model_uris.retrieve(
    model_id=train_model_id, model_version=train_model_version, model_scope=train_scope
)

Set static hyperparameters

We now retrieve the default hyperparameters for this LightGBM model, as preconfigured by JumpStart. We also override the num_boost_round hyperparameter with a custom value.

# Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(
    model_id=train_model_id, model_version=train_model_version
)
# [Optional] Override default hyperparameters with custom values

Define the tunable hyperparameter ranges

Next we define the hyperparameter ranges to be optimized by automatic model tuning. We define the hyperparameter name as expected by the model and then the ranges of values to be tried for this hyperparameter. Automatic model tuning draws samples (equal to the max_jobs parameter) from the space of hyperparameters, using a technique called Bayesian search. For each drawn hyperparameter sample, the tuner creates a training job to evaluate the model with that configuration. See the following code:

hyperparameter_ranges = {
    "learning_rate": ContinuousParameter(1e-4, 1, scaling_type="Logarithmic"),
    "num_boost_round": IntegerParameter(2, 30),
    "early_stopping_rounds": IntegerParameter(2, 30),
    "num_leaves": IntegerParameter(10, 50),
    "feature_fraction": ContinuousParameter(0, 1),
    "bagging_fraction": ContinuousParameter(0, 1),
    "bagging_freq": IntegerParameter(1, 10),
    "max_depth": IntegerParameter(5, 30),
    "min_data_in_leaf": IntegerParameter(5, 50),
}

Initialize the automatic model tuning

We start by creating an Estimator object with all the required assets that define the training job, such as the pre-trained model, the training image, and the training script. We then define a HyperparameterTuner object to interact with SageMaker hyperparameter tuning APIs.

The HyperparameterTuner accepts as parameters the Estimator object, the target metric based on which the best set of hyperparameters is decided, the total number of training jobs (max_jobs) to start for the hyperparameter tuning job, and the maximum parallel training jobs to run (max_parallel_jobs). Training jobs are run with the LightGBM algorithm, and the hyperparameter values that has the minimal mlogloss metric is chosen. For more information about configuring automatic model tuning, see Best Practices for Hyperparameter Tuning.

# Create SageMaker Estimator instance
tabular_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=s3_output_location,
)

tuner = HyperparameterTuner(
    estimator=tabular_estimator,
    objective_metric_name="multi_logloss",
    hyperparameter_ranges=hyperparameter_ranges,
    metric_definitions=[{"Name": "multi_logloss", "Regex": "multi_logloss: ([0-9\.]+)"}],
    strategy="Bayesian",
    max_jobs=10,
    max_parallel_jobs=2,
    objective_type="Minimize",
    base_tuning_job_name=training_job_name,
)

In the preceding code, we tell the tuner to run at most 10 experiments (max_jobs) and only two concurrent experiments at a time (max_parallel_jobs). Both of these parameters keep your cost and training time under control.

Run the tuning job

To launch the SageMaker tuning job, we call the fit method of the hyperparameter tuner object and pass the Amazon Simple Storage Service (Amazon S3) path of the training data:

tuner.fit({"training": training_dataset_s3_path}, logs=True)

While automatic model tuning searches for the best hyperparameters, you can monitor their progress either on the SageMaker console or on Amazon CloudWatch. When training is complete, the best model’s fine-tuned artifacts are uploaded to the Amazon S3 output location specified in the training configuration.

Deploy the best model to an endpoint

When the tuning job is complete, the best model has been selected and stored in Amazon S3. We now can deploy that model by calling the deploy method of the HyperparameterTuner object and passing the needed parameters, such as the number of instances to be used for the created endpoint, their type, the image to be deployed, and the script to run:

tuner.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    entry_point="inference.py",
    image_uri=deploy_image_uri,
    source_dir=deploy_source_uri,
    endpoint_name=endpoint_name,
    enable_network_isolation=True
)

We can now test the created endpoint by making inference requests. You can follow the rest of the process in the accompanying notebook.

Conclusion

With automatic model tuning in SageMaker, you can find the best version of your model by running training jobs on the provided dataset with one of the supported algorithms. Automatic model tuning allows you to reduce the time to tune a model by automatically searching for the best hyperparameter configuration within the hyperparameter ranges that you specify.

In this post, we showed the value of running automatic model tuning on a JumpStart pre-trained model using SageMaker APIs. We used the LightGBM algorithm and defined a maximum of 10 training jobs. We also provided links to example notebooks showcasing the ML frameworks that support JumpStart model optimization.

For more details on how to optimize a JumpStart model with automatic model tuning, refer to our example notebook.


About the Author

Doug Mbaya is a Senior Partner  Solution architect with a focus in data and analytics. Doug works closely with AWS partners, helping them integrate data and analytics solution in the cloud.

Kruthi Jayasimha Rao is a Partner Solutions Architect in the Scale-PSA team. Kruthi conducts technical validations for Partners enabling them progress in the Partner Path.

Giannis Mitropoulos is a Software Development Engineer for SageMaker Automatic Model Tuning.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He is an active researcher in machine learning and statistical inference and has published many papers in NeurIPS, ICML, ICLR, JMLR, and ACL conferences.

Read More