Set up a text summarization project with Hugging Face Transformers: Part 2

March 23, 2022

by Heiko Hotz Amazon AWS

This is the second post in a two-part series in which I propose a practical guide for organizations so you can assess the quality of text summarization models for your domain.

For an introduction to text summarization, an overview of this tutorial, and the steps to create a baseline for our project (also referred to as section 1), refer back to the first post.

This post is divided into three sections:

Section 2: Generate summaries with a zero-shot model
Section 3: Train a summarization model
Section 4: Evaluate the trained model

Section 2: Generate summaries with a zero-shot model

In this post, we use the concept of zero-shot learning (ZSL), which means we use a model that has been trained to summarize text but hasn’t seen any examples of the arXiv dataset. It’s a bit like trying to paint a portrait when all you have been doing in your life is landscape painting. You know how to paint, but you might not be too familiar with the intricacies of portrait painting.

For this section, we use the following notebook.

Why zero-shot learning?

ZSL has become popular over the past years because it allows you to use state-of-the-art NLP models with no training. And their performance is sometimes quite astonishing: the Big Science Research Workgroup has recently released their T0pp (pronounced “T Zero Plus Plus”) model, which has been trained specifically for researching zero-shot multitask learning. It can often outperform models six times larger on the BIG-bench benchmark, and can outperform the GPT-3 (16 times larger) on several other NLP benchmarks.

Another benefit of ZSL is that it takes just two lines of code to use it. By trying it out, we create a second baseline, which we use to quantify the gain in model performance after we fine-tune the model on our dataset.

Set up a zero-shot learning pipeline

To use ZSL models, we can use Hugging Face’s Pipeline API. This API enables us to use a text summarization model with just two lines of code. It takes care of the main processing steps in an NLP model:

Preprocess the text into a format the model can understand.
Pass the preprocessed inputs to the model.
Postprocess the predictions of the model, so you can make sense of them.

It uses the summarization models that are already available on the Hugging Face model hub.

To use it, run the following code:

from transformers import pipeline

summarizer = pipeline("summarization")
print(summarizer(text))

That’s it! The code downloads a summarization model and creates summaries locally on your machine. If you’re wondering which model it uses, you can either look it up in the source code or use the following command:

print(summarizer.model.config.__getattribute__('_name_or_path'))

When we run this command, we see that the default model for text summarization is called sshleifer/distilbart-cnn-12-6:

We can find the model card for this model on the Hugging Face website, where we can also see that the model has been trained on two datasets: the CNN Dailymail dataset and the Extreme Summarization (XSum) dataset. It’s worth noting that this model is not familiar with the arXiv dataset and is only used to summarize texts that are similar to the ones it has been trained on (mostly news articles). The numbers 12 and 6 in the model name refer to the number of encoder layers and decoder layers, respectively. Explaining what these are is outside the scope of this tutorial, but you can read more about it in the post Introducing BART by Sam Shleifer, who created the model.

We use the default model going forward, but I encourage you to try out different pre-trained models. All the models that are suitable for summarization can be found on the Hugging Face website. To use a different model, you can specify the model name when calling the Pipeline API:

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

Extractive vs. abstractive summarization

We haven’t spoken yet about two possible but different approaches to text summarization: extractive vs. abstractive. Extractive summarization is the strategy of concatenating extracts taken from a text into a summary, whereas abstractive summarization involves paraphrasing the corpus using novel sentences. Most of the summarization models are based on models that generate novel text (they’re natural language generation models, like, for example, GPT-3). This means that the summarization models also generate novel text, which makes them abstractive summarization models.

Generate zero-shot summaries

Now that we know how to use it, we want to use it on our test dataset—the same dataset we used in section 1 to create the baseline. We can do that with the following loop:

candidate_summaries = []

for i, text in enumerate(texts):
    if i % 100 == 0:
        print(i)
    candidate = summarizer(text, min_length=5, max_length=20)
    candidate_summaries.append(candidate[0]['summary_text'])

We use the min_length and max_length parameters to control the summary the model generates. In this example, we set min_length to 5 because we want the title to be at least five words long. And by estimating the reference summaries (the actual titles for the research papers), we determine that 20 could be a reasonable value for max_length. But again, this is just a first attempt. When the project is in the experimentation phase, these two parameters can and should be changed to see if the model performance changes.

Additional parameters

If you’re already familiar with text generation, you might know there are many more parameters to influence the text a model generates, such as beam search, sampling, and temperature. These parameters give you more control over the text that is being generated, for example make the text more fluent and less repetitive. These techniques are not available in the Pipeline API—you can see in the source code that min_length and max_length are the only parameters that are considered. After we train and deploy our own model, however, we have access to those parameters. More on that in section 4 of this post.

Model evaluation

After we have the generated the zero-shot summaries, we can use our ROUGE function again to compare the candidate summaries with the reference summaries:

from datasets import load_metric
metric = load_metric("rouge")

def calc_rouge_scores(candidates, references):
    result = metric.compute(predictions=candidates, references=references, use_stemmer=True)
    result = {key: round(value.mid.fmeasure * 100, 1) for key, value in result.items()}
    return result

Running this calculation on the summaries that were generated with the ZSL model gives us the following results:

When we compare those with our baseline, we see that this ZSL model is actually performing worse that our simple heuristic of just taking the first sentence. Again, this is not unexpected: although this model knows how to summarize news articles, it has never seen an example of summarizing the abstract of an academic research paper.

Baseline comparison

We have now created two baselines: one using a simple heuristic and one with an ZSL model. By comparing the ROUGE scores, we see that the simple heuristic currently outperforms the deep learning model.

In the next section, we take this same deep learning model and try to improve its performance. We do so by training it on the arXiv dataset (this step is also called fine-tuning). We take advantage of the fact that it already knows how to summarize text in general. We then show it lots of examples of our arXiv dataset. Deep learning models are exceptionally good at identifying patterns in datasets after they get trained on it, so we expect the model to get better at this particular task.

Section 3: Train a summarization model

In this section, we train the model we used for zero-shot summaries in section 2 (sshleifer/distilbart-cnn-12-6) on our dataset. The idea is to teach the model what summaries for abstracts of research papers look like by showing it many examples. Over time the model should recognize the patterns in this dataset, which will allow it to create better summaries.

It’s worth noting once more that if you have labeled data, namely texts and corresponding summaries, you should use those to train a model. Only by doing so can the model learn the patterns of your specific dataset.

The complete code for the model training is in the following notebook.

Set up a training job

Because training a deep learning model would take a few weeks on a laptop, we use Amazon SageMaker training jobs instead. For more details, refer to Train a Model with Amazon SageMaker. In this post, I briefly highlight the advantage of using these training jobs, besides the fact that they allow us to use GPU compute instances.

Let’s assume we have a cluster of GPU instances we can use. In that case, we likely want to create a Docker image to run the training so that we can easily replicate the training environment on other machines. We then install the required packages and because we want to use several instances, we need to set up distributed training as well. When the training is complete, we want to quickly shut down these computers because they are costly.

All these steps are abstracted away from us when using training jobs. In fact, we can train a model in the same way as described by specifying the training parameters and then just calling one method. SageMaker takes care of the rest, including stopping the GPU instances when the training is complete so to not incur any further costs.

In addition, Hugging Face and AWS announced a partnership earlier in 2022 that makes it even easier to train Hugging Face models on SageMaker. This functionality is available through the development of Hugging Face AWS Deep Learning Containers (DLCs). These containers include Hugging Face Transformers, Tokenizers and the Datasets library, which allows us to use these resources for training and inference jobs. For a list of the available DLC images, see available Hugging Face Deep Learning Containers Images. They are maintained and regularly updated with security patches. We can find many examples of how to train Hugging Face models with these DLCs and the Hugging Face Python SDK in the following GitHub repo.

We use one of those examples as a template because it does almost everything we need for our purpose: train a summarization model on a specific dataset in a distributed manner (using more than one GPU instance).

One thing, however, we have to account for is that this example uses a dataset directly from the Hugging Face dataset hub. Because we want to provide our own custom data, we need to amend the notebook slightly.

Pass data to the training job

To account for the fact that we bring our own dataset, we need to use channels. For more information, refer to How Amazon SageMaker Provides Training Information.

I personally find this term a bit confusing, so in my mind I always think mapping when I hear channels, because it helps me better visualize what happens. Let me explain: as we already learned, the training job spins up a cluster of Amazon Elastic Compute Cloud (Amazon EC2) instances and copies a Docker image onto it. However, our datasets are stored in Amazon Simple Storage Service (Amazon S3) and can’t be accessed by that Docker image. Instead, the training job needs to copy the data from Amazon S3 to a predefined path locally in that Docker image. The way it does that is by us telling the training job where the data resides in Amazon S3 and where on the Docker image the data should be copied to so that the training job can access it. We map the Amazon S3 location with the local path.

We set the local path in the hyperparameters section of the training job:

Then we tell the training job where the data resides in Amazon S3 when calling the fit() method, which starts the training:

Note that the folder name after /opt/ml/input/data matches the channel name (datasets). This enables the training job to copy the data from Amazon S3 to the local path.

Start the training

We’re now ready to start the training job. As mentioned before, we do so by calling the fit() method. The training job runs for about 40 minutes. You can follow the progress and see additional information on the SageMaker console.

When the training job is complete, it’s time to evaluate our newly trained model.

Section 4: Evaluate the trained model

Evaluating our trained model is very similar to what we did in section 2, where we evaluated the ZSL model. We call the model and generate candidate summaries and compare them to the reference summaries by calculating the ROUGE scores. But now, the model sits in Amazon S3 in a file called model.tar.gz (to find the exact location, you can check the training job on the console). So how do we access the model to generate summaries?

We have two options: deploy the model to a SageMaker endpoint or download it locally, similar to what we did in section 2 with the ZSL model. In this tutorial, I deploy the model to a SageMaker endpoint because it’s more convenient and by choosing a more powerful instance for the endpoint, we can shorten the inference time significantly. The GitHub repo contains a notebook that shows how to evaluate the model locally.

Deploy a model

It’s usually very easy to deploy a trained model on SageMaker (see again the following example on GitHub from Hugging Face). After the model has been trained, we can call estimator.deploy() and SageMaker does the rest for us in the background. Because in our tutorial we switch from one notebook to the next, we have to locate the training job and the associated model first, before we can deploy it:

After we retrieve the model location, we can deploy it to a SageMaker endpoint:

from sagemaker.huggingface import HuggingFaceModel

model_for_deployment = HuggingFaceModel(entry_point='inference.py',
                                        source_dir='inference_code',
                                        model_data=model_data,
                                        role=role,
                                        pytorch_version='1.7.1',
                                        py_version='py36',
                                        transformers_version='4.6.1',
                                        )

predictor = model_for_deployment.deploy(initial_instance_count=1,
                                        instance_type='ml.g4dn.xlarge',
                                        serializer=sagemaker.serializers.JSONSerializer(),
                                        deserializer=sagemaker.deserializers.JSONDeserializer()
                                        )

Deployment on SageMaker is straightforward because it uses the SageMaker Hugging Face Inference Toolkit, an open-source library for serving Transformers models on SageMaker. We normally don’t even have to provide an inference script; the toolkit takes care of that. In that case, however, the toolkit utilizes the Pipeline API again, and as we discussed in section 2, the Pipeline API doesn’t allow us to use advanced text generation techniques such as beam search and sampling. To avoid this limitation, we provide our custom inference script.

First evaluation

For the first evaluation of our newly trained model, we use the same parameters as in section 2 with the zero-shot model to generate the candidate summaries. This allows to make an apple-to-apples comparison:

candidate_summaries = []

for i, text in enumerate(texts):
    data = {"inputs":text, "parameters_list":[{"min_length": 5, "max_length": 20}]}
    candidate = predictor.predict(data)
    candidate_summaries.append(candidate[0][0])

We compare the summaries generated by the model with the reference summaries:

This is encouraging! Our first attempt to train the model, without any hyperparameter tuning, has improved the ROUGE scores significantly.

Second evaluation

Now it’s time to use some more advanced techniques such as beam search and sampling to play around with the model. For a detailed explanation what each of these parameters does, refer to How to generate text: using different decoding methods for language generation with Transformers. Let’s try it with a semi-random set of values for some of these parameters:

candidate_summaries = []

for i, text in enumerate(texts):
    data = {"inputs":text,
            "parameters_list":[{"min_length": 5, "max_length": 20, "num_beams": 50, "top_p": 0.9, "do_sample": True}]}
    candidate = predictor.predict(data)
    candidate_summaries.append(candidate[0][0])

When running our model with these parameters, we get the following scores:

That didn’t work out quite as we hoped—the ROUGE scores have actually gone down slightly. However, don’t let this discourage you from trying out different values for these parameters. In fact, this is the point where we finish with the setup phase and transition into the experimentation phase of the project.

Conclusion and next steps

We have concluded the setup for the experimentation phase. In this two-part series, we downloaded and prepared our data, created a baseline with a simple heuristic, created another baseline using zero-shot learning, and then trained our model and saw a significant increase in performance. Now it’s time to play around with every part we created in order to create even better summaries. Consider the following:

Preprocess the data properly – For example, remove stopwords and punctuation. Don’t underestimate this part—in many data science projects, data preprocessing is one of the most important aspects (if not the most important), and data scientists typically spend most of their time with this task.
Try out different models – In our tutorial, we used the standard model for summarization (sshleifer/distilbart-cnn-12-6), but many more models are available that you can use for this task. One of those might better fit your use case.
Perform hyperparameter tuning – When training the model, we used a certain set of hyperparameters (learning rate, number of epochs, and so on). These parameters aren’t set in stone—quite the opposite. You should change these parameters to understand how they affect your model performance.
Use different parameters for text generation – We already did one round of creating summaries with different parameters to utilize beam search and sampling. Try out different values and parameters. For more information, refer to How to generate text: using different decoding methods for language generation with Transformers.

I hope you made it to the end and found this tutorial useful.

About the Author

Heiko Hotz is a Senior Solutions Architect for AI & Machine Learning and leads the Natural Language Processing (NLP) community within AWS. Prior to this role, he was the Head of Data Science for Amazon’s EU Customer Service. Heiko helps our customers being successful in their AI/ML journey on AWS and has worked with organizations in many industries, including Insurance, Financial Services, Media and Entertainment, Healthcare, Utilities, and Manufacturing. In his spare time Heiko travels as much as possible.

Set up a text summarization project with Hugging Face Transformers: Part 1

March 23, 2022

by Heiko Hotz Amazon AWS

When OpenAI released the third generation of their machine learning (ML) model that specializes in text generation in July 2020, I knew something was different. This model struck a nerve like no one that came before it. Suddenly I heard friends and colleagues, who might be interested in technology but usually don’t care much about the latest advancements in the AI/ML space, talk about it. Even the Guardian wrote an article about it. Or, to be precise, the model wrote the article and the Guardian edited and published it. There was no denying it – GPT-3 was a game changer.

After the model had been released, people immediately started to come up with potential applications for it. Within weeks, many impressive demos were created, which can be found on the GPT-3 website. One particular application that caught my eye was text summarization – the capability of a computer to read a given text and summarize its content. It’s one of the hardest tasks for a computer because it combines two fields within the field of natural language processing (NLP): reading comprehension and text generation. Which is why I was so impressed by the GPT-3 demos for text summarization.

You can give them a try on the Hugging Face Spaces website. My favorite one at the moment is an application that generates summaries of news articles with just the URL of the article as input.

In this two-part series, I propose a practical guide for organizations so you can assess the quality of text summarization models for your domain.

Tutorial overview

Many organizations I work with (charities, companies, NGOs) have huge amounts of texts they need to read and summarize – financial reports or news articles, scientific research papers, patent applications, legal contracts, and more. Naturally, these organizations are interested in automating these tasks with NLP technology. To demonstrate the art of the possible, I often use the text summarization demos, which almost never fail to impress.

But now what?

The challenge for these organizations is that they want to assess text summarization models based on summaries for many, many documents – not one at a time. They don’t want to hire an intern whose only job is to open the application, paste in a document, hit the Summarize button, wait for the output, assess whether the summary is good, and do that all over again for thousands of documents.

I wrote this tutorial with my past self from four weeks ago in mind – it’s the tutorial I wish I had back then when I started on this journey. In that sense, the target audience of this tutorial is someone who is familiar with AI/ML and has used Transformer models before, but is at the beginning of their text summarization journey and wants to dive deeper into it. Because it’s written by a “beginner” and for beginners, I want to stress the fact that this tutorial is a practical guide – not the practical guide. Please treat it as if George E.P. Box had said:

In terms of how much technical knowledge is required in this tutorial: It does involve some coding in Python, but most of the time we just use the code to call APIs, so no deep coding knowledge is required, either. It’s helpful to be familiar with certain concepts of ML, such as what it means to train and deploy a model, the concepts of training, validation, and test datasets, and so on. Also having dabbled with the Transformers library before might be useful, because we use this library extensively throughout this tutorial. I also include useful links for further reading for these concepts.

Because this tutorial is written by a beginner, I don’t expect NLP experts and advanced deep learning practitioners to get much of this tutorial. At least not from a technical perspective – you might still enjoy the read, though, so please don’t leave just yet! But you will have to be patient with regards to my simplifications – I tried to live by the concept of making everything in this tutorial as simple as possible, but not simpler.

Structure of this tutorial

This series stretches over four sections split into two posts, in which we go through different stages of a text summarization project. In the first post (section 1), we start by introducing a metric for text summarization tasks – a measure of performance that allows us to assess whether a summary is good or bad. We also introduce the dataset we want to summarize and create a baseline using a no-ML model – we use a simple heuristic to generate a summary from a given text. Creating this baseline is a vitally important step in any ML project because it enables us to quantify how much progress we make by using AI going forward. It allows us to answer the question “Is it really worth investing in AI technology?”

In the second post, we use a model that already has been pre-trained to generate summaries (section 2). This is possible with a modern approach in ML called transfer learning. It’s another useful step because we basically take an off-the-shelf model and test it on our dataset. This allows us to create another baseline, which helps us see what happens when we actually train the model on our dataset. The approach is called zero-shot summarization, because the model has had zero exposure to our dataset.

After that, it’s time to use a pre-trained model and train it on our own dataset (section 3). This is also called fine-tuning. It enables the model to learn from the patterns and idiosyncrasies of our data and slowly adapt to it. After we train the model, we use it to create summaries (section 4).

To summarize:

Part 1:
- Section 1: Use a no-ML model to establish a baseline
Part 2:
- Section 2: Generate summaries with a zero-shot model
- Section 3: Train a summarization model
- Section 4: Evaluate the trained model

The entire code for this tutorial is available in the following GitHub repo.

What will we have achieved by the end of this tutorial?

By the end of this tutorial, we won’t have a text summarization model that can be used in production. We won’t even have a good summarization model (insert scream emoji here)!

What we will have instead is a starting point for the next phase of the project, which is the experimentation phase. This is where the “science” in data science comes in, because now it’s all about experimenting with different models and different settings to understand whether a good enough summarization model can be trained with the available training data.

And, to be completely transparent, there is a good chance that the conclusion will be that the technology is just not ripe yet and that the project will not be implemented. And you have to prepare your business stakeholders for that possibility. But that’s a topic for another post.

Section 1: Use a no-ML model to establish a baseline

This is the first section of our tutorial on setting up a text summarization project. In this section, we establish a baseline using a very simple model, without actually using ML. This is a very important step in any ML project, because it allows us to understand how much value ML adds over the time of the project and if it’s worth investing in it.

The code for the tutorial can be found in the following GitHub repo.

Data, data, data

Every ML project starts with data! If possible, we always should use data related to what we want to achieve with a text summarization project. For example, if our goal is to summarize patent applications, we should also use patent applications to train the model. A big caveat for an ML project is that the training data usually needs to be labeled. In the context of text summarization, that means we need to provide the text to be summarized as well as the summary (the label). Only by providing both can the model learn what a good summary looks like.

In this tutorial, we use a publicly available dataset, but the steps and code remain exactly the same if we use a custom or private dataset. And again, if you have an objective in mind for your text summarization model and have corresponding data, please use your data instead to get the most out of this.

The data we use is the arXiv dataset, which contains abstracts of arXiv papers as well as their titles. For our purpose, we use the abstract as the text we want to summarize and the title as the reference summary. All the steps of downloading and preprocessing the data are available in the following notebook. We require an AWS Identity and Access Management (IAM) role that permits loading data to and from Amazon Simple Storage Service (Amazon S3) in order to run this notebook successfully. The dataset was developed as part of the paper On the Use of ArXiv as a Dataset and is licensed under the Creative Commons CC0 1.0 Universal Public Domain Dedication.

The data is split into three datasets: training, validation, and test data. If you want to use your own data, make sure this is the case too. The following diagram illustrates how we use the different datasets.

Naturally, a common question at this point is: How much data do we need? As you can probably already guess, the answer is: it depends. It depends on how specialized the domain is (summarizing patent applications is quite different from summarizing news articles), how accurate the model needs to be to be useful, how much the training of the model should cost, and so on. We return to this question at a later point when we actually train the model, but the short of it is that we have to try out different dataset sizes when we’re in the experimentation phase of the project.

What makes a good model?

In many ML projects, it’s rather straightforward to measure a model’s performance. That’s because there is usually little ambiguity around whether the model’s result is correct. The labels in the dataset are often binary (True/False, Yes/No) or categorical. In any case, it’s easy in this scenario to compare the model’s output to the label and mark it as correct or incorrect.

When generating text, this becomes more challenging. The summaries (the labels) we provide in our dataset are only one way to summarize text. But there are many possibilities to summarize a given text. So, even if the model doesn’t match our label 1:1, the output might still be a valid and useful summary. So how do we compare the model’s summary with the one we provide? The metric that is used most often in text summarization to measure the quality of a model is the ROUGE score. To understand the mechanics of this metric, refer to The Ultimate Performance Metric in NLP. In summary, the ROUGE score measures the overlap of n-grams (contiguous sequence of n items) between the model’s summary (candidate summary) and the reference summary (the label we provide in our dataset). But, of course, this is not a perfect measure. To understand its limitations, check out To ROUGE or not to ROUGE?

So, how do we calculate the ROUGE score? There are quite a few Python packages out there to compute this metric. To ensure consistency, we should use the same method throughout our project. Because we will, at a later point in this tutorial, use a training script from the Transformers library instead of writing our own, we can just peek into the source code of the script and copy the code that computes the ROUGE score:

from datasets import load_metric
metric = load_metric("rouge")

def calc_rouge_scores(candidates, references):
    result = metric.compute(predictions=candidates, references=references, use_stemmer=True)
    result = {key: round(value.mid.fmeasure * 100, 1) for key, value in result.items()}
    return result

By using this method to compute the score, we ensure that we always compare apples to apples throughout the project.

This function computes several ROUGE scores: rouge1, rouge2, rougeL, and rougeLsum. The “sum” in rougeLsum refers to the fact that this metric is computed over a whole summary, whereas rougeL is computed as the average over individual sentences. So, which ROUGE score we should use for our project? Again, we have to try different approaches in the experimentation phase. For what it’s worth, the original ROUGE paper states that “ROUGE-2 and ROUGE-L worked well in single document summarization tasks” while “ROUGE-1 and ROUGE-L perform great in evaluating short summaries.”

Create the baseline

Next up we want to create the baseline by using a simple, no-ML model. What does that mean? In the field of text summarization, many studies use a very simple approach: they take the first n sentences of the text and declare it the candidate summary. They then compare the candidate summary with the reference summary and compute the ROUGE score. This is a simple yet powerful approach that we can implement in a few lines of code (the entire code for this part is in the following notebook):

import re

ref_summaries = list(df_test['summary'])

for i in range (3):
    candidate_summaries = list(df_test['text'].apply(lambda x: ' '.join(re.split(r'(?<=[.:;])s', x)[:i+1])))
    print(f"First {i+1} senctences: Scores {calc_rouge_scores(candidate_summaries, ref_summaries)}")

We use the test dataset for this evaluation. This makes sense because after we train the model, we also use the same test dataset for the final evaluation. We also try different numbers for n: we start with only the first sentence as the candidate summary, then the first two sentences, and finally the first three sentences.

The following screenshot shows the results for our first model.

The ROUGE scores are highest, with only the first sentence as the candidate summary. This means that taking more than one sentence makes the summary too verbose and leads to a lower score. So that means we will use the scores for the one-sentence summaries as our baseline.

It’s important to note that, for such a simple approach, these numbers are actually quite good, especially for the rouge1 score. To put these numbers in context, we can refer to Pegasus Models, which shows the scores of a state-of-the-art model for different datasets.

Conclusion and what’s next

In Part 1 of our series, we introduced the dataset that we use throughout the summarization project as well as a metric to evaluate summaries. We then created the following baseline with a simple, no-ML model.

In the next post, we use a zero-shot model – specifically, a model that has been specifically trained for text summarization on public news articles. However, this model won’t be trained at all on our dataset (hence the name “zero-shot”).

I leave it to you as homework to guess on how this zero-shot model will perform compared to our very simple baseline. On the one hand, it will be a much more sophisticated model (it’s actually a neural network). On the other hand, it’s only used to summarize news articles, so it might struggle with the patterns that are inherent to the arXiv dataset.

About the Author

Optimize customer engagement with reinforcement learning

March 23, 2022

by Taylor Names Amazon AWS

This is a guest post co-authored by Taylor Names, Staff Machine Learning Engineer, Dev Gupta, Machine Learning Manager, and Argie Angeleas, Senior Product Manager at Ibotta. Ibotta is an American technology company that enables users with its desktop and mobile apps to earn cash back on in-store, mobile app, and online purchases with receipt submission, linked retailer loyalty accounts, payments, and purchase verification.

Ibotta strives to recommend personalized promotions to better retain and engage its users. However, promotions and user preferences are constantly evolving. This ever-changing environment with many new users and new promotions is a typical cold start problem—there is no sufficient historical user and promotion interactions to draw any inferences from. Reinforcement learning (RL) is an area of machine learning (ML) concerned with how intelligent agents should take action in an environment in order to maximize the notion of cumulative rewards. RL focuses on finding a balance between exploring uncharted territory and exploiting current knowledge. Multi-armed bandit (MAB) is a classic reinforcement learning problem that exemplifies the exploration/exploitation tradeoff: maximizing reward in the short-term (exploitation) while sacrificing the short-term reward for knowledge that can increase rewards in the long term (exploration). A MAB algorithm explores and exploits optimal recommendations for the user.

Ibotta collaborated with the Amazon Machine Learning Solutions Lab to use MAB algorithms to increase user engagement when the user and promotion information is highly dynamic.

We selected a contextual MAB algorithm because it’s effective in the following use cases:

Making personalized recommendations according to users’ state (context)
Dealing with cold start aspects such as new bonuses and new customers
Accommodating recommendations where users’ preferences change over time

Data

To increase bonus redemptions, Ibotta desires to send personalized bonuses to customers. Bonuses are Ibotta’s self-funded cash incentives, which serve as the actions of the contextual multi-armed bandit model.

The bandit model uses two sets of features:

Action features – These describe the actions, such as bonus type and average amount of the bonus
Customer features – These describe customers’ historical preferences and interactions, such as past weeks’ redemptions, clicks, and views

The contextual features are derived from historical customer journeys, which contained 26 weekly activity metrics generated from users’ interactions with the Ibotta app.

Contextual multi-armed bandit

Bandit is a framework for sequential decision-making in which the decision-maker sequentially chooses an action, potentially based on the current contextual information, and observes a reward signal.

We set up the contextual multi-armed bandit workflow on Amazon SageMaker using the built-in Vowpal Wabbit (VW) container. SageMaker helps data scientists and developers prepare, build, train, and deploy high-quality ML models quickly by bringing together a broad set of capabilities purpose-built for ML. The model training and testing are based on offline experimentation. The bandit learns user preferences based on their feedback from past interactions rather than a live environment. The algorithm can switch to production mode, where SageMaker remains as the supporting infrastructure.

To implement the exploration/exploitation strategy, we built the iterative training and deployment system that performs the following actions:

Recommends an action using the contextual bandit model based on user context
Captures the implicit feedback over time
Continuously trains the model with incremental interaction data

The workflow of the client application is as follows:

The client application picks a context, which is sent to the SageMaker endpoint to retrieve an action.
The SageMaker endpoint returns an action, associated bonus redemption probability, and event_id.
Because this simulator was generated using historical interactions, the model knows the true class for that context. If the agent selects an action with redemption, the reward is 1. Otherwise, the agent obtains a reward of 0.

In the case where historical data is available and is in the format of <state, action, action probability, reward>, Ibotta can warm start a live model by learning the policy offline. Otherwise, Ibotta can initiate a random policy for day 1 and start to learn a bandit policy from there.

The following is the code snippet to train the model:

hyperparameters = {
    "exploration_policy": "egreedy" , # supports "egreedy", "bag", "cover"
    "epsilon": 0.01 , # used if egreedy is the exploration policy
    "num_policies": 3 , # used if bag or cover is the exploration policy
    "num_arms": 9,
}       

job_name_prefix = "ibotta-testbed-bandits-1"

vw_image_uri = "462105765813.dkr.ecr.us-east-1.amazonaws.com/sagemaker-rl-vw-container:vw-8.7.0-cpu"

# Train the estimator

rl_estimator = RLEstimator(entry_point='train-vw_new.py', 
                           source_dir="src", 
                           image_uri=vw_image_uri, 
                           role=role, 
                           output_path=s3_output_path, 
                           base_job_name=job_name_prefix, 
                           instance_type=instance_type, 
                           instance_count=1, 
                           hyperparameters=hyperparameters)

rl_estimator.fit(“s3 bucket/ibotta.csv”, wait=True)

Model performance

We randomly split the redeemed interactions as training data (10,000 interactions) and evaluation data (5,300 holdout interactions).

Evaluation metrics are the mean reward, where 1 indicates the recommended action was redeemed, and 0 indicates the recommended action didn’t get redeemed.

We can determine the mean reward as follows:

Mean reward (redeem rate) = (# of recomended actions with redemption)/(total # recommended actions)

The following table shows the mean reward result:

Mean Reward	Uniform Random Recommendation	Contextual MAB-based Recommendation
Train	11.44%	56.44%
Test	10.69%	59.09%

The following figure plots the incremental performance evaluation during training, where the x-axis is the number of records learned by the model and the y-axis is the incremental mean reward. The blue line indicates the multi-armed bandit; the orange line indicates random recommendations.

The graph shows that the predicted mean reward increases over the iterations, and the predicted action reward is significantly greater than the random assignment of actions.

We can use previously trained models as warm starts and batch retrain the model with new data. In this case, model performance already converged through initial training. No significant additional performance improvement was observed in new batch retraining, as shown in the following figure.

We also compared contextual bandit with uniformly random and posterior random (random recommendation using historical user preference distribution as warm start) policies. The results are listed and plotted as follows:

Bandit – 59.09% mean reward (training 56.44%)
Uniform random – 10.69% mean reward (training 11.44%)
Posterior probability random – 34.21% mean reward (training 34.82%)

The contextual multi-armed bandit algorithm outperformed the other two policies significantly.

Summary

The Amazon ML Solutions Lab collaborated with Ibotta to develop a contextual bandit reinforcement learning recommendation solution using a SageMaker RL container.

This solution demonstrated a steady incremental redemption rate lift over random (five-times lift) and non-contextual RL (two-times lift) recommendations based on an offline test. With this solution, Ibotta can establish a dynamic user-centric recommendation engine to optimize customer engagement. Compared to random recommendation, the solution improved recommendation accuracy (mean reward) from 11% to 59%, according to the offline test. Ibotta plans to integrate this solution into more personalization use cases.

“The Amazon ML Solutions Lab worked closely with Ibotta’s Machine Learning team to build a dynamic bonus recommendation engine to increase redemptions and optimize customer engagement. We created a recommendation engine leveraging reinforcement learning that learns and adapts to the ever-changing customer state and cold starts new bonuses automatically. Within 2 months, the ML Solutions Lab scientists developed a contextual multi-armed bandit reinforcement learning solution using a SageMaker RL container. The contextual RL solution showed a steady increase in redemption rates, achieving a five-times lift in bonus redemption rate over random recommendation, and a two-times lift over a non-contextual RL solution. The recommendation accuracy improved from 11% using random recommendation to 59% using the ML Solutions Lab solution. Given the effectiveness and flexibility of this solution, we plan to integrate this solution into more Ibotta personalization use cases to further our mission of making every purchase rewarding for our users.”

– Heather Shannon, Senior Vice President of Engineering & Data at Ibotta.

About the Authors

Taylor Names is a staff machine learning engineer at Ibotta, focusing on content personalization and real-time demand forecasting. Prior to joining Ibotta, Taylor led machine learning teams in the IoT and clean energy spaces.

Dev Gupta is an engineering manager at Ibotta Inc, where he leads the machine learning team. The ML team at Ibotta is tasked with providing high-quality ML software, such as recommenders, forecasters, and internal ML tools. Before joining Ibotta, Dev worked at Predikto Inc, a machine learning startup, and The Home Depot. He graduated from the University of Florida.

Argie Angeleas is a Senior Product Manager at Ibotta, where he leads the Machine Learning and Browser Extension squads. Before joining Ibotta, Argie worked as Director of Product at iReportsource. Argie obtained his PhD in Computer Science and Engineering from Wright State University.

Fang Wang is a Senior Research Scientist at the Amazon Machine Learning Solutions Lab, where she leads the Retail Vertical, working with AWS customers across various industries to solve their ML problems. Before joining AWS, Fang worked as Sr. Director of Data Science at Anthem, leading the medical claim processing AI platform. She obtained her master’s in Statistics from the University of Chicago.

Xin Chen is a senior manager at the Amazon Machine Learning Solutions Lab, where he leads the Central US, Greater China Region, LATAM, and Automotive Vertical. He helps AWS customers across different industries identify and build machine learning solutions to address their organization’s highest return-on-investment machine learning opportunities. Xin obtained his PhD in Computer Science and Engineering from the University of Notre Dame.

Raj Biswas is a Data Scientist at the Amazon Machine Learning Solutions Lab. He helps AWS customers develop ML-powered solutions across diverse industry verticals for their most pressing business challenges. Prior to joining AWS, he was a graduate student at Columbia University in Data Science.

Xinghua Liang is an Applied Scientist at the Amazon Machine Learning Solutions Lab, where he works with customers across various industries, including manufacturing and automotive, and helps them to accelerate their AI and cloud adoption. Xinghua obtained his PhD in Engineering from Carnegie Mellon University.

Yi Liu is an applied scientist with Amazon Customer Service. She is passionate about using the power of ML/AI to optimize user experience for Amazon customers and help AWS customers build scalable cloud solutions. Her science work in Amazon spans membership engagement, online recommendation system, and customer experience defect identification and resolution. Outside of work, Yi enjoys traveling and exploring nature with her dog.

Making DeepSpeed ZeRO run efficiently on more-affordable hardware

March 23, 2022

by admin Amazon AWS

Amazon researchers optimize the distributed-training tool to run efficiently on the Elastic Fabric Adapter network interface.Read More

Expedite IVR development with industry grammars on Amazon Lex

March 23, 2022

by John Heater Amazon AWS

Amazon Lex is a service for building conversational interfaces into any application using voice and text. With Amazon Lex, you can easily build sophisticated, natural language, conversational bots (chatbots), virtual agents, and interactive voice response (IVR) systems. You can now use industry grammars to accelerate IVR development on Amazon Lex as part of your IVR migration effort. Industry grammars are a set of XML files made available as a grammar slot type. You can select from a range of pre-built industry grammars across domains, such as financial services, insurance, and telecom. In this post, we review the industry grammars for these industries and use them to create IVR experiences.

Financial services

You can use Amazon Lex in the financial services domain to automate customer service interactions such as credit card payments, mortgage loan applications, portfolio status, and account updates. During these interactions, the IVR flow needs to collect several details, including credit card number, mortgage loan ID, and portfolio details, to fulfill the user’s request. We use the financial services industry grammars in the following sample conversation:

Agent: Welcome to ACME bank. To get started, can I get your account ID?

User: Yes, it’s AB12345.

IVR: Got it. How can I help you?

User: I’d like to transfer funds to my savings account.

IVR: Sure. How much would you like to transfer?

User: $100

IVR: Great, thank you.

The following grammars are supported for financial services: account ID, credit card number, transfer amount, and different date formats such as expiration date (mm/yy) and payment date (mm/dd).

Let’s review the sample account ID grammar. You can refer to the other grammars in the documentation.

<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         xml:lang="en-US" version="1.0"
         root="main"
         mode="voice"
         tag-format="semantics/1.0">


        <!-- Test Cases

        Grammar will support the following inputs:

            Scenario 1:
                Input: My account number is A B C 1 2 3 4
                Output: ABC1234

            Scenario 2:
                Input: My account number is 1 2 3 4 A B C
                Output: 1234ABC

            Scenario 3:
                Input: Hmm My account number is 1 2 3 4 A B C 1
                Output: 123ABC1
        -->

        <rule id="main" scope="public">
            <tag>out=""</tag>
            <item><ruleref uri="#alphanumeric"/><tag>out += rules.alphanumeric.alphanum;</tag></item>
            <item repeat="0-1"><ruleref uri="#alphabets"/><tag>out += rules.alphabets.letters;</tag></item>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out += rules.digits.numbers</tag></item>
        </rule>

        <rule id="text">
            <item repeat="0-1"><ruleref uri="#hesitation"/></item>
            <one-of>
                <item repeat="0-1">account number is</item>
                <item repeat="0-1">Account Number</item>
                <item repeat="0-1">Here is my Account Number </item>
                <item repeat="0-1">Yes, It is</item>
                <item repeat="0-1">Yes It is</item>
                <item repeat="0-1">Yes It's</item>
                <item repeat="0-1">My account Id is</item>
                <item repeat="0-1">This is the account Id</item>
                <item repeat="0-1">account Id</item>
            </one-of>
        </rule>

        <rule id="hesitation">
          <one-of>
             <item>Hmm</item>
             <item>Mmm</item>
             <item>My</item>
          </one-of>
        </rule>

        <rule id="alphanumeric" scope="public">
            <tag>out.alphanum=""</tag>
            <item><ruleref uri="#alphabets"/><tag>out.alphanum += rules.alphabets.letters;</tag></item>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out.alphanum += rules.digits.numbers</tag></item>
        </rule>

        <rule id="alphabets">
            <item repeat="0-1"><ruleref uri="#text"/></item>
            <tag>out.letters=""</tag>
            <tag>out.firstOccurence=""</tag>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out.firstOccurence += rules.digits.numbers; out.letters += out.firstOccurence;</tag></item>
            <item repeat="1-">
                <one-of>
                    <item>A<tag>out.letters+='A';</tag></item>
                    <item>B<tag>out.letters+='B';</tag></item>
                    <item>C<tag>out.letters+='C';</tag></item>
                    <item>D<tag>out.letters+='D';</tag></item>
                    <item>E<tag>out.letters+='E';</tag></item>
                    <item>F<tag>out.letters+='F';</tag></item>
                    <item>G<tag>out.letters+='G';</tag></item>
                    <item>H<tag>out.letters+='H';</tag></item>
                    <item>I<tag>out.letters+='I';</tag></item>
                    <item>J<tag>out.letters+='J';</tag></item>
                    <item>K<tag>out.letters+='K';</tag></item>
                    <item>L<tag>out.letters+='L';</tag></item>
                    <item>M<tag>out.letters+='M';</tag></item>
                    <item>N<tag>out.letters+='N';</tag></item>
                    <item>O<tag>out.letters+='O';</tag></item>
                    <item>P<tag>out.letters+='P';</tag></item>
                    <item>Q<tag>out.letters+='Q';</tag></item>
                    <item>R<tag>out.letters+='R';</tag></item>
                    <item>S<tag>out.letters+='S';</tag></item>
                    <item>T<tag>out.letters+='T';</tag></item>
                    <item>U<tag>out.letters+='U';</tag></item>
                    <item>V<tag>out.letters+='V';</tag></item>
                    <item>W<tag>out.letters+='W';</tag></item>
                    <item>X<tag>out.letters+='X';</tag></item>
                    <item>Y<tag>out.letters+='Y';</tag></item>
                    <item>Z<tag>out.letters+='Z';</tag></item>
                </one-of>
            </item>
        </rule>

        <rule id="digits">
            <item repeat="0-1"><ruleref uri="#text"/></item>
            <tag>out.numbers=""</tag>
            <item repeat="1-10">
                <one-of>
                    <item>0<tag>out.numbers+=0;</tag></item>
                    <item>1<tag>out.numbers+=1;</tag></item>
                    <item>2<tag>out.numbers+=2;</tag></item>
                    <item>3<tag>out.numbers+=3;</tag></item>
                    <item>4<tag>out.numbers+=4;</tag></item>
                    <item>5<tag>out.numbers+=5;</tag></item>
                    <item>6<tag>out.numbers+=6;</tag></item>
                    <item>7<tag>out.numbers+=7;</tag></item>
                    <item>8<tag>out.numbers+=8;</tag></item>
                    <item>9<tag>out.numbers+=9;</tag></item>
                </one-of>
            </item>
        </rule>
</grammar>

Using the industry grammar for financial services

To create the sample bot and add the grammars, perform the following steps. This creates an Amazon Lex bot called Financialbot and adds the grammars for financial services, which we store in Amazon Simple Storage Service (Amazon S3):

Download the Amazon Lex bot definition.
On the Amazon Lex console, choose Actions and then choose Import.
Choose the Financialbot.zip file that you downloaded, and choose Import.
Copy the grammar XML files for financial services, listed in the preceding section.
On the Amazon S3 console, upload the XML files.
Navigate to the slot types on the Amazon Lex console and choose the accountID slot type so you can associate the fin_accountNumber.grxml file.
In the slot type, enter the Amazon S3 link for the XML file and the object key.
Choose Save slot type.

The AWS Identity and Access Management (IAM) role used to create the bot must have permission to read files from the S3 bucket.

Repeat steps 6–8 for the transferFunds slot type with fin_transferAmount.grxml.
After you save the grammars, choose Build.
Download the financial services contact flow to integrate it with the Amazon Lex bot via Amazon Connect.
On the Amazon Connect console, choose Contact flows.
In the Amazon Lex section, select your Amazon Lex bot and make it available for use in the Amazon Connect contact flows.
Select the contact flow to load it into the application.
Test the IVR flow by calling in to the phone number.

Insurance

You can use Amazon Lex in the insurance domain to automate customer service interactions such as claims processing, policy management, and premium payments. During these interactions, the IVR flow needs to collect several details, including policy ID, license plate, and premium amount, to fulfill the policy holder’s request. We use the insurance industry grammars in the following sample conversation:

Agent: Welcome to ACME insurance company. To get started, can I get your policy ID?

Caller: Yes, it’s AB1234567.

IVR: Got it. How can I help you?

Caller: I’d like to file a claim.

IVR: Sure. Is this claim regarding your auto policy or home owners’ policy?

Caller: Auto

IVR: What’s the license plate on the vehicle?

Caller: ABCD1234

IVR: Thank you. And how much is the claim for?

Caller: $900

IVR: What was the date and time of the accident?

Caller: March 1st 2:30pm.

IVR: Thank you. I’ve got that started for you. Someone from our office should be in touch with you shortly. Your claim ID is 12345.

The following grammars are supported for the insurance domain: policy ID, driver’s license, social security number, license plate, claim number, and renewal date.

Let’s review the sample claimDateTime grammar. You can refer to the other grammars in the documentation.

<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         xml:lang="en-US" version="1.0"
         root="main"
         mode="voice"
         tag-format="semantics/1.0">

         <!-- Test Cases

         Grammar will support the following inputs:

             Scenario 1:
                 Input: The accident occured at july three at five am
                 Output:  july 3 5am

             Scenario 2:
                 Input: Damage was reported at july three at five am
                 Output:  july 3 5am

             Scenario 3:
                 Input: Schedule virtual inspection for july three at five am
                 Output:  july 3 5am
         -->

        <rule id="main" scope="public">
            <tag>out=""</tag>
            <item repeat="1-10">
                <item><ruleref uri="#months"/><tag>out = out + rules.months + " ";</tag></item>
                <one-of>
                    <item><ruleref uri="#digits"/><tag>out += rules.digits + " ";</tag></item>
                    <item><ruleref uri="#teens"/><tag>out += rules.teens+ " ";</tag></item>
                    <item><ruleref uri="#above_twenty"/><tag>out += rules.above_twenty+ " ";</tag></item>
                </one-of>
                <item><ruleref uri="#at"/><tag>out += rules.at.new;</tag></item>
                <item repeat="0-1"><ruleref uri="#mins"/><tag>out +=":" + rules.mins.min;</tag></item>
                <item><ruleref uri="#ampm"/><tag>out += rules.ampm;</tag></item>
            </item>
            <item repeat="0-1"><ruleref uri="#thanks"/></item>
        </rule>

        <rule id="text">
           <one-of>
             <item repeat="0-1">The accident occured at</item>
             <item repeat="0-1">Time of accident is</item>
             <item repeat="0-1">Damage was reported at</item>
             <item repeat="0-1">Schedule virtual inspection for</item>
           </one-of>
        </rule>

        <rule id="thanks">
            <one-of>
               <item>Thanks</item>
               <item>I think</item>
            </one-of>
          </rule>

        <rule id="months">
           <item repeat="0-1"><ruleref uri="#text"/></item>
           <one-of>
             <item>january<tag>out="january";</tag></item>
             <item>february<tag>out="february";</tag></item>
             <item>march<tag>out="march";</tag></item>
             <item>april<tag>out="april";</tag></item>
             <item>may<tag>out="may";</tag></item>
             <item>june<tag>out="june";</tag></item>
             <item>july<tag>out="july";</tag></item>
             <item>august<tag>out="august";</tag></item>
             <item>september<tag>out="september";</tag></item>
             <item>october<tag>out="october";</tag></item>
             <item>november<tag>out="november";</tag></item>
             <item>december<tag>out="december";</tag></item>
             <item>jan<tag>out="january";</tag></item>
             <item>feb<tag>out="february";</tag></item>
             <item>aug<tag>out="august";</tag></item>
             <item>sept<tag>out="september";</tag></item>
             <item>oct<tag>out="october";</tag></item>
             <item>nov<tag>out="november";</tag></item>
             <item>dec<tag>out="december";</tag></item>
           </one-of>
       </rule>

        <rule id="digits">
            <one-of>
                <item>0<tag>out=0;</tag></item>
                <item>1<tag>out=1;</tag></item>
                <item>2<tag>out=2;</tag></item>
                <item>3<tag>out=3;</tag></item>
                <item>4<tag>out=4;</tag></item>
                <item>5<tag>out=5;</tag></item>
                <item>6<tag>out=6;</tag></item>
                <item>7<tag>out=7;</tag></item>
                <item>8<tag>out=8;</tag></item>
                <item>9<tag>out=9;</tag></item>
                <item>first<tag>out=1;</tag></item>
                <item>second<tag>out=2;</tag></item>
                <item>third<tag>out=3;</tag></item>
                <item>fourth<tag>out=4;</tag></item>
                <item>fifth<tag>out=5;</tag></item>
                <item>sixth<tag>out=6;</tag></item>
                <item>seventh<tag>out=7;</tag></item>
                <item>eighth<tag>out=8;</tag></item>
                <item>ninth<tag>out=9;</tag></item>
                <item>one<tag>out=1;</tag></item>
                <item>two<tag>out=2;</tag></item>
                <item>three<tag>out=3;</tag></item>
                <item>four<tag>out=4;</tag></item>
                <item>five<tag>out=5;</tag></item>
                <item>six<tag>out=6;</tag></item>
                <item>seven<tag>out=7;</tag></item>
                <item>eight<tag>out=8;</tag></item>
                <item>nine<tag>out=9;</tag></item>
            </one-of>
        </rule>


      <rule id="at">
        <tag>out.new=""</tag>
        <item>at</item>
        <one-of>
          <item repeat="0-1"><ruleref uri="#digits"/><tag>out.new+= rules.digits</tag></item>
          <item repeat="0-1"><ruleref uri="#teens"/><tag>out.new+= rules.teens</tag></item>
        </one-of>
      </rule>

      <rule id="mins">
        <tag>out.min=""</tag>
        <item repeat="0-1">:</item>
        <item repeat="0-1">and</item>
        <one-of>
          <item repeat="0-1"><ruleref uri="#digits"/><tag>out.min+= rules.digits</tag></item>
          <item repeat="0-1"><ruleref uri="#teens"/><tag>out.min+= rules.teens</tag></item>
          <item repeat="0-1"><ruleref uri="#above_twenty"/><tag>out.min+= rules.above_twenty</tag></item>
        </one-of>
      </rule>

      <rule id="ampm">
            <tag>out=""</tag>
            <one-of>
                <item>AM<tag>out="am";</tag></item>
                <item>PM<tag>out="pm";</tag></item>
                <item>am<tag>out="am";</tag></item>
                <item>pm<tag>out="pm";</tag></item>
            </one-of>
        </rule>


        <rule id="teens">
            <one-of>
                <item>ten<tag>out=10;</tag></item>
                <item>tenth<tag>out=10;</tag></item>
                <item>eleven<tag>out=11;</tag></item>
                <item>twelve<tag>out=12;</tag></item>
                <item>thirteen<tag>out=13;</tag></item>
                <item>fourteen<tag>out=14;</tag></item>
                <item>fifteen<tag>out=15;</tag></item>
                <item>sixteen<tag>out=16;</tag></item>
                <item>seventeen<tag>out=17;</tag></item>
                <item>eighteen<tag>out=18;</tag></item>
                <item>nineteen<tag>out=19;</tag></item>
                <item>tenth<tag>out=10;</tag></item>
                <item>eleventh<tag>out=11;</tag></item>
                <item>twelveth<tag>out=12;</tag></item>
                <item>thirteenth<tag>out=13;</tag></item>
                <item>fourteenth<tag>out=14;</tag></item>
                <item>fifteenth<tag>out=15;</tag></item>
                <item>sixteenth<tag>out=16;</tag></item>
                <item>seventeenth<tag>out=17;</tag></item>
                <item>eighteenth<tag>out=18;</tag></item>
                <item>nineteenth<tag>out=19;</tag></item>
            </one-of>
        </rule>

        <rule id="above_twenty">
            <one-of>
                <item>twenty<tag>out=20;</tag></item>
                <item>thirty<tag>out=30;</tag></item>
            </one-of>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out += rules.digits;</tag></item>
        </rule>
</grammar>

Using the industry grammar for insurance

To create the sample bot and add the grammars, perform the following steps. This creates an Amazon Lex bot called InsuranceBot and adds the grammars for the insurance domain:

Download the Amazon Lex bot definition.
On the Amazon Lex console, choose Actions, then choose Import.
Choose the InsuranceBot.zip file that you downloaded, and choose Import.
Copy the grammar XML files for insurance, listed in the preceding section.
On the Amazon S3 console, upload the XML files.
Navigate to the slot types on the Amazon Lex console and select the policyID slot type so you can associate the ins_policyNumber.grxml grammar file.
In the slot type, enter the Amazon S3 link for the XML file and the object key.
Choose Save slot type.

The IAM role used to create the bot must have permission to read files from the S3 bucket.

Repeat steps 6–8 for the licensePlate slot type (ins_NJ_licensePlateNumber.grxml) and dateTime slot type (ins_claimDateTime.grxml).
After you save the grammars, choose Build.
Download the insurance contact flow to integrate with the Amazon Lex bot.
On the Amazon Connect console, choose Contact flows.
In the Amazon Lex section, and select your Lex bot and make it available for use in the Amazon Connect contact flows.
Select the contact flow to load it into the application.
Test the IVR flow by calling in to the phone number.

Telecom

You can use Amazon Lex in the telecom domain to automate customer service interactions such as activating service, paying bills, and managing device installations. During these interactions, the IVR flow needs to collect several details, including SIM number, zip code, and the service start date, to fulfill the user’s request. We use the financial services industry grammars in the following sample conversation:

Agent: Welcome to ACME cellular. To get started, can I have the telephone number associated with your account?

User: Yes, it’s 123 456 7890.

IVR: Thanks. How can I help you?

User: I am calling to activate my service.

IVR: Sure. What’s the SIM number on the device?

IVR: 12345ABC

IVR: Ok. And can I have the zip code?

User: 12345

IVR: Great, thank you. The device has been activated.

The following grammars are supported for telecom: SIM number, device serial number, zip code, phone number, service start date, and ordinals.

Let’s review the sample SIM number grammar. You can refer to the other grammars in the documentation.

<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         xml:lang="en-US" version="1.0"
         root="main"
         mode="voice"
         tag-format="semantics/1.0">


        <!-- Test Cases

        Grammar will support the following inputs:

            Scenario 1:
                Input: My SIM number is A B C 1 2 3 4
                Output: ABC1234

            Scenario 2:
                Input: My SIM number is 1 2 3 4 A B C
                Output: 1234ABC

            Scenario 3:
                Input: My SIM number is 1 2 3 4 A B C 1
                Output: 123ABC1
        -->

        <rule id="main" scope="public">
            <tag>out=""</tag>
            <item><ruleref uri="#alphanumeric"/><tag>out += rules.alphanumeric.alphanum;</tag></item>
            <item repeat="0-1"><ruleref uri="#alphabets"/><tag>out += rules.alphabets.letters;</tag></item>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out += rules.digits.numbers</tag></item>
        </rule>

        <rule id="text">
            <item repeat="0-1"><ruleref uri="#hesitation"/></item>
            <one-of>
                <item repeat="0-1">My SIM number is</item>
                <item repeat="0-1">SIM number is</item>
            </one-of>
        </rule>

        <rule id="hesitation">
          <one-of>
             <item>Hmm</item>
             <item>Mmm</item>
             <item>My</item>
          </one-of>
        </rule>

        <rule id="alphanumeric" scope="public">
            <tag>out.alphanum=""</tag>
            <item><ruleref uri="#alphabets"/><tag>out.alphanum += rules.alphabets.letters;</tag></item>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out.alphanum += rules.digits.numbers</tag></item>
        </rule>

        <rule id="alphabets">
            <item repeat="0-1"><ruleref uri="#text"/></item>
            <tag>out.letters=""</tag>
            <tag>out.firstOccurence=""</tag>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out.firstOccurence += rules.digits.numbers; out.letters += out.firstOccurence;</tag></item>
            <item repeat="1-">
                <one-of>
                    <item>A<tag>out.letters+='A';</tag></item>
                    <item>B<tag>out.letters+='B';</tag></item>
                    <item>C<tag>out.letters+='C';</tag></item>
                    <item>D<tag>out.letters+='D';</tag></item>
                    <item>E<tag>out.letters+='E';</tag></item>
                    <item>F<tag>out.letters+='F';</tag></item>
                    <item>G<tag>out.letters+='G';</tag></item>
                    <item>H<tag>out.letters+='H';</tag></item>
                    <item>I<tag>out.letters+='I';</tag></item>
                    <item>J<tag>out.letters+='J';</tag></item>
                    <item>K<tag>out.letters+='K';</tag></item>
                    <item>L<tag>out.letters+='L';</tag></item>
                    <item>M<tag>out.letters+='M';</tag></item>
                    <item>N<tag>out.letters+='N';</tag></item>
                    <item>O<tag>out.letters+='O';</tag></item>
                    <item>P<tag>out.letters+='P';</tag></item>
                    <item>Q<tag>out.letters+='Q';</tag></item>
                    <item>R<tag>out.letters+='R';</tag></item>
                    <item>S<tag>out.letters+='S';</tag></item>
                    <item>T<tag>out.letters+='T';</tag></item>
                    <item>U<tag>out.letters+='U';</tag></item>
                    <item>V<tag>out.letters+='V';</tag></item>
                    <item>W<tag>out.letters+='W';</tag></item>
                    <item>X<tag>out.letters+='X';</tag></item>
                    <item>Y<tag>out.letters+='Y';</tag></item>
                    <item>Z<tag>out.letters+='Z';</tag></item>
                </one-of>
            </item>
        </rule>

        <rule id="digits">
            <item repeat="0-1"><ruleref uri="#text"/></item>
            <tag>out.numbers=""</tag>
            <item repeat="1-10">
                <one-of>
                    <item>0<tag>out.numbers+=0;</tag></item>
                    <item>1<tag>out.numbers+=1;</tag></item>
                    <item>2<tag>out.numbers+=2;</tag></item>
                    <item>3<tag>out.numbers+=3;</tag></item>
                    <item>4<tag>out.numbers+=4;</tag></item>
                    <item>5<tag>out.numbers+=5;</tag></item>
                    <item>6<tag>out.numbers+=6;</tag></item>
                    <item>7<tag>out.numbers+=7;</tag></item>
                    <item>8<tag>out.numbers+=8;</tag></item>
                    <item>9<tag>out.numbers+=9;</tag></item>
                </one-of>
            </item>
        </rule>
</grammar>

Using the industry grammar for telecom

To create the sample bot and add the grammars, perform the following steps. This creates an Amazon Lex bot called TelecomBot and adds the grammars for telecom:

Download the Amazon Lex bot definition.
On the Amazon Lex console, choose Actions, then choose Import.
Choose the TelecomBot.zip file that you downloaded, and choose Import.
Copy the grammar XML files for the telecom domain, listed in the preceding section.
On the Amazon S3 console, upload the XML files.
Navigate to the slot types on the Amazon Lex console and select phoneNumber so you can associate the tel_phoneNumber.grxml grammar.
In the slot type, enter the Amazon S3 link for the XML file and the object key.
Choose Save slot type.

The IAM role used to create the bot must have permission to read files from the S3 bucket.

Repeat steps 6–8 for the slot types SIM number (tel_simNumber.grxml) and zipcode (tel_usZipcode.grxml).
After you save the grammars, choose Build.
Download the insurance contact flow to integrate with the Amazon Lex bot.
On the Amazon Connect console, choose Contact flows.
In the Amazon Lex section, and select your Amazon Lex bot and make it available for use in the Amazon Connect contact flows.
Select the contact flow to load it into the application.
Test the IVR flow by calling in to the phone number.

Test the solution

You can call in to the Amazon Connect phone number and interact with the bot. You can also test the solution directly on the Amazon Lex V2 console using voice or text.

Conclusion

Industry grammars provide a set of pre-built XML files that you can use to quickly create IVR flows. You can select grammars to enable customer service conversations for use cases across financial services, insurance, and telecom. The grammars are available as a grammar slot type and can be used in an Amazon Lex bot configuration. You can download the grammars and enable these via the Amazon Lex V2 console or SDK. The capability is available in all AWS Regions where Amazon Lex operates in the English (Australia), English (UK), and English (US) locales.

To learn more, refer to Using a custom grammar slot type.

About the Authors

John Heater has over 15 years of experience in AI and automation. As the SVP of the Contact Center Practice at NeuraFlash, he leads the implementation of the latest AI and automation techniques for a portfolio of products and customer solutions.

Sandeep Srinivasan is a Product Manager on the Amazon Lex team. As a keen observer of human behavior, he is passionate about customer experience. He spends his waking hours at the intersection of people, technology, and the future.

Easily migrate your IVR flows to Amazon Lex using the IVR migration tool

March 22, 2022

by John Heater Amazon AWS

This post was co-written by John Heater, SVP of the Contact Center Practice at NeuraFlash. NeuraFlash is an Advanced AWS Partner with over 40 collective years of experience in the voice and automation space. With a dedicated team of conversation designers, data engineers, and AWS developers, NeuraFlash helps customers take advantage of the power of Amazon Lex in their contact centers.

Amazon Lex provides automatic speech recognition and natural language understanding technologies so you can build sophisticated conversational experiences and create effective interactive voice response (IVR) flows. A native integration with Amazon Connect, AWS’s cloud-based contact center, enables the addition of a conversational interface to any call center application. You can design IVR experiences to identify user requests and fulfill these by running the appropriate business logic.

Today, NeuraFlash, an AWS APN partner, launched a migration tool on AWS Marketplace that helps you easily migrate your VoiceXML (VXML) IVR flows to Amazon Lex and Amazon Connect. The migration tool takes the VXML configuration and grammar XML files as input and provides an Amazon Lex bot definition. It also supports grammars and Amazon Connect contact flows so you can quickly get started with your IVR conversational experiences.

In this post, we cover the use of IVR migration tool and review the resulting Amazon Lex bot definition and Amazon Connect contact flows.

Sample conversation overview

You can use the sample VXML and grammar files as input to try out the tool. The sample IVR supports the following conversation:

IVR: Welcome to ACME bank. For verification, can I get the last four on SSN on the account?

Caller: Yes, it’s 1234.

IVR: Great. And the date of birth for the primary holder?

Caller: Jan 1st 2000.

IVR: Thank you. How can I help you today?

Caller: I’d like to make a payment.

IVR: Sure. What’s the credit card number?

Caller: 1234 5678 1234 5678

IVR: Got it. What’s the CVV?

Caller: 123

IVR: How about the expiration date?

Caller: Jan 2025.

IVR: Great. How much are we paying today?

Caller: $100

IVR: Thank you. Your payment of $100 on card ending in 5678 is processed. Anything else we can help you with?

Caller: No thanks.

IVR: Have a great day.

Migration tool overview

The following diagram illustrates the architecture of the migration tool.

You can access the migration tool in the AWS Marketplace. Follow the instructions to upload your VXML and grammar XML files.

The tool processes the input XML files to create an IVR flow. You can download the Amazon Connect contact flow, Amazon Lex bot definition, and supporting grammar files.

Migration methodology

The IVR migration tool analyzes the uploaded IVR application and generates an Amazon Lex bot, Amazon Connect flows, and SRGS grammar files. One bot is generated per VXML application (or VXML file). Each input state in the VXML file is mapped to a dialog prompt in the Amazon Lex bot. The corresponding grammar file for the input state is used to create a grammar slot. For the Amazon Connect flow, each VXML file maps to a node in the IVR flow. Within the flow, a GetCustomerInputBlock hands off the control to Amazon Lex to manage the dialog.

Let’s consider the following VXML content in the sample dialog for user verification. You can download the VerifyAccount VXML file.

<?xml version="1.0" encoding="UTF-8"?>

<vxml version="1.0" application="app_root.vxml">


<!--*** Verify user with SSN ***-->
<form id="Verify_SSN">
  <field name="Verify_SSN">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/last4ssn.grxml'"/>
      <prompt>
            <audio expr="'./prompts/Verify_SSN/Init.wav'">
                To verify your account, can I please have the last four digits of your social security number.
            </audio>
        </prompt>
<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/Verify_SSN/nm1.wav'">
         I'm sorry, I didn't understand. Please tell me the last four digits of your social security number.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/Verify_SSN/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter the last four digits of your social security number.  You can also say I dont know if you do not have it.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/Verify_SSN/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please tell me the last four digits of your social security number.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/Verify_SSN/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter the last four digits of your social security number.  You can also say I dont know if you do not have it.
  </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
            <if cond="Verify_SSN.option == 'agent'">
                <assign name="transfer_reason" expr="'agent'"/>
                <goto next="transfer.vxml"/>
            <elseif cond="Verify_SSN.option == 'dunno'" />
                <assign name="transfer_reason" expr="'no_ssn'"/>
                <goto next="transfer.vxml"/>
            <else/>
                <assign name="last4_ssn" expr="Verify_SSN.option"/>
                <goto next="#Verify_DOB"/>
            </if>
        </filled>
    </field>
</form>

<!--*** Verify user with date of birth ***-->
<form id="Verify_DOB">
  <field name="Verify_DOB">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/dateofbirth.grxml'"/>
      <prompt>
            <audio expr="'./prompts/Verify_DOB/Init.wav'">
                Thank you.  And can I also have your date of birth?
            </audio>
        </prompt>

<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/Verify_DOB/nm1.wav'">
         I'm sorry, I didn't understand. Please say your date of birth.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/Verify_DOB/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter your date of birth.  For example, you can say July twenty fifth nineteen eighty or enter zero seven two five one nine eight zero.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/Verify_DOB/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please say your date of birth.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/Verify_DOB/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter your date of birth.  For example, you can say July twenty fifth nineteen eighty or enter zero seven two five one nine eight zero.
   </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
            <if cond="Verify_DOB.option == 'agent'">
                <assign name="transfer_reason" expr="'agent'"/>
                <goto next="transfer.vxml"/>
            <else/>
                <assign name="date_of_birth" expr="Verify_DOB.option"/>
                <goto next="validate_authentication.vxml"/>
            </if>
        </filled>
    </field>
</form>


</vxml>

In addition to the preceding VXML file, we include the following SRGS grammars from the IVR application in the IVR migration tool:

last4SSN.grxml – Grammar to recognize the last four digits of the Social Security number
dateOfBirth.grxml – Grammar to recognize the date of birth

An Amazon Lex bot is created that to verify the caller. The Verification bot has one intent (VerifyAccount).

The bot has two slots (SSN, DOB) that reference the grammar files for the SSN and date of birth grammars, respectively. You can download the last4SSN.grxml and dateOfBirth.grxml grammar files as output to create the custom slot types in Amazon Lex.

In another example of a payment flow, the IVR migration tool reads in the payment collection flows to generate an Amazon Lex bot that can handle payments. You can download the corresponding Payment VXML file and SRGS grammars.

<?xml version="1.0" encoding="UTF-8"?>

<vxml version="1.0" application="app_root.vxml">


<!--*** Collect the users credit card for payment ***-->
<form id="CreditCard_Collection">
  <field name="CreditCard_Collection">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/creditcard.grxml'"/>
      <prompt>
            <audio expr="'./prompts/CreditCard_Collection/Init.wav'">
                To start your payment, can I please have your credit card number.
            </audio>
        </prompt>
<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/CreditCard_Collection/nm1.wav'">
         I'm sorry, I didn't understand. Please tell me your credit card number.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/CreditCard_Collection/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter your credit card number.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/CreditCard_Collection/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please tell me your credit card number.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/CreditCard_Collection/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter your credit card number.
  </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
                <assign name="creditcard_number" expr="CreditCard_Collection.option"/>
                <goto next="#ExpirationDate_Collection"/>
        </filled>
    </field>
</form>

<!--*** Collect the credit card expiration date ***-->
<form id="ExpirationDate_Collection">
  <field name="ExpirationDate_Collection">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/creditcard_expiration.grxml'"/>
      <prompt>
            <audio expr="'./prompts/ExpirationDate_Collection/Init.wav'">
                Thank you.  Now please provide your credit card expiration date.
            </audio>
        </prompt>

<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/ExpirationDate_Collection/nm1.wav'">
         I'm sorry, I didn't understand. Please say the expiration date.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/ExpirationDate_Collection/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter your credit card expiration date.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/ExpirationDate_Collection/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please say the expiration date.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/ExpirationDate_Collection/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter your credit card expiration date.
   </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
            <if cond="ExpirationDate_Collection.option == 'agent'">
                <assign name="transfer_reason" expr="'agent'"/>
                <goto next="transfer.vxml"/>
            <else/>
                <assign name="creditcard_expiration" expr="ExpirationDate_Collection.option"/>
                <goto next="#CVV_Collection"/>
            </if>
        </filled>
    </field>
</form>

<!--*** Collect the credit card CVV number ***-->
<form id="CVV_Collection">
  <field name="CVV_Collection">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/creditcard_cvv.grxml'"/>
      <prompt>
            <audio expr="'./prompts/CVV_Collection/Init.wav'">
                Almost done.  Now please tell me the CVV code.
            </audio>
        </prompt>

<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/CVV_Collection/nm1.wav'">
         I'm sorry, I didn't understand. Please say the CVV on the credit card.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/CVV_Collection/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter the credit card CVV.  It can be found on the back of the card.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/CVV_Collection/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please say the CVV on the credit card.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/CVV_Collection/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter the credit card CVV.  It can be found on the back of the card.
   </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
            <if cond="CVV_Collection.option == 'agent'">
                <assign name="transfer_reason" expr="'agent'"/>
                <goto next="transfer.vxml"/>
            <else/>
                <assign name="creditcard_cvv" expr="CVV_Collection.option"/>
                <goto next="#PaymentAmount_Collection"/>
            </if>
        </filled>
    </field>
</form>

<!--*** Collect the payment amount ***-->
<form id="PaymentAmount_Collection">
  <field name="PaymentAmount_Collection">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/amount.grxml'"/>
      <prompt>
            <audio expr="'./prompts/PaymentAmount_Collection/Init.wav'">
                Finally, please tell me how much you will be paying.  You can also say full amount.
            </audio>
        </prompt>

<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/PaymentAmount_Collection/nm1.wav'">
         I'm sorry, I didn't understand. Please say the amount of your payment.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/PaymentAmount_Collection/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter your payment amount.  If you will be paying in full you can just say full amount.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/PaymentAmount_Collection/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please say the amount of your payment.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/PaymentAmount_Collection/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter your payment amount.  If you will be paying in full you can just say full amount.
   </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
            <if cond="PaymentAmount_Collection.option == 'agent'">
                <assign name="transfer_reason" expr="'agent'"/>
                <goto next="transfer.vxml"/>
            <elseif cond="Verify_SSN.option == 'full_amount'" />
                <assign name="creditcard_amount" expr="'full''"/>
                <goto next="processpayment.vxml"/>
            <else/>
                <assign name="creditcard_amount" expr="PaymentAmount_Collection.option"/>
                <goto next="processpayment.vxml"/>
            </if>
        </filled>
    </field>
</form>

</vxml>

In addition to the preceding VXML file, we include the following SRGS grammars from the IVR application in the IVR migration tool:

creditCard.grxml – Grammar to recognize the credit card number
creditCardExpiration.grxml – Grammar to recognize the date of birth
creditCardCVV.grxml – Grammar to recognize the CVV on the credit card
paymentAmount.grxml – Grammar to recognize the payment amount

An Amazon Lex bot is created to collect the payment details. The Payment bot has one intent (MakePayment).

The bot has four slots (credit card number, expiration date, CVV, payment amount) that reference the grammar file. You can download the creditCard.grxml, creditCardExpiration.grxml, creditCardCVV.grxml, and paymentAmount.grxml grammar files as output to create the custom slot types in Amazon Lex.

Lastly, the migration tool provides the payment IVR contact flow to manage the end-to-end conversation.

Conclusion

Amazon Lex enables you to easily build sophisticated, natural language conversational experiences. The IVR migration tool allows you to easily migrate your VXML IVR flows to Amazon Lex. The tool provides the bot definitions and grammars in addition to the Amazon Connect contact flows. It enables you to migrate your IVR flows as is and get started on Amazon Lex, giving you the flexibility to build out the conversational experience at your own pace.

Use the migration tool on AWS Marketplace and migrate your IVR to Amazon Lex today.

About the Authors

How Amazon Search achieves low-latency, high-throughput T5 inference with NVIDIA Triton on AWS

March 22, 2022

by RJ Amazon AWS

Amazon Search’s vision is to enable customers to search effortlessly. Our spelling correction helps you find what you want even if you don’t know the exact spelling of the intended words. In the past, we used classical machine learning (ML) algorithms with manual feature engineering for spelling correction. To make the next generational leap in spelling correction performance, we are embracing a number of deep-learning approaches, including sequence-to-sequence models. Deep learning (DL) models are compute-intensive both in training and inference, and these costs have historically made DL models impractical in a production setting at Amazon’s scale. In this post, we present the results of an inference optimization experimentation where we overcome those obstacles and achieve 534% inference speed-up for the popular Hugging Face T5 Transformer.

Challenge

The Text-to-Text Transfer Transformer (T5, Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Reffel et al) is the state-of-the-art natural language processing (NLP) model architecture. T5 is a promising architecture for spelling correction, that we found to perform well in our experiments. T5 models are easy to research, develop, and train, thanks to open-source deep learning frameworks and ongoing academic and enterprise research.

However, it’s difficult to achieve production-grade, low-latency inference with a T5. For example, a single inference with a PyTorch T5 takes 45 milliseconds on one of the four NVIDIA V100 Tensor Core GPUs equipping an Amazon Elastic Compute Cloud (EC2) p3.8xlarge instance. (All inference numbers reported are for an input of 9 tokens and output of 11 tokens. The latency of T5 architectures is sensitive to both input and output lengths.)

Low-latency, cost-efficient T5 inference at scale is a known difficulty that has been reported by several AWS customers beyond Amazon Search, which boosts our motivation to contribute this post. To go from an offline, scientific achievement to a customer-facing production service, Amazon Search faces the following challenges:

Latency – How to realize T5 inference in less than 50-millisecond P99 latency
Throughput – How to handle large-scale concurrent inference requests
Cost efficiency – How to keep costs under control

In the rest of this post, we explain how the NVIDIA inference optimization stack—namely the NVIDIA TensorRT compiler and the open source NVIDIA Triton Inference Server—solves those challenges. Read NVIDIA’s press release to learn about the updates.

NVIDIA TensorRT: Reducing costs and latency with inference optimization

Deep learning frameworks are convenient to iterate fast on the science, and come with numerous functionalities for scientific modeling, data loading, and training optimization. However, most of those tools are suboptimal for inference, which only requires a minimal set of operators for matrix multiplication and activation functions. Therefore, significant gains can be realized by using a specialized, prediction-only application instead of running inference in the deep learning development framework.

NVIDIA TensorRT is an SDK for high-performance deep learning inference. TensorRT delivers both an optimized runtime, using low-level optimized kernels available on NVIDIA GPUs, and an inference-only model graph, which rearranges inference computation in an optimized order.

In the following section, we will talk about the details happening behind TensorRT and how it speeds performance.

Reduced Precision maximizes throughput with FP16 or INT8 by quantizing models while maintaining correctness.
Layer and Tensor Fusion optimizes use of GPU memory and bandwidth by fusing nodes in a kernel to avoid kernel launch latency.
Kernel Auto-Tuning selects best data layers and algorithms based on the target GPU platform and data kernel shapes.
Dynamic Tensor Memory minimizes memory footprint by freeing unnecessary memory consumption of intermediate results and reuses memory for tensors efficiently.
Multi-Stream Execution uses a scalable design to process multiple input streams in parallel with dedicated CUDA streams.
Time Fusion optimizes recurrent neural networks over time steps with dynamically generated kernels.

T5 uses transformer layers as building blocks for its architectures. The latest release of NVIDIA TensorRT 8.2 introduces new optimizations for the T5 and GPT-2 models for real-time inference. In the following table, we can see the speedup with TensorRT on some public T5 models running on Amazon EC2G4dn instances, powered by NVIDIA T4 GPUs and EC2 G5 instances, powered by NVIDIA A10G GPUs.

Model	Instance	Baseline Pytorch Latency (ms)			TensorRT 8.2 Latency (ms)						Speedup vs. the HF baseline
		FP32			FP32			FP16			FP32	FP16
		Encoder	Decoder	End to End	Encoder	Decoder	End to End	Encoder	Decoder	End to End	End to End	End to End
t5-small	g4dn.xlarge	5.98	9.74	30.71	1.28	2.25	7.54	0.93	1.59	5.91	407.40%	519.34%
	g5.xlarge	4.63	7.56	24.22	0.61	1.05	3.99	0.47	0.80	3.19	606.66%	760.01%
t5-base	g4dn.xlarge	11.61	19.05	78.44	3.18	5.45	19.59	3.15	2.96	13.76	400.48%	569.97%
	g5.xlarge	8.59	14.23	59.98	1.55	2.47	11.32	1.54	1.65	8.46	530.05%	709.20%

For more information about optimizations and replication of the attached performance, refer to Optimizing T5 and GPT-2 for Real-Time Inference with NVIDIA TensorRT.

It is important to note that compilation preserves model accuracy, as it operates on the inference environment and the computation scheduling, leaving the model science unaltered – unlike weight removal compression such as distillation or pruning. NVIDIA TensorRT allows to combine compilation with quantization for further gains. Quantization has double benefits on recent NVIDIA hardware: it reduces memory usage, and enables the use of NVIDIA Tensor Cores, DL-specific cells that run a fused matrix-multiply-add in mixed precision.

In the case of the Amazon Search experimentation with Hugging Face T5 model, replacing PyTorch with TensorRT for model inference increases speed by 534%.

NVIDIA Triton: Low-latency, high-throughput inference serving

Modern model serving solutions can transform offline trained models into customer-facing ML-powered products. To maintain reasonable costs at such a scale, it’s important to keep serving overhead low (HTTP handling, preprocessing and postprocessing, CPU-GPU communication), and fully take advantage of the parallel processing ability of GPUs.

NVIDIA Triton is an inference serving software proposing wide support of model runtimes (NVIDIA TensorRT, ONNX, PyTorch, XGBoost among others) and infrastructure backends, including GPUs, CPU and AWS Inferentia.

ML practitioners love Triton for multiple reasons. Its dynamic batching ability allows to accumulate inference requests during a user-defined delay and within a maximal user-defined batch size, so that GPU inference is batched, amortizing the CPU-GPU communication overhead. Note that dynamic batching happens server-side and within very short time frames, so that the requesting client still has a synchronous, near-real-time invocation experience. Triton users also enjoy its concurrent model execution capacity. GPUs are powerful multitaskers that excel in executing compute-intensive workloads in parallel. Triton maximize the GPU utilization and throughput by using CUDA streams to run multiple model instances concurrently. These model instances can be different models from different frameworks for different use cases, or a direct copy of the same model. This translates to direct throughput improvement when you have enough idle GPU memory. Also, as Triton is not tied to a specific DL development framework, it allows scientist to fully express themselves, in the tool of their choice.

With Triton on AWS, Amazon Search expects to better serve Amazon.com customers and meet latency requirements at low cost. The tight integration between the TensorRT runtime and the Triton server facilitates the development experience. Using AWS cloud infrastructure allows to scale up or down in minutes based on throughput requirements, while maintaining the bar high or reliability and security.

How AWS lowers the barrier to entry

While Amazon Search conducted this experiment on Amazon EC2 infrastructure, other AWS services exist to facilitate the development, training and hosting of state-of-the-art deep learning solutions.

For example, AWS and NVIDIA have collaborated to release a managed implementation of Triton Inference Server in Amazon SageMaker ; for more information, see Deploy fast and scalable AI with NVIDIA Triton Inference Server in Amazon SageMaker. AWS also collaborated with Hugging Face to develop a managed, optimized integration between Amazon SageMaker and Hugging Face Transformers, the open-source framework from which Amazon Search T5 model is derived ; read more at https://aws.amazon.com/machine-learning/hugging-face/.

We encourage customers with latency-sensitive CPU and GPU deep learning serving applications to consider NVIDIA TensorRT and Triton on AWS. Let us know what you build!

Passionate about deep learning and building deep learning-based solutions for Amazon Search? Check out our careers page.

About the Authors

RJ is an engineer in Search M5 team leading the efforts for building large scale deep learning systems for training and inference. Outside of work he explores different cuisines of food and plays racquet sports.

Hemant Pugaliya is an Applied Scientist at Search M5. He works on applying latest natural language processing and deep learning research to improve customer experience on Amazon shopping worldwide. His research interests include natural language processing and large-scale machine learning systems. Outside of work, he enjoys hiking, cooking and reading.

Andy Sun is a Software Engineer and Technical Lead for Search Spelling Correction. His research interests include optimizing deep learning inference latency, and building rapid experimentation platforms. Outside of work, he enjoys filmmaking, and acrobatics.

Le Cai is a Software Engineer at Amazon Search. He works on improving Search Spelling Correction performance to help customers with their shopping experience. He is focusing on high-performance online inference and distributed training optimization for deep learning model. Outside of work, he enjoys skiing, hiking and cycling.

Anthony Ko is currently working as a software engineer at Search M5 Palo Alto, CA. He works on building tools and products for model deployment and inference optimization. Outside of work, he enjoys cooking and playing racquet sports.

Olivier Cruchant is a Machine Learning Specialist Solutions Architect at AWS, based in France. Olivier helps AWS customers – from small startups to large enterprises – develop and deploy production-grade machine learning applications. In his spare time, he enjoys reading research papers and exploring the wilderness with friends and family.

Anish Mohan is a Machine Learning Architect at NVIDIA and the technical lead for ML and DL engagements with its customers in the greater Seattle region.

Jiahong Liu is a Solution Architect on the Cloud Service Provider team at NVIDIA. He assists clients in adopting machine learning and AI solutions that leverage NVIDIA accelerated computing to address their training and inference challenges. In his leisure time, he enjoys origami, DIY projects, and playing basketball.

Eliuth Triana is a Developer Relations Manager at NVIDIA. He connects Amazon and AWS product leaders, developers, and scientists with NVIDIA technologists and product leaders to accelerate Amazon ML/DL workloads, EC2 products, and AWS AI services. In addition, Eliuth is a passionate mountain biker, skier, and poker player.

Monitoring and rewarding honest bids to increase auction revenue

March 22, 2022

by admin Amazon AWS

Amazon Scholar Alexandre Belloni discusses the implications of auction design on digital goods.Read More

New Amazon graduate research fellows announced at Carnegie Mellon

March 21, 2022

by admin Amazon AWS

Graduate Research Fellows Program, launched in 2021, supports research in automated reasoning, computer vision, robotics, language technology, machine learning, operations research, and data science.Read More

Enable conversational chatbots for telephony using Amazon Lex and the Amazon Chime SDK

March 18, 2022

by Greg Herlein Amazon AWS

Conversational AI can deliver powerful, automated, interactive experiences through voice and text. Amazon Lex is a service that combines automatic speech recognition and natural language understanding technologies, so you can build these sophisticated conversational experiences. A common application of conversational AI is found in contact centers: self-service virtual agents. We’re excited to announce that you can now use Amazon Chime SDK Public Switched Telephone Network (PSTN) audio to enable conversational self-service applications to reduce call resolution times and automate informational responses.

The Amazon Chime SDK is a set of real-time communications components that developers can use to add audio, messaging, video, and screen-sharing to your web and mobile applications. Amazon Chime SDK PSTN audio integration with Amazon Lex enables builders to develop conversational interfaces for calls to or from the public telephone network. You can now build AI-powered self-service applications such as conversational interactive voice response systems (IVRs), virtual agents, and other telephony applications that use Session Initiation Protocol (SIP) for voice communications.

In addition, we have launched several new features. Amazon Voice Focus for PSTN provides deep learning-based noise suppression to reduce unwanted noise on calls. You can also now use machine learning (ML)-driven text-to-speech in your application through our native integration to Amazon Polly. All features are now directly integrated with Amazon Chime SDK PSTN audio.

In this post, we teach you how to build a conversational IVR system for a fictitious travel service that accepts reservations over the phone using Amazon Lex.

Solution overview

Amazon Chime SDK PSTN audio makes it easy for developers to build customized telephony applications using the agility and operational simplicity of serverless AWS Lambda functions.

For this solution, we use the following components:

Amazon Chime SDK PSTN audio
AWS Lambda
Amazon Lex
Amazon Polly

Amazon Lex natively integrates with Amazon Polly to provide text-to-speech capabilities. In this post, we also enable Amazon Voice Focus to reduce background noise on phone calls. In a previous post, we showed how to integrate with Amazon Lex v1 using the API interface. That is no longer required. The heavy lifting of working with Amazon Lex and Amazon Polly is now replaced by a few simple function calls.

The following diagram illustrates the high-level design of the Amazon Chime SDK Amazon Lex chatbot system.

To help you learn to build using the Amazon Chime SDK PSTN audio service, we have published a repository of source code and documentation explaining how that source code works. The source code is in a workshop format, with each example program building upon the previous lesson. The final lesson is how to build a complete Amazon Lex-driven chatbot over the phone. That is the lesson we focus on in this post.

As part of this solution, you create the following resources:

SIP media application – A managed object that specifies a Lambda function to invoke.
SIP rule – A managed object that specifies a phone number to trigger on and which SIP media application managed object to use to invoke a Lambda function.
Phone number – An Amazon Chime SDK PSTN phone number provisioned for receiving phone calls.
Lambda function – A function written in Typescript that is integrated with the PSTN audio service. It receives invocations from the SIP media application and sends actions back that instruct the SIP media application to perform Amazon Polly and Amazon Lex tasks.

The demo code is deployed in two parts. The Amazon Lex chatbot example is one of a series of workshop examples that teach how to use Amazon Chime SDK PSTN audio. For this post, you complete the following high-level steps to deploy the chatbot:

Configure the Amazon Lex chatbot.
Clone the code from the GitHub repository.
Deploy the common resources for the workshop (including a phone number).
Deploy the Lambda function that connects Amazon Lex to the phone number.

We go through each step in detail.

Prerequisites

You must have the following prerequisites:

node V12+/npm installed
The AWS Command Line Interface (AWS CLI) installed
Node Version Manager (nvm) installed
The node modules typescript aws-sdk (using nvm) installed
AWS credentials configured for the account and Region that you use for this demo
Permissions to create Amazon Chime SIP media applications and phone numbers (make sure your service quota in us-east-1 or us-west-2 for phone numbers, voice connectors, SIP media applications, and SIP rules hasn’t been reached)
Deployment must be done in us-east-1 or us-west-2 to align with PSTN audio resources

For detailed installation instructions, including a script that can automate the installation and an AWS Cloud Development Kit (AWS CDK) project to easily create an Amazon Elastic Compute Cloud (Amazon EC2) development environment, see the workshop instructions.

Configure the Amazon Lex chatbot

You can build a complete conversational voice bot using Amazon Lex. In this example, you use the Amazon Lex console to build a bot. We skip the steps where you build the Lambda function for Amazon Lex. The focus here is how to connect Amazon Chime PSTN audio to Amazon Lex. For instructions on building custom Amazon Lex bots, refer to Amazon Lex: How It Works. In this example, we use the pre-built “book trip” example.

Create a bot

To create your chatbot, complete the following steps:

This must be in either us-east-1 or us-west-2, depending on where you deployed the Amazon Chime SDK resources using AWS CDK.

In the navigation pane, choose Bots.
Choose Create bot.
Select Start with an example.
For Bot name, enter a name (for example, BookTrip).
For Description, enter an optional description.
Under IAM permissions, select Create a role with basic Amazon Lex permissions.
Under Children’s Online Privacy Protection Act, select No.

This example doesn’t need that protection, but for your own bot creation you should select this option accordingly.

Under Idle session timeout¸ set Session timeout to 1 minute.
You can skip the Advanced settings section.
Choose Next.

For Select Language, choose your preferred language (for this post, we choose English (US)).
For Voice interaction, choose the voice you want to use.
You can enter a voice sample and choose Play to test the phrase and confirm the voice is to your liking.
Leave other settings at their default.
Choose Done.

In the Fulfilment section, enter the following text for On successful fulfilment:

Thank you!  We'll see you on {CheckInDate}.

Under Closing responses, enter the following text for Message:

Goodbye!

Choose Save intent.
Choose Build.

The build process takes a few moments to complete. When it’s finished, you can test the bot on the Amazon Lex console.

Create a version

You have now built the bot. Next, we create a version.

Navigate to the Versions page of your bot (under the bot name in the navigation pane).
Choose Create version.
Accept all the default values and choose Create.

Your new version is now listed on the Versions page.

Create an alias

Next, we create an alias.

In the navigation pane, choose Aliases.
Choose Create alias.
For Alias name, enter a name (for example, production).
Under Associate with a version, choose Version 1 on the drop-down menu.

If you had more than one version of the bot, you could choose the appropriate version here.

Choose Create.

The alias is now listed on the Aliases page.

On the Aliases page, choose the alias you just created.
Under Resource-based policy, choose Edit.
Add the following policy, which allows the Amazon Chime SDK PSTN audio to invoke Amazon Lex for you:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "SMALexAccess",
      "Effect": "Allow",
      "Principal": {
        "Service": "voiceconnector.chime.amazonaws.com"
      },
      "Action": "lex:StartConversation",
      "Resource": "<Resource-ARN-for-the-Alias>",
      "Condition": {
        "StringEquals": {
          "AWS:SourceAccount": "<account-num>"
        },
        "ArnEquals": {
          "AWS:SourceArn": "arn:aws:voiceconnector:<region>:<account-num>:*"
        }
      }
    }
  ]
}

In the preceding code, provide the resource ARN (located directly above the text box), which is the ARN for the bot alias. Also provide your account number and specify the Region you’re deploying into (us-east-1 or us-west-2). That defines the ARN of the PSTN audio control plane in your account.

Choose Save to store the policy.
Choose Copy next to the resource ARN to use in a later step.

Congratulations! You have configured an Amazon Lex bot!

In a real chatbot application, you would almost certainly implement a Lambda function to process the intents. This demo program focuses on explaining how to connect to Amazon Chime SDK PSTN audio, so we don’t go into that level of detail. For more information, refer to Add the Lambda Function as a Code Hook.

Clone the GitHub repository

You can get the code for the entire workshop by cloning the repository:

git clone https://github.com/aws-samples/amazon-chime-sdk-pstn-audio-workshop
cd amazon-chime-sdk-pstn-audio-workshop

Deploy the common resources for the workshop

This workshop uses the AWS CDK to automate the deployment of all needed resources (except the Amazon Lex bot, which you already did). To deploy, run the following code from your terminal:

cdk bootstrap
yarn deploy

The AWS CDK deploys the resources. We do the bootstrap step to make sure that AWS CDK is properly initialized in the Region you’re deploying into. Note that these examples use AWS CDK version 2.

The repository has a series of lessons that are designed to explain how to develop PSTN audio applications. We recommend reviewing these documents to understand the basics using the first few sample programs. You can then review the Lambda sample program folder. Lastly, follow the steps to configure and then deploy your code. In the terminal, enter the following command:

cd lambdas/call-lex-bot

Configure your Lambda function to use the Amazon Lex bot ARN

Open the src/index.ts source code file for the Lambda function and edit the variable botAlias near the top of the file (provide the ARN you copied earlier):

const botAlias = "<Resource-ARN-for-the-Alias>";

You can now deploy the bot with yarn deploy and swap the new Lambda function into PSTN audio with yarn swap. You can also note the welcome text in the startBotConversationAction object:

const startBotConversationAction = {
  Type: "StartBotConversation",
  Parameters: {
    BotAliasArn: "none",
    LocaleId: "en_US",
    Configuration: {
      SessionState: {
        DialogAction: {
          Type: "ElicitIntent"
        }
      },
      WelcomeMessages: [
        {
          ContentType: "PlainText",
          Content: "Welcome to AWS Chime SDK Voice Service. Please say what you would like to do.  For example: I'd like to book a room, or, I'd like to rent a car."
        },
      ]
    }
  }
}

Amazon Lex starts the bot and uses Amazon Polly to read that text. This gives the caller a greeting, and tells them what they should do next.

How it works

The following example adds more actions to what we learned in the Call and Bridge Call lesson. The NEW_INBOUND_CALL event arrives and is processed the same way. We enable Amazon Voice Focus (which enhances the ability of Amazon Lex to understand words) and then immediately hand the incoming call off to the bot with a StartBotConversation action. An example of that action looks like the following object:

{
    "SchemaVersion": "1.0",
    "Actions": [
        {
            "Type": "Pause",
            "Parameters": {
                "DurationInMilliseconds": "1000"
            }
        },
        {
            "Type": "VoiceFocus",
            "Parameters": {
                "Enable": true,
                "CallId": "2947dfba-0748-46fc-abc5-a2c21c7569eb"
            }
        },
        {
            "Type": "StartBotConversation",
            "Parameters": {
                "BotAliasArn": "arn:aws:lex:us-east-1:<account-num>:bot-alias/RQXM74UXC7/ZYXLOINIJL",
                "LocaleId": "en_US",
                "Configuration": {
                    "SessionState": {
                        "DialogAction": {
                            "Type": "ElicitIntent"
                        }
                    },
                    "WelcomeMessages": [
                        {
                            "ContentType": "PlainText",
                            "Content": "Welcome to AWS Chime SDK Voice Service. Please say what you would like to do.  For example: I'd like to order flowers."
                        }
                    ]
                }
            }
        }
    ]
}

When the bot returns an ACTION_SUCCESSFUL event, the data collected by the Amazon Lex bot is included in the event. The collected data from the bot is included, and your Lambda function can use that data if needed. However, a common practice for building Amazon Lex applications is to process in the data with the function associated with the Amazon Lex bot. Examples of the event and the returned action are provided in the workshop documentation for this session.

Sequence diagram

The following diagram shows the sequence of calls made between PSTN audio and the Lambda function:

For a more detailed explanation of the operation, refer to the workshop documentation.

Clean up

To clean up the resources used in this demo and avoid incurring further charges, complete the following steps:

In the terminal, enter the following code:

yarn destroy

Return to the workshop folder (cd ../../) and enter the following code:

yarn destroy

The AWS CloudFormation stack created by the AWS CDK is destroyed, removing all the allocated resources.

Conclusion

In this post, you learned how to build a conversational interactive voice response (IVR) system using Amazon Lex and Amazon Chime SDK PSTN audio. You can use these techniques to build your own system to reduce your own customer call resolution times and automate informational responses on your customers calls.

For more information, see the project GitHub repository and Using the Amazon Chime SDK PSTN Audio service.

About the Author

Greg Herlein has led software teams for over 25 years at large and small companies, including several startups. He is currently the Principal Evangelist for the Amazon Chime SDK service where he is passionate about how to help customers build advanced communications software.

Section 2: Generate summaries with a zero-shot model

Why zero-shot learning?

Set up a zero-shot learning pipeline

Extractive vs. abstractive summarization

Generate zero-shot summaries

Additional parameters

Model evaluation

Baseline comparison

Section 3: Train a summarization model

Set up a training job

Pass data to the training job

Start the training

Section 4: Evaluate the trained model

Deploy a model

First evaluation

Second evaluation

Conclusion and next steps

About the Author

Tutorial overview

Structure of this tutorial

What will we have achieved by the end of this tutorial?

Section 1: Use a no-ML model to establish a baseline

Data, data, data

What makes a good model?

Create the baseline

Conclusion and what’s next

About the Author

Data

Contextual multi-armed bandit

Model performance

Summary

About the Authors

Financial services

Using the industry grammar for financial services

Insurance

Using the industry grammar for insurance

Telecom

Using the industry grammar for telecom

Test the solution

Conclusion

About the Authors

Sample conversation overview

Migration tool overview

Migration methodology

Conclusion

About the Authors

Challenge

NVIDIA TensorRT: Reducing costs and latency with inference optimization

NVIDIA Triton: Low-latency, high-throughput inference serving

How AWS lowers the barrier to entry

About the Authors

Solution overview

Prerequisites

Configure the Amazon Lex chatbot

Create a bot

Create a version

Create an alias

Clone the GitHub repository

Deploy the common resources for the workshop

Configure your Lambda function to use the Amazon Lex bot ARN

How it works

Sequence diagram

Clean up

Conclusion

About the Author

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.