PyTorch Trace Analysis for the Masses

PyTorch Trace Analysis for the Masses

Introduction

We are excited to announce the public release of Holistic Trace Analysis (HTA), an open source performance analysis and visualization Python library for PyTorch users. HTA takes as input Kineto traces collected by the PyTorch profiler, which are complex and challenging to interpret, and up-levels the performance information contained in these traces. It was initially developed internally at Meta to understand and debug performance problems for large-scale distributed training jobs on GPUs. The multidisciplinary team has made a number of enhancements to HTA’s features and scaled them to support state-of-the-art ML workloads.

ML researchers and systems engineers often struggle to computationally scale up their models because they are not aware of the performance bottlenecks in their workloads. The resources requested for a job (e.g. GPUs, memory) are often misaligned with the resources actually required due to lack of visibility “under the hood”. To achieve the best performance from the hardware stack, it is imperative to understand the resource utilization and bottlenecks for distributed training workloads.

The initial HTA implementation was specifically targeted at Deep Learning Based Recommendation Models (DLRM). To make the features in HTA generic and applicable to use cases such as analyzing Vision and NLP models, we decided to refactor the HTA codebase and make the library available to the larger community. This new codebase has implemented several important ideas which lead to significant efficiency and performance improvements.

In this blog, we present several features implemented in the open source version of HTA, which can be used as a Python script as well as interactively in a Jupyter notebook. HTA provides the following features:

  1. Breakdown by Dimensions
    1. Temporal: Breakdown of GPU time in terms of time spent in computation, communication, memory events, and idle time on a single node and across all ranks.
    2. Idle Time: Breakdown of GPU idle time into waiting for the host, waiting for another kernel or attributed to an unknown cause.
    3. Kernel: Find kernels with the longest duration on each rank.
    4. Communication Computation Overlap: Calculate the percentage of time when communication overlaps computation.
  2. Statistical Analysis
    1. Kernel Duration Distribution: Distribution of average time taken by longest kernels across different ranks.
    2. CUDA Kernel Launch: Distributions of GPU kernels with very small duration, large duration, and excessive launch time.
    3. Augmented Counters (Memory bandwidth, Queue length): Augmented trace files which provide insights into memory copy bandwidth and number of outstanding operations on each CUDA stream.
  3. Patterns
    1. Frequent CUDA Kernels: Find the CUDA kernels most frequently launched by any given PyTorch or user defined operator.
  4. Trace Comparison
    1. Trace Diff: A trace comparison tool to identify and visualize the differences between traces.

HTA source code is available to users via Github. Users can request new features or build their own analysis using the core libraries and data structures provided in the codebase in addition to the features mentioned above.

GPU Training Performance Debugging 101

To understand the GPU performance in distributed training jobs, we consider how the model operators interact with the GPU devices and how such interactions are reflected in certain measurable metrics.

At a high level, we can break down the GPU operations in a model execution into three broad categories, henceforth referred to as kernel types:

  1. Computation (COMP) – Compute kernels execute compiled routines for matrix multiplication and similar numeric calculations. They are responsible for all of the number-crunching necessary for model execution.
  2. Communication (COMM) – Communication kernels are routines which are responsible for exchanging and synchronizing data between different GPU devices in a distributed training job. The NVIDIA Collective Communication Library (NCCL) is a widely used communication library and all its kernels have the prefix “nccl”. Example NCCL kernels include NCCL_AllGather, NCCL_ReduceScatter, NCCL_AllReduce, etc.
  3. Memory (MEM) – Memory kernels manage the memory allocations/deallocations on the GPU devices and data movement between the memory space on the host and the GPUs. The memory kernels include Memcpy_H2D, Memcpy_D2H, Memcpy_D2D, Memset, etc. Here, H represents the Host and D represents the GPU Device. Thus, H2D, D2H, D2D stands for Host to Device, Device to Host and Device to Device respectively.

Because a modern GPU device like the NVIDIA A100 GPU is a massively parallel device which is capable of running multiple kernels simultaneously, it is possible to overlap the computation, communication, and memory kernels to reduce the model execution time. One common technique to achieve the overlap is to utilize multiple CUDA streams. A CUDA stream is a sequence of operations that execute on a GPU device in the order in which they are issued by the host code. Different CUDA streams can be interleaved and even run concurrently, thus achieving the effect of kernel overlap.

To help understand the above concepts, Figure 1 provides a timeline of the GPU kernels in a sample distributed training job on 8 GPUs for one iteration. In the figure below, each rank represents one GPU and the kernels on each GPU run on 6 CUDA streams. In the right column of the figure, you can see names of the GPU kernels used. In the middle of the figure, you see the overlap between compute and communicate kernels. This figure is created using the plot_timeline example notebook available in HTA.

Figure 1. An example of the execution timeline of GPU Kernels across multiple ranks

Figure 1. An example of the execution timeline of GPU Kernels across multiple ranks

The performance of multiple GPU training jobs is affected by multiple factors. Among these factors, how does a model execution create and orchestrate the GPU kernels plays a critical role. HTA provides insights on how the model execution interacts with the GPU devices and highlights the opportunities for performance improvement.

With the features we built in HTA, we aim to provide users insights into “what is happening under the hood in a distributed GPU training?” We briefly describe these features in the next few paragraphs.

Features in Holistic Trace Analysis

For most users, understanding the performance of GPU training jobs is nontrivial. Thus, we built this library to simplify the task of trace analysis and provide the user useful insights by examining the model execution traces. As the first step, we developed features which are important and generic enough so that most users can benefit from this library.

Temporal Breakdown: We begin by asking whether the GPU is spending time on computation, communication, memory events, or is it idle? To answer this question, the temporal breakdown feature presents a breakdown in terms of these categories. To achieve high training efficiency the code should maximize time used by computation kernels and minimize idle time and non-compute time (time used by communication or memory kernels). This is accomplished by implementing concurrent execution of computation kernels with communication or memory kernels. Note that, during concurrent execution of computation kernels with communication/memory kernels the time spent by communication/memory kernels is accounted for under compute time.

Figure 2: Temporal Breakdown across 8 GPUs

Figure 2: Temporal Breakdown across 8 GPUs

Kernel Breakdown: It is natural to ask which kernels are taking the most amount of time. The next feature breaks down the time spent within each kernel type (COMM, COMP, MEM) and sorts them by duration. We present this information for each kernel type and for each rank as a pie chart. See figure 3 below.

Figure 3: Pie chart of top computation and communication kernels

Figure 3: Pie chart of top computation and communication kernels

Kernel Duration Distribution: Subsequently, one can also ask – for any given kernel, what is the distribution of the time spent across the ranks? To answer this, HTA generates bar graphs for the average duration of a given kernel across all ranks. Additionally, the error bars in the bar graphs show the minimum and maximum amount of time taken by a given kernel on a given rank. Figure 4 below shows a discrepancy between average duration on rank 0 as compared to other ranks. This anomalous behavior on rank 0 guides the user on where to look for possible bugs.

Figure 4: Average duration of NCCL AllReduce Kernel across 8 ranks

Figure 4: Average duration of NCCL AllReduce Kernel across 8 ranks

Communication Computation Overlap: In distributed training, a significant amount of time is spent in communication and synchronization events among multiple GPU devices. To achieve high GPU efficiency (i.e. TFLOPS/GPU) it is vital to keep the GPU doing actual computation work. In other words, a GPU should not be blocked because of waiting for data from other GPUs. One way to measure the extent to which computation is blocked by data dependencies is to calculate the computation-communication overlap. Higher GPU efficiency is observed if communication events overlap computation events. Lack of communication and computation overlap will lead to the GPU being idle, thus the efficiency would be low. Thus, the communication computation overlap feature calculates the percentage of time communication and computation overlap in a job for each rank and generates a bar graph representation. See figure below. More precisely, we measure the following ratio

(time spent in computation while communicating) / (time spent in communication)

Figure 5: Communication computation overlap

Figure 5: Communication computation overlap

Augmented Counters (Queue length, Memory bandwidth): To aid in debugging, HTA calculates the memory bandwidth statistics for D2H, H2D and D2D memory copy (memcpy) and memory set (memset) events. Additionally, HTA also computes the number of outstanding CUDA operations on each CUDA stream. We refer to this as queue length. When the queue length on a stream is 1024 or larger new events cannot be scheduled on that stream and the CPU will stall until the GPU events have processed. Additionally, HTA generates a new trace file containing tracks with the memory bandwidth and queue length time series. See Figure 6 below.

Figure 6: Memory Bandwidth and Queue Length

Figure 6: Memory Bandwidth and Queue Length

These primary features give us a peek into the system performance and help answer “what is happening in the system?”. As HTA evolves, we hope to address “why is X happening?” and also suggest possible solutions to overcome the bottlenecks.

Installation and Usage

Installation

For installing the HTA please refer to the README. In brief, the user is required to clone the repo and install the necessary Python packages via pip.

Usage

This version of Holistic Trace Analysis is currently in beta and we recommend using HTA in a Jupyter notebook. A demo notebook is provided for your convenience. To get started, import the hta package in a Jupyter notebook, create a TraceAnalysis object and off we go in exactly two lines of code.

from hta.trace_analysis import TraceAnalysis
analyzer = TraceAnalysis(trace_dir = /trace/folder/path)

Requirements

  • All trace files for a training or inference job must be stored in a unique folder.
  • Trace files are in json or gzipped json format.

FAQ

Q. How can I install HTA?

Please see the README in the root directory of the repository.

Q. Is there any documentation on the features and API in HTA?

The documentation and detailed API is available here.

Q. Can you implement feature X?

Depending on how widely the feature is needed and the level of effort required to implement it we would consider developing the feature. Please open a Github Issue and tag it with the feature-request label.

Q. Can I modify the code?

Please do and send a PR along the way, if you think it would be useful for others.

Q. How can I collect traces in PyTorch?

Please refer to this tutorial here.

Q. Can HTA be used at production scale?

Yes, please see a use case study here.

Read More

Best practices for creating Amazon Lex interaction models

Best practices for creating Amazon Lex interaction models

Amazon Lex is an AWS service for building conversational interfaces into any application using voice and text, enabling businesses to add sophisticated, natural language chatbots across different channels. Amazon Lex uses machine learning (ML) to understand natural language (normal conversational text and speech). In this post, we go through a set of best practices for using ML to create a bot that will delight your customers by accurately understanding them. This allows your bot to have more natural conversations that don’t require the user to follow a set of strict instructions. Designing and building an intelligent conversational interface is very different than building a traditional application or website, and this post will help you develop some of the new skills required.

Let’s look at some of the terminology we use frequently in this post:

  • Utterance – The phrase the user says to your live bot.
  • Sample utterance – Some examples of what users might say. These are attached to intents and used to train the bot.
  • Intent – This represents what the user meant and should be clearly connected to a response or an action from the bot. For instance, an intent that responds to a user saying hello, or an intent that can respond and take action if a user wants to order a coffee. A bot has one or more intents that utterances can be mapped to.
  • Slot – A parameter that can capture specific types of information from the utterance (for example, the time of an appointment or the customer’s name). Slots are attached to intents.
  • Slot value – Either examples of what the slot should capture, or a specific list of values for a slot (for example, large, medium, and small as values for a slot for coffee sizes).

The below image shows how all these pieces fit together to make up your bot.

A diagram showing how an interaction with an Amazon Lex bot flows through automatic speech recognition, natural language understanding, fulfilment (including conversational user experience) and back to text to speech

Building a well-designed bot requires several different considerations. These include requirements gathering and discovery, conversational design, testing through automation and with users, and monitoring and optimizing your bot. Within the conversational design aspect, there are two main elements: the interaction model and the conversational or voice user experience (CUX/VUX). CUX and VUX encompass the personality of the bot, the types of responses, the flow of the conversation, variations for modality, and how the bot handles unexpected inputs or failures. The interaction model is the piece that can take what the user said (utterance) and map it to what they meant (intent). In this post, we only look at how to design and optimize your interaction model.

Because Amazon Lex uses machine learning, that puts the creator of the bot in the role of machine teacher. When we build a bot, we need to give it all the knowledge it needs about the types of conversations it will support. We do this both by how we configure the bot (intents and slots) and the training data we give it (sample utterances and slot values). The underlying service then enriches it with knowledge about language generally, enabling it to understand phrases beyond the exact data we have given it.

The best practices listed in the following sections can support you in building a bot that will give your customers a great user experience and work well for your use case.

Creating intents

Each intent is a concept you teach your bot to understand. For instance, it could be an intent that represents someone ordering a coffee, or someone greeting your bot. You need to make sure that you make it really clear and easy for the bot to recognize that a particular utterance should be matched to that intent.

Imagine if someone gave you a set of index cards with phrases on them, each sorted into piles, but with no other context or details. They then started to give you additional index cards with phrases and asked you to add them to the right pile, simply based on the phrases on the cards in each pile. If each pile represented a clear concept with similar phrasing, this would be easy. But if there were no clear topic in each, you would struggle to work out how to match them to a pile. You may even start to use other clues, like “these are all short sentences” or “only these have punctuation.”

Your bot uses similar techniques, but remember that although ML is smart, it’s not as smart as a human, and doesn’t have all the external knowledge and context a human has. If a human with no context of what your bot does might struggle to understand what was meant, your bot likely will too. The best practices in this section can help you create intents that will be recognizable and more likely to be matched with the desired utterance.

1. Each intent should represent a single concept

Each intent should represent one concept or idea, and not just a topic. It’s okay to have multiple intents that map to the same action or response if separating them gives each a clearer, cohesive concept. Let’s look at some dos and don’ts:

  • Don’t create generic intents that group multiple concepts together.

For example, the following intent combines phrases about a damaged product and more general complaint phrases:

DamageComplaint
I've received a damaged product
i received a damaged product
I'm really frustrated
Your company is terrible at deliveries
My product is broken
I got a damaged package
I'm going to return this order
I'll never buy from you again

The following intent is another example, which combines updating personal details with updating the mobile application:

UpdateNeeded
I need to update my address
Can I update the address you have for me
How do I update my telephone number
I can't get the update for the mobile app to work
Help me update my iphone app
How do I get the latest version of the mobile app

  • Do split up intents when they have very different meanings. For example, we can split up the UpdateNeeded intent from the previous example into two intents:

UpdatePersonalDetails
I need to update my address
Can I update the address you have for me
How do I update my telephone number

UpdateMobileApp
I can't get the update for the mobile app to work
Help me update my iphone app
How do I get the latest version of the mobile app

  • Do split up intents when they have the same action or response needed, but use very different phrasing. For example, the following two intents may have the same end result, but the first is directly telling us they need to tow their car, whereas the second is only indirectly hinting that they may need their car towed.

RoadsideAssistanceRequested
I need to tow my car

Can I get a tow truck
Can you send someone out to get my car

RoadsideAssistanceNeeded
I've had an accident

I hit an animal
My car broke down

2. Reduce overlap between intents

Let’s think about that stack of index cards again. If there were cards with the same (or very similar) phrases, it would be hard to know which stack to add a new card with that phrase onto. It’s the same in this case. We want really clear-cut sets of sample utterances in each intent. The following are a few strategies:

  • Don’t create intents with very similar phrasing that have similar meanings. For example, because Amazon Lex will generalize outside of the sample utterances, phrases that aren’t clearly one specific intent could get mismatched, for instance a customer saying “I’d like to book an appointment” when there are two appointment intents, like the following:

BookDoctorsAppointment
I’d like to book a doctors appointment

BookBloodLabAppointment
I’d like to book a lab appointment

  • Do use slots to combine intents that are on the same topic and have similar phrasing. For example, by combining the two intents in the previous example, we can more accurately capture any requests for an appointment, and then use a slot to determine the correct type of appointment:

BookAppointment
I’d like to book a {appointmentType} appointment

  • Don’t create intents where one intent is subset of another. For example, as your bot grows, it can be easy to start creating intents to capture more detailed information:

BookFlight
I'd like to book a flight
book me a round trip flight
i need to book flight one way

BookOneWayFlight
book me a one-way flight
I’d like to book a one way flight
i need to book flight one way please

  • Do use slots to capture different subsets of information within an intent. For example, instead of using different intents to capture the information on the type of flight, we can use a slot to capture this:

BookFlight
I'd like to book a flight
book me a {itineraryType} flight
i need to book flight {itineraryType}
I’d like to book a {itineraryType} flight

3. Have the right amount of data

In ML, training data is key. Hundreds or thousands of samples are often needed to get good results. You’ll be glad to hear that Amazon Lex doesn’t require a huge amount of data, and in fact you don’t want to have too many sample utterances in each intent, because they may start to diverge or add confusion. However, it is key that we provide enough sample utterances to create a clear pattern for the bot to learn from.

Consider the following:

  • Have at least 15 utterances per intent.
  • Add additional utterances incrementally (batches of 10–15) so you can test the performance in stages. A larger number of utterances is not necessarily better.
  • Review intents with a large number of utterances (over 100) to evaluate if you can either remove very similar utterances, or should split the intent into multiple intents.
  • Keep the number of utterances similar across intents. This allows recognition for each intent to be balanced, and avoids accidentally biasing your bot to certain intents.
  • Regularly review your intents based on learnings from your production bot, and continue to add and adjust the utterances. Designing and developing bot is an iterative process that never stops.

4. Have diversity in your data

Amazon Lex is a conversational AI—its primary purpose is to chat with humans. Humans tend to have a large amount of variety in how they phrase things. When designing a bot, we want to make sure we’re capturing that range in our intent configuration. It’s important to re-evaluate and update your configuration and sample data on a regular basis, especially if you’re expanding or changing your user base over time. Consider the following recommendations:

  • Do have a diverse range of utterances in each intent. The following are examples of the types of diversity you should consider:
    • Utterance lengths – The following is an example of varying lengths:

BookFlight
book flight
I need to book a flight
I want to book a flight for my upcoming trip

    • Vocabulary – We need to align this with how our customers talk. You can capture this through user testing or by using the conversational logs from your bot. For example:

OrderFlowers
I want to buy flowers
Can I order flowers
I need to get flowers

    • Phrasing – We need a mix of utterances that represent the different ways our customers might phrase things. The following example shows utterances using “book” as a verb, “booking” as a noun, “flight booking” as a subject, and formal and informal language:

BookFlight
I need to book a flight
can you help with a flight booking
Flight booking is what I am looking for
please book me a flight
I'm gonna need a flight

    • Punctuation – We should include a range of common usage. We should also include non-grammatical usage if this something a customer would use (especially when typing). See the following example:

OrderFlowers
I want to order flowers.
i wanted to get flowers!
Get me some flowers... please!!

    • Slot usage – Provide sample utterances that show both using and not using slots. Use different mixes of slots across those that include them. Make sure the slots have examples with different places they could appear in the utterance. For example:

CancelAppointment
Cancel appointment
Cancel my appointment with Dr. {DoctorLastName}
Cancel appointment on {AppointmentDate} with Dr. {DoctorLastName}
Cancel my appointment on {AppointmentDate}
Can you tell Dr. {DoctorLastName} to cancel my appointment
Please cancel my doctors appointment

  • Don’t keep adding utterances that are just small variances in phrasing. Amazon Lex is able to handle generalizing these for you. For example, you wouldn’t require each of these three variations as the differences are minor:

DamagedProductComplaint
I've received a damaged product
I received a damaged product
Received damaged product

  • Don’t add diversity to some intents but not to others. We need to be consistent with the forms of diversity we add. Remember the index cards from the beginning—when an utterance isn’t clear, the bot may start to use other clues, like sentence length or punctuation, to try to make a match. There are times you may want to use this to your advantage (for example, if you genuinely want to direct all one-word phrases to a particular intent), but it’s important you avoid doing this by accident.

Creating slots

We touched on some good practices involving slots in the previous section, but let’s look at some more specific best practices for slots.

5. Use short noun or adjective phrases for slots

Slots represent something that can be captured definitively as a parameter, like the size of the coffee you want to order, or the airport you’re flying to. Consider the following:

  • Use nouns or short adjectives for your slot values. Don’t use slots for things like carrier phrases (“how do I” or “what could I”) because this will reduce the ability of Amazon Lex to generalize your utterances. Try to keep slots for values you need to capture to fulfil your intent.
  • Keep slots generally to one or two words.

6. Prefer slots over explicit values

You can use slots to generalize the phrases you’re using, but we need to stick to the recommendations we just reviewed as well. To make our slot values as easy to identify as possible, we never use values included in the slot directly in sample utterances. Keep in mind the following tips:

  • Don’t explicitly include values that could be slots in the sample utterances. For example:

OrderFlowers
I want to buy roses
I want to buy lilies
I would love to order some orchids
I would love to order some roses

  • Do use slots to reduce repetition. For example:

OrderFlowers
I want to buy {flowers}
I would love to order some {flowers}

flowers
roses
lilies
orchids

  • Don’t mix slots and real values in the sample utterances. For example:

OrderFlowers
I want to buy {flowers}
I want to buy lilies
I would love to order some {flowers}

flowers
roses
lilies
orchids

  • Don’t have intents with only slots in the sample utterances if the slot types are AlphaNumeric, Number, Date, GRXML, are very broad custom slots, or include abbreviations. Instead, expand the sample utterances by adding conversational phrases that include the slot to the sample utterances.

7. Keep your slot values coherent

The bot has to decide whether to match a slot based only on what it can learn from the values we have entered. If there is a lot of similarity or overlap within slots in the same intent, this can cause challenges with the right slot being matched.

  • Don’t have slots with overlapping values in the same intent. Try to combine them instead. For example:

pets
cat
dog
goldfish

animals
horse
cat
dog

8. Consider how the words will be transcribed

Amazon Lex uses automated speech recognition (ASR) to transcribe speech. This means that all inputs to your Amazon Lex interaction model are processed as text, even when using a voice bot. We need to remember that a transcription may vary from how users might type the same thing. Consider the following:

  • Enter acronyms, or other words whose letters should be pronounced individually, as single letters separated by a period and a space. This will more closely match how it will be transcribed. For example:

A. T. M.
A. W. S.
P. A.

  • Review the audio and transcriptions on a regular basis, so you can adjust your sample utterances or slot types. To do this, turn on conversation logs, and enable both text and audio logs, whenever possible.

9. Use the right options available for your slots

Many different types of slots and options are available, and using the best options for each of our slots can help the recognition of those slot values. We always want to take the time to understand the options before deciding on how to design our slots:

  • Use the restrict option to limit slots to a closed set of values. You can define synonyms for each value. This could be, for instance, the menu items in your restaurant.
  • Use the expand option when you want to be able to identify more than just the sample values you provide (for example, Name).
  • Turn obfuscation on for slots that are collecting sensitive data to prevent the data from being logged.
  • Use runtime hints to improve slot recognition when you can narrow down the potential options at runtime. Choosing one slot might narrow down the options for another; for example, a particular type of furniture may not have all color options.
  • Use spelling styles to capture uncommon words or words with variations in spellings such as names.

10. Use custom vocabulary for specialist domains

In most cases, a custom vocabulary is not required, but can be helpful if your users will use specialist words not common in everyday language. In this case, adding one can be helpful in making sure that your transcriptions are accurate. Keep the following in mind:

  • Do use a custom vocabulary to add words that aren’t readily recognized by Amazon Lex in voice-based conversations. This improves the speech-to-text transcription and overall customer experience.
  • Don’t use short or common words like “on,” “it,” “to,” “yes,” or “no” in a custom vocabulary.
  • Do decide how much weight to give a word based on how often the word isn’t recognized in the transcription and how rare the word is in the input. Words that are difficult to pronounce require a higher weight. Use a representative test set to determine if a weight is appropriate. You can collect an audio test set by turning on audio logging in conversation logs.
  • Do use custom slot types for lists of catalog values or entities such as product names or mutual funds.

11. GRXML slots need a strict grammar

When migrating to Amazon Lex from a service that may already have grammars in place (such as traditional automatic speech recognition engines), it is possible to reuse GRXML grammars during the new bot design process. However, when creating a completely new Amazon Lex bot, we recommend first checking if other slot types might meet your needs before using GRXML. Consider the following:

  • Do use GRXML slots only for spoken input, and not text-based interactions.
  • Don’t add the carrier phrases for the GRXML slots in the GRXML file (grammar) itself.
  • Do put carrier phrases into the slot sample utterances, such as I live in {zipCode} or {zipCode} is my zip code.
  • Do author the grammar to only capture correct slot values. For example, to capture a five-digit US ZIP code, you should only accept values that are exactly five digits.

Summary

In this post, we walked through a set of best practices that should help you as you design and build your next bot. As you take away this information, it’s important to remember that best practices are always context dependent. These aren’t rules, but guidelines to help you build a high-performing chatbot. As you keep building and optimizing your own bots, you will find some of these are more important for your use case than others, and you might add your own additional best practices. As a bot creator, you have a lot of control over how you configure your Amazon Lex bot to get the best results for your use case, and these best practices should give you a great place to start.

We can summarize the best practices in this post as follows:

  • Keep each intent to a single clear concept with a coherent set of utterances
  • Use representative, balanced, and diverse sample utterance data
  • Use slots to make intents clearer and capture data
  • Keep each slot to a single topic with a clear set of values
  • Know and use the right type of slot for your use case

For more information on Amazon Lex, check out Getting started with Amazon Lex for documentation, tutorials, how-to videos, code samples, and SDKs.


About the Author

Picture of Gillian ArmstrongGillian Armstrong is a Builder Solutions Architect. She is excited about how the Cloud is opening up opportunities for more people to use technology to solve problems, and especially excited about how cognitive technologies, like conversational AI, are allowing us to interact with computers in more human ways.

Read More

Power recommendations and search using an IMDb knowledge graph – Part 3

Power recommendations and search using an IMDb knowledge graph – Part 3

This three-part series demonstrates how to use graph neural networks (GNNs) and Amazon Neptune to generate movie recommendations using the IMDb and Box Office Mojo Movies/TV/OTT licensable data package, which provides a wide range of entertainment metadata, including over 1 billion user ratings; credits for more than 11 million cast and crew members; 9 million movie, TV, and entertainment titles; and global box office reporting data from more than 60 countries. Many AWS media and entertainment customers license IMDb data through AWS Data Exchange to improve content discovery and increase customer engagement and retention.

The following diagram illustrates the complete architecture implemented as part of this series.

In Part 1, we discussed the applications of GNNs and how to transform and prepare our IMDb data into a knowledge graph (KG). We downloaded the data from AWS Data Exchange and processed it in AWS Glue to generate KG files. The KG files were stored in Amazon Simple Storage Service (Amazon S3) and then loaded in Amazon Neptune.

In Part 2, we demonstrated how to use Amazon Neptune ML (in Amazon SageMaker) to train the KG and create KG embeddings.

In this post, we walk you through how to apply our trained KG embeddings in Amazon S3 to out-of-catalog search use cases using Amazon OpenSearch Service and AWS Lambda. You also deploy a local web app for an interactive search experience. All the resources used in this post can be created using a single AWS Cloud Development Kit (AWS CDK) command as described later in the post.

Background

Have you ever inadvertently searched a content title that wasn’t available in a video streaming platform? If yes, you will find that instead of facing a blank search result page, you find a list of movies in same genre, with cast or crew members. That’s an out-of-catalog search experience!

Out-of-catalog search (OOC) is when you enter a search query that has no direct match in a catalog. This event frequently occurs in video streaming platforms that constantly purchase a variety of content from multiple vendors and production companies for a limited time. The absence of relevancy or mapping from a streaming company’s catalog to large knowledge bases of movies and shows can result in a sub-par search experience for customers that query OOC content, thereby lowering the interaction time with the platform. This mapping can be done by manually mapping frequent OOC queries to catalog content or can be automated using machine learning (ML).

In this post, we illustrate how to handle OOC by utilizing the power of the IMDb dataset (the premier source of global entertainment metadata) and knowledge graphs.

OpenSearch Service is a fully managed service that makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more. OpenSearch is an open source, distributed search and analytics suite derived from Elasticsearch. OpenSearch Service offers the latest versions of OpenSearch, support for 19 versions of Elasticsearch (1.5 to 7.10 versions), as well as visualization capabilities powered by OpenSearch Dashboards and Kibana (1.5 to 7.10 versions). OpenSearch Service currently has tens of thousands of active customers with hundreds of thousands of clusters under management processing trillions of requests per month. OpenSearch Service offers kNN search, which can enhance search in use cases such as product recommendations, fraud detection, and image, video, and some specific semantic scenarios like document and query similarity. For more information about the natural language understanding-powered search functionalities of OpenSearch Service, refer to Building an NLU-powered search application with Amazon SageMaker and the Amazon OpenSearch Service KNN feature.

Solution overview

In this post, we present a solution to handle OOC situations through knowledge graph-based embedding search using the k-nearest neighbor (kNN) search capabilities of OpenSearch Service. The key AWS services used to implement this solution are OpenSearch Service, SageMaker, Lambda, and Amazon S3.

Check out Part 1 and Part 2 of this series to learn more about creating knowledge graphs and GNN embedding using Amazon Neptune ML.

Our OOC solution assumes that you have a combined KG obtained by merging a streaming company KG and IMDb KG. This can be done through simple text processing techniques that match titles along with the title type (movie, series, documentary), cast, and crew. Additionally, this joint knowledge graph has to be trained to generate knowledge graph embeddings through the pipelines mentioned in Part 1 and Part 2. The following diagram illustrates a simplified view of the combined KG.

To demonstrate the OOC search functionality with a simple example, we split the IMDb knowledge graph into customer-catalog and out-of-customer-catalog. We mark the titles that contain “Toy Story” as an out-of-customer catalog resource and the rest of the IMDb knowledge graph as customer catalog. In a scenario where the customer catalog is not enhanced or merged with external databases, a search for “toy story” would return any title that has the words “toy” or “story” in its metadata, with the OpenSearch text search. If the customer catalog was mapped to IMDb, it would be easier to glean that the query “toy story” doesn’t exist in the catalog and that the top matches in IMDb are “Toy Story,” “Toy Story 2,” “Toy Story 3,” “Toy Story 4,” and “Charlie: Toy Story” in decreasing order of relevance with text match. To get within-catalog results for each of these matches, we can generate five closest movies in customer catalog-based kNN embedding (of the joint KG) similarity through OpenSearch Service.

A typical OOC experience follows the flow illustrated in the following figure.

The following video shows the top five (number of hits) OOC results for the query “toy story” and relevant matches in the customer catalog (number of recommendations).

Here, the query is matched to the knowledge graph using text search in OpenSearch Service. We then map the embeddings of the text match to the customer catalog titles using the OpenSearch Service kNN index. Because the user query can’t be directly mapped to the knowledge graph entities, we use a two-step approach to first find title-based query similarities and then items similar to the title using knowledge graph embeddings. In the following sections, we walk through the process of setting up an OpenSearch Service cluster, creating and uploading knowledge graph indexes, and deploying the solution as a web application.

Prerequisites

To implement this solution, you should have an AWS account, familiarity with OpenSearch Service, SageMaker, Lambda, and AWS CloudFormation, and have completed the steps in Part 1 and Part 2 of this series.

Launch solution resources

The following architecture diagram shows the out-of-catalog workflow.

You will use the AWS Cloud Development Kit (CDK) to provision the resources required for the OOC search applications. The code to launch these resources performs the following operations:

  1. Creates a VPC for the resources.
  2. Creates an OpenSearch Service domain for the search application.
  3. Creates a Lambda function to process and load movie metadata and embeddings to OpenSearch Service indexes (**-ReadFromOpenSearchLambda-**).
  4. Creates a Lambda function that takes as input the user query from a web app and returns relevant titles from OpenSearch (**-LoadDataIntoOpenSearchLambda-**).
  5. Creates an API Gateway that adds an additional layer of security between the web app user interface and Lambda.

To get started, complete the following steps:

  1. Run the code and notebooks from Part 1 and Part 2.
  2. Navigate to the part3-out-of-catalog folder in the code repository.

  1. Launch the AWS CDK from the terminal with the command bash launch_stack.sh.
  2. Provide the two S3 file paths created in Part 2 as input:
    1. The S3 path to the movie embeddings CSV file.
    2. The S3 path to the movie node file.

  1. Wait until the script provisions all the required resources and finishes running.
  2. Copy the API Gateway URL that the AWS CDK script prints out and save it. (We use this for the Streamlit app later).

Create an OpenSearch Service Domain

For illustration purposes, you create a search domain on one Availability Zone in an r6g.large.search instance within a secure VPC and subnet. Note that the best practice would be to set up on three Availability Zones with one primary and two replica instances.

Create an OpenSearch Service index and upload data

You use Lambda functions (created using the AWS CDK launch stack command) to create the OpenSearch Service indexes. To start the index creation, complete the following steps:

  1. On the Lambda console, open the LoadDataIntoOpenSearchLambda Lambda function.
  2. On the Test tab, choose Test to create and ingest data into the OpenSearch Service index.

The following code to this Lambda function can be found in part3-out-of-catalog/cdk/ooc/lambdas/LoadDataIntoOpenSearchLambda/lambda_handler.py:

embedding_file = os.environ.get("embeddings_file")
movie_node_file = os.environ.get("movie_node_file")
print("Merging files")
merged_df = merge_data(embedding_file, movie_node_file)
print("Embeddings and metadata files merged")

print("Initializing OpenSearch client")
ops = initialize_ops()
indices = ops.indices.get_alias().keys()
print("Current indices are :", indices)

# This will take 5 minutes
print("Creating knn index")
# Create the index using knn settings. Creating OOC text is not needed
create_index('ooc_knn',ops)
print("knn index created!")

print("Uploading the data for knn index")
response = ingest_data_into_ops(merged_df, ops, ops_index='ooc_knn', post_method=post_request_emb)
print(response)
print("Upload complete for knn index")

print("Uploading the data for fuzzy word search index")
response = ingest_data_into_ops(merged_df, ops, ops_index='ooc_text', post_method=post_request)
print("Upload complete for fuzzy word search index")
# Create the response and add some extra content to support CORS
response = {
    "statusCode": 200,
    "headers": {
        "Access-Control-Allow-Origin": '*'
    },
    "isBase64Encoded": False
}

The function performs the following tasks:

  1. Loads the IMDB KG movie node file that contains the movie metadata and its associated embeddings from the S3 file paths that were passed to the stack creation file launch_stack.sh.
  2. Merges the two input files to create a single dataframe for index creation.
  3. Initializes the OpenSearch Service client using the Boto3 Python library.
  4. Creates two indexes for text (ooc_text) and kNN embedding search (ooc_knn) and bulk uploads data from the combined dataframe through the ingest_data_into_ops function.

This data ingestion process takes 5–10 minutes and can be monitored through the Amazon CloudWatch logs on the Monitoring tab of the Lambda function.

You create two indexes to enable text-based search and kNN embedding-based search. The text search maps the free-form query the user enters to the titles of the movie. The kNN embedding search finds the k closest movies to the best text match from the KG latent space to return as outputs.

Deploy the solution as a local web application

Now that you have a working text search and kNN index on OpenSearch Service, you’re ready to build a ML-powered web app.

We use the streamlit Python package to create a front-end illustration for this application. The IMDb-Knowledge-Graph-Blog/part3-out-of-catalog/run_imdb_demo.py Python file in our GitHub repo has the required code to la­­­­unch a local web app to explore this capability.

To run the code, complete the following steps:

  1. Install the streamlit and aws_requests_auth Python package in your local virtual Python environment through for following commands in your terminal:
pip install streamlit

pip install aws-requests-auth
  1. Replace the placeholder for the API Gateway URL in the code as follows with the one created by the AWS CDK:

api = '<ENTER URL OF THE API GATEWAY HERE>/opensearch-lambda?q={query_text}&numMovies={num_movies}&numRecs={num_recs}'

  1. Launch the web app with the command streamlit run run_imdb_demo.py from your terminal.

This script launches a Streamlit web app that can be accessed in your web browser. The URL of the web app can be retrieved from the script output, as shown in the following screenshot.

The app accepts new search strings, number of hits, and number of recommendations. The number of hits correspond to how many matching OOC titles we should retrieve from the external (IMDb) catalog. The number of recommendations corresponds to how many nearest neighbors we should retrieve from the customer catalog based on kNN embedding search. See the following code:

search_text=st.sidebar.text_input("Please enter search text to find movies and recommendations")
num_movies= st.sidebar.slider('Number of search hits', min_value=0, max_value=5, value=1)
recs_per_movie= st.sidebar.slider('Number of recommendations per hit', min_value=0, max_value=10, value=5)
if st.sidebar.button('Find'):
    resp= get_movies()

This input (query, number of hits and recommendations) is passed to the **-ReadFromOpenSearchLambda-** Lambda function created by the AWS CDK through the API Gateway request. This is done in the following function:

def get_movies():
    result = requests.get(api.format(query_text=search_text, num_movies=num_movies, num_recs=recs_per_movie)).json()

The output results of the Lambda function from OpenSearch Service is passed to API Gateway and is displayed in the Streamlit app.

Clean up

You can delete all the resources created by the AWS CDK through the command npx cdk destroy –app “python3 appy.py” --all in the same instance (inside the cdk folder) that was used to launch the stack (see the following screenshot).

Conclusion

In this post, we showed you how to create a solution for OOC search using text and kNN-based search using SageMaker and OpenSearch Service. You used custom knowledge graph model embeddings to find nearest neighbors in your catalog to that of IMDb titles. You can now, for example, search for “The Rings of Power,” a fantasy series developed by Amazon Prime Video, on other streaming platforms and reason how they could have optimized the search result.

For more information about the code sample in this post, see the GitHub repo. To learn more about collaborating with the Amazon ML Solutions Lab to build similar state-of-the-art ML applications, see Amazon Machine Learning Solutions Lab. For more information on licensing IMDb datasets, visit developer.imdb.com.


About the Authors

Divya Bhargavi is a Data Scientist and Media and Entertainment Vertical Lead at the Amazon ML Solutions Lab,  where she solves high-value business problems for AWS customers using Machine Learning. She works on image/video understanding, knowledge graph recommendation systems, predictive advertising use cases.

Gaurav Rele is a Data Scientist at the Amazon ML Solution Lab, where he works with AWS customers across different verticals to accelerate their use of machine learning and AWS Cloud services to solve their business challenges.

Matthew Rhodes is a Data Scientist I working in the Amazon ML Solutions Lab. He specializes in building Machine Learning pipelines that involve concepts such as Natural Language Processing and Computer Vision.

Karan Sindwani is a Data Scientist at Amazon ML Solutions Lab, where he builds and deploys deep learning models. He specializes in the area of computer vision. In his spare time, he enjoys hiking.

Soji Adeshina is an Applied Scientist at AWS where he develops graph neural network-based models for machine learning on graphs tasks with applications to fraud & abuse, knowledge graphs, recommender systems, and life sciences. In his spare time, he enjoys reading and cooking.

Vidya Sagar Ravipati is a Manager at the Amazon ML Solutions Lab, where he leverages his vast experience in large-scale distributed systems and his passion for machine learning to help AWS customers across different industry verticals accelerate their AI and cloud adoption.

Read More

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

AWS positioned in the Leaders category in the 2022 IDC MarketScape for APEJ AI Life-Cycle Software Tools and Platforms Vendor Assessment

The recently published IDC MarketScape: Asia/Pacific (Excluding Japan) AI Life-Cycle Software Tools and Platforms 2022 Vendor Assessment positions AWS in the Leaders category. This was the first and only APEJ-specific analyst evaluation focused on AI life-cycle software from IDC. The vendors evaluated for this MarketScape offer various software tools needed to support end-to-end machine learning (ML) model development, including data preparation, model building and training, model operation, evaluation, deployment, and monitoring. The tools are typically used by data scientists and ML developers from experimentation to production deployment of AI and ML solutions.

AI life-cycle tools are essential to productize AI/ML solutions. They go quite a few steps beyond AI/ML experimentation: to achieve deployment anywhere, performance at scale, cost optimization, and increasingly important, support systematic model risk management—explainability, robustness, drift, privacy protection, and more. Businesses need these tools to unlock the value of enterprise data assets at greater scale and faster speed.

Vendor Requirements for the IDC MarketScape

To be considered for the MarketScape, the vendor had to provide software products for various aspects of the end-to-end ML process under independent product stock-keeping units (SKUs) or as part of a general AI software platform. The products had to be based on the company’s own IP, and the products should have generated software license revenue or consumption-based software revenue for at least 12 months in APEJ as of March 2022. The company had to be among the top 15 vendors by the reported revenues of 2020–2021 in the APEJ region, according to IDC’s AI Software Tracker. AWS met the criteria and was evaluated by IDC along with eight other vendors.

The result of IDC’s comprehensive evaluation was published October 2022 in the IDC MarketScape: Asia/Pacific (Excluding Japan) AI Life-Cycle Software Tools and Platforms 2022 Vendor Assessment. AWS is positioned in the Leaders category based on current capabilities. The AWS strategy is to make continuous investments in AI/ML services to help customers innovate with AI and ML.

AWS position

“AWS is placed in the Leaders category in this exercise, receiving higher ratings in various assessment categories—the breadth of tooling services provided, options to lower cost for performance, quality of customer service and support, and pace of product innovation, to name a few.”

– Jessie Danqing Cai, Associate Research Director, Big Data & Analytics Practice, IDC Asia/Pacific.

The visual below is part of the MarketScape and shows the AWS position evaluated by capabilities and strategies.

The IDC MarketScape vendor analysis model is designed to provide an overview of the competitive fitness of ICT suppliers in a given market. The research methodology utilizes a rigorous scoring methodology based on both qualitative and quantitative criteria that results in a single graphical illustration of each vendor’s position within a given market. The Capabilities score measures vendor product, go-to-market, and business execution in the short term. The Strategy score measures alignment of vendor strategies with customer requirements in a 3–5-year time frame. Vendor market share is represented by the size of the icons.

Amazon SageMaker evaluated as part of the MarketScape

As part of the evaluation, IDC dove deep into Amazon SageMaker capabilities. SageMaker is a fully managed service to build, train, and deploy ML models for any use case with fully managed infrastructure, tools, and workflows. Since the launch of SageMaker in 2017, over 250 capabilities and features have been released.

ML practitioners such as data scientists, data engineers, business analysts, and MLOps professionals use SageMaker to break down barriers across each step of the ML workflow through their choice of integrated development environments (IDEs) or no-code interfaces. Starting with data preparation, SageMaker makes it easy to access, label, and process large amounts of structured data (tabular data) and unstructured data (photo, video, geospatial, and audio) for ML. After data is prepared, SageMaker offers fully managed notebooks for model building and reduces training time from hours to minutes with optimized infrastructure. SageMaker makes it easy to deploy ML models to make predictions at the best price-performance for any use case through a broad selection of ML infrastructure and model deployment options. Finally, the MLOps tools in SageMaker help you scale model deployment, reduce inference costs, manage models more effectively in production, and reduce operational burden.

The MarketScape calls out three strengths for AWS:

  • Functionality and offering – SageMaker provides a broad and deep set of tools for data preparation, model training, and deployment, including AWS-built silicon: AWS Inferentia for inference workloads and AWS Trainium for training workloads. SageMaker supports model explainability and bias detection through Amazon SageMaker Clarify.
  • Service delivery – SageMaker is natively available on AWS, the second largest public cloud platform in the APEJ region (based on IDC Public Cloud Services Tracker, IaaS+PaaS, 2021 data), with regions in Japan, Australia, New Zealand, Singapore, India, Indonesia, South Korea, and Greater China. Local zones are available to serve customers in ASEAN countries: Thailand, the Philippines, and Vietnam.
  • Growth opportunities – AWS actively contributes to open-source projects such as Gluon and engages with regional developer and student communities through many events, online courses, and Amazon SageMaker Studio Lab, a no-cost SageMaker notebook environment.

SageMaker launches at re:Invent 2022

SageMaker innovation continued at AWS re:Invent 2022, with eight new capabilities. The launches included three new capabilities for ML model governance. As the number of models and users within an organization increases, it becomes harder to set least-privilege access controls and establish governance processes to document model information (for example, input datasets, training environment information, model-use description, and risk rating). After models are deployed, customers also need to monitor for bias and feature drift to ensure they perform as expected. A new role manager, model cards, and model dashboard simplify access control and enhance transparency to support ML model governance.

There were also three launches related to Amazon SageMaker Studio notebooks. SageMaker Studio notebooks gives practitioners a fully managed notebook experience, from data exploration to deployment. As teams grow in size and complexity, dozens of practitioners may need to collaboratively develop models using notebooks. AWS continues to offer the best notebook experience for users, with the launch of three new features that help you coordinate and automate notebook code.

To support model deployment, new capabilities in SageMaker help you run shadow tests to evaluate a new ML model before production release by testing its performance against the currently deployed model. Shadow testing can help you catch potential configuration errors and performance issues before they impact end-users.

Finally, SageMaker launched support for geospatial ML, allowing data scientists and ML engineers to easily build, train, and deploy ML models using geospatial data. You can access geospatial data sources, purpose-built processing operations, pre-trained ML models, and built-in visualization tools to run geospatial ML faster and at scale.

Today, tens of thousands of customers use Amazon SageMaker to train models with billions of parameters and make over 1 trillion predictions per month. To learn more about SageMaker, visit the webpage and explore how fully managed infrastructure, tools, and workflows can help you accelerate ML model development.


About the author

Kimberly Madia is a Principal Product Marketing Manager with AWS Machine Learning. Her goal is to make it easy for customers to build, train, and deploy machine learning models using Amazon SageMaker. For fun outside work, Kimberly likes to cook, read, and run on the San Francisco Bay Trail.

Read More

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

How Thomson Reuters delivers personalized content subscription plans at scale using Amazon Personalize

This post is co-written by Hesham Fahim from Thomson Reuters.

Thomson Reuters (TR) is one of the world’s most trusted information organizations for businesses and professionals. It provides companies with the intelligence, technology, and human expertise they need to find trusted answers, enabling them to make better decisions more quickly. TR’s customers span across the financial, risk, legal, tax, accounting, and media markets.

Thomson Reuters provides market-leading products in the Tax, Legal and News campaign, which users can sign up to using a subscription licensing model. To enhance this experience for their customers, TR wanted to create a centralized recommendations platform that allowed their sales team to suggest the most relevant subscription packages to their customers, generating suggestions that help raise awareness of products that could help their customers serve the market better through tailored product selections.

Prior to building this centralized platform, TR had a legacy rules-based engine to generate renewal recommendations. The rules in this engine were predefined and written in SQL, which aside from posing a challenge to manage, also struggled to cope with the proliferation of data from TR’s various integrated data source. TR customer data is changing at a faster rate than the business rules can evolve to reflect changing customer needs. The key requirement for TR’s new machine learning (ML)-based personalization engine was centered around an accurate recommendation system that takes into account recent customer trends. The desired solution would be one with low operational overhead, the ability to accelerate delivering business goals, and a personalization engine that could be constantly trained with up-to-date data to deal with changing consumer habits and new products.

Personalizing the renewal recommendations based on what would be valuable products for TR’s customers was an important business challenge for the sales and marketing team. TR has a wealth of data that could be used for personalization that has been collected from customer interactions and stored within a centralized data warehouse. TR has been an early adopter of ML with Amazon SageMaker, and their maturity in the AI/ML domain meant that they had collated a significant dataset of relevant data within a data warehouse, which the team could train a personalization model with. TR has continued their AI/ML innovation and has recently developed a revamped recommendation platform using Amazon Personalize, which is a fully managed ML service that uses user interactions and items to generate recommendations for users. In this post, we explain how TR used Amazon Personalize to build a scalable, multi-tenanted recommender system that provides the best product subscription plans and associated pricing to their customers.

Solution architecture

The solution had to be designed considering TR’s core operations around understanding users through data; providing these users with personalized and relevant content from a large corpus of data was a mission-critical requirement. Having a well-designed recommendation system is key to getting quality recommendations that are customized to each user’s requirements.

The solution required collecting and preparing user behavior data, training an ML model using Amazon Personalize, generating personalized recommendations through the trained model, and driving marketing campaigns with the personalized recommendations.

TR wanted to take advantage of AWS managed services where possible to simplify operations and reduce undifferentiated heavy lifting. TR used AWS Glue DataBrew and AWS Batch jobs to perform the extract, transform, and load (ETL) jobs in the ML pipelines, and SageMaker along with Amazon Personalize to tailor the recommendations. From a training data volume and runtime perspective, the solution needed to be scalable to process millions of records within the time frame already committed to downstream consumers in TR’s business teams.

The following sections explain the components involved in the solution.

ML training pipeline

Interactions between the users and the content is collected in the form of clickstream data, which is generated as the customer clicks on the content. TR analyzes if this is part of their subscription plan or beyond their subscription plan so that they can provide additional details about the price and plan enrollment options. The user interactions data from various sources is persisted in their data warehouse.

The following diagram illustrates the ML training pipeline.
ML engine training pipeline
The pipeline starts with an AWS Batch job that extracts the data from the data warehouse and transforms the data to create interactions, users, and items datasets.

The following datasets are used to train the model:

  • Structured product data – Subscriptions, orders, product catalog, transactions, and customer details
  • Semi-structured behavior data – Users, usage, and interactions

This transformed data is stored in an Amazon Simple Storage Service (Amazon S3) bucket, which is imported into Amazon Personalize for ML training. Because TR wants to generate personalized recommendations for their users, they use the USER_PERSONALIZATION recipe to train ML models for their custom data, which is referred as creating a solution version. After the solution version is created, it’s used for generating personalized recommendations for the users.

The entire workflow is orchestrated using AWS Step Functions. The alerts and notifications are captured and published to Microsoft Teams using Amazon Simple Notification Service (Amazon SNS) and Amazon EventBridge.

Generating personalized recommendations pipeline: Batch inference

Customer requirements and preferences change very often, and the latest interactions captured in clickstream data serves as a key data point to understand the changing preferences of the customer. To adapt to ever-changing customer preferences, TR generates personalized recommendations on a daily basis.

The following diagram illustrates the pipeline to generate personalized recommendations.
Pipeline to generate personalized recommendations in Batch
A DataBrew job extracts the data from the TR data warehouse for the users who are eligible to provide recommendations during renewal based on the current subscription plan and recent activity. The DataBrew visual data preparation tool makes it easy for TR data analysts and data scientists to clean and normalize data to prepare it for analytics and ML. The ability to choose from over 250 pre-built transformations within the visual data preparation tool to automate data preparation tasks, all without the need to write any code, was an important feature. The DataBrew job generates an incremental dataset for interactions and input for the batch recommendations job and stores the output in a S3 bucket. The newly generated incremental dataset is imported into the interactions dataset. When the incremental dataset import job is successful, an Amazon Personalize batch recommendations job is triggered with the input data. Amazon Personalize generates the latest recommendations for the users provided in the input data and stores it in a recommendations S3 bucket.

Price optimization is the last step before the newly formed recommendations are ready to use. TR runs a cost optimization job on the recommendations generated and uses SageMaker to run custom models on the recommendations as part of this final step. An AWS Glue job curates the output generated from Amazon Personalize and transforms it into the input format required by the SageMaker custom model. TR is able to take the advantage of breadth of the services that AWS provides, using both Amazon Personalize and SageMaker in the recommendation platform to tailor recommendations based on the type of customer firm and end-users.

The entire workflow is decoupled and orchestrated using Step Functions, which gives the flexibility of scaling the pipeline depending on the data processing requirements. The alerts and notifications are captured using Amazon SNS and EventBridge.

Driving email campaigns

The recommendations generated along with the pricing results are used to drive email campaigns to TR’s customers. An AWS Batch job is used to curate the recommendations for each customer and enrich it with the optimized pricing information. These recommendations are ingested into TR’s campaigning systems, which drive the following email campaigns:

  • Automated subscription renewal or upgrade campaigns with new products that might interest the customer
  • Mid-contract renewal campaigns with better offers and more relevant products and legal content materials

The information from this process is also replicated to the customer portal so customers reviewing their current subscription can see the new renewal recommendations. TR has seen a higher conversion rate from email campaigns, leading to increased sales orders, since implementing the new recommendation platform.

What’s next: Real-time recommendations pipeline

Customer requirements and shopping behaviors change in real time, and adapting recommendations to the real-time changes is key to serving the right content. After seeing a great success deploying a batch recommendation system, TR is now planning to take this solution to the next level by implementing a real-time recommendations pipeline to generate recommendations using Amazon Personalize.

The following diagram illustrates the architecture to provide real-time recommendations.
Real-time recommendations pipeline
The real-time integration starts with collecting the live user engagement data and streaming it to Amazon Personalize. As the users are interacting with TR’s applications, they generate clickstream events, which are published into Amazon Kinesis Data Streams. Then the events are ingested into TR’s centralized streaming platform, which is built on top of Amazon Managed Streaming for Kafka (Amazon MSK). Amazon MSK makes it easy to ingest and process streaming data in real time with fully managed Apache Kafka. In this architecture, Amazon MSK serves as a streaming platform and performs any data transformations required on the raw incoming clickstream events. Then an AWS Lambda function is triggered to filter the events to the schema compatible with the Amazon Personalize dataset and push those events to an Amazon Personalize event tracker using a putEvent API. This allows Amazon Personalize to learn from your user’s most recent behavior and include relevant items in recommendations.

TR’s web applications invoke an API deployed in Amazon API Gateway to get recommendations, which triggers a Lambda function to invoke a GetRecommendations API call with Amazon Personalize. Amazon Personalize provides the latest set of personalized recommendations curated to the user behavior, which are provided back to the web applications via Lambda and API Gateway.

With this real-time architecture, TR can serve their customers with personalized recommendations curated to their most recent behavior and serve their needs better.

Conclusion

In this post, we showed you how TR used Amazon Personalize and other AWS services to implement a recommendation engine. Amazon Personalize enabled TR to accelerate the development and deployment of high-performance models to provide recommendations to their customers. TR is able to onboard a new suite of products within weeks now, compared to months earlier. With Amazon Personalize and SageMaker, TR is able to elevate the customer experience with better content subscription plans and prices for their customers.

If you enjoyed reading this blog and would like to learn more about Amazon Personalize and how it can help your organization build recommendation systems, please see the developer guide.


About the Authors

Hesham Fahim is a Lead Machine Learning Engineer and Personalization Engine Architect at Thomson Reuters. He has worked with organizations in academia and industry ranging from large enterprises to mid-sized startups. With a focus on scalable deep learning architectures, He has experience in mobile robotics, biomedical image analysis as well as recommender systems. Away from computers he enjoys astrophotography, reading and long distance biking.

Srinivasa Shaik is a Solutions Architect at AWS based in Boston. He helps Enterprise customers to accelerate their journey to the cloud. He is passionate about containers and machine learning technologies. In his spare time, he enjoys spending time with his family, cooking, and traveling.

Vamshi Krishna Enabothala is a Sr. Applied AI Specialist Architect at AWS. He works with customers from different sectors to accelerate high-impact data, analytics, and machine learning initiatives. He is passionate about recommendation systems, NLP, and computer vision areas in AI and ML. Outside of work, Vamshi is an RC enthusiast, building RC equipment (planes, cars, and drones), and also enjoys gardening.

Simone Zucchet is a Senior Solutions Architect at AWS. With over 6 years of experience as a Cloud Architect, Simone enjoys working on innovative projects that help transform the way organizations approach business problems. He helps support large enterprise customers at AWS and is part of the Machine Learning TFC. Outside of his professional life, he enjoys working on cars and photography.

Read More

Amazon’s papers at SLT

Quantization with self-adjustable centroids, contrastive predictive coding for transfer learning, teacher ensembles for differential privacy, and more — Amazon’s speech research features a battery of cutting-edge machine learning techniques.Read More

Tipping Point: NVIDIA DRIVE Scales AI-Powered Transportation at CES 2023

Tipping Point: NVIDIA DRIVE Scales AI-Powered Transportation at CES 2023

Autonomous vehicle (AV) technology is heading to the mainstream.

The NVIDIA DRIVE ecosystem showcased significant milestones toward widespread intelligent transportation at CES. Growth is occurring in vehicle deployment plans as well as AI solutions integrating further into the car.

Foxconn joined the NVIDIA DRIVE ecosystem. The world’s largest technology manufacturer will produce electronic control units based on the NVIDIA DRIVE Orin systems-on-a-chip and build its electric vehicles using the NVIDIA DRIVE Hyperion platform.

The Polestar 3, which is powered by NVIDIA DRIVE Orin, made its U.S. debut, showcasing its new driver-monitoring system. Working with intelligent sensing company Smart Eye, the automaker is using AI to improve in-cabin safety and convenience.

Also appearing stateside for the first time was the Volvo EX90 fully electric SUV. Volvo Cars’ new flagship vehicle features centralized, software-defined compute powered by DRIVE Orin and NVIDIA DRIVE Xavier, and will begin deliveries in early 2024.

The Volvo EX90 was on display at the automotive technology company, Luminar’s booth (CES LVCC West Hall booth 5324). Luminar, an NVIDIA DRIVE ecosystem member, is providing its lidar technology to enable next-generation safety and, in the future, highway autonomy.

The Volvo EX90.

Elsewhere on the show floor, NVIDIA DRIVE ecosystem members such as Aeva with Plus, Imagry, Infineon with Lucid, u-blox and Valeo showcased the latest innovations in intelligent transportation.

These announcements mark a shift in the autonomous vehicle industry, from early stages to global deployment.

Foxconn Enters the AV Arena

Building safe, intelligent vehicles with highly automated and fully autonomous driving capabilities is a massive endeavor.

NVIDIA DRIVE offers an open, AI-enabled AV development platform for the industry to build upon. By adding Foxconn as a tier-one platform scaling partner, NVIDIA can greatly extend its efforts to meet growing demand.

In addition, Foxconn’s selection of DRIVE Hyperion will speed time to market for its state-of-the-art EVs with autonomous driving capabilities and lower its time-to-cost strategy.

The DRIVE Hyperion sensor suite is already qualified to ensure diverse, redundant real-time processing, which increases overall safety.

Inside AI

The industry is placing greater focus on interior safety and convenience features as AI takes over more driving tasks.

Driver monitoring is a key part of Polestar’s broader driver-understanding system, which includes features such as adaptive cruise control, lane-keep assist and pilot assist as standard. These coordinated systems run simultaneously on the centralized DRIVE Orin AI compute platform.

The recently launched Polestar 3, featuring Smart Eye driver monitoring software.

The Polestar 3, launched in October, features two closed-loop driver-monitoring cameras and software from Smart Eye (6353) which track the driver’s head, eye and eyelid movements, and can trigger warning messages, sounds or an emergency-stop function if a distracted, drowsy or disconnected driver is detected.

End-to-End Innovation

The rest of the CES show floor was brimming with new vehicle technologies poised to deliver more convenient and safer transportation.

Lucid showcased its flagship sedan, the Air, in partner Infineon’s booth (3829), breaking down the technologies that make up the award-winning EV. At its core is the NVIDIA DRIVE centralized compute platform, which powers its software-defined DreamDrive advanced driver assistance system.

The award-winning Lucid Air electric sedan.

In addition to personal transportation, NVIDIA DRIVE is powering safer, more efficient public transit, as well as delivery and logistics.

Israeli startup Imagry (booth 5874), a developer of mapless autonomous driving solutions, announced that its DRIVE Orin-based platform will power two autonomous bus pilots in its home country in 2023. Lidar maker Aeva showcased the latest vehicle from autonomous trucking company Plus, built on DRIVE Orin.

AV sensing and localization technology also exhibited significant advances. Global tier-one supplier Valeo (booth CP-17) demonstrated how it’s using the high-fidelity NVIDIA DRIVE Sim platform to develop intelligent active lighting solutions for low-light conditions. U-blox (booth 10963), which specializes in global satellite navigation satellite system solutions, showed the latest in AV localization, integrated into the NVIDIA DRIVE Hyperion architecture.

With every corner of the AV industry firing on all cylinders, CES 2023 is signaling the start to the widespread deployment of intelligent transportation.

Read More

GFN Thursday Brings RTX 4080 to the Cloud With GeForce NOW Ultimate Membership

GFN Thursday Brings RTX 4080 to the Cloud With GeForce NOW Ultimate Membership

GFN Thursday rings in the new year with a recap of the biggest cloud gaming news from CES 2023: the GeForce NOW Ultimate membership. Powered by the latest NVIDIA GPU technology, Ultimate members can play their favorite PC games at performance never before available from the cloud.

Plus, with a new year comes new games. GeForce NOW brings 24 more titles to the cloud in January, starting with five this week.

Ultimate Performance, Now in the Cloud

Get ready for cloud gaming that’s “beyond fast.” GeForce NOW is bringing RTX 4080 performance to the cloud with the new high-performance Ultimate membership.

Supercomputer power streamed to you, fueling your every Victory Royale.

The GeForce NOW Ultimate membership raises the bar on cloud gaming, bringing it closer than ever to a local gaming experience. It’s powered by the NVIDIA Ada Lovelace architecture in upgraded GeForce NOW RTX 4080 SuperPODs.

GeForce NOW Ultimate members receive three major streaming upgrades. First, the new SuperPODs are capable of rendering and streaming at up to 240 frames per second for the lowest latency ever from the cloud. Paired with NVIDIA Reflex, members’ game play will feel almost indistinguishable from a local desktop PC.

Second, supported streaming resolutions get an upgrade: Ultimate members can play their favorite PC games at up to 4K 120 fps on the native PC and Mac apps.

And third, for the first time, cloud gamers can play at native ultrawide resolutions, a long-requested feature from the GeForce NOW community. Experience your favorite adventures like A Plague Tale: Requiem, The Witcher 3: Wild Hunt, Shadow of the Tomb Raider and more at up to 3840×1600 resolutions for a truly immersive experience.

The Ultimate upgrade also brings support for the latest NVIDIA RTX technologies, like full ray tracing and DLSS 3 — introduced with the GeForce RTX 40 Series launch. They deliver beautiful, cinematic-quality graphics and use AI to keep frame rates smooth in supported games.

With support for NVIDIA G-SYNC-enabled monitors, GeForce NOW will vary the streaming rate to the client for the first time, delivering smooth and instantaneous frame updates to client screens on Reflex-enabled games — further driving down total latency.

Ultimate members will also continue to enjoy longer streaming sessions, fastest access to the highest-performance cloud gaming servers and game settings that persist from session to session.

The Ultimate Library Keeps Growing

There are more than 1,500 games supported in the GeForce NOW library, with more than 400 titles joining last year. Members can stream mega-hits from top publishers like Electronic Arts and Ubisoft, popular PC indie titles like Valheim and Rust, and over 100 of the biggest free-to-play games like Fortnite and Genshin Impact.

The new year also brings some of the biggest upcoming PC game launches on the service, starting with Portal with RTX later this week. Relive the critically acclaimed and award-winning Portal reimagined with full ray tracing and higher frame rates with DLSS 3 for those streaming from the cloud from an RTX 4080 SuperPOD.

Full ray tracing transforms each scene of Portal with RTX, enabling light to bounce and be affected by each area’s new high-resolution, physically based textures and enhanced high-poly models. Every light is ray traced and casts shadows for a new sense of realism. Global illumination indirect lighting naturally illuminates and darkens rooms, volumetric ray-traced lighting scatters through fog and smoke, and shadows are pixel perfect.

This ray-traced reimagining of Valve’s classic game was built using a revolutionary modding tool called NVIDIA RTX Remix, which brings the test chambers of Portal’s “Aperture Science” to new life.

More big titles are on the way. As announced at CES 2023 this week, members can expect to see Atomic Heart, The Day Before and Party Animals join GeForce NOW when they release later this year. Stay tuned to future GFN Thursday updates for more details.

Upgrading Is Beyond Easy

Members can sign up today for the GeForce NOW Ultimate membership at $19.99 per month or $99.99 for six months.

Existing GeForce RTX 3080 members’ accounts have already been converted to Ultimate memberships at their current pricing, and will experience GeForce RTX 4080 performance as soon as it’s available in their regions. It’s the easiest upgrade to Ultimate performance, happening automatically.

The new GeForce RTX 4080-powered SuperPODs will be available in North America and Europe starting later this month, with continued rollout over the months to follow. Sign up today, as quantities are limited.

Upgrade today for the Ultimate cloud gaming experience.

 

Additionally, new AT&T Fiber customers, and new or existing AT&T 5G customers on an eligible 5G rate plan, can get a complimentary six-month Ultimate membership. Visit AT&T Gaming for more details.

Joining in January

Level up and learn new awesome abilities, unlock secret items and modes, summon powerful allies, and more in Scott Pilgrim vs. The World: The Game Complete Edition.

Here’s a look at the games joining the GeForce NOW library in January:

In addition, members can look for the following this week:

  • Scott Pilgrim vs. The World: The Game Complete Edition (New release on Steam, Jan. 5)
  • Carrier Command 2 (Steam)
  • Project Hospital (Steam)
  • Portal with RTX (Steam)
  • Severed Steel (Epic Games Store)

Let us know what you think of GeForce NOW Ultimate on Twitter or in the comments below.

Read More

Lights! Cameras! Atoms! Scientist Peers Into the Quantum Future

Lights! Cameras! Atoms! Scientist Peers Into the Quantum Future

Editor’s note: This is part of a series profiling people advancing science with high performance computing.

Ryan Coffee makes movies of molecules. Their impacts are huge.

The senior scientist at the SLAC National Accelerator Laboratory (above) says these visualizations could unlock the secrets of photosynthesis. They’ve already shown how sunlight can cause skin cancer.

Long term, they may help chemists engineer life-saving drugs and batteries that let electric cars go farther on a charge.

To make films that inspire that kind of work, Coffee’s team needs high-performance computers, AI and an excellent projector.

A Brighter Light

The projector is called the Linac Coherent Light Source (LCLS). It uses a linear accelerator a kilometer long to pulse X-rays up to 120 times per second.

That’s good enough for a Hollywood flick, but not fast enough for Coffee’s movies.

“We need to see how electron clouds move like soap bubbles around molecules, how you can squeeze them in certain ways and energy comes out,” said Coffee, a specialist in the physics at the intersection of atoms, molecules and optics.

So, an upgrade next year will let the giant instrument take 100,000 frames per second. In two years, another enhancement, called LCLS II, will push that to a million frames a second.

Sorting the frames that flash by that fast — in random order — is a job for the combination of high performance computing (HPC) and AI.

AIs in the Audience

Coffee’s goal is to sit an AI model in front of the LCLS II. It will watch the ultrafast movies to learn an atomic dance no human eyes could follow.

The work will require inference on the fastest GPUs available running next to the instrument in Menlo Park, Calif. Meanwhile, data streaming off LCLS II will be used to constantly retrain the model on a bank of NVIDIA A100 Tensor Core GPUs at the Argonne National Laboratory outside Chicago.

It’s a textbook case for HPC at the edge, and one that’s increasingly common in an era of giant scientific instruments that peer up at stars and down into atoms.

LCLS instrument for molecular science with HPC + AI
A look at part of the LCLS instrument. (For more details, see this blog.)

So far, Coffee’s team has been able to retrain an autoencoder model every 10-20 minutes while it makes inferences 100,000 times a second.

“We’re already in the realm of attosecond pulses where I can watch the electron bubbles slosh back and forth,” said Coffee, a core member of SLAC’s overall AI initiative.

A Broader AI Collaboration

The next step is even bigger.

Data from Coffee’s work on molecular movies will be securely shared with data from Argonne’s Advanced Proton Source, a kind of ultra-high-resolution still camera.

“We can use secure, federated machine learning to pull these two datasets together, creating a powerful, shared transformer model,” said Coffee, who’s collaborating with multiple organizations to make it happen.

Ryan Coffee HPC AI for molecular science
Coffee in the ‘projection room’ where the light in his next molecular movies will first appear.

The transformer will let scientists generate synthetic data for many data-starved applications such as research on fusion reactors.

It’s an effort specific to science that parallels work in federated learning in healthcare. Both want to build powerful AI models for their fields while preserving data privacy and security.

“We know people get the best results from large language models trained on many languages,” he said. “So, we want to do that in science by taking diverse views of the same things to create better models,” he said.

The Quantum Future

The atomic forces that Coffee studies may power tomorrow’s computers, the scientist explains.

“Imagine a stack of electron bubbles all in the same quantum state, so it’s a superconductor,” he said. “When I add one electron at the bottom, one pops to the top instantaneously because there’s no resistance.”

The concept, called entanglement in quantum computing, means two particles can switch states in lock step even if they’re on opposite sides of the planet.

That would give researchers like Coffee instant connections between powerful instruments like LCLS II and remote HPC centers training powerful AI models in real time.

Sounds like science fiction? Maybe not.

Coffee foresees a time when his experiments will outrun today’s computers, a time that will require alternative architectures and AIs. It’s the kind of big-picture thinking that excites him.

“I love the counterintuitiveness of quantum mechanics, especially when it has real, measurable results humans can apply — that’s the fun stuff.”

Read More