Building predictive disease models using Amazon SageMaker with Amazon HealthLake normalized data

In this post, we walk you through the steps to build machine learning (ML) models in Amazon SageMaker with data stored in Amazon HealthLake using two example predictive disease models we trained on sample data using the MIMIC-III dataset. This dataset was developed by the MIT lab for Computational Physiology and consists of de-identified healthcare data associated with approximately 60,000 ICU admissions. The dataset includes multiple attributes about the patients like their demographics, vital signs, and medications, along with their clinical notes. We first developed the models using the structured data such as demographics, vital signs, and medications. Then we augmented these models with additional data extracted and normalized from clinical notes to test and compare their performance. In both these experiments, we found an improvement in model performance when modelled as a supervised learning (classification) or an unsupervised learning (clustering) problem. We present our findings and the setup of the experiments in this post.

Why multiple modalities?

Modality can be defined as the classification of a single independent sensory input/output between a computer and a human. For example, we can see objects and hear sounds by using our senses. These can be considered as two separate modalities. Datasets that represent multiple modalities are categorized as a multi-modal dataset. For instance, images can consist of tags that help search and organize them, and textual data can contain images to explain what’s in the image. When medical practitioners make clinical decisions, it’s usually based on information gathered from a variety of healthcare data modalities. A physician looks at patient’s observations, their past history, their scans, and even physical characteristics of the patient during the visit to make a definitive diagnosis. ML models need to take this into account when trying to achieve real-world performance. The post Building a medical image search platform on AWS shows how you can combine features from medical images and their corresponding radiology reports to create a medical image search platform. The challenge with creating such models is the preprocessing of these multi-modal datasets and extracting appropriate features from them.

Amazon HealthLake makes it easier to train models on multi-modal data

Amazon HealthLake is a HIPAA eligible service that enables healthcare providers, health insurance companies, and pharmaceutical companies to store, transform, query, and analyze health data on the AWS Cloud at petabyte scale. As part of the transformation, Amazon HealthLake tags and indexes unstructured data using specialized ML models. These tags and indexes can be used to query and search as well as understand relationships in the data for analytics.

When you export data from Amazon HealthLake, it adds a resource called DocumentReference to the output. This resource consists of clinical entities (Like medications, medical conditions, anatomy, and Protected Health Information (PHI)), the RxNorm codes for medications, and the ICD10 codes for medical conditions that are automatically derived from the unstructured notes about the patients. These are additional attributes about the patients that are embedded within the unstructured portions of their clinical records and would have been largely ignored for downstream analysis. Combining the structured data from the EHR with these attributes provides a more holistic picture of the patient and their conditions. To help determine the value of these attributes, we created a couple of experiments around clinical outcome prediction.

Architecture overview

The following diagram illustrates the architecture for our experiments.

The following diagram illustrates the architecture for our experiments.

You can export the normalized data to an Amazon Simple Storage Service (Amazon S3) bucket using the Export API. Then we use AWS Glue to crawl and build a catalog of the data. This catalog is shared by Amazon Athena to run the queries directly off of the exported data from Colossus. Athena also normalizes the JSON format files to rows and columns for easy querying. The DocumentReference resource JSON file is processed separately to extract indexed data derived from the unstructured portions of the patient records. The file consists of an extension tag that has a hierarchical JSON output consisting of patient attributes. There are multiple ways to process this file (like using Python-based JSON parsers or even string-based regex and pattern matching). For an example implementation, see the section Connecting Athena with HealthLake in the post Population health applications with Amazon HealthLake – Part 1: Analytics and monitoring using Amazon QuickSight.

Example setup

Accessing the MIMIC-III dataset requires you to request access. As part of this post, we don’t distribute any data but instead provide the setup steps so you can replicate these experiments when you have access to MIMIC-III. We also publish our conclusions and findings from the results.

For the first experiment, we build a binary disease classification model to predict patients with congestive heart failure (CHF). We measure its performance using accuracy, ROC, and confusion matrix for both structured and unstructured patient records. For the second experiment, we cluster a cohort of patients into a fixed number of groups and visualize the cluster separation before and after the addition of the unstructured patient records. For both our experiments, we build a baseline model and compare it with the multi-modal model, where we combine existing structured data with additional features (ICD-10 codes and Rx-Norm codes) in our training set.

These experiments are not intended to produce a state-of-the-art model on real-world datasets; its purpose is to demonstrate how you can utilize features exported from Amazon Healthlake for training models on structured and unstructured patient records to improve your overall model performance.

Features and data normalization

We took a variety of features related to patient encounters to train our models. This included the patient demographics (gender, marital status), the clinical conditions, procedures, medications, and observations. Because each patient could have multiple encounters consisting of multiple observations, clinical conditions, procedures, and medications, we normalized the data and converted each of these features into a list. This allowed us to get a training set with all these features (as a list) for each patient.

Similarly, for the unstructured features that Amazon Healthlake converted into the DocumentReference resource, we extracted the ICD-10 codes and Rx-Norm codes (using the methods described in the architecture) and converted them into feature vectors.

Feature engineering and model

For the categorical attributes in our dataset, we used a label encoder to convert the attributes into a numerical representation. For all other list attributes, we used term frequency-inverse document frequency (FI-IDF) vectors. This high-dimensional dataset was then shuffled and divided into 80% train and 20% test sets for training and evaluation of the models, respectively. For training our model, we used the gradient boosting library XGBoost. We considered mostly default hyperparameters and didn’t perform any hyperparameter tuning, because our objective was only to train a baseline model with structured patient records and then show improvement on those results with the unstructured features. Adopting better hyperparameters or changing to other feature engineering and modelling approaches can likely improve these results.

Example 1: Predicting patients with a congestive heart failure

For the first experiment, we took 500 patients with a positive CHF diagnosis. For the negative class, we randomly selected 500 patients who didn’t have a CHF diagnosis. We removed the clinical conditions from the positive class of patients that were directly related to CHF. For example, all the patients in the positive class were expected to have ICD-9 code 428, which stands for CHF. We filtered that out from the positive class to make sure the model is not overfitting on the clinical condition.

Baseline model

Our baseline model had an accuracy of 85.8%. The following graph shows the ROC curve.

Our baseline model had an accuracy of 85.8%. The following graph shows the ROC curve.

The following graph shows the confusion matrix.

The following graph shows the confusion matrix.

Amazon HealthLake augmented model

Our Amazon HealthLake augmented model had an accuracy of 89.1%. The following graph shows the ROC curve.

The following graph shows the ROC curve.

The following graph shows the confusion matrix.

The following graph shows the confusion matrix.

Adding the features extracted from Amazon HealthLake allowed us to improve the model accuracy from 85% to 89% and also the AUC from 0.86 to 0.89. If you look at the confusion matrices for the two models, the false positives reduced from 20 to 13 and the false negatives reduced from 27 to 20.

Optimizing healthcare is about ensuring the patient is associated with their peers and the right cohort. As patient data is added or changes, it’s important to continuously identify and reduce false negative and positive identifiers for overall improvement in the quality of care.

To better explain the performance improvements, we picked a patient from the false negative cohort in the first model who moved to true positive in the second model. We plotted a word cloud for the top medical conditions for this patient for the first and the second model, as shown in the following images.

There is a clear difference between the medical conditions of the patient before and after the addition of features from Amazon HealthLake. The word cloud for model 2 is richer, with more medical conditions indicative of CHF than the one for model 1. The data embedded within the unstructured notes for this patient extracted by Amazon HealthLake helped this patient move from a false negative category to a true positive.

These numbers are based on synthetic experimental data we used from a subset of MIMIC-III patients. In a real-world scenario with higher-volume of patients, these numbers may differ.

Example 2: Grouping patients diagnosed with sepsis

For the second experiment, we took 500 patients with a positive sepsis diagnosis. We grouped these patients on the basis of their structured clinical records using k-means clustering. To show that this is a repeatable pattern, we chose the same feature engineering techniques as described in experiment 1. We didn’t divide the data into training and testing datasets because we were implementing an unsupervised learning algorithm.

We first analyzed the optimal number of clusters of the grouping using the Elbow method and arrived at the curve shown in the following graph.

This allowed us to determine that six clusters were the optimal number in our patient grouping.

Baseline model

We reduced the dimensionality of the input data using Principal Component Analysis (PCA) to two and plotted the following scatter plot.

The following were the counts of patients across each cluster:

Cluster 1
Number of patients: 44

Cluster 2
Number of patients: 30

Cluster 3
Number of patients: 109

Cluster 4
Number of patients: 66

Cluster 5
Number of patients: 106

Cluster 6
Number of patients: 145

We found that the at least four of the six clusters had a distinct overlap of patients. That means the structured clinical features weren’t enough to clearly divide the patients into six groups.

Enhanced model

For the enhanced model, we added the ICD-10 codes and their corresponding descriptions for each patient as extracted from Amazon HealthLake. However, this time, we could see a clear separation of the patient groups.

We also saw a change in distribution across the six clusters:

Cluster 1
Number of patients: 54

Cluster 2
Number of patients: 154

Cluster 3
Number of patients: 64

Cluster 4
Number of patients: 44

Cluster 5
Number of patients: 109

Cluster 6
Number of patients: 75

As you can see, adding features from the unstructured data for the patients allows us to improve the clustering model to clearly divide the patients into six clusters. We even saw that some patients moved across clusters, denoting that the model became better at recognizing those patients based on their unstructured clinical records.

Conclusion

In this post, we demonstrated how you can easily use SageMaker to build ML models on your data in Amazon HealthLake. We also demonstrated the advantages of augmenting data from unstructured clinical notes to improve the accuracy of disease prediction models. We hope this body of work provides you with examples of how to build ML models using SageMaker with your data stored and normalized in Amazon HealthLake and improve model performance for clinical outcome predictions. To learn more about Amazon HealthLake, please check the website and technical documentation for more information.


About the Authors

Ujjwal Ratan is a Principal Machine Learning Specialist in the Global Healthcare and Lifesciences team at Amazon Web Services. He works on the application of machine learning and deep learning to real world industry problems like medical imaging, unstructured clinical text, genomics, precision medicine, clinical trials and quality of care improvement. He has expertise in scaling machine learning/deep learning algorithms on the AWS cloud for accelerated training and inference. In his free time, he enjoys listening to (and playing) music and taking unplanned road trips with his family.

 

Nihir Chadderwala is an AI/ML Solutions Architect on the Global Healthcare and Life Sciences team. His background is building Big Data and AI-powered solutions to customer problems in variety of domains such as software, media, automotive, and healthcare. In his spare time, he enjoys playing tennis, watching and reading about Cosmos.

 

 

Parminder Bhatia is a science leader in the AWS Health AI, currently building deep learning algorithms for clinical domain at scale. His expertise is in machine learning and large scale text analysis techniques in low resource settings, especially in biomedical, life sciences and healthcare technologies. He enjoys playing soccer, water sports and traveling with his family.

Read More

A Trusted Companion: AI Software Keeps Drivers Safe and Focused on the Road Ahead

Editor’s note: This is the latest post in our NVIDIA DRIVE Labs series, which takes an engineering-focused look at individual autonomous vehicle challenges and how NVIDIA DRIVE addresses them. Catch up on all of our automotive posts, here.

Even with advanced driver assistance systems automating more driving functions, human drivers must maintain their attention at the wheel and build trust in the AI system.

Traditional driver monitoring systems typically don’t understand subtle cues such as a driver’s cognitive state, behavior or other activity that indicates whether they’re ready to take over the driving controls.

NVIDIA DRIVE IX is an open, scalable cockpit software platform that provides AI functions to enable a full range of in-cabin experiences, including intelligent visualization with augmented reality and virtual reality, conversational AI and interior sensing.

Driver perception is a key aspect of the platform that enables the AV system to ensure a driver is alert and paying attention to the road. It also enables the AI system to perform cockpit functions that are more intuitive and intelligent.

In this DRIVE Labs episode, NVIDIA experts demonstrate how DRIVE IX perceives driver attention, activity, emotion, behaviour, posture, speech, gesture and mood with a variety of detection capabilities.

A Multi-DNN Approach

Facial expressions are complex signals to interpret. A simple wrinkle of the brow or shift of the gaze can have a variety of meanings.

DRIVE IX uses multiple DNNs to recognize faces and decipher the expressions of vehicle occupants. The first DNN detects the face itself, while a second identifies fiducial points, or reference markings — such as eye location, nose, etc.

On top of these base networks, a variety of DNNs operate to determine whether a driver is paying attention or requires other actions from the AI system.

The GazeNet DNN tracks gazes by detecting the vector of the driver’s eyes and mapping it to the road to check if they’re able to see obstacles ahead. SleepNet monitors drowsiness, classifying whether eyes are open or closed, running through a state machine to determine levels of exhaustion. Finally, ActivityNet tracks driver activity such as phone usage, hands on/off the wheel and driver attention to road events. DRIVE IX can also detect whether the driver is properly sitting in their seat to focus on road events.

In addition to driver focus, a separate DNN can determine a driver’s emotions — a key indicator of their ability to safely operate the vehicle. Taking in data from the base face-detect and fiducial-point networks, DRIVE IX can classify a driver’s state as happy, surprised, neutral, disgusted or angry.

It can also tell if the driver is squinting or screaming, indicating their level of visibility or alertness and state of mind.

 

A Customizable Solution

Vehicle manufacturers can leverage the driver monitoring capabilities in DRIVE IX to develop advanced AI-based driver understanding capabilities for personalizing the car cockpit.

The car can be programmed to alert a driver if their attention drifts from the road, or the cabin can adjust settings to soothe occupants if tensions are high.

And these capabilities extend well beyond driver monitoring. The aforementioned DNNs, together with gesture DNN and speech capabilities, enable multi-modal conversational AI offerings such as automatic speech recognition, natural language processing and speech synthesis.

These networks can be used for in-cabin personalization and virtual assistant applications. Additionally, the base facial recognition and facial key point models can be used for AI-based video conferencing platforms.

The driver monitoring capabilities of DRIVE IX help build trust between occupants and the AI system as automated driving technology develops, creating a safer, more enjoyable intelligent vehicle experience.

The post A Trusted Companion: AI Software Keeps Drivers Safe and Focused on the Road Ahead appeared first on The Official NVIDIA Blog.

Read More

ToTTo: A Controlled Table-to-Text Generation Dataset

Posted by Ankur Parikh and Xuezhi Wang, Research Scientists, Google Research

In the last few years, research in natural language generation, used for tasks like text summarization, has made tremendous progress. Yet, despite achieving high levels of fluency, neural systems can still be prone to hallucination (i.e.generating text that is understandable, but not faithful to the source), which can prohibit these systems from being used in many applications that require high degrees of accuracy. Consider an example from the Wikibio dataset, where the neural baseline model tasked with summarizing a Wikipedia infobox entry for Belgian football player Constant Vanden Stock summarizes incorrectly that he is an American figure skater.

While the process of assessing the faithfulness of generated text to the source content can be challenging, it is often easier when the source content is structured (e.g., in tabular format). Moreover, structured data can also test a model’s ability for reasoning and numerical inference. However, existing large scale structured datasets are often noisy (i.e., the reference sentence cannot be fully inferred from the tabular data), making them unreliable for the measurement of hallucination in model development.

In “ToTTo: A Controlled Table-To-Text Generation Dataset”, we present an open domain table-to-text generation dataset generated using a novel annotation process (via sentence revision) along with a controlled text generation task that can be used to assess model hallucination. ToTTo (shorthand for “Table-To-Text”) consists of 121,000 training examples, along with 7,500 examples each for development and test. Due to the accuracy of annotations, this dataset is suitable as a challenging benchmark for research in high precision text generation. The dataset and code are open-sourced on our GitHub repo.

Table-to-Text Generation
ToTTo introduces a controlled generation task in which a given Wikipedia table with a set of selected cells is used as the source material for the task of producing a single sentence description that summarizes the cell contents in the context of the table. The example below demonstrates some of the many challenges posed by the task, such as numerical reasoning, a large open-domain vocabulary, and varied table structure.

Example in the ToTTo dataset, where given the source table and set of highlighted cells (left), the goal is to generate a one sentence description, such as the “target sentence” (right). Note that generating the target sentence would require numerical inference (eleven NFL seasons) and understanding of the NFL domain.

Annotation Process
Designing an annotation process to obtain natural but also clean target sentences from tabular data is a significant challenge. Many datasets like Wikibio and RotoWire pair naturally occurring text heuristically with tables, a noisy process that makes it difficult to disentangle whether hallucination is primarily caused by data noise or model shortcomings. On the other hand, one can elicit annotators to write sentence targets from scratch, which are faithful to the table, but the resulting targets often lack variety in terms of structure and style.

In contrast, ToTTo is constructed using a novel data annotation strategy in which annotators revise existing Wikipedia sentences in stages. This results in target sentences that are clean, as well as natural, containing interesting and varied linguistic properties. The data collection and annotation process begins by collecting tables from Wikipedia, where a given table is paired with a summary sentence collected from the supporting page context according to heuristics, such as word overlap between the page text and the table and hyperlinks referencing tabular data. This summary sentence may contain information not supported by the table and may contain pronouns with antecedents found in the table only, not the sentence itself.

The annotator then highlights the cells in the table that support the sentence and deletes phrases in the sentence that are not supported by the table. They also decontextualize the sentence so that it is standalone (e.g., with correct pronoun resolution) and correct grammar, where necessary.

We show that annotators obtain high agreement on the above task: 0.856 Fleiss Kappa for cell highlighting, and 67.0 BLEU for the final target sentence.

Dataset Analysis
We conducted a topic analysis on the ToTTo dataset over 44 categories and found that the Sports and Countries topics, each of which consists of a range of fine-grained topics, e.g., football/olympics for sports and population/buildings for countries, together comprise 56.4% of the dataset. The other 44% is composed of a much more broad set of topics, including Performing Arts, Transportation, and Entertainment.

Furthermore, we conducted a manual analysis of the different types of linguistic phenomena in the dataset over 100 randomly chosen examples. The table below summarizes the fraction of examples that require reference to the page and section titles, as well as some of the linguistic phenomena in the dataset that potentially pose new challenges to current systems.

Linguistic Phenomena Percentage
Require reference to page title 82%
Require reference to section title 19%
Require reference to table description 3%
Reasoning (logical, numerical, temporal etc.) 21%
Comparison across rows/columns/cells 13%
Require background information 12%

Baseline Results
We present some baseline results of three state-of-the-art models from the literature (BERT-to-BERT, Pointer Generator, and the Puduppully 2019 model) on two evaluation metrics, BLEU and PARENT. In addition to reporting the score on the overall test set, we also evaluate each model on a more challenging subset consisting of out-of-domain examples. As the table below shows, the BERT-to-BERT model performs best in terms of both BLEU and PARENT. Moreover, all models achieve considerably lower performance on the challenge set indicating the challenge of out-of-domain generalization.

  BLEU PARENT BLEU PARENT
Model (overall) (overall) (challenge) (challenge)
BERT-to-BERT 43.9 52.6 34.8 46.7
Pointer Generator 41.6 51.6 32.2 45.2
Puduppully et al. 2019 19.2 29.2 13.9 25.8

While automatic metrics can give some indication of performance, they are not currently sufficient for evaluating hallucination in text generation systems. To better understand hallucination, we manually evaluate the top performing baseline, to determine how faithful it is to the content in the source table, under the assumption that discrepancies indicate hallucination. To compute the “Expert” performance, for each example in our multi-reference test set, we held out one reference and asked annotators to compare it with the other references for faithfulness. As the results show, the top performing baseline appears to hallucinate information ~20% of the time.

  Faithfulness Faithfulness
Model (overall) (challenge)
Expert 93.6 91.4
BERT-to-BERT  76.2 74.2

Model Errors and Challenges
In the table below, we present a selection of the observed model errors to highlight some of the more challenging aspects of the ToTTo dataset. We find that state-of-the-art models struggle with hallucination, numerical reasoning, and rare topics, even when using cleaned references (errors in red). The last example shows that even when the model output is correct it is sometimes not as informative as the original reference which contains more reasoning about the table (shown in blue).

Reference Model Prediction
in the 1939 currie cup, western province lost to transvaal by 17–6 in cape town. the first currie cup was played in 1939 in transvaal1 at new- lands, with western province winning 17–6.
a second generation of micro- drive was announced by ibm in 2000 with increased capacities at 512 mb and 1 gb. there were 512 microdrive models in 2000: 1 gigabyte.
the 1956 grand prix motorcy- cle racing season consisted of six grand prix races in five classes: 500cc, 350cc, 250cc, 125cc and sidecars 500cc. the 1956 grand prix motorcycle racing season consisted of eight grand prix races in five classes: 500cc, 350cc, 250cc, 125cc and sidecars 500cc.
in travis kelce’s last collegiate season, he set personal career highs in receptions (45), re- ceiving yards (722), yards per receptions (16.0) and receiving touchdowns (8). travis kelce finished the 2012 season with 45 receptions for 722 yards (16.0 avg.) and eight touchdowns.

Conclusion
In this work, we presented ToTTo, a large, English table-to-text dataset that presents both a controlled generation task and a data annotation process based on iterative sentence revision. We also provided several state-of-the-art baselines, and demonstrated ToTTo could be a useful dataset for modeling research as well as for developing evaluation metrics that can better detect model improvements.

In addition to the proposed task, we hope our dataset can also be helpful for other tasks such as table understanding and sentence revision. ToTTo is available at our GitHub repo.

Acknowledgements
The authors wish to thank Ming-Wei Chang, Jonathan H. Clark, Kenton Lee, and Jennimaria Palomaki for their insightful discussions and support. Many thanks also to Ashwin Kakarla and his team for help with the annotations.

Read More

Electric Avenue: NVIDIA Engineer Revs Up Classic Car to Sport AI

Arman Toorians isn’t your average classic car restoration hobbyist.

The NVIDIA engineer recently transformed a 1974 Triumph TR6 roadster at his home workshop into an EV featuring AI.

Toorians built the vehicle to show a classic car can be recycled into an electric ride that taps NVIDIA Jetson AI for safety, security and vehicle management features.

He hopes the car blazes a path for others to explore electric vehicle conversions that pack AI.

“My objective is to encourage others to take on this task — it’s a lot of fun,” he said.

Where It Began

The story begins when Toorians purchased the Triumph in “junkyard condition” for $4,000 from a retired cop who gave up on ambitions of restoring the car.

The conversion cost $20,000 in parts. He acquired a motor from NetGain Motors, electrical components from EV West, Triumph parts from Moss Motors, and five Tesla batteries capable of 70 miles on a charge for his setup. NVIDIA supplied the Jetson Xavier NX developer kit.

To find spare time, the quadrilingual Persian-Armenian musician took a hiatus from playing flamenco and jazz guitar. Co-workers and friends cheered on his effort, he said, while his wife and son gave a hand and advice as needed.

Three years in the making, the car’s sophisticated build with AI may just set a new bar for the burgeoning classic-to-electric conversion space.

Jetson for Security 

The car is smart. And it may have as much in common with a Tesla as it does with future generations of EVs from automakers everywhere adopting AI.

That’s because he installed the NVIDIA Jetson NX developer kit in the little sports car’s trunk. The power-efficient Jetson edge AI platform provides compact supercomputing performance capable of handling multi-modal inference, devouring data from the car’s sensors and camera.

Toorian’s Triumph packs the compact Jetson Xavier NX in the trunk for AI.

“Jetson is a great platform that addresses many different markets and can also be very easily used to solve many problems in this DIY electric car market,” Toorians said.

Toorians, a director in the Jetson group, uses the Jetson Xavier NX for a growing list of cool features. For example, using a dash cam feed and the NVIDIA DeepStream SDK for intelligent video analytics, he’s using the Jetson to work on securing the car. Like the talking car KITT from the hit ‘80s TV series Knight Rider, the Triumph will be able to react with verbal warnings to anyone unauthorized to be in or around it.

If that doesn’t work to deter a threat to the vehicle, his plans are for Jetson to step it up a notch and activate alarms and email alerts.

Lighter, Faster, Cleaner

The sleek sports car has impressive specs. It’s now lighter, faster and cleaner on its energy usage than when it rolled out of the factory decades ago. It also doesn’t leave motor oil stains or the stench of gas and exhaust fumes in its wake.

In place of the greasy 350-pound six cylinder gas engine, Toorians installed a shiny 90-pound electrical motor. The new motor lives in the same spot under the hood and puts out 134 horsepower versus the stock engine’s 100 horses, providing peppy takeoffs.

The Triumph’s engine bay sports a lighter, cleaner and peppier motor.

He gets a lot of thumbs-up reactions to the car. But for him the big payoff is repurposing the older gasoline vehicle to electric to avoid environmental waste and exhaust pollutants.

“Electric cars help keep our air and environment clean and are the way toward a more sustainable future in transportation,” he said.

Jetson for Safety

Among its many tricks, the Triumph uses AI to recognize the driver so that only authorized users can start it — and only when seat belts are fastened by both the driver and passenger.

The dash camera running through Jetson — which can process more than 60 frames per second — is being used to generate lane departure warnings.

Front and rear lidar generate driver alerts if objects are in the Triumph’s path or too close. The front lidar and the dash cam can also be used to run the cruise control.

Jetson processes the front and rear battery temperatures and translates it for the analog temperature gauges. Jetson is also called on to read the battery capacity and display it on the analog fuel gauge, keeping it stock. Another nice one: the fuel cap now covers the charging port.

Touches like these — as well as keeping a functioning four-speed shifter — allowed the car to keep its original look.

“Rebuilding older generation cars with electric motors and computers helps us recycle good gasoline engine cars that otherwise would have to be destroyed,” he said.

DIY makers and businesses alike turn to NVIDIA Jetson for edge AI.

 

 

The post Electric Avenue: NVIDIA Engineer Revs Up Classic Car to Sport AI appeared first on The Official NVIDIA Blog.

Read More

A graphic illustrates how the new settings for voice data will appear to users. Text boxes explain why Microsoft asks users to contribute voice clips, how user identity is protected and the people who use the contributed data.

Microsoft gives users control over their voice clips

Microsoft is rolling out updates to its user consent experience for voice data to give customers more meaningful control over whether their voice data is used to improve products, the company announced Friday. These updates let customers decide if people can listen to recordings of what they said while speaking to Microsoft products and services that use speech recognition technology.

If customers choose to opt in, people may review these voice clips to improve the performance of Microsoft’s artificial intelligence systems across a diversity of people, speaking styles, accents, dialects and acoustic environments. The goal is to make Microsoft’s speech recognition technologies more inclusive by making them easier and more natural to interact with, the company said.

Customers who do not choose to contribute their voice clips for review by people will still be able to use all of Microsoft’s voice-enabled products and services.

Voice clips are audio recordings of what users said when they used their voice to interact with voice-enabled products and services, such as dictating a translation request or a web search.

Microsoft removes certain personal information from voice clips as they are processed in the cloud, including Microsoft account identifiers and strings of letters or numbers that could be telephone numbers, Social Security numbers and email addresses.

The new settings for voice clips mean that customers must actively choose to allow people to listen to the recordings of what they said. If they do, Microsoft employees and people contracted to work for Microsoft may listen to these voice clips and manually transcribe what they hear as part of a process the company uses to improve AI systems.

“Their transcription is what we consider our ground truth of what was actually spoken inside that audio clip. We use that as a basis for comparison to identify where our AI needs improvement,” said Neeta Saran, a senior attorney at Microsoft in Redmond, Washington.

The more transcripts Microsoft has of how real people talk from contributed voice clips, the better these AI systems will perform.

While Microsoft employees and contractors will only listen to voice clips with user permission, the company may continue to access information associated with user voice activity, such as the transcriptions automatically generated during user interactions with speech recognition AI. The details of how that works are described in the terms of use for individual Microsoft products and services, the company said.

A graphic illustrates how the new settings for voice data will appear to users. Text boxes explain why Microsoft asks users to contribute voice clips, how user identity is protected and the people who use the contributed data.
Microsoft’s new settings for voice data will roll out to voice typing, an updated version of the Windows dictation experience. Graphic courtesy of Microsoft.

Meaningful consent

These new settings for voice clips are designed to give customers meaningful consent for people to listen to what they said while interacting with Microsoft products and services, including increased awareness of who their voice clips are being shared with and how they are being used.

“This new meaningful consent release is about making sure that we’re transparent with users about how we are using this audio data to improve our speech recognition technology,” Saran said.

Because Microsoft removes account identifiers from the voice clips as they are processed, they will no longer show up in the privacy dashboard of customers’ Microsoft accounts, the company said.

Microsoft does not use any human reviewers to listen to audio data collected from speech recognition features built into enterprise offerings, the company added.

Data retention and next steps

On Oct. 30, 2020, Microsoft stopped storing voice clips processed by its speech recognition technologies. Over the next few months, the company is rolling out the new settings for voice clips across products including Microsoft Translator, SwiftKey, Windows, Cortana, HoloLens, Mixed Reality and Skype voice translation.

If a customer chooses to let Microsoft employees or contractors listen to their voice recordings to improve AI technology, the company will retain all new audio data contributed for review for up to two years. If a contributed voice clip is sampled for transcription by people, the company may retain it for more than two years to continue training and improving the quality of speech recognition AI.

“The more diverse ground truth data that we are able to collect and use to update our speech models, the better and more inclusive our speech recognition technology is going to be for our users across many languages,” Saran said.

Related

Questions? Check out the FAQs to learn more about these changes and check out Microsoft’s privacy policies.

Learn more about Microsoft’s approach to responsible AI.

The post Microsoft gives users control over their voice clips appeared first on The AI Blog.

Read More

Amid CES, NVIDIA Packs Flying, Driving, Gaming Tech News into a Single Week

Flying, driving, gaming, racing… amid the first-ever virtual Consumer Electronics Show this week, NVIDIA-powered technologies spilled out in all directions.

In automotive, Chinese automakers SAIC and NIO announced they’ll use NVIDIA DRIVE in future vehicles.

In gaming, NVIDIA on Tuesday led off a slew of gaming announcements by revealing the affordable new RTX 3060 GPU and detailing the arrival of more than 70 30-series GPUs for gamers and creatives.

In robotics, the Skydio X2 drone has received the CES 2021 Best of Innovation Award for Drones and Unmanned Systems.

And in, well, a category all its own, the Indy Autonomous Challenge, unveiled Thursday, will pit college teams equipped with sleek, swift vehicles equipped with ADLINK DLP-8000 robot controller powered by NVIDIA GPUs against each other for a $1.5 million prize.

This week’s announcements were just the latest examples of how NVIDIA is driving AI and innovation into every aspect of our lives.

Game On

Bringing more gaming capabilities to millions more gamers, NVIDIA Tuesday announced more than 70 new laptops will feature GeForce RTX 30 Series Laptop GPUs and unveiled the NVIDIA GeForce RTX 3060 graphics card for desktops, priced at just $329.

All are powered by the award-winning NVIDIA Ampere GPU architecture, the second generation of RTX with enhanced Ray Tracing Cores, Tensor Cores, and new streaming multiprocessors.

NVIDIA also announced Call of Duty: Warzone and Square Enix’s new IP, Outriders. And Five Nights at Freddy’s: Security Breach and F.I.S.T.: Forged in Shadow Torch will be adding RTX ray tracing and DLSS.

The games are just the latest to support the real-time ray tracing and AI-based DLSS (deep learning super sampling) technologies, known together called RTX, which NVIDIA introduced two years ago.

The announcements were among the highlights of a streamed presentation from Jeff Fisher, senior vice president of NVIDIA’s GeForce business.

Amid the unprecedented challenges of 2020, “millions of people tuned into gaming — to play, create and connect with one another,” Fisher said. “More than ever, gaming has become an integral part of our lives.”

Hitting the Road

In automotive, two Chinese automakers announced they’ll be relying on NVIDIA DRIVE technologies.

Just as CES was starting electric car startup NIO announced a supercomputer to power its automated and autonomous driving features, with NVIDIA DRIVE Orin at its core.

The computer, known as Adam, achieves over 1,000 trillion operations per second of performance with the redundancy and diversity necessary for safe autonomous driving.

The Orin-powered supercomputer will debut in NIO’s flagship ET7 sedan, scheduled for production in 2022, and every NIO model to follow.

And on Thursday, SAIC, China’s largest automaker, announced it’s joining forces with online retail giant Alibaba to unveil a new premium EV brand, dubbed IM for “intelligence in motion.”

The long-range electric vehicles will feature AI capabilities powered by the high-performance, energy-efficient NVIDIA DRIVE Orin compute platform.

The news comes as EV startups in China have skyrocketed in popularity, with NVIDIA working with NIO along with Li Auto and Xpeng to bolster the growth of new-energy vehicles.

Taking to the Skies

Meanwhile, Skydio, the leading U.S. drone manufacturer and world leader in autonomous flight, today announced it received the CES 2021 Best of Innovation Award for Drones and Unmanned Systems for the Skydio X2.

Skydio’s new autonomous drone offers enterprise and public sector customers up to 35 minutes of autonomous flight time.

Packing six 4k cameras and powered by the NVIDIA Jetson TX2 mobile supercomputer, it’s built to offer situational awareness, asset inspection, and security patrol.

The post Amid CES, NVIDIA Packs Flying, Driving, Gaming Tech News into a Single Week appeared first on The Official NVIDIA Blog.

Read More

Recognizing Pose Similarity in Images and Videos

Posted by Jennifer J. Sun, Student Researcher and Ting Liu, Senior Software Engineer, Google Research

Everyday actions, such as jogging, reading a book, pouring water, or playing sports, can be viewed as a sequence of poses, consisting of the position and orientation of a person’s body. An understanding of poses from images and videos is a crucial step for enabling a range of applications, including augmented reality display, full-body gesture control, and physical exercise quantification. However, a 3-dimensional pose captured in two dimensions in images and videos appears different depending on the viewpoint of the camera. The ability to recognize similarity in 3D pose using only 2D information will help vision systems better understand the world.

In “View-Invariant Probabilistic Embedding for Human Pose” (Pr-VIPE), a spotlight paper at ECCV 2020, we present a new algorithm for human pose perception that recognizes similarity in human body poses across different camera views by mapping 2D body pose keypoints to a view-invariant embedding space. This ability enables tasks, such as pose retrieval, action recognition, action video synchronization, and more. Compared to existing models that directly map 2D pose keypoints to 3D pose keypoints, the Pr-VIPE embedding space is (1) view-invariant, (2) probabilistic in order to capture 2D input ambiguity, and (3) does not require camera parameters during training or inference. Trained with in-lab setting data, the model works on in-the-wild images out of the box, given a reasonably good 2D pose estimator (e.g., PersonLab, BlazePose, among others). The model is simple, results in compact embeddings, and can be trained (in ~1 day) using 15 CPUs. We have released the code on our GitHub repo.

Pr-VIPE can be directly applied to align videos from different views.

Pr-VIPE
The input to Pr-VIPE is a set of 2D keypoints, from any 2D pose estimator that produces a minimum of 13 body keypoints, and the output is the mean and variance of the pose embedding. The distances between embeddings of 2D poses correlate to their similarities in absolute 3D pose space. Our approach is based on two observations:

  • The same 3D pose may appear very different in 2D as the viewpoint changes.
  • The same 2D pose can be projected from different 3D poses.

The first observation motivates the need for view-invariance. To accomplish this, we define the matching probability, i.e., the likelihood that different 2D poses were projected from the same, or similar 3D poses. The matching probability predicted by Pr-VIPE for matching pose pairs should be higher than for non-matching pairs.

To address the second observation, Pr-VIPE utilizes a probabilistic embedding formulation. Because many 3D poses can project to the same or similar 2D poses, the model input exhibits an inherent ambiguity that is difficult to capture through deterministic mapping point-to-point in embedding space. Therefore, we map a 2D pose through a probabilistic mapping to an embedding distribution, of which we use the variance to represent the uncertainty of the input 2D pose. As an example, in the figure below the third 2D view of the 3D pose on the left is similar to the first 2D view of a different 3D pose on the right, so we map them into a similar location in the embedding space with large variances.

Pr-VIPE enables vision systems to recognize 2D poses across views. We embed 2D poses using Pr-VIPE such that the embeddings are (1) view-invariant (2D projections of similar 3D poses are embedded close together) and (2) probabilistic. By embedding detected 2D poses, Pr-VIPE enables direct retrieval of pose images from different views, and can also be applied to action recognition and video alignment.

View-Invariance
During training, we use 2D poses from two sources: multi-view images and projections of groundtruth 3D poses. Triplets of 2D poses (anchor, positive, and negative) are selected from a batch, where the anchor and positive are two different projections of the same 3D pose, and the negative is a projection of a non-matching 3D pose. Pr-VIPE then estimates the matching probability of 2D pose pairs from their embeddings.
During training, we push the matching probability of positive pairs to be close to 1 with a positive pairwise loss in which we minimize the embedding distance between positive pairs, and the matching probability of negative pairs to be small by maximizing the ratio of the matching probabilities between positive and negative pairs with a triplet ratio loss.

Overview of the Pr-VIPE model. During training, we apply three losses (triplet ratio loss, positive pairwise loss, and a prior loss that applies a unit Gaussian prior to our embeddings). During inference, the model maps an input 2D pose to a probabilistic, view-invariant embedding.

Probabilistic Embedding
Pr-VIPE maps a 2D pose to a probabilistic embedding as a multivariate Gaussian distribution using a sampling-based approach for similarity score computation between two distributions. During training, we use a Gaussian prior loss to regularize the predicted distribution.

Evaluation
We propose a new cross-view pose retrieval benchmark to evaluate the view-invariance property of the embedding. Given a monocular pose image, cross-view retrieval aims to retrieve the same pose from different views without using camera parameters. The results demonstrate that Pr-VIPE retrieves poses more accurately across views compared to baseline methods in both evaluated datasets (Human3.6M, MPI-INF-3DHP).

Pr-VIPE retrieves poses across different views more accurately relative to the baseline method (3D pose estimation).

Common 3D pose estimation methods (such as the simple baseline used for comparison above, SemGCN, and EpipolarPose, amongst many others), predict 3D poses in camera coordinates, which are not directly view-invariant. Thus, rigid alignment between every query-index pair is required for retrieval using estimated 3D poses, which is computationally expensive due to the need for singular value decomposition (SVD). In contrast, Pr-VIPE embeddings can be directly used for distance computation in Euclidean space, without any post-processing.

Applications
View-invariant pose embedding can be applied to many image and video related tasks. Below, we show Pr-VIPE applied to cross-view retrieval on in-the-wild images without using camera parameters.


We can retrieve in-the-wild images from different views without using camera parameters by embedding the detected 2D pose using Pr-VIPE. Using the query image (top row), we search for a matching pose from a different camera view and we show the nearest neighbor retrieval (bottom row). This enables us to search for matching poses across camera views more easily.

The same Pr-VIPE model can also be used for video alignment. To do so, we stack Pr-VIPE embeddings within a small time window, and use the dynamic time warping (DTW) algorithm to align video pairs.

Manual video alignment is difficult and time-consuming. Here, Pr-VIPE is applied to automatically align videos of the same action repeated from different views.

The video alignment distance calculated via DTW can then be used for action recognition by classifying videos using nearest neighbor search. We evaluate the Pr-VIPE embedding using the Penn Action dataset and demonstrate that using the Pr-VIPE embedding without fine-tuning on the target dataset, yields highly competitive recognition accuracy. In addition, we show that Pr-VIPE even achieves relatively accurate results using only videos from a single view in the index set.

Pr-VIPE recognizes action across views using pose inputs only, and is comparable to or better than methods using pose only or with additional context information (such as Iqbal et al., Liu and Yuan, Luvizon et al., and Du et al.). When action labels are only available for videos from a single view, Pr-VIPE (1-view only) can still achieve relatively accurate results.

Conclusion
We introduce the Pr-VIPE model for mapping 2D human poses to a view-invariant probabilistic embedding space, and show that the learned embeddings can be directly used for pose retrieval, action recognition, and video alignment. Our cross-view retrieval benchmark can be used to test the view-invariant property of other embeddings. We look forward to hearing about what you can do with pose embeddings!

Acknowledgments
Special thanks to Jiaping Zhao, Liang-Chieh Chen, Long Zhao (Rutgers University), Liangzhe Yuan, Yuxiao Wang, Florian Schroff, Hartwig Adam, and the Mobile Vision team for the wonderful collaboration and support.

Read More

Automating Amazon Personalize solution using the AWS Step Functions Data Science SDK

Machine learning (ML)-based recommender systems aren’t a new concept across organizations such as retail, media and entertainment, and education, but developing such a system can be a resource-intensive task—from data labelling, training and inference, to scaling. You also need to apply continuous integration, continuous deployment, and continuous training to your ML model, or MLOps. The MLOps model helps build code and integration across ML tools and frameworks. Moreover, building a recommender system requires managing and orchestrating multiple workflows, for example waiting for your training jobs to finish and trigger deployments.

In this post, we show you how to use Amazon Personalize to create a serverless recommender system for movie recommendations with no ML experience required. Creating an Amazon Personalize solution workflow involves multiple steps, such as preparing and importing the data, choosing a recipe and creating a solution and finally generating recommendations. End-users or your data scientists can orchestrate these steps using AWS Lambda functions and AWS Step Functions. However, writing JSON code to manage such workflows can be complicated for your data scientists.

You can orchestrate an Amazon Personalize workflow using the AWS Step Functions Data Science SDK for Python using an Amazon SageMaker Jupyter notebook. The AWS Step Functions Data Science SDK is an open-source library that allows data scientists to easily create workflows that process and publish ML models using Amazon SageMaker Jupyter notebooks and Step Functions. You can create multi-step ML workflows in Python that orchestrate AWS infrastructure at scale, without having to provision and integrate the AWS services separately.

Solution and services overview

We have orchestrated the Amazon Personalize training workflow steps such as preparing and importing the data, choosing a recipe and creating a solution using AWS Step functions and AWS lambda functions to create a below Amazon Personalize end to end automated workflow:

AWS Step functions and AWS lambda functions to create a below Amazon Personalize end to end automated workflow:

We walk you through the below steps using the accompanying Amazon SageMaker Jupyter notebook. In order to create the above Amazon Personalize workflow, we are going to follow below detailed steps:

  1. Complete the prerequisites of setting up permissions and preparing the dataset.
  2. Set up AWS Lambda function and AWS Step function task states.
  3. Add wait conditions to each AWS Step function task states.
  4. Add a control flow and link the states.
  5. Define and run the workflows by combining all the above steps.
  6. Generate recommendations.

Prerequisites

Before you build your workflow and generate recommendations, you must set up a notebook instance, AWS Identity and Access Management (IAM) roles, and create an Amazon Simple Storage Service (Amazon S3) bucket.

Creating a notebook instance in Amazon SageMaker

Create an Amazon SageMaker notebook Instance by following instructions here. For creating IAM role for your notebook, make sure your notebook IAM role has AmazonS3FullAccess and AmazonPersonalizeFullAccess in order to access Amazon S3 and Amazon Personalize APIs through the Sagemaker notebook.

  1. When the notebook is active, choose Open Jupyter.
  2. On the Jupyter dashboard, choose New.
  3. Choose Terminal.
  4. In the terminal, enter the following code:
    cd SageMaker 
    git clone https://github.com/aws-samples/personalize-data-science-sdk-workflow.git

  1. Open the notebook by choosing Personalize-Stepfunction-Workflow.ipynb in the root folder.

You’re now ready to run the following steps through the notebook cells. You can find the complete notebook for this solution in the GitHub repo.

Setting up Step Function Execution role

You need a Step Functions execution role so that you can create and invoke workflows in Step Functions. For instructions, see Create a role for Step Functions or go through the steps in the notebook Create an execution role for Step Functions.

Attach an inline policy to the role you created for notebook instance in previous step. Enter the JSON policy by copying the policy from the notebook.

Preparing your dataset

To prepare your dataset, complete the following steps:

  1. Create an S3 bucket to store the training dataset and provide the bucket name and file name as ’movie-lens-100k.csv’ in the notebook step Setup S3 location and filename. See the following code:
    bucket = "<SAMPLE BUCKET NAME>" # replace with the name of your S3 bucket
    filename = "<SAMPLE FILE NAME>" # replace with a name that you want to save the dataset under

  1. Attach the policy to the S3 bucket by running the Attach policy to Amazon S3 bucket step in the notebook.
  2. Create an IAM role for Amazon Personalize by running the Create Personalize Role step in the notebook using Python boto3 create_role API.
  1. To preprocess the dataset and upload to Amazon S3, run the Data-Preparation step of the following notebook cell to download the movie-lens dataset and select the movies that have a rating of 2 or above from the dataset:
    !wget -N http://files.grouplens.org/datasets/movielens/ml-100k.zip
    !unzip -o ml-100k.zip
    data = pd.read_csv('./ml-100k/u.data', sep='t', names=['USER_ID', 'ITEM_ID', 'RATING', 'TIMESTAMP'])
    pd.set_option('display.max_rows', 5)
    data

The following screenshot shows your output.

The following screenshot shows your output.

  1. Run the following code to upload your data to the S3 bucket that you created before:
    data = data[data['RATING'] > 2] # keep only movies rated 2 and above
    data2 = data[['USER_ID', 'ITEM_ID', 'TIMESTAMP']] 
    data2.to_csv(filename, index=False)
    
    boto3.Session().resource('s3').Bucket(bucket).Object(filename).upload_file(filename)

Set up AWS Lambda function and AWS Step function task states.

After you complete the prerequisite steps, you would need do to below:

  1. Setup Lambda functions in the AWS Console: You need to setup AWS Lambda function for each Amazon Personalize tasks such as creating a dataset, choosing a recipe and creating a solution using Amazon Personalize AWS SDK for Python (Boto3). With the Python code provided in the Lambda GitHub repo. This repository has python code for various lambda functions, you need to copy the code and create lambda functions following the steps in this documentation “build a Lambda function on the Lambda console.”

Note: While creating an execution role, make sure you provide AmazonPersonalizeFullAccess along with AWSLambdaBasicExecutionRole permission policy to the lambda role.

  1. Orchestrate each of these lambda function as AWS Step function task states: A Task state represents a single unit of work performed by a state machine. AWS Step Functions can invoke Lambda functions directly from a task state. For more information see Creating a Step Functions State Machine That Uses Lambda. Below is a step function workflow diagram to orchestrate the lambda functions we are going to cover in this section:

Below is a step function workflow diagram to orchestrate the lambda functions we are going to cover in this section:

We already created Lambda functions for each of the Amazon Personalize steps using the Amazon Personalize AWS SDK for Python (Boto3) with the Python code provided in the Lambda GitHub repo. We deep dive and explain each step in this section. Alternatively, you can build a Lambda function on the Lambda console.

Creating a schema

Before you add a dataset to Amazon Personalize, you must define a schema for that dataset. For more information, see Datasets and Schemas. First we would create a step function task state to create a schema using below code in the walkthrough notebook.

lambda_state_schema = LambdaStep(
    state_id="create schema",
    parameters={  
        "FunctionName": "stepfunction-create-schema", #replace with the name of the function you created
        "Payload": {  
           "input": "personalize-stepfunction-schema"
        }
    },
    result_path='$'    
)

The above step function is invoking the AWS lambda function Lambda function stepfunction-create-schema.py. This lambda code snippet above is uses Personalize.create_schema() boto 3 APIs to create schema.

import boto3

client = boto3.client('personalize')
create_schema_response = personalize.create_schema(
        name = event['input'],
        schema = json.dumps(schema)
    )

This API creates an Amazon Personalize schema from the specified schema string. The schema you create must be in Avro JSON format.

Creating a dataset group

After you create a schema, similarly, we are going to create a step function task states for creating a dataset group .A dataset group contains related datasets that supply data for training a model. A dataset group can contain at most three datasets, one for each type of dataset: Interactions, Items and Users.

To train a model (create a solution), you need a dataset group that contains an Interactions dataset.

Run the notebook steps to create state id “create dataset” below:

lambda_state_createdataset = LambdaStep(
    state_id="create dataset",
    parameters={  
        "FunctionName": "stepfunctioncreatedataset", #replace with the name of the function you created
        
        "Payload": {  
           "schemaArn.$": '$.schemaArn',
           "datasetGroupArn.$": '$.datasetGroupArn',        
        } 
        
        
    },
    result_path = '$'
)

The above step function task state is invoking the AWS lambda function stepfunctioncreatedatagroup.py. Below is a code snippet of using personalize.create_dataset() to create a dataset:

create_dataset_response = personalize.create_dataset(

name = "personalize-stepfunction-dataset",

datasetType = dataset_type,

datasetGroupArn = event['datasetGroupArn'],

schemaArn = event['schemaArn']

)

This API creates an empty dataset and adds it to the specified dataset group.

Creating a dataset

In this step, you create a step function task state for automating an empty dataset and add it to the specified dataset group.

Run through the notebook steps to create a dataset step function. You can review the underlying AWS Lambda function stepfunctioncreatedataset.py. We use the same API personalize.create_dataset Python boto3 API to automate dataset creation which we used to create a dataset group in above step.

Importing your data

When you complete Step 1: Creating a Dataset Group and Step 2: Creating a Dataset and a Schema, you’re ready to import your training data into Amazon Personalize. When you import data, you can choose to import records in bulk, import records individually, or both, depending on your business requirements and the amount of historical data you have collected. If you have a large amount of historical records, we recommend you import data in bulk and add data incrementally as necessary.

Run through the notebook steps to create a step functions task state to import a dataset with state_id=”create dataset import job”. Let’s review the code snippet for underlying Lambda function stepfunction-createdatasetimportjob.py.

create_dataset_import_job_response = personalize.create_dataset_import_job(

jobName = "stepfunction-dataset-import-job",

datasetArn = datasetArn,

dataSource = {

"dataLocation": "s3://{}/{}".format(bucket, filename)

},

roleArn = roleArn

)

We are using personalize.create_dataset_import_job() boto3 APIs to import the dataset in the above lambda code. This API creates a job that imports training data from your data source (an Amazon S3 bucket) to an Amazon Personalize dataset.

Creating a solution

After you prepare and import the data, you’re ready to create a solution. A solution refers to the combination of an Amazon Personalize recipe, customized parameters, and one or more solution versions (trained models). A recipe is an Amazon Personalize term specifying an appropriate algorithm to train for a given use case. After you create a solution with a solution version, you can create a campaign to deploy the solution version and get recommendations. We will create a step function task states for choosing a recipe and creating a solution.

Choosing a recipe, configuring a solution and creating a solution version

Run the notebook cell Choose a recipe to create a step function task state to generate state ID state_id=”select recipe and create solution”. This state is representing the AWS lambda function  stepfunction_select-recipe_create-solution.py. Below is the code snippet:

list_recipes_response = personalize.list_recipes()

recipe_arn = "arn:aws:personalize:::recipe/aws-user-personalization" # aws-user-personalization selected for demo purposes

#list_recipes_response
  create_solution_response = personalize.create_solution(

name = "stepfunction-solution",

datasetGroupArn = event['dataset_group_arn'],

recipeArn = recipe_arn

)

We are using personalize.list_recipes() boto3 APIs to return a list of available recipes and personalize.craete_solution() boto 3 API to create the configuration for training a model. A trained model is known as a solution.

We use the algorithm aws-user-personalization for this post. For more information, see Choosing a Recipe and  Configuring a Solution.

After you choose a recipe and configure your solution, you’re ready to create a solution version, which refers to a trained ML model.

Run the notebook cell Create Solution Version to create a step function task state with the State ID state_id=” create solution version” and recipes using the underlying Lambda function stepfunction_create_solution_version.py. Below is the lambda code snippet using personalize.create_solution_version() boto3 API which trains or retrains an active solution.

create_solution_version_response = personalize.create_solution_version(


solutionArn = event['solution_arn']

)

A solution is created using the CreateSolution operation and must be in the ACTIVE state before calling CreateSolutionVersion . A new version of the solution is created every time you call this operation. For more information, see Creating a Solution Version.

Creating a campaign

A campaign is used to make recommendations for your users. You create a campaign by deploying a solution version.

Run the notebook cell Create Campaign to create step function task state with the state ID state_id=” create campaign” using the underlying Lambda function stepfunction_getsolution_metric_create_campaign.py. This lambda is using below API

get_solution_metrics_response = personalize.get_solution_metrics(
        solutionVersionArn = event['solution_version_arn']
    )

    create_campaign_response = personalize.create_campaign(
        name = "stepfunction-campaign",
        solutionVersionArn = event['solution_version_arn'],
        minProvisionedTPS = 1
    )

personalize.get_solution_metrics() gets the metrics for the specified solution version and personalize.create_campaign() creates a campaign by deploying a solution version.

In this section, we covered all the Step function task states and their AWS lambda functions needed to orchestrate the Amazon Personalize workflow. Go to next section to add the wait states to these step function states.

Adding a wait state to your steps

In this section, we add wait states because these steps need to wait for previous steps to finish before running the next step. For example, you should create a dataset after you create a dataset group.

Run the notebook steps from Wait for Schema to be ready to Wait for Campaign to be ACTIVE to make sure these Step Functions or Lambda steps are waiting for each step to run before they trigger the next step.

The following code is an example of what a wait state looks like to wait for the schema to be created:

wait_state_schema = Wait(
    state_id="Wait for create schema - 5 secs",
    seconds=5
)

We added a wait time of 5 seconds for the state ID, which means it should wait for 5 seconds before going to next state.

Adding a control flow and linking states by using the choice state

The AWS Step Functions Data Science SDK’s choice state supports branching logic based on the outputs from previous steps. You can create dynamic and complex workflows by adding this state.

After you define these steps, chain them together into a logical sequence. Run all the notebook steps to add choices while automating Amazon Personalize datasets, recipes, solutions, and the campaign.

The following code is an example of the choice state for the create_campaign workflow:

create_campaign_choice_state = Choice(
    state_id="Is the Campaign ready?"
)
create_campaign_choice_state.add_choice(
    rule=ChoiceRule.StringEquals(variable=lambda_state_campaign_status.output()['Payload']['status'], value='ACTIVE'),
    next_step=Succeed("CampaignCreatedSuccessfully")     
)
create_campaign_choice_state.add_choice(
    ChoiceRule.StringEquals(variable=lambda_state_campaign_status.output()['Payload']['status'], value='CREATE PENDING'),
    next_step=wait_state_campaign
)
create_campaign_choice_state.add_choice(
    ChoiceRule.StringEquals(variable=lambda_state_campaign_status.output()['Payload']['status'], value='CREATE IN_PROGRESS'),
    next_step=wait_state_campaign
)

create_campaign_choice_state.default_choice(next_step=Fail("CreateCampaignFailed"))

Defining and running workflows

You create a workflow that runs a group of Lambda functions (steps) in a specific order for example one Lambda function’s output passes to the next Lambda function’s input. We’re now ready to define the workflow definition for each step in Amazon Personalize:

  • Dataset workflow
  • Dataset import workflow
  • Recipe and solution workflow
  • Create campaign workflow
  • Main workflow to orchestrate the four preceding workflows for Amazon Personalize

After completing these steps, we can run our main workflow. To learn more about Step function workflows refer this link.

Dataset workflow

Run the following code to generate a workflow definition for dataset creation using the Lambda function state defined by Step Functions:

Dataset_workflow_definition=Chain([lambda_state_schema,
                                   wait_state_schema,
                                   lambda_state_datasetgroup,
                                   wait_state_datasetgroup,
                                   lambda_state_datasetgroupstatus
                                  ])
Dataset_workflow = Workflow(
    name="Dataset-workflow",
    definition=Dataset_workflow_definition,
    role=workflow_execution_role
)

The following screenshot shows the dataset workflow view.

The following screenshot shows the dataset workflow view.

Dataset import workflow

Run the following code to generate a workflow definition for dataset creation using the Lambda function state defined by Step Functions:

DatasetImport_workflow_definition=Chain([lambda_state_createdataset,
                                   wait_state_dataset,
                                   lambda_state_datasetimportjob,
                                   wait_state_datasetimportjob,
                                   lambda_state_datasetimportjob_status,
                                   datasetimportjob_choice_state
                                  ])

The following screenshot shows the dataset Import workflow view.

The following screenshot shows the dataset Import workflow view.

Recipe and solution workflow

Run the notebook to generate a workflow definition for dataset creation using the Lambda function state defined by Step Functions:

Create_receipe_sol_workflow_definition=Chain([lambda_state_select_receipe_create_solution,
                                   wait_state_receipe,
                                   lambda_create_solution_version,
                                   wait_state_solutionversion,
                                   lambda_state_solutionversion_status,
                                   solutionversion_choice_state
                                  ])

The following screenshot shows the create recipe workflow view.

The following screenshot shows the create recipe workflow view.

Campaign workflow

Run the notebook to generate a workflow definition for dataset creation using the Lambda function state defined by Step Functions:

Create_Campaign_workflow_definition=Chain([lambda_create_campaign,
                                   wait_state_campaign,
                                   lambda_state_campaign_status,
                                   wait_state_datasetimportjob,
                                   create_campaign_choice_state
                                  ])

The following screenshot shows the campaign workflow view.

The following screenshot shows the campaign workflow view.

Main workflow

Now we automate all four workflow definitions by combining them into a single workflow definition in order to automate all the task states to create the dataset, recipe, solution, and campaign in Amazon Personalize, including the wait times and choice states. Run the following notebook cell to generate a main workflow definition:

Main_workflow_definition=Chain([call_dataset_workflow_state,
                                call_datasetImport_workflow_state,
                                call_receipe_solution_workflow_state,
                                call_campaign_solution_workflow_state
                               ])

Running the workflow

Run the following code to trigger an Amazon Personalize automated workflow to create a dataset, recipe, solution, and campaign using the underlying Lambda function and Step Functions states:

Main_workflow_execution = Main_workflow.execute()
Main_workflow_execution.render_progress()

Run the following code to trigger an Amazon Personalize automated workflow to create a dataset

 

Dataset Workflow Dataset Import Workflow
Create recipe and Solution Workflow Campaign Workflow

Alternatively you can Inspect in AWS Step Functions console to see the status of each step in progress.

To see detailed progress of each step whether its success or failed run the code below:

Main_workflow_execution.list_events(html=True)

To see detailed progress of each step whether its success or failed run the code below:

Note: Wait till the above execution finishes.

Generating recommendations

Now that you have a successful campaign, you can use it to generate a recommendation workflow. The following screenshot shows your recommendation workflow view.

The following screenshot shows your recommendation workflow view.

Running the recommendation workflow

Run the following code to trigger a recommendation workflow using the underlying Lambda function and Step Functions states:

recommendation_workflow_execution = recommendation_workflow.execute()
recommendation_workflow_execution.render_progress()

Run the following code to trigger a recommendation workflow using the underlying Lambda function and Step Functions states:

Use the following code to display the movie recommendations generated by the workflow:

item_list = recommendation_workflow_execution.get_output()['Payload']['item_list']

print("Recommendations:")
for item in item_list:
    np.int(item['itemId'])
    item_title = items.loc[items['ITEM_ID'] == np.int(item['itemId'])].values[0][-1]
    print(item_title)

You can find the complete notebook for this solution in the GitHub repo.

Cleaning up

Make sure to clean up the Amazon Personalize and the state machines created in this post to avoid incurring any charges. On the Amazon Personalize console, delete these resources in the following order:

  1. Campaign
  2. Receipts
  3. Solutions
  4. Dataset
  5. Dataset groups

Summary

Amazon Personalize makes it easy for developers to add highly personalized recommendations to customers who use their applications. It uses the same machine learning (ML) technology used by Amazon.com for real-time personalized recommendations – no ML expertise required. This post shows how to use the AWS Step Functions Data Science SDK to automate the process of orchestrating Amazon Personalize workflows such as such as dataset group, dataset, dataset import job, solution, solution version, campaign, and recommendations. This automation of creating a recommender function using Step Functions can improve the CI/CD pipeline of the model training and deployment process.

For additional technical documentation and example notebooks related to the SDK, see Introducing the AWS Step Functions Data Science SDK for Amazon SageMaker.


About the Authors

Neel SendasNeel Sendas is a Senior Technical Account Manager at Amazon Web Services. Neel works with enterprise customers to design, deploy, and scale cloud applications to achieve their business goals. He has worked on various ML use cases, ranging from anomaly detection to predictive product quality for manufacturing and logistics optimization. When he is not helping customers, he dabbles in golf and salsa dancing.

 

Mona Mona is an AI/ML Specialist Solutions Architect based out of Arlington, VA. She works with the World Wide Public Sector team and helps customers adopt machine learning on a large scale. She is passionate about NLP and ML explain-ability areas in AI/ML.

Read More