Automate feature engineering pipelines with Amazon SageMaker

The process of extracting, cleaning, manipulating, and encoding data from raw sources and preparing it to be consumed by machine learning (ML) algorithms is an important, expensive, and time-consuming part of data science. Managing these data pipelines for either training or inference is a challenge for data science teams, however, and can take valuable time away that could be better used towards experimenting with new features or optimizing model performance with different algorithms or hyperparameter tuning.

Many ML use cases such as churn prediction, fraud detection, or predictive maintenance rely on models trained from historical datasets that build up over time. The set of feature engineering steps a data scientist defined and performed on historical data for one time period needs to be applied towards any new data after that period, as models trained from historic features need to make predictions on features derived from the new data. Instead of manually performing these feature transformations on new data as it arrives, data scientists can create a data preprocessing pipeline to perform the desired set of feature engineering steps that runs automatically whenever new raw data is available. Decoupling the data engineering from the data science in this way can be a powerful time-saving practice when done well.

Workflow orchestration tools like AWS Step Functions or Apache Airflow are typically used by data engineering teams to build these kinds of extract, transform, and load (ETL) data pipelines. Although these tools offer comprehensive and scalable options to support many data transformation workloads, data scientists may prefer to use a toolset specific to ML workloads. Amazon SageMaker supports the end-to-end lifecycle for ML projects, including simplifying feature preparation with SageMaker Data Wrangler and storage and feature serving with SageMaker Feature Store.

In this post, we show you how a data scientist working on a new ML use case can use both Data Wrangler and Feature Store to create a set of feature transformations, perform them over a historical dataset, and then use SageMaker Pipelines to automatically transform and store features as new data arrives daily.

For more information about SageMaker Data Wrangler, Feature Store, and Pipelines, we recommend the following resources:

Overview of solution

The following diagram shows an example end-to-end process from receiving a raw dataset to using the transformed features for model training and predictions. This post describes how to set up your architecture such that each new dataset arriving in Amazon Simple Storage Service (Amazon S3) automatically triggers a pipeline that performs a set of predefined transformations with Data Wrangler and stores the resulting features in Feature Store. You can visit our code repo to try it out in your own account.

Before we set up the architecture for automating feature transformations, we first explore the historical dataset with Data Wrangler, define the set of transformations we want to apply, and store the features in Feature Store.

Dataset

To demonstrate feature pipeline automation, we use an example of preparing features for a flight delay prediction model. We use flight delay data from the US Department of Transportation’s Bureau of Transportation Statistics (BTS), which tracks the on-time performance of domestic US flights. After you try out the approach with this example, you can experiment with the same pattern on your own datasets.

Each record in the flight delay dataset contains information such as:

  • Flight date
  • Airline details
  • Origin and destination airport details
  • Scheduled and actual times for takeoff and landing
  • Delay details

Once the features have been transformed, we can use them to train a machine learning model to predict future flight delays.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Upload the historical dataset to Amazon S3

Our code repo provides a link to download the raw flight delay dataset used in this example. The directory flight-delay-data contains two CSV files covering two time periods with the same columns. One file contains flight data from Jan 1, 2020, through March 30, 2020. The second file contains flight data for a single day: March 31, 2020. We use the first file for the initial feature transformations. We use the second file to test our feature pipeline automation. In this example, we store the raw dataset in the default S3 bucket associated with our Studio domain, but this isn’t required.

Feature engineering with Data Wrangler

Whenever a data scientist starts working on a new ML use case, the first step is typically to explore and understand the available data. Data Wrangler provides a fast and easy way to visually inspect datasets and perform exploratory data analysis. In this post, we use Data Wrangler within the Studio IDE to analyze the airline dataset and create the transformations we later automate.

A typical model may have dozens or hundreds of features. To keep our example simple, we show how to create the following feature engineering steps using Data Wrangler:

  • One-hot encoding the airline carrier column
  • Adding a record identifier feature and an event timestamp feature, so that we can export to Feature Store
  • Adding a feature with the aggregate daily count of delays from each origin airport

Data Wrangler walkthrough

To start using Data Wrangler, complete the following steps:

  1. In a Studio domain, on the Launcher tab, choose New data flow.
  2. Import the flight delay dataset jan01_mar30_2020.csv from its location in Amazon S3.

Data Wrangler shows you a preview of the data before importing.

  1. Choose Import dataset.

You’re ready to begin exploring and feature engineering.

Because ML algorithms typically require all input features to be numeric for training and inference, it’s common to transform categorical features into a numerical representation. Here we use one-hot encoding for the airline carrier column, which transforms it into several binary columns, one for each airline carrier present in the data.

  1. Choose the + icon next to the dataset and choose Add Transform.
  2. For the field OP_UNIQUE_CARRIER, select one-hot encoding.
  3. Under Encode Categorical, for Output Style, choose Columns.

Feature Store requires a unique RecordIdentifier field for each record ingested into the store, so we add a new column to our dataset, RECORD_ID which is a concatenation of four fields: OP_CARRIER_FL_NUM, ORIGIN, DEP_TIME, and DEST. It also requires an EventTime feature for each record, so we add a timestamp to FL_DATE in a new column called EVENT_TIME. Here we use Data Wrangler’s custom transform option with Pandas:

df['RECORD_ID']= df['OP_CARRIER_FL_NUM'].astype(str) +df['ORIGIN']+df['DEP_TIME'].astype(str)+df['DEST']

df['EVENT_TIME']=df['FL_DATE'].astype(str)+'T00:00:00Z'

To predict delays for certain flights each day, it’s useful to create aggregated features based on the entities present in the data over different time windows. Providing an ML algorithm with these kinds of features can deliver a powerful signal over and above what contextual information is available for a single record in this raw dataset. Here, we calculate the number of delayed flights from each origin airport over the last day using Data Wrangler’s custom transform option with PySpark SQL:

SELECT *, SUM(ARR_DEL15) OVER w1 as NUM_DELAYS_LAST_DAY
FROM df WINDOW w1 AS (PARTITION BY ORIGIN order by 
cast('EVENT_TIME' AS timestamp) 
RANGE INTERVAL 1 DAY PRECEDING)

In a real use case, we’d likely spend a lot of time at this stage exploring the data, defining transformations, and creating more features. After defining all of the transformations to perform over the dataset, you can export the resulting ML features to Feature Store.

  1. On the Export tab, choose </> under Steps. This displays a list of all the steps you have created.
  2. Choose the last step, then choose Export Step.
  3. On the Export Step drop-down menu, choose Feature Store.

SageMaker generates a Jupyter notebook for you and opens it in a new tab in Studio. This notebook contains everything needed to run the transformations over our historical dataset and ingest the resulting features into Feature Store.

Store features in Feature Store

Now that we’ve defined the set of transformations to apply to our dataset, we need to perform them over the set of historical records and store them in Feature Store, a purpose-built store for ML features, so that we can easily discover and reuse them without needing to reproduce the same transformations from the raw dataset as we have done here. For more information about the capabilities of Feature Store, see Understanding the key capabilities of Amazon SageMaker Feature Store.

Running all code cells in the notebook created in the earlier section completes the following:

  • Creates a feature group
  • Runs a SageMaker Processing job that uses our historical dataset and defined transformations from Data Wrangler as input
  • Ingests the newly transformed historical features into Feature Store
  1. Select the kernel Python 3 (Data Science) in the newly opened notebook tab.
  2. Read through and explore the Jupyter notebook.
  3. In the Create FeatureGroup section of the generated notebook, update the following fields for event time and record identifier with the column names we created in the previous Data Wrangler step (if using your own dataset, your names may differ):
record_identifier_name = "RECORD_ID"
event_time_feature_name = "EVENT_TIME"
  1. Choose Run and then choose Run All Cells.

Automate data transformations for future datasets

After the Processing job is complete, we’re ready to move on to creating a pipeline that is automatically triggered when new data arrives in Amazon S3, which reproduces the same set of transformations on the new data and constantly refreshes the Feature Store, without any manual intervention needed.

  1. Open a new terminal in Studio and clone our repo by running git clone https://github.com/aws-samples/amazon-sagemaker-automated-feature-transformation.git
  2. Open the Jupyter notebook called automating-feature-transformation-pipeline.ipynb in a new tab

This notebook walks through the process of creating a new pipeline that runs whenever any new data arrives in the designated S3 location.

  1. After running the code in that notebook, we upload one new day’s worth of flight delay data, mar31_2020.csv, to Amazon S3.

A run of our newly created pipeline is automatically triggered to create features from this data and ingest them into Feature Store. You can monitor progress and see past runs on the Pipelines tab in Studio.

Our example pipeline only has one step to perform feature transformations, but you can easily add subsequent steps like model training, deployment, or batch predictions if it fits your particular use case. For a more in-depth look at SageMaker Pipelines, see Building, automating, managing, and scaling ML workflows using Amazon SageMaker Pipelines.

We use an S3 event notification with a AWS Lambda function destination to trigger a run of the feature transformation pipeline, but you can also schedule pipeline runs using Amazon EventBridge, which enables you to automate pipelines to respond automatically to events such as training job or endpoint status changes, or even configure your feature pipeline to run on a specific schedule.

Conclusion

In this post, we showed how you can use a combination of Data Wrangler, Feature Store, and Pipelines to transform data as it arrives in Amazon S3 and store the engineered features automatically into Feature Store. We hope you try this solution and let us know what you think. We’re always looking forward to your feedback, either through your usual AWS support contacts or on the SageMaker Discussion Forum.


About the Authors

Muhammad Khas is a Solutions Architect working in the Public Sector team at Amazon Web Services. He enjoys supporting customers in using artificial intelligence and machine learning to enhance their decision-making. Outside of work, Muhammad enjoys swimming and horseback riding.

 

 

Megan Leoni is an AI/ML Specialist Solutions Architect for AWS, helping customers across Europe, Middle East, and Africa design and implement ML solutions. Prior to joining AWS, Megan worked as a data scientist building and deploying real-time fraud detection models.

 

 

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build AI/ML solutions. Mark’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Mark holds six AWS certifications, including the ML Specialty Certification. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services.

Read More

Learn how the winner of the AWS DeepComposer Chartbusters Keep Calm and Model On challenge used Transformer algorithms to create music

AWS is excited to announce the winner of the AWS DeepComposer Chartbusters Keep Calm and Model On challenge, Nari Koizumi. AWS DeepComposer gives developers a creative way to get started with machine learning (ML) by creating an original piece of music in collaboration with artificial intelligence (AI). In June 2020, we launched Chartbusters, a global competition where developers use AWS DeepComposer to create original AI-generated compositions and compete to showcase their ML skills. The Keep Calm and Model On challenge, which ran from December 2020 to January 2021, challenged developers to use the newly launched Transformers algorithm to extend an input melody by up to 20 seconds to create new and interesting musical scores from an input melody.

We interviewed Nari to learn more about his experience competing in the Keep Calm and Model On Chartbusters challenge, and asked him to tell us more about how he created his winning composition.

Learning about AWS DeepComposer

Nari currently works in the TV and media industry and describes himself as a creator. Before getting started with AWS DeepComposer, Nari had no prior ML experience.

“I have no educational background in machine learning, but I’m an artist and creator. I always look for artificial intelligence services for creative purposes. I’m working on a project, called Project 52, which is about making artwork everyday. I always set a theme each month, and this month’s theme was about composition and audio visualization.”

Nari discovered AWS DeepComposer when he was gathering ideas for his new project.

“I was searching one day for ‘AI composition music’, and that’s how I found out about AWS DeepComposer. I knew that AWS had many, many services and I was surprised that AWS was doing something with entertainment and AI.”

Nari at his work station.

Building in AWS DeepComposer

Nari saw AWS DeepComposer as an opportunity to see how he could combine his creative side with his interest in learning more about AI. To get started, Nari first played around in the AWS DeepComposer Music Studio and used the learning capsules provided to understand the generative AI models offered by AWS DeepComposer.

“I thought AWS DeepComposer was very easy to use and make music. I checked through all the learning capsules and pages to help get started.”

For the Keep Calm and Model On Chartbusters challenge, participants were challenged to use the newly launched Transformers algorithm, which can extend an input melody by up to 30 seconds. The Transformer is a state-of-the-art model that works with sequential data such as predicting stock prices, or natural language tasks such as translation. Learn more about the Transformer technique in the learning capsule provided on the AWS DeepComposer console.

“I used my keyboard and connected it to the Music Studio, and made a short melody and recorded in the Music Studio. What’s interesting is you can extend your own melody using Transformers and it will make a 30-second song from only 5 seconds of input. That was such an interesting moment for me; how I was able to input a short melody, and AI created the rest of the song.”

The Transformers feature used in Nari’s composition in the AWS DeepComposer Music Studio.

After playing around with his keyboard, Nari chose one of the input melodies. The Transformers model allows developers to experiment with parameters such as creative risk, track length, and note length.

“I chose one of the melodies provided, and then played around with a couple parameters. I made seven songs, and tweaked until I liked the final output. You can also export the MIDI file and continue to play around with parts of the song. That was a fun part, because I exported the file and continued to play with the melody to customize with other instruments. It was so much fun playing around and making different sounds.”

Nari composing his melody.

You can listen to Nari’s winning composition “P.S. No. 11 Ext.” on the AWS DeepComposer SoundCloud page. Check out Nari’s Instagram, where he created audio visualization to one of the tracks he created using AWS DeepComposer.

Conclusion

Nari found competing in the challenge to be a rewarding experience because he was able to go from no experience in ML to developing an understanding of generative AI in less than an hour.

“What’s great about AWS DeepComposer is it’s easy to use. I think AWS has so many services and many can be hard or intimidating to get started with for those who aren’t programmers. When I first found out about AWS DeepComposer, I knew it was exciting. But at the same time, I thought it was AWS and I’m not an engineer and I wasn’t sure if I had the knowledge to get started. But even the setup was super easy, and it took only 15 minutes to get started, so it was very easy to use.”

Nari is excited to see how AI will continue to transform the creative industry.

“Even though I’m not an engineer or programmer, I know that AI has huge potential for creative purposes. I think it’s getting more interesting in creating artwork with AI. There’s so much potential with AI not just within music, but also in the media world in general. It’s a pretty exciting future.”

By participating in the challenge, Nari hopes that he will inspire future participants to get started in ML.

“I’m on the creative side, so I hope I can be a good example that someone who’s not an engineer or programmer can create something with AWS DeepComposer. Try it out, and you can do it!”

Congratulations to Nari for his well-deserved win!

We hope Nari’s story inspired you to learn more about ML and AWS DeepComposer. Check out the new skill-based AWS DeepComposer Chartbusters challenge and start composing today.


About the Authors

Paloma Pineda is a Product Marketing Manager for AWS Artificial Intelligence Devices. She is passionate about the intersection of technology, art, and human centered design. Out of the office, Paloma enjoys photography, watching foreign films, and cooking French cuisine.

Read More

KELM: Integrating Knowledge Graphs with Language Model Pre-training Corpora

Posted by Siamak Shakeri, Staff Software Engineer and Oshin Agarwal, Research Intern, Google Research

Large pre-trained natural language processing (NLP) models, such as BERT, RoBERTa, GPT-3, T5 and REALM, leverage natural language corpora that are derived from the Web and fine-tuned on task specific data, and have made significant advances in various NLP tasks. However, natural language text alone represents a limited coverage of knowledge, and facts may be contained in wordy sentences in many different ways. Furthermore, existence of non-factual information and toxic content in text can eventually cause biases in the resulting models.

Alternate sources of information are knowledge graphs (KGs), which consist of structured data. KGs are factual in nature because the information is usually extracted from more trusted sources, and post-processing filters and human editors ensure inappropriate and incorrect content are removed. Therefore, models that can incorporate them carry the advantages of improved factual accuracy and reduced toxicity. However, their different structural format makes it difficult to integrate them with the existing pre-training corpora in language models.

In “Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training” (KELM), accepted at NAACL 2021, we explore converting KGs to synthetic natural language sentences to augment existing pre-training corpora, enabling their integration into the pre-training of language models without architectural changes. To that end, we leverage the publicly available English Wikidata KG and convert it into natural language text in order to create a synthetic corpus. We then augment REALM, a retrieval-based language model, with the synthetic corpus as a method of integrating natural language corpora and KGs in pre-training. We have released this corpus publicly for the broader research community.

Converting KG to Natural Language Text
KGs consist of factual information represented explicitly in a structured format, generally in the form of [subject entity, relation, object entity] triples, e.g., [10×10 photobooks, inception, 2012]. A group of related triples is called an entity subgraph. An example of an entity subgraph that builds on the previous example of a triple is { [10×10 photobooks, instance of, Nonprofit Organization], [10×10 photobooks, inception, 2012] }, which is illustrated in the figure below. A KG can be viewed as interconnected entity subgraphs.

Converting subgraphs into natural language text is a standard task in NLP known as data-to-text generation. Although there have been significant advances on data-to-text-generation on benchmark datasets such as WebNLG, converting an entire KG into natural text has additional challenges. The entities and relations in large KGs are more vast and diverse than small benchmark datasets. Moreover, benchmark datasets consist of predefined subgraphs that can form fluent meaningful sentences. With an entire KG, such a segmentation into entity subgraphs needs to be created as well.

An example illustration of how the pipeline converts an entity subgraph (in bubbles) into synthetic natural sentences (far right).

In order to convert the Wikidata KG into synthetic natural sentences, we developed a verbalization pipeline named “Text from KG Generator” (TEKGEN), which is made up of the following components: a large training corpus of heuristically aligned Wikipedia text and Wikidata KG triples, a text-to-text generator (T5) to convert the KG triples to text, an entity subgraph creator for generating groups of triples to be verbalized together, and finally, a post-processing filter to remove low quality outputs. The result is a corpus containing the entire Wikidata KG as natural text, which we call the Knowledge-Enhanced Language Model (KELM) corpus. It consists of ~18M sentences spanning ~45M triples and ~1500 relations.

Converting a KG to natural language, which is then used for language model augmentation

Integrating Knowledge Graph and Natural Text for Language Model Pre-training
Our evaluation shows that KG verbalization is an effective method of integrating KGs with natural language text. We demonstrate this by augmenting the retrieval corpus of REALM, which includes only Wikipedia text.

To assess the effectiveness of verbalization, we augment the REALM retrieval corpus with the KELM corpus (i.e., “verbalized triples”) and compare its performance against augmentation with concatenated triples without verbalization. We measure the accuracy with each data augmentation technique on two popular open-domain question answering datasets: Natural Questions and Web Questions.

Augmenting REALM with even the concatenated triples improves accuracy, potentially adding information not expressed in text explicitly or at all. However, augmentation with verbalized triples allows for a smoother integration of the KG with the natural language text corpus, as demonstrated by the higher accuracy. We also observed the same trend on a knowledge probe called LAMA that queries the model using fill-in-the-blank questions.

Conclusion
With KELM, we provide a publicly-available corpus of a KG as natural text. We show that KG verbalization can be used to integrate KGs with natural text corpora to overcome their structural differences. This has real-world applications for knowledge-intensive tasks, such as question answering, where providing factual knowledge is essential. Moreover, such corpora can be applied in pre-training of large language models, and can potentially reduce toxicity and improve factuality. We hope that this work encourages further advances in integrating structured knowledge sources into pre-training of large language models.

Acknowledgements
This work has been a collaborative effort involving Oshin Agarwal, Heming Ge, Siamak Shakeri and Rami Al-Rfou. We thank William Woods, Jonni Kanerva, Tania Rojas-Esponda, Jianmo Ni, Aaron Cohen and Itai Rolnick for rating a sample of the synthetic corpus to evaluate its quality. We also thank Kelvin Guu for his valuable feedback on the paper.

Read More

GFN Thursday Plunges Into ‘Phantom Abyss,’ the New Adventure Announced by Devolver Digital

GFN Thursday returns with a brand new adventure, exploring the unknown in Phantom Abyss, announced just moments ago by Devolver Digital and Team WIBY. The game launches on PC this summer, and when it does, it’ll be streaming instantly to GeForce NOW members.

No GFN Thursday would be complete without new games. And this week is chock full of new-new ones. Five PC games launching this week are joining the GeForce NOW library — including Saints Row: The Third Remastered — plus twelve additional titles.

It’s a good week to have your head in the clouds.

Adventure From the Clouds

Devolver and Team WIBY have an exciting spin on dungeon-delving, and they’re bringing it to GeForce NOW members at release.

“Phantom Abyss and GeForce NOW are a perfect match,” said Josh Sanderson, lead programmer at Team WIBY. “We can’t wait for gamers to explore each dungeon, and now those adventurers can stream from the cloud even if their gaming PC isn’t up to par.”

Members who purchase the game on Steam will be able to play Phantom Abyss across nearly all of their devices, at legendary GeForce quality. Away from your rig but looking for a challenge? Stream to your Android mobile device, Chromebook or any other supported device. You can bring the action with you, wherever you are.

Read on to learn more about Devolver’s freshly announced Phantom Abyss.

Phantom Abyss on GeForce NOW
You’ll face perils in each of Phantom Abyss’ tombs.

Temple of Zoom

Phantom Abyss is a massive asynchronous multiplayer game that casts players into procedurally generated temples and tasks them with retrieving the sacred relics hidden within deadly chambers.

You’ll need to dodge hidden traps, leap across chasms and defeat foes as you navigate each labyrinth to claim the relic at the end. Oh, and you only get one chance per temple. Luckily, you’re joined on your quest by the phantoms of adventurers who tried before you, and you can learn from their run to aid your own.

Phantom Abyss on GeForce NOW
To survive, you’ll need to learn from those who came before you.

Explore the perilous halls and colossal rooms of each temple alongside the phantoms of fallen players that came before you. Use their successes and failures to your advantage to progress deeper than they ever could’ve hoped. Watch and learn from the mistakes of up to 50 phantoms, including your Steam friends who have attempted the same temple, and steal their whips as they fall.

Those stolen whips matter, as they carry minor blessings to aid you on your run. But beware: they’re also cursed, and balancing the banes and boons is key to reaching your ultimate prize.

Phantom Abyss on GeForce NOW
The competition if fierce. Succeed where others have failed, and the treasures of each tomb will be yours.

As you progress on each run, you’ll recover keys from chests to unlock deeper, deadlier sections of the temple that house more coveted relics. The more difficult a relic is to obtain, the greater the reward.

And if you succeed, the legendary relic at the bottom of each temple will be yours, sealing the temple and cementing your legacy.

Exclusive Tips & Tricks

The folks at Devolver shared a few tips and tricks to help you start your adventure in Phantom Abyss.

Before each run you can select between two standard whips and one legendary one that can be acquired in exchange for tokens that you’ve found in previous runs. Select carefully though, as each whip has its own blessings and curses, so it’s important to find the whip that boosts your play style.

The phantoms of ghosts aren’t just fun to watch meet their demise, they can be a helpful guide in your run! Phantoms can set off traps for you, which can be advantageous but also unexpected, so stay on your toes. If a phantom dies in front of you, you can pick up its whip if you find that it’s more beneficial to you on the run.

The guardians are relentless so always keep them in mind the deeper you get into a temple — they tend to cause complete chaos when in the same room as you!

Each temple has different levels and as players move down they can choose to take a more common relic and secure their lesser success or choose a different door to venture further into the Caverns and then the Inferno for even more treasure and glory.

Remastered and Returning

Saints Row The Third Remastered on GeForce NOW
The remastered version of Volition’s classic returns to GeForce NOW.

Saints Row: The Third Remastered is both joining and returning to GeForce NOW. The game launches on Steam later this week and we’ll be working to bring it to GeForce NOW after it’s released. Aligned to the launch, the Epic Games Store version of the game will make its triumphant return to GeForce NOW.

The remastered edition includes enhanced graphics, plus all DLC included – the three expansion mission packs and 30 pieces of DLC from the original version.

Get Your Game On

That’s only the beginning. GFN Thursday means more games, and this week’s list includes four more day-and-date releases. Members can look forward to the following this week:

  • Snowrunner (day-and-date release on Steam, May 17)
  • Siege Survival Gloria Victis (day-and-date release on Steam, May 18)
  • Just Die Already (day-and-date release on Steam, May 20)
  • 41 Hours (day-and-date release on Steam, May 21)
  • Saints Row: The Third Remastered (Steam, May 22 and Epic Games Store)
  • Bad North (Steam)
  • Beyond Good & Evil (Ubisoft Connect)
  • Chess Ultra (Steam)
  • Groove Coaster (Steam)
  • Hearts of Iron 2: Complete (Steam)
  • Monster Prom (Steam)
  • OneShot (Steam)
  • Outlast 2 (Steam)
  • Red Wings: Aces of the Sky (Steam)
  • Space Invaders Extreme (Steam)
  • Warlock: Master of the Arcane (Steam)
  • WRC 8 Fia World Rally Championship (Epic Games Store)

Ready to brave the abyss on GeForce NOW this summer? Join the conversation on Twitter or in the comments below.

The post GFN Thursday Plunges Into ‘Phantom Abyss,’ the New Adventure Announced by Devolver Digital appeared first on The Official NVIDIA Blog.

Read More

11 ways we’re innovating with AI

AI is integral to so much of the work we do at Google. Fundamental advances in computing are helping us confront some of the greatest challenges of this century, like climate change. Meanwhile, AI is also powering updates across our products, including Search, Maps and Photos — demonstrating how machine learning can improve your life in both big and small ways. 

In case you missed it, here are some of the AI-powered updates we announced at Google I/O.


LaMDA is a breakthrough in natural language understanding for dialogue.

Human conversations are surprisingly complex. They’re grounded in concepts we’ve learned throughout our lives; are composed of responses that are both sensible and specific; and unfold in an open-ended manner. LaMDA — short for “Language Model for Dialogue Applications” — is a machine learning model designed for dialogue and built on Transformer, a neural network architecture that Google invented and open-sourced. We think that this early-stage research could unlock more natural ways of interacting with technology and entirely new categories of helpful applications. Learn more about LaMDA.


And MUM, our new AI language model, will eventually help make Google Search a lot smarter.

In 2019 we launched BERT, a Transformer AI model that can better understand the intent behind your Search queries. Multitask Unified Model (MUM), our latest milestone, is 1000x more powerful than BERT. It can learn across 75 languages at once (most AI models train on one language at a time), and it can understand information across text, images, video and more. We’re still in the early days of exploring MUM, but the goal is that one day you’ll be able to type a long, information-dense, and natural sounding query like “I’ve hiked Mt. Adams and now want to hike Mt. Fuji next fall, what should I do differently to prepare?” and more quickly find relevant information you need. Learn more about MUM.

 

Project Starline will help you feel like you’re there, together.

Imagine looking through a sort of magic window. And through that window, you see another person, life-size, and in three dimensions. You can talk naturally, gesture and make eye contact.

  • A woman communicates with her sister and baby using Project Starline.

    We brought in people to reconnect using Project Starline.

  • A woman communicates with her friend using Project Starline.

    We brought in people to reconnect using Project Starline.

  • A woman and a man communicate using sign language using Project Starline.

    We brought in people to reconnect using Project Starline.

  • A woman communicates with her friend using Project Starline.

    We brought in people to reconnect using Project Starline.

Project Starline is a technology project that combines advances in hardware and software to enable friends, family and co-workers to feel together, even when they’re cities (or countries) apart. To create this experience, we’re applying research in computer vision, machine learning, spatial audio and real-time compression. And we’ve developed a light field display system that creates a sense of volume and depth without needing additional glasses or headsets. It feels like someone is sitting just across from you, like they’re right there. Learn more about Project Starline.

Within a decade, we’ll build the world’s first useful, error-corrected quantum computer. And our new Quantum AI campus is where it’ll happen. 

Confronting many of the world’s greatest challenges, from climate change to the next pandemic, will require a new kind of computing. A useful, error-corrected quantum computer will allow us to mirror the complexity of nature, enabling us to develop new materials, better batteries, more effective medicines and more. Our new Quantum AI campus — home to research offices, a fabrication facility, and our first quantum data center — will help us build that computer before the end of the decade. Learn more about our work on the Quantum AI campus.


Maps will help reduce hard-braking moments while you drive.

Soon, Google Maps will use machine learning to reduce your chances of experiencing hard-braking moments — incidents where you slam hard on your brakes, caused by things like sudden traffic jams or confusion about which highway exit to take. 

When you get directions in Maps, we calculate your route based on a lot of factors, like how many lanes a road has or how direct the route is. With this update, we’ll also factor in the likelihood of hard-braking. Maps will identify the two fastest route options for you, and then we’ll automatically recommend the one with fewer hard-braking moments (as long as your ETA is roughly the same). We believe these changes have the potential to eliminate over 100 million hard-braking events in routes driven with Google Maps each year. Learn more about our updates to Maps.


Your Memories in Google Photos will become even more personalized.

With Memories, you can already look back on important photos from years past or highlights from the last week. Using machine learning, we’ll soon be able to identify the less-obvious patterns in your photos. Starting later this summer, when we find a set of three or more photos with similarities like shape or color, we’ll highlight these little patterns for you in your Memories. For example, Photos might identify a pattern of your family hanging out on the same couch over the years — something you wouldn’t have ever thought to search for, but that tells a meaningful story about your daily life. Learn more about our updates to Google Photos.


And Cinematic moments will bring your pictures to life.

When you’re trying to get the perfect photo, you usually take the same shot two or three (or 20) times. Using neural networks, we can take two nearly identical images and fill in the gaps by creating new frames in between. This creates vivid, moving images called Cinematic moments. 

Producing this effect from scratch would take professional animators hours, but with machine learning we can automatically generate these moments and bring them to your Recent Highlights. Best of all, you don’t need a specific phone; Cinematic moments will come to everyone across Android and iOS. Learn more about Cinematic moments in Google Photos.

Two very similar pictures of a child and their baby sibling get transformed into a moving image thanks to AI.

Cinematic moments bring your pictures to life, thanks to AI.

New features in Google Workspace help make collaboration more inclusive. 

In Google Workspace, assisted writing will suggest more inclusive language when applicable. For example, it may recommend that you use the word “chairperson” instead of “chairman” or “mail carrier” instead of “mailman.” It can also give you other stylistic suggestions to avoid passive voice and offensive language, which can speed up editing and help make your writing stronger. Learn more about our updates to Workspace.

Google Shopping shows you the best products for your particular needs, thanks to our Shopping Graph.

To help shoppers find what they’re looking for, we need to have a deep understanding of all the products that are available, based on information from images, videos, online reviews and even inventory in local stores. Enter the Shopping Graph: our AI-enhanced model tracks products, sellers, brands, reviews, product information and inventory data — as well as how all these attributes relate to one another. With people shopping across Google more than a billion times a day, the Shopping Graph makes those sessions more helpful by connecting people with over 24 billion listings from millions of merchants across the web. Learn how we’re working with merchants to give you more ways to shop.

A dermatology assist tool can help you figure out what’s going on with your skin.

Each year we see billions of Google Searches related to skin, nail and hair issues, but it can be difficult to describe what you’re seeing on your skin through words alone.

With our CE marked AI-powered dermatology assist tool, a web-based application that we aim to make available for early testing in the EU later this year, it’s easier to figure out what might be going on with your skin. Simply use your phone’s camera to take three images of the skin, hair or nail concern from different angles. You’ll then be asked questions about your skin type, how long you’ve had the issue and other symptoms that help the AI to narrow down the possibilities. The AI model analyzes all of this information and draws from its knowledge of 288 conditions to give you a list of possible conditions that you can then research further. It’s not meant to be a replacement for diagnosis, but rather a good place to start. Learn more about our AI-powered dermatology assist tool.

And AI could help improve screening for tuberculosis.

Tuberculosis (TB) is one of the leading causes of death worldwide, infecting 10 million people per year and disproportionately impacting people in low-to-middle-income countries. It’s also really tough to diagnose early because of how similar symptoms are to other respiratory diseases. Chest X-rays help with diagnosis, but experts aren’t always available to read the results. That’s why the World Health Organization (WHO) recently recommended using technology to help with screening and triaging for TB. Researchers at Google are exploring how AI can be used to identify potential TB patients for follow-up testing, hoping to catch the disease early and work to eradicate it. Learn more about our ongoing research into tuberculosis screening.

Read More

High Fidelity Pose Tracking with MediaPipe BlazePose and TensorFlow.js

Posted by Ivan Grishchenko, Valentin Bazarevsky and Na Li, Google Research

Today we’re excited to launch MediaPipe‘s BlazePose in our new pose-detection API. BlazePose is a high-fidelity body pose model designed specifically to support challenging domains like yoga, fitness and dance. It can detect 33 keypoints, extending the 17 keypoint topology of the original PoseNet model we launched a couple of years ago. These additional keypoints provide vital information about face, hands, and feet location with scale and rotation. Together with our face and hand models they can be used to unlock various domain-specific applications like gesture control or sign language without special hardware. With today’s release we enable developers to use the same models on the web that are powering MLKit Pose and MediaPipe Python unlocking the same great performance across all devices.

The new TensorFlow.js pose-detection API supports two runtimes: TensorFlow.js and MediaPipe. TensorFlow.js provides the flexibility and wider adoption of JavaScript, optimized for several backends including WebGL (GPU), WASM (CPU), and Node. MediaPipe capitalizes on WASM with GPU accelerated processing and provides faster out-of-the-box inference speed. The MediaPipe runtime currently lacks Node and iOS Safari support, but we’ll be adding the support soon.

Try out the live demo!

3 examples of BlazePose tracking dancing, stretching, and exercising
BlazePose can track 33 keypoints across a variety complex poses in real-time.

Installation

To use BlazePose with the new pose-detection API, you have to first decide whether to use the TensorFlow.js runtime or MediaPipe runtime. To understand the advantages of each runtime, check the performance and loading times section later in this document for further details.

For each runtime, you can use either script tag or NPM for installation.

Using TensorFlow.js runtime:

  1. Through script tag:
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-converter"></script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-webgl"></script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/pose-detection"></script>
  2. Through NPM:
    yarn add @tensorflow/tfjs-core, @tensorflow/tfjs-converter
    yarn add @tensorflow/tfjs-backend-webgl
    yarn add @tensorflow-models/pose-detection

Using MediaPipe runtime:

  1. Through script tag:
    <script src="https://cdn.jsdelivr.net/npm/@mediapipe/pose"></script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/pose-detection"></script>
  2. Through NPM:
    yarn add @mediapipe/pose
    yarn add @tensorflow-models/pose-detection

Try it yourself!

Once the package is installed, you only need to follow the few steps below to start using it. There are three variants of the model: lite, full, and heavy. The model accuracy increases from lite to heavy, while the inference speed decreases and memory footprint increases. The heavy variant is intended for applications that require high accuracy, while the lite variant is intended for latency-critical applications. The full variant is a balanced option, which is also the default option here.

Using TensorFlow.js runtime:

// Import TFJS runtime with side effects.
import '@tensorflow/tfjs-backend-webgl';
import * as poseDetection from '@tensorflow-models/pose-detection';

// Create a detector.
const detector = await poseDetection.createDetector(poseDetection.SupportedModels.BlazePose, {runtime: 'tfjs'});

Using MediaPipe runtime:

// Import MediaPipe runtime with side effects.
import '@mediapipe/pose';
import * as poseDetection from '@tensorflow-models/pose-detection';

// Create a detector.
const detector = await poseDetection.createDetector(poseDetection.SupportedModels.BlazePose, {runtime: 'mediapipe'});

You can also choose the lite or the heavy variant by setting the modelType field, as shown below:

// Create a detector.
const detector = await poseDetection.createDetector(poseDetection.SupportedModels.BlazePose, {runtime, modelType:'lite'});
// Pass in a video stream to the model to detect poses.
const video = document.getElementById('video');
const poses = await detector.estimatePoses(video);

Each pose contains 33 keypoints, with absolute x, y coordinates, confidence score and name:

console.log(poses[0].keypoints);
// Outputs:
// [
// {x: 230, y: 220, score: 0.9, name: "nose"},
// {x: 212, y: 190, score: 0.8, name: "left_eye_inner"},
// ...
// ]

Refer to our ReadMe (TFJS runtime, MediaPipe runtime) for more details about the API.

As you begin to play and develop with BlazePose, we would appreciate your feedback and contributions. If you make something using this model, tag it with #MadeWithTFJS on social media so we can find your work, as we would love to see what you create.

Model deep dive

BlazePose provides real-time human body pose perception in the browser, working up to 4 meters from the camera.

We trained BlazePose specifically for highly demanded single-person use cases like yoga, fitness, and dance which require precise tracking of challenging postures, enabling the overlay of digital content and information on top of the physical world in augmented reality, gesture control, and quantifying physical exercises.

For pose estimation, we utilize our proven two-step detector-tracker ML pipeline. Using a detector, this pipeline first locates the pose region-of-interest (ROI) within the frame. The tracker subsequently predicts all 33 pose keypoints from this ROI. Note that for video use cases, the detector is run only on the first frame. For subsequent frames we derive the ROI from the previous frame’s pose keypoints as discussed below.

BlazePose architecture.
BlazePose architecture.

BlazePose’s topology contains 33 points extending 17 points by COCO with additional points on palms and feet to provide lacking scale and orientation information for limbs, which is vital for practical applications like fitness, yoga and dance.

Since the BlazePose CVPR’2020 release, MediaPipe has been constantly improving the models’ quality to remain state-of-the-art on the web / edge for single person pose tracking. Besides running through the TensorFlow.js pose-detection API, BlazePose is also available on Android, iOS and Python via MediaPipe and ML Kit. For detailed information, read the Google AI Blog post and the model card.

BlazePose Browser Performance

TensorFlow.js continuously seeks opportunities to bring the latest and fastest runtime for browsers. To achieve the best performance for this BlazePose model, in addition to the TensorFlow.js runtime (w/ WebGL backend) we further integrated with the MediaPipe runtime via the MediaPipe JavaScript Solutions. The MediaPipe runtime leverages WASM to utilize state-of-the-art pipeline acceleration available across platforms, which also powers Google products such as Google Meet.

Inference speed:

To quantify the inference speed of BlazePose, we benchmark the model across multiple devices.

MacBook Pro 15” 2019. 

Intel core i9. 

AMD Radeon Pro Vega 20 Graphics.

(FPS)

iPhone 11

(FPS)

Pixel 5

(FPS)

Desktop 

Intel i9-10900K. Nvidia GTX 1070 GPU.

(FPS)

MediaPipe Runtime

With WASM & GPU Accel.

92 | 81 | 38

N/A

32 | 22 | N/A

160  | 140 | 98

TensorFlow.js Runtime
With WebGL backend

48 | 53 | 28

34 | 30 | N/A

 13 | 11 | 5

44 | 40 | 30

Inference speed of BlazePose across different devices and runtimes. The first number in each cell is for the lite model, and the second number is for the full model, the third number is for the heavy model. Certain model types and runtime do not work at time of this release, and we will be adding the support soon.

To see the model’s FPS on your device, try our demo. You can switch the model type and runtime live in the demo UI to see what works best for your device.

Loading times:

Bundle size can affect initial page loading experience, such as Time-To-Interactive (TTI), UI rendering, etc. We evaluate the pose-detection API and the two runtime options. The bundle size affects file fetching time and UI smoothness, because processing the code and loading them into memory will compete with UI rendering on CPU. It also affects when the model is available to make inference.

There is a difference of how things are loaded between the two runtimes. For the MediaPipe runtime, only the @tensorflow-models/pose-detection and the @mediapipe/pose library are loaded at initial page download; the runtime and the model assets are loaded when the createDetector method is called. For the TF.js runtime with WebGL backend, the runtime is loaded at initial page download; only the model assets are loaded when the createDetector method is called. The TensorFlow.js package sizes can be further reduced with a custom bundle technique. Also, if your application is currently using TensorFlow.js, you don’t need to load those packages again, models will share the same TensorFlow.js runtime. Choose the runtime that best suits your latency and bundle size requirements. A summary of loading times and bundle sizes is provided below:

Bundle Size

gzipped + minified

Average Loading Time

WiFi:

download speed 100Mbps

MediaPipe Runtime

    Initial Page Load

22.1KB

0.04s

    Initial Detector Creation:

         Runtime

1.57MB

         Lite model

10.6MB 

1.91s

         Full model

14MB

1.91s

         Heavy model

34.9MB

4.82s

TensorFlow.js Runtime

    Initial Page Load

162.6KB

0.07s

    Initial Detector Creation:

         Lite model

10.4MB

1.91s

         Full model

13.8MB

1.91s

         Heavy model

34.7MB

4.82s

Bundle size and loading time analysis for MediaPipe and TF.js runtime. The loading time is estimated based on a simulated WiFi network with 100Mbps download speed and includes time from request sent to content downloaded, see what is included in more detail here.

Looking ahead

In the future, we plan to extend TensorFlow.js pose-detection API with new features like BlazePose GHUM 3D pose. We also plan to speed up the TensorFlow.js WebGL backend to make model execution even faster. This will be achieved through repeated benchmarking and backend optimization, such as operator fusion. We will also bring Node.js support in the near future.

Acknowledgements

We would like to acknowledge our colleagues, who participated in creating BlazePose GHUM 3D: Eduard Gabriel Bazavan, Cristian Sminchisescu, Tyler Zhu, the other contributors to MediaPipe: Chuo-Ling Chang, Michael Hays and Ming Guang Yong, along with those involved with the TensorFlow.js pose-detection API: Ping Yu, Sandeep Gupta, Jason Mayes, and Masoud Charkhabi.

Read More

Maysam Moussalem teaches Googlers human-centered AI

Originally, Maysam Moussalem dreamed of being an architect. “When I was 10, I looked up to see the Art Nouveau dome over the Galeries Lafayette in Paris, and I knew I wanted to make things like that,” she says. “Growing up between Austin, Paris, Beirut and Istanbul just fed my love of architecture.” But she found herself often talking to her father, a computer science (CS) professor, about what she wanted in a career. “I always loved art and science and I wanted to explore the intersections between fields. CS felt broader to me, and so I ended up there.”

While in grad school for CS, her advisor encouraged her to apply for a National Science Foundation Graduate Research Fellowship. “Given my lack of publications at the time, I wasn’t sure I should apply,” Maysam remembers. “But my advisor gave me some of the best advice I’ve ever received: ‘If you try, you may not get it. But if you don’t try, you definitely won’t get it.’” Maysam received the scholarship, which supported her throughout grad school. “I’ll always be grateful for that advice.” 

Today, Maysam works in AI, in Google’s Machine Learning Education division and also as the co-author and editor-in-chief of the People + AI Research (PAIR) Guidebook. She’s hosting a session at Google I/O on “Building trusted AI products” as well, which you can view when it’s live at 9 am PT Thursday, May 20, as a part of Google Design’s I/O Agenda. We recently took some time to talk to Maysam about what landed her at Google, and her path toward responsible innovation.

How would you explain your job to someone who isn’t in tech?

I create different types of training, like workshops and labs for Googlers who work in machine learning and data science. I also help create guidebooks and courses that people who don’t work at Google use.

What’s something you didn’t realize would help you in your career one day?

I didn’t think that knowing seven languages would come in handy for my work here, but it did! When I was working on the externalization of the Machine Learning Crash Course, I was so happy to be able to review modules and glossary entries for the French translation!

How do you apply Google’s AI Principles in your work? 

I’m applying the AI Principles whenever I’m helping teams learn best practices for building user-centered products with AI. It’s so gratifying when someone who’s taken one of my classes tells me they had a great experience going through the training, they enjoyed learning something new and they feel ready to apply it in their work. Just like when I was an engineer, anytime someone told me the tool I’d worked on helped them do their job better and addressed their needs, it drove home the fourth AI principle: Being accountable to people. It’s so important to put people first in our work. 

This idea was really important when I was working on Google’s People + AI Research (PAIR) Guidebook. I love PAIR’s approach of putting humans at the center of product development. It’s really helpful when people in different roles come together and pool their skills to make better products. 

How did you go from being an engineer to doing what you’re doing now? 

At Google, it feels like I don’t have to choose between learning and working. There are tech talks every week, plus workshops and codelabs constantly. I’ve loved continuing to learn while working here.

Being raised by two professors also gave me a love of teaching. I wanted to share what I’d learned with others. My current role enables me to do this and use a wider range of my skills.

My background as an engineer gives me a strong understanding of how we build software at Google’s scale. This inspires me to think more about how to bring education into the engineering workflow, rather than forcing people to learn from a disconnected experience.

How can aspiring AI thinkers and future technologists prepare for a career in responsible innovation? 

Pick up and exercise a variety of skills! I’m a technical educator, but I’m always happy to pick up new skills that aren’t traditionally specific to my job. For example, I was thinking of a new platform to deliver internal data science training, and I learned how to create a prototype using UX tools so that I could illustrate my ideas really clearly in my proposal. I write, code, teach, design and I’m always interested in learning new techniques from my colleagues in other roles.

And spend time with your audience, the people who will be using your product or the coursework you’re creating or whatever it is you’re working on. When I was an engineer, I’d always look for opportunities to sit with, observe, and talk with the people who were using my team’s products. And I learned so much from this process.

Read More

Get Outta My Streams, Get Into My Car: Aston Martin Designs Immersive Extended Reality Experience for Customers

Legendary car manufacturer Aston Martin is using the latest virtual and mixed reality technologies to drive new experiences for customers and designers.

The company has worked with Lenovo to use VR and AR to deliver a unique experience that allowed customers to explore its first luxury SUV, the Aston Martin DBX, without physically being in dealerships or offices.

With the Lenovo ThinkStation P620 powered by NVIDIA RTX A6000 graphics, Aston Martin is able to serve up an immersive experience of the Aston Martin DBX. The stunning demo consists of over 10 million polygons, enabling users to view incredibly detailed, photorealistic visuals in virtual, augmented and mixed reality — collectively known as extended reality, or XR.

“It’s our partnership with Lenovo workstations — and in particular, ThinkStation P620 — which has enabled us to take this to the next level,” said Pete Freedman, vice president and chief marketing officer of Aston Martin Lagonda. “Our aim has always been to provide our customers with a truly immersive experience, one that feels like it brings them to the center of the automotive product, and we’ve only been able to do this with the NVIDIA RTX A6000.”

NVIDIA RTX Brings the XR Factor

Customers would typically visit Aston Martin dealerships, attend motor shows or tour their facilities in the U.K. to explore the latest car models. A team would walk them through the design and features in person.

But after everyone started working remotely, Aston Martin decided to take a fresh look at what’s truly possible and investigate options to take the experience directly to customers — virtually.

With the help of teams from Lenovo and Varjo, an XR headset maker, the automaker developed the demo that provides an immersive look at the new Aston Martin DBX using VR and XR.

The experience, which is rendered from the NVIDIA RTX-powered ThinkStation P620, allows virtual participants to enter the environment and see a pixel-perfect representation of the Aston Martin DBX. Customers with XR headsets can explore the virtual vehicle from anywhere in the world, and see details such as the stitching and lettering on the steering wheel, leather and chrome accents, and even the reflections within the paint.

The real-time reflections and illumination in the demo were enabled by Varjo’s pass-through mixed reality technology. The Varjo XR-3’s LiDAR with RGB Depth Fusion using NVIDIA’s Optical Flow gives users the perception that the car is in the room, seamlessly blending the real world and virtual car together.

With the NVIDIA RTX A6000, the immersive demo runs smoothly and efficiently, providing users with high-quality graphics and stunning detail.

“As you dial up the detail, you need high-end GPUs. You need large GPU frame buffers to build the most photorealistic experiences, and that’s exactly what the NVIDIA RTX A6000 delivers,” said Mike Leach, worldwide solution portfolio lead at Lenovo.

The NVIDIA RTX A6000 is based on the NVIDIA Ampere GPU architecture and delivers a 48GB frame buffer. This allows teams to create high-fidelity VR and AR experiences with consistent framerates.

Aston Martin will expand its use of VR and XR to enhance internal workflows, as well. With this new experience, the design teams can work in virtual environments and iterate more quickly earlier in the process, instead of creating costly models.

Watch Lenovo’s GTC session to hear more about Aston Martin’s story.

Learn more about NVIDIA RTX and how our latest technology is powering the most immersive environments across industries.

The post Get Outta My Streams, Get Into My Car: Aston Martin Designs Immersive Extended Reality Experience for Customers appeared first on The Official NVIDIA Blog.

Read More