Custom object detection in the browser using TensorFlow.js

A guest post by Hugo Zanini, Machine Learning Engineer

Object detection is the task of detecting where in an image an object is located and classifying every object of interest in a given image. In computer vision, this technique is used in applications such as picture retrieval, security cameras, and autonomous vehicles.

One of the most famous families of Deep Convolutional Neural Networks (DNN) for object detection is the YOLO (You Only Look Once).

In this post, we are going to develop an end-to-end solution using TensorFlow to train a custom object-detection model in Python, then put it into production, and run real-time inferences in the browser through TensorFlow.js.

This post is going to be divided into four steps, as follows:

Object detection pipeline

Prepare the data

The first step to train a great model is to have good quality data. When developing this project, I did not find a suitable (and small enough) object detection dataset, so I decided to create my own.

I looked around and saw a Kangaroo sign that I have in my bedroom — a souvenir that I bought to remember my Aussie days. So I decided to build a Kangaroo detector.

To build my dataset, I downloaded 350 kangaroo images from an image search for kangaroos and labeled all of them by hand using the LabelImg application. As we can have more than one animal per image, the process resulted in 520 labeled kangaroos.

Labelling example

In that case, I chose just one class, but the software can be used to annotate multiple classes as well. It’s going to generate an XML file per image (Pascal VOC format) that contains all annotations and bounding boxes.

<annotation>
<folder>images</folder>
<filename>kangaroo-0.jpg</filename>
<path>/home/hugo/Documents/projects/tfjs/dataset/images/kangaroo-0.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>3872</width>
<height>2592</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>kangaroo</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>60</xmin>
<ymin>367</ymin>
<xmax>2872</xmax>
<ymax>2399</ymax>
</bndbox>
</object>
</annotation>

XML Annotation example

To facilitate the conversion to TF.record format (below), I then converted the XML of the program above into two CSV files containing the data already split in train and test (80%-20%). These files have 9 columns:

  • filename: Image name
  • width: Image width
  • height: Image height
  • class: Image class (kangaroo)
  • xmin: Minimum bounding box x coordinate value
  • ymin: Minimum bounding box y coordinate value
  • xmax: Maximum value of the x coordinate of the bounding box
  • ymax: Maximum value of the y coordinate of the bounding box
  • source: Image source

Using LabelImg makes it easy to create your own dataset, but feel free to use my kangaroo dataset, I’ve uploaded it on Kaggle:

Kangaroo Dataset

Training the model

With a good dataset, it’s time to think about the model.TensorFlow 2 provides an Object Detection API that makes it easy to construct, train, and deploy object detection models. In this project, we’re going to use this API and train the model using a Google Colaboratory Notebook. The remainder of this section explains how to set up the environment, the model selection, and training. If you want to jump straight to the Colab Notebook, click here.

Setting up the environment

Create a new Google Colab notebook and select a GPU as hardware accelerator:

 Runtime > Change runtime type > Hardware accelerator: GPU 

Clone, install, and test the TensorFlow Object Detection API:

Getting and processing the data

As mentioned before, the model is going to be trained using the Kangaroo dataset on Kaggle. If you want to use it as well, it’s necessary to create a user, go into the account section of Kaggle, and get an API Token:

Getting an API Token

Then, you’re ready to download the data:

Now, it’s necessary to create a labelmap file to define the classes that are going to be used. Kangaroo is the only one, so right-click in the File section on Google Colab and create a New file named labelmap.pbtxt as follows:

 item {
name: "kangaroo"
id: 1
}

The last step is to convert the data into a sequence of binary records so that they can be fed into Tensorflow’s object detection API. To do so, transform the data into the TFRecord format using the generate_tf_records.py script available in the Kangaroo Dataset:

Choosing the model

We’re ready to choose the model that’s going to be the Kangaroo Detector. TensorFlow 2 provides 40 pre-trained detection models on the COCO 2017 Dataset. This collection is the TensorFlow 2 Detection Model Zoo and can be accessed here.

Every model has a Speed, Mean Average Precision(mAP) and Output. Generally, a higher mAP implies a lower speed, but as this project is based on a one-class object detection problem, the faster model (SSD MobileNet v2 320×320) should be enough.

Besides the Model Zoo, TensorFlow provides a Models Configs Repository as well. There, it’s possible to get the configuration file that has to be modified before the training. Let’s download the files:

Configure training

As mentioned before, the downloaded weights were pre-trained on the COCO 2017 Dataset, but the focus here is to train the model to recognize one class so these weights are going to be used only to initialize the network — this technique is known as transfer learning, and it’s commonly used to speed up the learning process.

From now, what has to be done is to set up the mobilenet_v2.config file, and start the training. I highly recommend reading the MobileNetV2 paper (Sandler, Mark, et al. – 2018) to get the gist of the architecture.

Choosing the best hyperparameters is a task that requires some experimentation. As the resources are limited in the Google Colab, I am going to use the same batch size as the paper, set a number of steps to get a reasonably low loss, and leave all the other values as default. If you want to try something more sophisticated to find the hyperparameters, I recommend Keras Tuner – an easy-to-use framework that applies Bayesian Optimization, Hyperband, and Random Search algorithms.

With the parameters set, start the training:

To identify how well the training is going, we use the loss value. Loss is a number indicating how bad the model’s prediction was on the training samples. If the model’s prediction is perfect, the loss is zero; otherwise, the loss is greater. The goal of training a model is to find a set of weights and biases that have low loss, on average, across all examples (Descending into ML: Training and Loss | Machine Learning Crash Course).

From the logs, it’s possible to see a downward trend in the values so we say that “The model is converging”. In the next section, we’re going to plot these values for all training steps and the trend will be even clearer.

The model took around 4h to train (with Colab GPU), but by setting different parameters, you can make the process faster or slower. Everything depends on the number of classes you are using and your Precision/Recall target. A highly accurate network that recognizes multiple classes will take more steps and require more detailed parameters tuning.

Validate the model

Now let’s evaluate the trained model using the test data:

The evaluation was done in 89 images and provides three metrics based on the COCO detection evaluation metrics: Precision, Recall and Loss.

The Recall measures how good the model is at hitting the positive class, That is, from the positive samples, how many did the algorithm get right?

Recall

Precision defines how much you can rely on the positive class prediction: From the samples that the model said were positive, how many actually are?

Precision

Setting a practical example: Imagine we have an image containing 10 kangaroos, our model returned 5 detections, being 3 real kangaroos (TP = 3, FN =7) and 2 wrong detections (FP = 2). In that case, we have a 30% recall (the model detected 3 out of 10 kangaroos in the image) and a 60% precision (from the 5 detections, 3 were correct).

The precision and recall were divided by Intersection over Union (IoU) thresholds. The IoU is defined as the area of the intersection divided by the area of the union of a predicted bounding box (B) to a ground-truth box (B)(Zeng, N. – 2018):

Intersection over Union

For simplicity, it’s possible to consider that the IoU thresholds are used to determine whether a detection is a true positive(TP), a false positive(FP) or a false negative (FN). See an example below:

IoU threshold examples

With these concepts in mind, we can analyze some of the metrics we got from the evaluation. From the TensorFlow 2 Detection Model Zoo, the SSD MobileNet v2 320×320 has an mAP of 0.202. Our model presented the following average precisions (AP) for different IoUs:

 
AP@[IoU=0.50:0.95 | area=all | maxDets=100] = 0.222
AP@[IoU=0.50 | area=all | maxDets=100] = 0.405
AP@[IoU=0.75 | area=all | maxDets=100] = 0.221

That’s pretty good! And we can compare the obtained APs with the SSD MobileNet v2 320×320 mAP as from the COCO Dataset documentation:

We make no distinction between AP and mAP (and likewise AR and mAR) and assume the difference is clear from context.

The Average Recall(AR) was split by the max number of detection per image (1, 10, 100). When we have just one kangaroo per image, the recall is around 30% while when we have up to 100 kangaroos it is around 51%. These values are not that good but are reasonable for the kind of problem we’re trying to solve.

 
(AR)@[ IoU=0.50:0.95 | area=all | maxDets= 1] = 0.293
(AR)@[ IoU=0.50:0.95 | area=all | maxDets= 10] = 0.414
(AR)@[ IoU=0.50:0.95 | area=all | maxDets=100] = 0.514

The Loss analysis is very straightforward, we’ve got 4 values:

 
INFO:tensorflow: + Loss/localization_loss: 0.345804
INFO:tensorflow: + Loss/classification_loss: 1.496982
INFO:tensorflow: + Loss/regularization_loss: 0.130125
INFO:tensorflow: + Loss/total_loss: 1.972911

The localization loss computes the difference between the predicted bounding boxes and the labeled ones. The classification loss indicates whether the bounding box class matches with the predicted class. The regularization loss is generated by the network’s regularization function and helps to drive the optimization algorithm in the right direction. The last term is the total loss and is the sum of three previous ones.

Tensorflow provides a tool to visualize all these metrics in an easy way. It’s called TensorBoard and can be initialized by the following command:

 
%load_ext tensorboard
%tensorboard --logdir '/content/training/'

This is going to be shown, and you can explore all training and evaluation metrics.

Tensorboard — Loss

In the tab IMAGES, it’s possible to find some comparisons between the predictions and the ground truth side by side. A very interesting resource to explore during the validation process as well.

Tensorboard — Testing images

Exporting the model

Now that the training is validated, it’s time to export the model. We’re going to convert the training checkpoints to a protobuf (pb) file. This file is going to have the graph definition and the weights of the model.

As we’re going to deploy the model using TensorFlow.js and Google Colab has a maximum lifetime limit of 12 hours, let’s download the trained weights and save them locally. When running the command files.download(‘/content/saved_model.zip”), the colab will prompt the file automatically.

If you want to check if the model was saved properly, load, and test it. I’ve created some functions to make this process easier so feel free to clone the inferenceutils.py file from my GitHub to test some images.

Everything is working well, so we’re ready to put the model in production.

Deploying the model

The model is going to be deployed in a way that anyone can open a PC or mobile camera and perform inferences in real-time through a web browser. To do that, we’re going to convert the saved model to the Tensorflow.js layers format, load the model in a javascript application and make everything available on Glitch.

Converting the model

At this point, you should have something similar to this structure saved locally:

 
├── inference-graph
│ ├── saved_model
│ │ ├── assets
│ │ ├── saved_model.pb
│ │ ├── variables
│ │ ├── variables.data-00000-of-00001
│ │ └── variables.index

Before we start, let’s create an isolated Python environment to work in an empty workspace and avoid any library conflict. Install virtualenv and then open a terminal in the inference-graph folder and create and activate a new virtual environment:

 
virtualenv -p python3 venv
source venv/bin/activate

Install the TensorFlow.js converter:

  pip install tensorflowjs[wizard]

Start the conversion wizard:

  tensorflowjs_wizard

Now, the tool will guide you through the conversion, providing explanations for each choice you need to make. The image below shows all the choices that were made to convert the model. Most of them are the standard ones, but options like the shard sizes and compression can be changed according to your needs.

To enable the browser to cache the weights automatically, it’s recommended to split them into shard files of around 4MB. To guarantee that the conversion is going to work, don’t skip the op validation as well, not all TensorFlow operations are supported so some models can be incompatible with TensorFlow.js — See this list for which ops are currently supported.

Model conversion using Tensorflow.js Converter (Full resolution image here

If everything worked well, you’re going to have the model converted to the Tensorflow.js layers format in the web_modeldirectory. The folder contains a model.json file and a set of sharded weights files in a binary format. The model.json has both the model topology (aka “architecture” or “graph”: a description of the layers and how they are connected) and a manifest of the weight files (Lin, Tsung-Yi, et al).

  
└ web_model
├── group1-shard1of5.bin
├── group1-shard2of5.bin
├── group1-shard3of5.bin
├── group1-shard4of5.bin
├── group1-shard5of5.bin
└── model.json

Configuring the application

The model is ready to be loaded in javascript. I’ve created an application to perform inferences directly from the browser. Let’s clone the repository to figure out how to use the converted model in real-time. This is the project structure:

 
├── models
│ └── kangaroo-detector
│ ├── group1-shard1of5.bin
│ ├── group1-shard2of5.bin
│ ├── group1-shard3of5.bin
│ ├── group1-shard4of5.bin
│ ├── group1-shard5of5.bin
│ └── model.json
├── package.json
├── package-lock.json
├── public
│ └── index.html
├── README.MD
└── src
├── index.js
└── styles.css

For the sake of simplicity, I already provide a converted kangaroo-detector model in the models folder. However, let’s put the web_model generated in the previous section in the models folder and test it.

The first thing to do is to define how the model is going to be loaded in the function load_model (lines 10–15 in the file src>index.js). There are two choices.

The first option is to create an HTTP server locally that will make the model available in a URL allowing requests and be treated as a REST API. When loading the model, TensorFlow.js will do the following requests:

 
GET /model.json
GET /group1-shard1of5.bin
GET /group1-shard2of5.bin
GET /group1-shard3of5.bin
GET /group1-shardo4f5.bin
GET /group1-shardo5f5.bin

If you choose this option, define the load_model function as follows:

  async function load_model() {
// It's possible to load the model locally or from a repo
// You can choose whatever IP and PORT you want in the "http://127.0.0.1:8080/model.json" just set it before in your https server
const model = await loadGraphModel("http://127.0.0.1:8080/model.json");
//const model = await loadGraphModel("https://raw.githubusercontent.com/hugozanini/TFJS-object-detection/master/models/web_model/model.json");
return model;
}

Then install the http-server:

  npm install http-server -g

Go to models > web_model and run the command below to make the model available at http://127.0.0.1:8080 . This a good choice when you want to keep the model weights in a safe place and control who can request inferences to it. The -c1 parameter is added to disable caching, and the –cors flag enables cross origin resource sharing allowing the hosted files to be used by the client side JavaScript for a given domain.

  http-server -c1 --cors .

Alternatively you can upload the model files somewhere, in my case, I chose my own Github repo and referenced to the model.json URL in the load_model function:

 

async function load_model() {
// It's possible to load the model locally or from a repo
//const model = await loadGraphModel("http://127.0.0.1:8080/model.json");
const model = await loadGraphModel("https://raw.githubusercontent.com/hugozanini/TFJS-object-detection/master/models/web_model/model.json");
return model;
}

This is a good option because it gives more flexibility to the application and makes it easier to run on some platform as Glitch.

Running locally

To run the app locally, install the required packages:

 npm install
 And start:
 npm start

The application is going to run at http://localhost:3000 and you should see something similar to this:

Application running locally

The model takes from 1 to 2 seconds to load and, after that, you can show kangaroos images to the camera and the application is going to draw bounding boxes around them.

Publishing in Glitch

Glitch is a simple tool for creating web apps where we can upload the code and make the application available for everyone on the web. Uploading the model files in a GitHub repo and referencing to them in the load_model function, we can simply log into Glitch, click on New project > Import from Github and select the app repository.

Wait some minutes to install the packages and your app will be available in a public URL. Click on Show > In a new window and a tab will be open. Copy this URL and past it in any web browser (PC or Mobile) and your object detection will be ready to run. See some examples in the video below:

Running the model on different devices

First, I did a test showing a kangaroo sign to verify the robustness of the application. It showed that the model is focusing specifically on the kangaroo features and did not specialize in irrelevant characteristics that were present in many images, such as pale colors or shrubs.

Then, I opened the app on my mobile and showed some images from the test set. The model runs smoothly and identifies most of the kangaroos. If you want to test my live app, it is available here (glitch takes some minutes to wake up).

Besides the accuracy, an interesting part of these experiments is the inference time — everything runs in real-time in the browser via JavaScript. Good object detection models running in the browser and using few computational resources is a must in many applications, mostly in industry. Putting the Machine Learning model on the client-side means cost reduction and safer applications as user privacy is preserved as there is no need to send the information to any server to perform the inference.

Next steps

Object detection in the browser can solve a lot of real-world problems and I hope this article will serve as a basis for new projects involving Computer Vision, Python, TensorFlow and Javascript.

As the next steps, I’d like to make more detailed training experiments. Due to the lack of resources, I could not try many different parameters and I’m sure that there is a lot of room for improvements in the model.

I’m more focused on the models’ training, but I’d like to see a better user interface for the app. If someone is interested in contributing to the project, feel free to create a pull request in the project repo. It will be nice to make a more user-friendly application.

If you have any questions or suggestions you can reach me out on Linkedin. Thanks for reading!

Read More

RxR: A Multilingual Benchmark for Navigation Instruction Following

Posted by Alexander Ku, Software Engineer and Peter Anderson, Research Scientist, Google Research

A core challenge in machine learning (ML) is to build agents that can navigate complex human environments in response to spoken or written commands. While today’s agents, including robots, can often navigate complicated environments, they cannot yet understand navigation goals expressed in natural language, such as, “Go past the brown double doors that are closed to your right and stand behind the chair at the head of the table.”

This challenge, referred to as vision-and-language navigation (VLN), demands a sophisticated understanding of spatial language. For example, the ability to identify the position “behind the chair at the head of the table requires finding the table, identifying which part of the table is considered to be the “head”, finding the chair closest to the head, identifying the area behind this chair and so on. While people can follow these instructions easily, these challenges cannot be easily solved with current ML-based methods, requiring systems that can better connect language to the physical world it describes.

To help spur progress in this area, we are excited to introduce Room-Across-Room (RxR), a new dataset for VLN. Described in “Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding”, RxR is the first multilingual dataset for VLN, containing 126,069 human-annotated navigation instructions in three typologically diverse languages — English, Hindi and Telugu. Each instruction describes a path through a photorealistic simulator populated with indoor environments from the Matterport3D dataset, which includes 3D captures of homes, offices and public buildings. To track progress on VLN, we are also announcing the RxR Challenge, a competition that encourages the machine learning community to train and evaluate their own instruction following agents on RxR instructions.

Language Instruction
en-US Starting next to the long dining room table, turn so the table is to your right. Walk towards the glass double doors. When you reach the mat before the doors, turn immediately left and walk down the stairs. When you reach the bottom of the stairs, walk through the open doors to your left and continue through the art exhibit with the tub to your right hand side. Down the length of the table until you reach the small step at the end of the room before you reach the tub and stop.
   
hi-IN अभी हमारे बायीं ओर एक बड़ा मेज़ है कुछ कुर्सियाँ हैं और कुछ दीपक मेज़ के ऊपर रखे हैं। उलटी दिशा में घूम जाएँ और सिधा चलें। अभी हमारे दायीं ओर एक गोल मेज़ है वहां से सीधा बढ़ें और सामने एक शीशे का बंद दरवाज़ा है उससे पहले बायीं ओर एक सीढ़ी है उससे निचे उतरें। निचे उतरने के बाद दायीं ओर मुड़े और एक भूरे रंग के दरवाज़े से अंदर प्रवेश करें और सीधा चलें। अभी हमारे दायीं ओर एक बड़ा मेज़ है और दो कुर्सियां राखी हैं सीधा आगे बढ़ें। हमारे सामने एक पानी का कल है और सामने तीन कुर्सियां दिवार के पास रखी हैं यहीं पर ठहर जाएँ।
   
te-IN ఉన్న చోటు నుండి వెనకకు తిరిగి, నేరుగా వెళ్తే, మీ ముందర ఒక బల్ల ఉంటుంది. దాన్ని దాటుకొని ఎడమవైపుకి తిరిగితే, మీ ముందర మెట్లు ఉంటాయి. వాటిని పూర్తిగా దిగండి. ఇప్పుడు మీ ముందర రెండు తెరిచిన ద్వారాలు ఉంటాయి. ఎడమవైపు ఉన్న ద్వారం గుండా బయటకు వెళ్ళి, నేరుగా నడవండి. ఇప్పుడు మీ కుడివైపున పొడవైన బల్ల ఉంటుంది. దాన్ని దాటుకొని ముందరే ఉన్న మెట్ల వద్దకు వెళ్ళి ఆగండి.

Examples of English, Hindi and Telugu navigation instructions from the RxR dataset. Each navigation instruction describes the same path.

Pose Traces
In addition to navigation instructions and paths, RxR also includes a new, more detailed multimodal annotation called a pose trace. Inspired by the mouse traces captured in the Localized Narratives dataset, pose traces provide dense groundings between language, vision and movement in a rich 3D setting. To generate navigation instructions, we ask guide annotators to move along a path in the simulator while narrating the path based on the surroundings. The pose trace is a record of everything the guide sees along the path, time-aligned with the words in the navigation instructions. These traces are then paired with pose traces from follower annotators, who are tasked with following the intended path by listening to the guide’s audio, thereby validating the quality of the navigation instructions. Pose traces implicitly capture notions of landmark selection and visual saliency, and represent a play-by-play account of how to solve the navigation instruction generation task (for guides) and the navigation instruction following task (for followers).

Example English navigation instruction in the RxR dataset. Words in the instruction text (right) are color-coded to align with the pose trace (left) that illustrates the movements and visual percepts of the guide annotator as they move through the environment describing the path.
The same RxR example with words in the navigation instruction aligned to 360° images along the path. The parts of the scene the guide annotator observed are highlighted; parts of the scene ignored by the annotator are faded. Red and yellow boxes highlight some of the close alignments between the textual instructions and the annotator’s visual cues. The red cross indicates the next direction the annotator moved.

Scale
In total, RxR contains almost 10 million words, making it around 10 times larger than existing datasets, such as R2R and Touchdown/Retouchdown. This is important because, in comparison to tasks based on static image and text data, language tasks that require learning through movement or interaction with an environment typically suffer from a lack of large-scale training data. RxR also addresses known biases in the construction of the paths that have arisen in other datasets, such as R2R in which all paths have similar lengths and take the shortest route to the goal. In contrast, the paths in RxR are on average longer and less predictable, making them more challenging to follow and encouraging models trained on the dataset to place greater emphasis on the role of language in the task. The size, scope and detail of RxR will expand the frontier for research on grounded language learning while reducing the dominance of high resource languages such as English.

Left: RxR is an order of magnitude larger than similar existing datasets. Right: Compared to R2R, the paths in RxR are typically longer and less predictable, making them more challenging to follow.

Baselines
To better characterize and understand the RxR dataset, we trained a variety of agents on RxR using our open source framework VALAN, and language representations from the multilingual BERT model. We found that results were improved by including follower annotations as well as guide annotations during training, and that independently trained monolingual agents outperformed a single multilingual agent.

Conceptually, evaluation of these agents is straightforward — did the agent follow the intended path? Empirically, we measure the similarity between the path taken by the VLN agent and the reference path using NDTW, a normalized measure of path fidelity that ranges between 100 (perfect correspondence) and 0 (completely wrong). The average score for the follower annotators across all three languages is 79.5, due to natural variation between similar paths. In contrast, the best model (a composite of three independently trained monolingual agents, one for each language) achieved an NDTW score on the RxR test set of 41.5. While this is much better than random (15.4), it remains far below human performance. Although advances in language modeling continue to rapidly erode the headroom for improvement in text-only language understanding benchmarks such as GLUE and SuperGLUE, benchmarks like RxR that connect language to the physical world offer substantial room for improvement.

Results for our multilingual and monolingual instruction following agents on the RxR test-standard split. While performance is much better than a random walk, there remains considerable headroom to reach human performance on this task.

Competition
To encourage further research in this area, we are launching the RxR Challenge, an ongoing competition for the machine learning community to develop computational agents that can follow natural language navigation instructions. To take part, participants upload the navigation paths taken by their agent in response to the provided RxR test instructions. In the most difficult setting (reported here and in the paper), all the test environments are previously unseen. However, we also allow for settings in which the agent is either trained in or explores the test environments in advance. For more details and the latest results please visit the challenge website.

PanGEA
We are also releasing the custom web-based annotation tool that we developed to collect the RxR dataset. The Panoramic Graph Environment Annotation toolkit (PanGEA), is a lightweight and customizable codebase for collecting speech and text annotations in panoramic graph environments, such as Matterport3D and StreetLearn. It includes speech recording and virtual pose tracking, as well as tooling to align the resulting pose trace with a manual transcript. For more details please visit the PanGEA github page.

Acknowledgements
The authors would like to thank Roma Patel, Eugene Ie and Jason Baldridge for their contributions to this research. We would also like to thank all the annotators, Sneha Kudugunta for analyzing the Telugu annotations, and Igor Karpov, Ashwin Kakarla and Christina Liu for their tooling and annotation support for this project, Austin Waters and Su Wang for help with image features, and Daphne Luong for executive support for the data collection.

Read More

Take Note: Otter.ai CEO Sam Liang on Bringing Live Captions to a Meeting Near You

Sam Liang is making things easier for the creators of the NVIDIA AI Podcast — and just about every remote worker.

He’s the CEO and co-founder of Otter.ai, which uses AI to produce speech-to-text transcriptions in real time or from recording uploads. The platform has a range of capabilities, from differentiating between multiple people, to understanding accents, to parsing through various background noises.

And now, Otter.ai is making live captioning possible on a variety of platforms, including Zoom, Skype and Microsoft Teams. Even Liang’s conversation with AI Podcast host Noah Kravitz was captioned in real time over Skype.

This new capability has been enthusiastically received by remote workers — Liang says that Otter.ai has already transcribed tens of millions of meetings.

Liang envisions even more practical effects of Otter.ai’s live captions. The platform can already identify keywords. Soon he thinks it’ll be recognizing action items, helping manage agendas and providing notifications.

Key Points From This Episode:

  • Otter.ai was founded in 2016 and is Liang’s second startup, after Alohar, a company focused on mobile behavior services. Once Alohar was acquired, Liang reflected that he needed better tools to help transcribe and share meetings, inspiring him to found Otter.ai.
  • The company’s AI model was built from scratch. Although Siri and Alexa predate it, Otter.ai needed to comprehend multiple voices that could overlap and vary in accents — a different, more complex task than understanding and responding to just one voice.

Tweetables:

“Though it’s been growing steadily before COVID, people have been using Otter on their laptop or on iOS or Android devices … you can use it anywhere.” — Sam Liang [7:32]

“Otter is your new meeting assistant. People will have the peace of mind that they don’t have to write down everything themselves.” — Sam Liang [22:07]

You Might Also Like:

Hugging Face’s Sam Shleifer Talks Natural Language Processing

Research engineer Sam Shleifer talks about Hugging Face’s natural language processing technology, which is in use at over 1,000 companies, including Apple, Bing and Grammarly, across fields ranging from finance to medical technology.

Pod Squad: Descript Uses AI to Make Managing Podcasts Quicker, Easier

Serial entrepreneur Andrew Mason talks about his company, Descript Podcast Studio, which is using AI, NLP and automatic speech synthesis to make podcast editing easier and more collaborative.

How SoundHound Uses AI to Bring Voice and Music Recognition to Any Platform

SoundHound made its name as a music identification service. Since then, it’s leveraged its 10+ years in data analytics to create a voice recognition tool that companies can bake into any product. SoundHound VP of Product Marketing Mike Zagorsek speaks about how the company has grown into a significant player in voice-driven AI.

Tune in to the AI Podcast

Get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn. If your favorite isn’t listed here, drop us a note.

Tune in to the Apple Podcast Tune in to the Google Podcast Tune in to the Spotify Podcast

Make the AI Podcast Better

Have a few minutes to spare? Fill out this listener survey. Your answers will help us make a better podcast.

The post Take Note: Otter.ai CEO Sam Liang on Bringing Live Captions to a Meeting Near You appeared first on The Official NVIDIA Blog.

Read More

On the Road Again: GeForce NOW Alliance Expanding to Turkey, Saudi Arabia and Australia

Bringing more games to more gamers, our GeForce NOW game-streaming service is coming soon to Turkey, Saudi Arabia and Australia.

Turkcell, Zain KSA and Pentanet are the latest telcos to join the GeForce NOW Alliance.

By placing NVIDIA RTX Servers on the edge, GeForce NOW Alliance partners deliver even lower latency gaming experiences. And this gives partners an opportunity to show the value of their broadband and 5G infrastructure to customers.

Gamers get more games, in more places, faster.

Ericsson recently published their 2020 mobility report, noting, “5G population coverage

is estimated to reach 15 percent, equivalent to over 1 billion people.”

Meanwhile, DFC Intelligence reports, “there are now over 3 billion video game consumers around the world.”

Cloud gaming is at the intersection of these two trends.

GeForce NOW is the world’s leading cloud-gaming platform for PC gamers, offering hundreds of games from NVIDIA’s own data centers in North America and Europe.

The service brings real-time ray tracing to today’s biggest blockbusters to underpowered PCs, Macs, Chromebooks, Android and iOS devices.

It also offers an opportunity for the world’s leading telecommunications firms to deliver high-quality, low-latency PC gaming to nearly any device from the cloud. These partners form the GeForce NOW Alliance, a partnership of operators using RTX Servers and NVIDIA cloud-gaming software to expand and improve cloud gaming globally.

It’s our commitment to deliver the best cloud-gaming experience around the world.

GeForce NOW Alliance Grows, Again

The newest GeForce NOW Alliance members join a growing group of telecommunications experts that includes Softbank, KDDI, LG Uplus, Taiwan Mobile and GFN.RU.

Bringing cloud gaming to Turkey, in collaboration with their new gaming platform, Turkcell recently announced GeForce NOW Powered by GAMEPLUS. Gamers who want to get a leg up can visit the GAME+ page to pre-register.

Zain KSA, the leading 5G telecom operator in Saudi Arabia, is nearing a formal launch of its GeForce NOW service. The launch will bring GeForce NOW into the country for the first time.

Founded by a group of avid gamers, Pentanet has built Perth, Australia’s largest and fastest-growing fixed wireless network, delivering high-bandwidth internet services to the Perth metro area as well as fiber throughout Western Australia.

Australian gamers can look forward to PC gaming on nearly any device later this year.

Stay tuned as we work with additional partners to bring cloud gaming to new regions throughout 2021 and beyond.

The post On the Road Again: GeForce NOW Alliance Expanding to Turkey, Saudi Arabia and Australia appeared first on The Official NVIDIA Blog.

Read More

Redacting PII from application log output with Amazon Comprehend

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning (ML) to find insights and relationships in text. The service can extract people, places, sentiments, and topics in unstructured data. You can now use Amazon Comprehend ML capabilities to detect and redact personally identifiable information (PII) in application logs, customer emails, support tickets, and more. No ML experience required. Redacting PII entities helps you protect privacy and comply with local laws and regulations.

Use case: Applications printing PII data in log output

Some applications print PII data in their log output inadvertently. In some cases, this may be due to developers forgetting to remove debug statements before deploying the application in production, and in other cases it may be due to legacy applications that are handed down and are difficult to update. PII can also get printed in stack traces. It’s generally a mistake to have PII present in such logs. Correlation IDs and primary keys are better identifiers than PII when debugging applications.

PII in application logs can quickly propagate to downstream systems, compounding security concerns. For example it may get submitted to search and analytics systems where it’s searchable and viewable by everyone. It may also be stored in object storage such as Amazon Simple Storage Service (Amazon S3) for analytics purposes. With the PII detection API of Amazon Comprehend, you can remove PII from application log output before such a log statement even gets printed.

In this post, I take the use case of a Java application that is generating log output with PII. The initial log output goes through filter-like processing that redacts PII before the log statement is output by the application. You can take a similar approach for other programming languages.

The application can be repackaged by changing its log format file, such as log4j.xml, and adding one Java class from this sample project, or adding this Java class as a dependency in the form of a .jar file.

The sample application is available in the following GitHub repo.

PII entity types

The following table lists some of the entity types Amazon Comprehend detects.

PII Entity Types Description
EMAIL An email address, such as marymajor@email.com.
NAME An individual’s name. This entity type does not include titles, such as Mr., Mrs., Miss, or Dr. Amazon Comprehend does not apply this entity type to names that are part of organizations or addresses. For example, Amazon Comprehend recognizes the “John Doe Organization” as an organization, and it recognizes “Jane Doe Street” as an address.
PHONE A phone number. This entity type also includes fax and pager numbers.
SSN A Social Security Number (SSN) is a 9-digit number that is issued to US citizens, permanent residents, and temporary working residents. Amazon Comprehend also recognizes Social Security Numbers when only the last 4 digits are present.

For the full list, see Detect Personally Identifiable Information (PII).

The API response from Amazon Comprehend includes the entity type, its begin offset, end offset, and a confidence score. For this post, we use all of them.

Application overview

Our example application is a very simple application that simulates opening a bank account for a user. In its current form, the log output looks like the following code. We can see this by making requests to the endpoint payment:

curl localhost:8080/payment
2020-09-29T10:29:04,115 INFO [http-nio-8080-exec-1] c.e.l.c.PaymentController: Processing user User(name=Terina, ssn=626031641, email=mel.swift@Taylor.com, description=Ea minima omnis autem illo.)
2020-09-29T10:29:04,711 INFO [http-nio-8080-exec-2] c.e.l.c.PaymentController: User Napoleon, SSN 366435036, opened an account
2020-09-29T10:29:05,253 INFO [http-nio-8080-exec-4] c.e.l.c.PaymentController: User Cristen, SSN 197961488, opened an account
2020-09-29T10:29:05,673 INFO [http-nio-8080-exec-5] c.e.l.c.PaymentController: Processing user User(name=Giuseppe, ssn=713425581, email=elijah.dach@Shawnna.com, description=Impedit asperiores in magnam exercitationem.)

The output prints Name, SSN, and Email. This PII data is being generated by the java-faker library, which is a Java port of the well-known Ruby gem. See the following code:

        <dependency>
            <groupId>com.github.javafaker</groupId>
            <artifactId>javafaker</artifactId>
            <version>1.0.2</version>
        </dependency>

Log4j 2

Log4j 2 is a common Java library used for logging. Appenders in Log4j are responsible for delivering log events to their destinations, which can be console, file, and more. Log4j also has a RewriteAppender that lets you rewrite the log message before it is output. RewriteAppender works in conjunction with a RewritePolicy that provides the implementation for changing the log output.

The sample application uses the following log4j.xml file for log configuration:

<?xml version="1.0" encoding="UTF-8"?>
<!-- status="trace" -->
<Configuration packages="com.example.logging">
    <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
            <PatternLayout
                    pattern="%style{%d{ISO8601}}{black} %highlight{%-5level }[%style{%t}{bright,blue}] %style{%C{1.}}{bright,yellow}: %msg%n%throwable" />
        </Console>
        <Rewrite name="Rewrite">
            <SensitiveDataPolicy
                    maskMode="MASK"
                    mask="*"
                    minScore="0.9"
                    entitiesToReplace="SSN,EMAIL"
            />
            <AppenderRef ref="Console" />
        </Rewrite>
    </Appenders>

    <Loggers>
        <!-- LOG everything at INFO level -->
        <Root level="info">
            <AppenderRef ref="Console" />
        </Root>
        <!-- LOG "com.example*" at DEBUG level -->
        <Logger name="com.example" level="debug" additivity="false">
            <AppenderRef ref="Rewrite" />
        </Logger>
    </Loggers>

</Configuration>

SensitiveDataPolicy

The Log4j RewritePolicy we created for this project is named SensitiveDataPolicy. It uses four parameters:

  • maskMode – This parameter has two modes:
    • REPLACE – The policy replaces discovered entities with their type names. For example, in case of social security numbers, the replaced string is [SSN].
    • MASK – The policy replaces the discovered entity with a string consisting of the character provided as a mask parameter.
  • mask – The character to use to replace the discovered entity with. Only relevant if maskMode is MASK.
  • minScore – The minimum confidence score acceptable to us.
  • entitiesToReplace – A comma-separated list of entity type names that we want to replace. For example, we’re choosing to replace social security number and email, so the string value we provide is SSN,EMAIL. Amazon Comprehend also detects NAME in our application, but it’s printed as is.

Choosing redaction vs. masking is a matter of preference. Redaction is usually preferred when the context needs to be preserved, such as in natural text, whereas masking is best for maintaining text length as well as structured data such as formatted files or key-value pairs.

Detecting PII is as simple as making an API call to Amazon Comprehend using the AWS SDK and providing the text to analyze:

        DetectPiiEntitiesRequest piiEntitiesRequest =
                DetectPiiEntitiesRequest.builder()
                        .languageCode("en")
                        .text(msg.getFormattedMessage())
                        .build();

        DetectPiiEntitiesResponse piiEntitiesResponse = comprehendClient.detectPiiEntities(piiEntitiesRequest);

Asynchronous logging

Because our policy makes synchronous calls to Amazon Comprehend for PII detection, we want this processing to happen asynchronously, outside of customer request loop, to avoid introducing latency. For instructions, see Asynchronous Loggers for Low-Latency Logging. We add the Disruptor library to our classpath by adding it to pom.xml:

        <dependency>
            <groupId>com.lmax</groupId>
            <artifactId>disruptor</artifactId>
            <version>3.4.2</version>
        </dependency>

We also need to set a system property. After we package our application with mvn package, we can run it as in the following code:

java -jar target/comprehend-logging.jar -Dlog4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector

Updated log output

The log output from this application now looks like the following. We can see that SSN and Email are being suppressed.

2020-09-29T12:52:30,423 INFO  [http-nio-8080-exec-6] ?: User Willa, SSN *********, opened an account
2020-09-29T12:52:30,824 INFO  [http-nio-8080-exec-8] ?: User Vania, SSN *********, opened an account
2020-09-29T12:52:31,245 INFO  [http-nio-8080-exec-9] ?: Processing user User(name=Laronda, ssn=*********, email=******************************, description=Doloremque culpa iure dolore omnis.)
2020-09-29T12:52:31,637 INFO  [http-nio-8080-exec-1] ?: Processing user User(name=Tommye, ssn=*********, email=*************************, description=Corporis sed tempore.)

Conclusion

We learned how to use Amazon Comprehend to redact sensitive data natively within next-generation applications. For information about applying it as a postprocessing technique for logs in storage, see Detecting and redacting PII using Amazon Comprehend. The API lets you have complete control over the entities that are important for your use case and lets you either mask or redact the information.

For more information about Amazon Comprehend availability and quotas, see Amazon Comprehend endpoints and quotas.


About the Author

Pradeep Singh is a Solutions Architect at Amazon Web Services. He helps AWS customers take advantage of AWS services to design scalable and secure applications. His expertise spans Application Architecture, Containers, Analytics and Machine Learning.

 

Read More

Building, automating, managing, and scaling ML workflows using Amazon SageMaker Pipelines

We recently announced Amazon SageMaker Pipelines, the first purpose-built, easy-to-use continuous integration and continuous delivery (CI/CD) service for machine learning (ML). SageMaker Pipelines is a native workflow orchestration tool for building ML pipelines that take advantage of direct Amazon SageMaker integration. Three components improve the operational resilience and reproducibility of your ML workflows: pipelines, model registry, and projects. These workflow automation components enable you to easily scale your ability to build, train, test, and deploy hundreds of models in production, iterate faster, reduce errors due to manual orchestration, and build repeatable mechanisms.

SageMaker projects introduce MLOps templates that automatically provision the underlying resources needed to enable CI/CD capabilities for your ML development lifecycle. You can use a number of built-in templates or create your own custom template. You can use SageMaker Pipelines independently to create automated workflows; however, when used in combination with SageMaker projects, the additional CI/CD capabilities are provided automatically. The following screenshot shows how the three components of SageMaker Pipelines can work together in an example SageMaker project.

The following screenshot shows how the three components of SageMaker Pipelines can work together in an example SageMaker project.

This post focuses on using an MLOps template to bootstrap your ML project and establish a CI/CD pattern from sample code. We show how to use the built-in build, train, and deploy project template as a base for a customer churn classification example. This base template enables CI/CD for training ML models, registering model artifacts to the model registry, and automating model deployment with manual approval and automated testing.

MLOps template for building, training, and deploying models

We start by taking a detailed look at what AWS services are launched when this build, train, and deploy MLOps template is launched. Later, we discuss how to modify the skeleton for a custom use case.

To get started with SageMaker projects, you must first enable it on the Amazon SageMaker Studio console. This can be done for existing users or while creating new ones. For more information, see SageMaker Studio Permissions Required to Use Projects.

For more information, see SageMaker Studio Permissions Required to Use Projects.

In SageMaker Studio, you can now choose the Projects menu on the Components and registries menu.

In SageMaker Studio, you can now choose the Projects menu on the Components and registries menu.

On the projects page, you can launch a preconfigured SageMaker MLOps template. For this post, we choose MLOps template for model building, training, and deployment.

On the projects page, you can launch a preconfigured SageMaker MLOps template.

Launching this template starts a model building pipeline by default, and while there is no cost for using SageMaker Pipelines itself, you will be charged for the services launched. Cost varies by Region. A single run of the model build pipeline in us-east-1 is estimated to cost less than $0.50. Models approved for deployment incur the cost of the SageMaker endpoints (test and production) for the Region using an ml.m5.large instance.

After the project is created from the MLOps template, the following architecture is deployed.

After the project is created from the MLOps template, the following architecture is deployed.

Included in the architecture are the following AWS services and resources:

  • The MLOps templates that are made available through SageMaker projects are provided via an AWS Service Catalog portfolio that automatically gets imported when a user enables projects on the Studio domain.
  • Two repositories are added to AWS CodeCommit:
    • The first repository provides scaffolding code to create a multi-step model building pipeline including the following steps: data processing, model training, model evaluation, and conditional model registration based on accuracy. As you can see in the pipeline.py file, this pipeline trains a linear regression model using the XGBoost algorithm on the well-known UCI Abalone dataset. This repository also includes a build specification file, used by AWS CodePipeline and AWS CodeBuild to run the pipeline automatically.
    • The second repository contains code and configuration files for model deployment, as well as test scripts required to pass the quality gate. This repo also uses CodePipeline and CodeBuild, which run an AWS CloudFormation template to create model endpoints for staging and production.
  • Two CodePipeline pipelines:
    • The ModelBuild pipeline automatically triggers and runs the pipeline from end to end whenever a new commit is made to the ModelBuild CodeCommit repository.
    • The ModelDeploy pipeline automatically triggers whenever a new model version is added to the model registry and the status is marked as Approved. Models that are registered with Pending or Rejected statuses aren’t deployed.
  • An Amazon Simple Storage Service (Amazon S3) bucket is created for output model artifacts generated from the pipeline.
  • SageMaker Pipelines uses the following resources:
    • This workflow contains the directed acyclic graph (DAG) that trains and evaluates our model. Each step in the pipeline keeps track of the lineage and intermediate steps can be cached for quickly re-running the pipeline. Outside of templates, you can also create pipelines using the SDK.
    • Within SageMaker Pipelines, the SageMaker model registry tracks the model versions and respective artifacts, including the lineage and metadata for how they were created. Different model versions are grouped together under a model group, and new models registered to the registry are automatically versioned. The model registry also provides an approval workflow for model versions and supports deployment of models in different accounts. You can also use the model registry through the boto3 package.
  • Two SageMaker endpoints:
    • After a model is approved in the registry, the artifact is automatically deployed to a staging endpoint followed by a manual approval step.
    • If approved, it’s deployed to a production endpoint in the same AWS account.

All SageMaker resources, such as training jobs, pipelines, models, and endpoints, as well as AWS resources listed in this post, are automatically tagged with the project name and a unique project ID tag.

Modifying the sample code for a custom use case

After your project has been created, the architecture described earlier is deployed and the visualization of the pipeline is available on the Pipelines drop-down menu within SageMaker Studio.

To modify the sample code from this launched template, we first need to clone the CodeCommit repositories to our local SageMaker Studio instance. From the list of projects, choose the one that was just created. On the Repositories tab, you can select the hyperlinks to locally clone the CodeCommit repos.

On the Repositories tab, you can select the hyperlinks to locally clone the CodeCommit repos.

ModelBuild repo

The ModelBuild repository contains the code for preprocessing, training, and evaluating the model. The sample code trains and evaluates a model on the UCI Abalone dataset. We can modify these files to solve our own customer churn use case. See the following code:

|-- codebuild-buildspec.yml
|-- CONTRIBUTING.md
|-- pipelines
| |-- abalone
| | |-- evaluate.py
| | |-- __init__.py
| | |-- pipeline.py
| | |-- preprocess.py
| |-- get_pipeline_definition.py
| |-- __init__.py
| |-- run_pipeline.py
| |-- _utils.py
| |-- __version__.py
|-- README.md
|-- sagemaker-pipelines-project.ipynb
|-- setup.cfg
|-- setup.py
|-- tests
| -- test_pipelines.py
|-- tox.ini

We now need a dataset accessible to the project.

  1. Open a new SageMaker notebook inside Studio and run the following cells:
    !wget http://dataminingconsultant.com/DKD2e_data_sets.zip
    !unzip -o DKD2e_data_sets.zip
    !mv "Data sets" Datasets
    
    import os
    import boto3
    import sagemaker
    prefix = 'sagemaker/DEMO-xgboost-churn'
    region = boto3.Session().region_name
    default_bucket = sagemaker.session.Session().default_bucket()
    role = sagemaker.get_execution_role()
    
    RawData = boto3.Session().resource('s3')
    .Bucket(default_bucket).Object(os.path.join(prefix, 'data/RawData.csv'))
    .upload_file('./Datasets/churn.txt')
    
    print(os.path.join("s3://",default_bucket, prefix, 'data/RawData.csv'))

  1. Rename the abalone directory to customer_churn. This requires us to modify the path inside codebuild-buildspec.yml as shown in the sample repository. See the following code:
    run-pipeline --module-name pipelines.customer-churn.pipeline 

  1. Replace the preprocess.py code with the customer churn preprocessing script found in the sample repository.
  2. Replace the pipeline.py code with the customer churn pipeline script found in the sample repository.
    1. Be sure to replace the “InputDataUrl” default parameter with the Amazon S3 URL obtained in step 1:
      input_data = ParameterString(
          name="InputDataUrl",
         default_value=f"s3://YOUR_BUCKET/RawData.csv",
      )

    2. Update the conditional step to evaluate the classification model:
      # Conditional step for evaluating model quality and branching execution
      cond_lte = ConditionGreaterThanOrEqualTo(
          left=JsonGet(step=step_eval, property_file=evaluation_report, json_path="binary_classification_metrics.accuracy.value"), right=0.8
      )

    One last thing to note is the default ModelApprovalStatus is set to PendingManualApproval. If our model has greater than 80% accuracy, it’s added to the model registry, but not deployed until manual approval is complete.

  1. Replace the evaluate.py code with the customer churn evaluation script found in the sample repository. One piece of the code we’d like to point out is that, because we’re evaluating a classification model, we need to update the metrics we’re evaluating and associating with trained models:
    report_dict = {
      "binary_classification_metrics": {
          "accuracy": {
              "value": acc,
               "standard_deviation" : "NaN"
           },
           "auc" : {
              "value" : roc_auc,
              "standard_deviation": "NaN"
           },
       },
    }
    
    evaluation_output_path = '/opt/ml/processing/evaluation/evaluation.json'
    with open(evaluation_output_path, 'w') as f:
        f.write(json.dumps(report_dict))

The JSON structure of these metrics are required to match the format of sagemaker.model_metrics for complete integration with the model registry. 

ModelDeploy repo

The ModelDeploy repository contains the AWS CloudFormation buildspec for the deployment pipeline. We don’t make any modifications to this code because it’s sufficient for our customer churn use case. It’s worth noting that model tests can be added to this repo to gate model deployment. See the following code:

├── build.py
├── buildspec.yml
├── endpoint-config-template.yml
├── prod-config.json
├── README.md
├── staging-config.json
└── test
    ├── buildspec.yml
    └── test.py

Triggering a pipeline run

Committing these changes to the CodeCommit repository (easily done on the Studio source control tab) triggers a new pipeline run, because an Amazon EventBridge event monitors for commits. After a few moments, we can monitor the run by choosing the pipeline inside the SageMaker project.

After a few moments, we can monitor the run by choosing the pipeline inside the SageMaker project. The following screenshot shows our pipeline details.The following screenshot shows our pipeline details. Choosing the pipeline run displays the steps of the pipeline, which you can monitor.

Choosing the pipeline run displays the steps of the pipeline, which you can monitor.

When the pipeline is complete, you can go to the Model groups tab inside the SageMaker project and inspect the metadata attached to the model artifacts.

When the pipeline is complete, you can go to the Model groups tab inside the SageMaker project and inspect the metadata attached to the model artifacts.

If everything looks good, we can manually approve the model.

This approval triggers the ModelDeploy pipeline and exposes an endpoint for real-time inference.

This approval triggers the ModelDeploy pipeline and exposes an endpoint for real-time inference.

This approval triggers the ModelDeploy pipeline and exposes an endpoint for real-time inference. 

Conclusion

SageMaker Pipelines enables teams to leverage best practice CI/CD methods within their ML workflows. In this post, we showed how a data scientist can modify a preconfigured MLOps template for their own modeling use case. Among the many benefits is that the changes to the source code can be tracked, associated metadata can be tied to trained models for deployment approval, and repeated pipeline steps can be cached for reuse. To learn more about SageMaker Pipelines, check out the website and the documentation. Try SageMaker Pipelines in your own workflows today.


About the Authors

Sean MorganSean Morgan is an AI/ML Solutions Architect at AWS. He previously worked in the semiconductor industry, using computer vision to improve product yield. He later transitioned to a DoD research lab where he specialized in adversarial ML defense and network security. In his free time, Sean is an active open-source contributor and maintainer, and is the special interest group lead for TensorFlow Addons.

 

Hallie CrosbyHallie Weishahn is an AI/ML Specialist Solutions Architect at AWS, focused on leading global standards for MLOps. She previously worked as an ML Specialist at Google Cloud Platform. She works with product, engineering, and key customers to build repeatable architectures and drive product roadmaps. She provides guidance and hands-on work to advance and scale machine learning use cases and technologies. Troubleshooting top issues and evaluating existing architectures to enable integrations from PoC to a full deployment is her strong suit.

 

Shelbee EigenbrodeShelbee Eigenbrode is an AI/ML Specialist Solutions Architect at AWS. Her current areas of depth include DevOps combined with ML/AI. She has been in technology for 23 years, spanning multiple roles and technologies. With over 35 patents granted across various technology domains, her passion for continuous innovation combined with a love of all things data turned her focus to the field of d ata science. Combining her backgrounds in data, DevOps, and machine learning, her current passion is helping customers to not only embrace data science but also ensure all models have a path to production by adopting MLOps practices. In her spare time, she enjoys reading and spending time with family, including her fur family (aka dogs), as well as friends.

Read More

Labeling mixed-source, industrial datasets with Amazon SageMaker Ground Truth

Prior to using any kind of supervised machine learning (ML) algorithm, data has to be labeled. Amazon SageMaker Ground Truth simplifies and accelerates this task. Ground Truth uses pre-defined templates to assign labels that classify the content of images or videos or verify existing labels. Ground Truth allows you to define workflows for labeling various kinds of data, such as text, video, or images, without writing any code. Although these templates are applicable to a wide range of use cases in which the data to be labeled is in a single format or from a single source, industrial workloads often require labeling data from different sources and in different formats. This post explores the use case of industrial welding data consisting of sensor readings and images to show how to implement customized, complex, mixed-source labeling workflows using Ground Truth.

For this post, you deploy an AWS CloudFormation template in your AWS account to provision the foundational resources to get started with implementing of this labeling workflow. This provides you with hands-on experience for the following topics:

  • Creating a private labeling workforce in Ground Truth
  • Creating a custom labeling job using the Ground Truth framework with the following components:
    • Designing a pre-labeling AWS Lambda function that pulls data from different sources and runs a format conversion where necessary
    • Implementing a customized labeling user interface in Ground Truth using crowd templates that dynamically loads the data generated by the pre-labeling Lambda function
    • Consolidating labels from multiple workers using a customized post-labeling Lambda function
  • Configuring a custom labeling job using Ground Truth with a customized interface for displaying multiple pieces of data that have to be labeled as a single item

Prior to diving deep into the implementation, I provide an introduction into the use case and show how the Ground Truth custom labeling framework eases the implementation of highly complex labeling workflows. To make full use of this post, you need an AWS account on which you can deploy CloudFormation templates. The total cost incurred on your account for following this post is under $1.

Labeling complex datasets for industrial welding quality control

Although the mechanisms discussed in this post are generally applicable to any labeling workflow with different data formats, I use data from a welding quality control use case. In this use case, the manufacturing company running the welding process wants to predict whether the welding result will be OK or if a number of anomalies have occurred during the process. To implement this using a supervised ML model, you need to obtain labeled data with which to train the ML model, such as datasets representing welding processes that need to be labeled to indicate whether the process was normal or not. We implement this labeling process (not the ML or modeling process) using Ground Truth, which allows welding experts to make assessments about the result of a welding and assign this result to a dataset consisting of images and sensor data.

The CloudFormation template creates an Amazon Simple Storage Service (Amazon S3) bucket in your AWS account that contains images (prefix images) and CSV files (prefix sensor_data). The images contain pictures taken during an industrial welding process similar to the following, where a welding beam is applied onto a metal surface (for image source, see TIG Stainless Steel 304):

 

The CSV files contain sensor data representing current, electrode position, and voltage measured by sensors on the welding machine. For the full dataset, see the GitHub repo. A raw sample of this CSV data is as follows:

0|96.19|1023|420|4.5|4.5|1|8
0.1|96.13|894|424|4.5|4.5|1|8
0.2|96.06|884|425|4.5|4.5|1|8
0.3|96.05|884|426|4.5|4.5|1|8
0.4|96.12|887|426|4.5|4.5|1|8
0.5|96.17|902|426|4.5|4.5|2|8
0.6|95.82|974|426|4.5|4.5|2|8
0.7|95.45|1304|426|4.5|4.5|3|8
0.8|95.15|1410|428|4.5|4.5|3|8
0.9|94.96|1446|428|4.5|4.5|3|8
1|94.79|1464|428|4.5|4.5|3|8
...

The first column of the data is a timestamp in milliseconds normalized to the start of the welding process. Each row consists of various sensor values associated with the timestamp. The first row is the electrode position, the second row is the current, and the third row is the voltage (the other values are irrelevant here). For instance, the row with timestamp 1, 100 milliseconds after the start of the welding process, has an electrode position of 94.79, a current of 1464, and a voltage of 428.

Because it’s difficult for humans to make assessments using the raw CSV data, I also show how to preprocess such data on the fly for labeling and turn it into more easily readable plots. This way, the welding experts can view the images and the plots to make their assessment about the welding process.

Deploying the CloudFormation template

To simplify the setup and configurations needed in the following, I created a CloudFormation template that deploys several foundations into your AWS account. To start this process, complete the following steps:

  1. Sign in to your AWS account.
  2. Choose one of the following links, depending on which AWS Region you’re using:
us-east-1
us-west-2
eu-west-1
eu-central-1
ap-northeast-1
ap-southeast-1
  1. Keep all the parameters as they are and select I acknowledge that AWS CloudFormation might create IAM resources with custom names and I acknowledge that AWS CloudFormation might require the following capability: CAPABILITY_AUTO_EXPAND.
  2. Choose Create stack to start the deployment.

The deployment takes about 3–5 minutes, during which time a bucket with data to label, some AWS Lambda functions, and an AWS Identity and Access Management (IAM) role are deployed. The process is complete when the status of the deployment switches to CREATE_COMPLETE.

The Outputs tab has additional information, such as the Amazon S3 path to the manifest file, which you use throughout this post. Therefore, it’s recommended to keep this browser tab open and follow the rest of the post in another tab.

Creating a Ground Truth labeling workforce

Ground Truth offers three options for defining workforces that complete the labeling: Amazon Mechanical Turk, vendor-specific workforces, and private workforces. In this section, we configure a private workforce because we want to complete the labeling ourselves. Create a private workforce with the following steps:

  1. On the Amazon SageMaker console, under Ground Truth, choose Labeling workforces.
  2. On the Private tab, choose Create private team.

  1. Enter a name for the labeling workforce. For our use case, I enter welding-experts.
  2. Select Invite new workers by email.
  3. Enter your e-mail address, an organization name, and a contact e-mail (which may be the same as the one you just entered).
  4. Choose Create private team.

The console confirms the creation of the labeling workforce at the top of the screen. When you refresh the page, the new workforce shows on the Private tab, under Private teams.

You also receive an e-mail with login instructions, including a temporary password and a link to open the login page.

  1. Choose the link and use your e-mail and temporary password to authenticate and change the password for the login.

It’s recommended to keep this browser tab open so you don’t have to log in again. This concludes all necessary steps to create your workforce.

Configuring a custom labeling job

In this section, we create a labeling job and use this job to explain the details and data flow of a custom labeling job.

  1. On the Amazon SageMaker console, under Ground Truth, choose Labeling jobs.
  2. Choose Create labeling job.

  1. Enter a name for your labeling job, such as WeldingLabelJob1.
  2. Choose Manual data setup.
  3. For Input dataset location, enter the ManifestS3Path value from the CloudFormation stack Outputs
  4. For Output dataset location, enter the ProposedOutputPath value from the CloudFormation stack Outputs
  5. For IAM role, choose Enter a custom IAM role ARN.
  6. Enter the SagemakerServiceRoleArn value from the CloudFormation stack Outputs
  7. For the task type, choose Custom.
  8. Choose Next.

The IAM role is a customized role created by the CloudFormation template that allows Ground Truth to invoke Lambda functions and access Amazon S3.

  1. Choose to use a private labeling workforce.
  2. From the drop-down menu, choose the workforce welding-experts.
  3. For task timeout and task expiration time, 1 hour is sufficient.
  4. The number of workers per dataset object is 1.
  5. In the Lambda functions section, for Pre-labeling task Lambda function, choose the function that starts with PreLabelingLambda-.
  6. For Post-labeling task Lambda function, choose the function that starts with PostLabelingLambda-.
  7. Enter the following code into the templates section. This HTML code specifies the interface that the workers in the private label workforce see when labeling items. For our use case, the template displays four images, and the categories to classify welding results is as follows:
    <script src="https://assets.crowd.aws/crowd-html-elements.js"></script>
    <crowd-form>
      <crowd-classifier
        name="WeldingClassification"
        categories="['Good Weld', 'Burn Through', 'Contamination', 'Lack of Fusion', 'Lack of Shielding Gas', 'High Travel Speed', 'Not sure']"
        header="Please classify the welding process."
      >
          <classification-target>
            <div>
              <h3>Welding Image</h3>
    	      	<p><strong>Welding Camera Image </strong>{{ task.input.image.title }}</p>
    	      	<p><a href="{{ task.input.image.file | grant_read_access }}" target="_blank">Download Image</a></p>
    	      	<p>
    	      		<img style="height: 30vh; margin-bottom: 10px" src="{{ task.input.image.file | grant_read_access }}"/>
    	      	</p>
    	    </div>
    	    <hr/>
            <div>
              <h3>Current Graph</h3>
    	      	<p><strong>Current Graph </strong>{{ task.input.current.title }}</p>
    	      	<p><a href="{{ task.input.current.file | grant_read_access }}" target="_blank">Download Current Plot</a></p>
    	      	<p>
    	      		<img style="height: 30vh; margin-bottom: 10px" src="{{ task.input.current.file | grant_read_access }}"/>
    	      	</p>
    	    </div>
            <hr/>
            <div>
              <h3>Electrode Position Graph</h3>
    	      	<p><strong>Electrode Position Graph </strong>{{ task.input.electrode.title }}</p>
    	      	<p><a href="{{ task.input.electrode.file | grant_read_access }}" target="_blank">Download Electrode Position Plot</a></p>
    	      	<p>
    	      		<img style="height: 30vh; margin-bottom: 10px" src="{{ task.input.electrode.file | grant_read_access }}"/>
    	      	</p>
    	    </div>
            <hr/>
            <div>
              <h3>Voltage Graph</h3>
    	      	<p><strong>Voltage Graph </strong>{{ task.input.voltage.title }}</p>
    	      	<p><a href="{{ task.input.voltage.file | grant_read_access }}" target="_blank">Download Voltage Plot</a></p>
    	      	<p>
    	      		<img style="height: 30vh; margin-bottom: 10px" src="{{ task.input.voltage.file | grant_read_access }}"/>
    	      	</p>
    	    </div>
          </classification-target>
    
    
    
        <full-instructions header="Classification Instructions">
          <p>Read the task carefully and inspect the image as well as the plots.</p>
          <p>
    		  The image is a picture taking during the welding process. The plots show the corresponding sensor data for
    		  the electrode position, the voltage and the current measured during the welding process.
    	  </p>
        </full-instructions>
    
        <short-instructions>
          <p>Read the task carefully and inspect the image as well as the plots</p>
        </short-instructions>
      </crowd-classifier>
    </crowd-form>
    

The wizard for creating the labeling job has a preview function in the section Custom labeling task setup, which you can use to check if all configurations work properly.

  1. To preview the interface, choose Preview.

This opens a new browser tab and shows a test version of the labeling interface, similar to the following screenshot.

  1. To create the labeling job, choose Create.

Ground Truth sets up the labeling job as specified, and the dashboard shows its status.

Assigning labels

To finalize the labeling job that you configured, you log in to the worker portal and assign labels to different data items consisting of images and data plots. The details on how the different components of the labeling job work together are explained in the next section.

  1. On the Amazon SageMaker console, under Ground Truth, choose Labeling workforces.
  2. On the Private tab, choose the link for Labeling portal sign-in URL.

When Ground Truth is finished preparing the labeling job, you can see it listed in the Jobs section. If it’s not showing up, wait a few minutes and refresh the tab.

  1. Choose Start working.

This launches the labeling UI, which allows you to assign labels to mixed datasets consisting of welding images and plots for current, electrode position, and voltage.

For this use case, you can assign seven different labels to a single dataset. These different classes and labels are defined in the HTML of the UI, but you can also insert them dynamically using the pre-labeling Lambda function (discussed in the next section). Because we don’t actually use the labeled data for ML purposes, you can assign the labels randomly to the five items that are displayed by Ground Truth for this labeling job.

After labeling all the items, the UI switches back to the list with available jobs. This concludes the section about configuring and launching the labeling job. In the next section, I explain the mechanics of a custom labeling job in detail and also dive deep into the different elements of the HTML interface.

Custom labeling deep dive

A custom labeling job combines the data to be labeled with three components to create a workflow that allows workers from the labeling workforce to assign labels to each item in the dataset:

  • Pre-labeling Lambda function – Generates the content to be displayed on the labeling interface using the manifest file specified during the configuration of the labeling job. For this use case, the function also converts the CSV files into human readable plots and stores these plots as images in the S3 bucket under the prefix plots.
  • Labeling interface – Uses the output of the pre-labeling function to generate a user interface. For this use case, the interface displays four images (the picture taken during the welding process and the three graphs for current, electrode position, and voltage) and a form that allows workers to classify the welding process.
  • Label consolidation Lambda function – Allows you to implement custom strategies to consolidate classifications of one or several workers into a single response. For our workforce, this is very simple because there is only a single worker whose labels are consolidated into a file, which is stored by Ground Truth into Amazon S3.

Before we analyze these three components, I provide insights into the structure of the manifest file, which describes the data sources for the labeling job.

Manifest and dataset files

The manifest file is a file conforming to the JSON lines format, in which each line represents one item to label. Ground Truth expects either a key source or source-ref in each line of the file. For this use case, I use source, and the mapped value must be a string representing an Amazon S3 path. For this post, we only label five items, and the JSON lines are similar to the following code:

{"source": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/dataset/dataset-1.json"}

For our use case with multiple input formats and files, each line in the manifest points to a dataset file that is also stored on Amazon S3. Our dataset is a JSON document, which contains references to the welding images and the CSV file with the sensor data:

{
  "sensor_data": {"s3Path": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/sensor_data/weld.1.csv"},
  "image": {"s3Path": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/images/weld.1.png"}
}

Ground Truth takes each line of the manifest file and triggers the pre-labeling Lambda function, which we discuss next.

Pre-labeling Lambda function

A pre-labeling Lambda function creates a JSON object that is used to populate the item-specific portions of the labeling interface. For more information, see Processing with AWS Lambda.

Before Ground Truth displays an item for labeling to a worker, it runs the pre-labeling function and forwards the information in the manifest’s JSON line to the function. For our use case, the event passed to the function is as follows:

{
  "version": "2018-10-06", 
  "labelingJobArn": "arn:aws:sagemaker:eu-west-1:XXX:labeling-job/weldinglabeljob1",
  "dataObject": { 
    "source": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/dataset/dataset-1.json" 
  }
}

Although I omit the implementation details here (for those interested, the code is deployed with the CloudFormation template for review), the function for our labeling job uses this input to complete the following steps:

  1. Download the file referenced in the source field of the input (see the preceding code).
  2. Download the dataset file that is referenced in the source
  3. Download a CSV file containing the sensor data. The dataset file is expected to have a reference to this CSV file.
  4. Generate plots for current, electrode position, and voltage from the contents of the CSV file.
  5. Upload the plot files to Amazon S3.
  6. Generate a JSON object containing the references to the aforementioned plot files and the welding image referenced in the dataset file.

When these steps are complete, the function returns a JSON object with two parts :

  • taskInput – Fully customizable JSON object that contains information to be displayed on the labeling UI.
  • isHumanAnnotationRequired – A string representing a Boolean value (True or False), which you can use to exclude objects from being labeled by humans. I don’t use this flag for this use case because we want to label all the provided data items.

For more information, see Processing with AWS Lambda.

Because I want to show the welding images and the three graphs for current, electrode position, and voltage, the result of the Lambda function is as follows for the first dataset:

{
  "taskInput": { 
    "image": { 
      "file": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/images/weld.1.png", 
      "title": " from image at s3://iiot-custom-label-blog-bucket-unn4d0l4j0/images/weld.1.png"
    }, 
    "voltage": { 
      "file": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/plots/weld.1.csv-current.png", 
      "title": " from file at plots/weld.1.csv-current.png"
    },
    "electrode": { 
      "file": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/plots/weld.1.csv-electrode_pos.png", 
      "title": " from file at plots/weld.1.csv-electrode_pos.png" 
    }, 
    "current": { 
      "file": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/plots/weld.1.csv-voltage.png", 
      "title": " from file at plots/weld.1.csv-voltage.png" 
    } 
  }, 
  "isHumanAnnotationRequired": "true"
}

In the preceding code, the taskInput is fully customizable; the function returns the Amazon S3 paths to the images to display, and also a title, which has some non-functional text. Next, I show how to access these different parts of the taskInput JSON object when building the customized labeling UI displayed to workers by Ground Truth.

Labeling UI: Accessing taskInput content

Ground Truth uses the output of the Lambda function to fill in content into the HTML skeleton that is provided at the creation of the labeling job. In general, the contents of the taskInput output object is accessed using task.input in the HTML code.

For instance, to retrieve the Amazon S3 path where the welding image is stored from the output, you need to access the path taskInput/image/file. Because the taskInput object from the function output is mapped to task.input in the HTML, the corresponding reference to the welding image file is task.input.image.file. This reference is directly integrated into the HTML code of the labeling UI to display the welding image:

<img style="height: 30vh; margin-bottom: 10px" src="{{ task.input.image.file | grant_read_access }}"/>

The grant_read_access filter is needed for files in S3 buckets that aren’t publicly accessible. This makes sure that the URL passed to the browser contains a short-lived access token for the image and thereby avoids having to make resources publicly accessible for labeling jobs. This is often mandatory because the data to be labeled, such as machine data, is confidential. Because the pre-labeling function has also converted the CSV files into plots and images, their integration into the UI is analogous.

Label consolidation Lambda function

The second Lambda function that was configured for the custom labeling job runs when all workers have labeled an item or the time limit of the labeling job is reached. The key task of this function is to derive a single label from the responses of the workers. Additionally, the function can be for any kind of further processing of the labeled data, such as storing them on Amazon S3 in a format ideally suited for the ML pipeline that you use.

Although there are different possible strategies to consolidate labels, I focus on the cornerstones of the implementation for such a function and show how they translate to our use case. The consolidation function is triggered by an event similar to the following JSON code:

{ 
  "version": "2018-10-06", 
  "labelingJobArn": "arn:aws:sagemaker:eu-west-1:261679111194:labeling-job/weldinglabeljob1", 
  "payload": { 
    "s3Uri": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/output/WeldingLabelJob1/annotations/consolidated-annotation/consolidation-request/iteration-1/2020-09-15_16:16:11.json" 
  }, 
  "labelAttributeName": "WeldingLabelJob1", 
  "roleArn": "arn:aws:iam::261679111194:role/AmazonSageMaker-Service-role-unn4d0l4j0", 
  "outputConfig": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/output/WeldingLabelJob1/annotations", 
  "maxHumanWorkersPerDataObject": 1 
}

The key item in this event is the payload, which contains an s3Uri pointing to a file stored on Amazon S3. This payload file contains the list of datasets that have been labeled and the labels assigned to them by workers. The following code is an example of such a list entry:

{ 
  "datasetObjectId": "4", 
  "dataObject": { 
    "s3Uri": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/dataset/dataset-5.json" 
  }, 
  "annotations": [ 
    { 
      "workerId": "private.eu-west-1.abd2ec3e354db315",
      "annotationData": { 
          "content":"{"WeldingClassification":{"label":"Not sure"}}"
      } 
    } 
  ] 
}

Along with an identifier that you could use to determine which worker labeled the item, each entry lists for each dataset which labels have been assigned. For example, in the case of multiple workers, there are multiple entries in annotations. Because I created a single worker that labeled all the items for this post, there is only a single entry. The file dataset-5.json has been labeled with Not Sure for the classifier WeldingClassification.

The label consolidation function has to iterate over all list entries and determine for each dataset a label to use as the ground truth for supervised ML training. Ground Truth expects the function to return a list containing an entry for each dataset item with the following structure:

{ 
  "datasetObjectId": "4", 
  "consolidatedAnnotation": { 
    "content": { 
      "WeldingLabelJob1": {
         "WeldingClassification": "Not sure" 
      } 
    }
  } 
}

Each entry of the returned list must contain the datasetObjectId for the corresponding entry in the payload file and a JSON object consolidatedAnnotation, which contains an object content. Ground Truth expects content to contain a key that equals the name of the labeling job, (for our use case, WeldingLabelJob1). For more information, see Processing with AWS Lambda.
You can change this behavior when you create the labeling job by selecting I want to specify a label attribute name different from the labeling job name and entering a label attribute name.

The content inside this key equaling the name of the labeling job is freely configurable and can be arbitrarily complex. For our use case, it’s enough to return the assigned label Not Sure. If any of these formatting requirements are not met, Ground Truth assumes the labeling job didn’t run properly and failed.

Because I specified output as the desired prefix during the creation of the labeling job, the requirements are met, and Ground Truth uploads the list of JSON entries into the bucket and prefix specified during the creation of the consolidated labels, and they are uploaded with the following prefix:

output/WeldingLabelJob1/annotations/consolidated-annotation/consolidation-response/iteration-1/

You can use such files for training ML algorithms in Amazon SageMaker or for further processing.

Cleaning up

To avoid incurring future charges, delete all resources created for this post.

  1. On the AWS CloudFormation console, choose Stacks.
  2. Select the stack iiot-custom-label-blog.
  3. Choose Delete.

This step removes all files and the S3 bucket from your account. The process takes about 3–5 minutes.

Conclusion

Supervised ML requires labeled data, and Ground Truth provides a platform for creating labeling workflows. This post showed how to build a complex industrial IoT labeling workflow, in which data from multiple sources needs to be considered for labeling items. The post explained how to create a custom labeling job and provided details on the mechanisms Ground Truth requires to implement such a workflow. To get started with writing your own custom labeling job, refer to the custom labeling documentation page for Ground Truth and potentially re-deploy the CloudFormation template of this post to get a sample for the pre-labeling and consolidation lambdas. Additionally, the blog post “Creating custom labeling jobs with AWS Lambda and Amazon SageMaker Ground Truth” provides additional insights into building custom labeling jobs.


About the Author

As a Principal Prototyping Engagement Manager, Dr. Markus Bestehorn is responsible for building business-critical prototypes with AWS customers, and is a specialist for IoT and machine learning. His “career” started as a 7-year-old when he got his hands on a computer with two 5.25” floppy disks, no hard disk, and no mouse, on which he started writing BASIC, and later C as well as C++ programs. He holds a PhD in computer science and all currently available AWS certifications. When he’s not on the computer, he runs or climbs mountains.

Read More