Accelerated inference on Arm microcontrollers with TensorFlow Lite for Microcontrollers and CMSIS-NN

A guest post by Fredrik Knutsson of Arm

arm logo

The MCU universe

Microcontrollers (MCUs) are the tiny computers that power our technological environment. There are over 30 billion of them manufactured every year, embedded in everything from household appliances to fitness trackers. If you’re in a house right now, there are dozens of microcontrollers all around you. If you drive a car, there are dozens riding with you on every drive. Using TensorFlow Lite for Microcontrollers (TFLM), developers can deploy TensorFlow models to many of these devices, enabling entirely new forms of on-device intelligence.

While ubiquitous, microcontrollers are designed to be inexpensive and energy efficient, which means they have small amounts of memory and limited processing power. A typical microcontroller might have a few hundred kilobytes of RAM, and a 32-bit processor running at less than 100 MHz. With advances in machine learning enabled by TFLM, it has become possible to run neural networks on these devices.

microcontroller

With minimal computational resources, it is important that microcontroller programs are optimized to run as efficiently as possible. This means making the most of the features of their microprocessor hardware, which requires carefully tuned application code.

Many of the microcontrollers used in popular products are built around Arm’s Cortex-M based processors, which are the industry leader in 32-bit microcontrollers, with more than 47 billion shipped. Arm’s open source CMSIS-NN library provides optimized implementations of common neural network functions that maximize performance on Cortex-M processors. This includes making use of DSP and M-Profile Vector Extension (MVE) instructions for hardware acceleration of operations such as matrix multiplication.

Benchmarks for key use cases

Arm’s engineers have worked closely with the TensorFlow team to develop optimized versions of the TensorFlow Lite kernels that use CMSIS-NN to deliver blazing fast performance on Arm Cortex-M cores. Developers using TensorFlow Lite can use these optimized kernels with no additional work, just by using the latest version of the library. Arm has made these optimizations in open source, and they are free and easy for developers to use today!

The following benchmarks show the performance uplift when using CMSIS-NN optimized kernels versus reference kernels for several key use cases featured in the TFLM example applications. The tests have been performed on an Arm Cortex-M4 based FPGA platform:

Table showing performance uplift when using CMSIS-NN kernels

The Arm Cortex-M4 processor supports DSP extensions, that enables the processor to execute DSP-like instructions for faster inference. To improve the inference performance even further, the new Arm Cortex-M55 processor supports MVE, also known as Helium technology.

Improving performance with CMSIS-NN

So far, the following optimized CMSIS-NN kernels have been integrated with TFLM:

 Table showing CMSIS-NN kernels integrated with TFLM

There will be regular updates to the CMSIS-NN library to expand the support of optimized kernels, where the key driver for improving support is that it should give a significant performance increase for a given use case. For discussion regarding kernel optimizations, a good starting point is to raise a ticket on the TensorFlow or CMSIS Github repository describing your use case.

Most of the optimizations are implemented specifically for 8-bit quantized (int8) operations, and this will be the focus of future improvements.

It’s easy to try the optimized kernels yourself by following the instructions that accompany the examples. For example, to build the person detection example for the SparkFun Edge with CMSIS-NN kernels, you can use the following command:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=sparkfun_edge OPTIMIZED_KERNEL_DIR=cmsis_nn person_detection_int8_bin

The latest version of the TensorFlow Lite Arduino library includes the CMSIS-NN optimizations, and includes all of the example applications, which are compatible with the Cortex-M4 based Arduino Nano 33 BLE Sense.

Next leap in neural processing

Looking ahead into 2021 we can expect a dramatic increase in neural processing from the introduction of devices including a microNPU (Neural Processing Unit) working alongside a microcontroller. These microNPUs are designed to accelerate ML inference within the constraints of embedded and IoT devices, with devices using the Arm Cortex-M55 MCU coupled with the new Ethos-U55 microNPU delivering up to a 480x performance increase compared to previous microcontrollers.

This unprecedented level of ML processing capability within smaller, power constrained devices will unlock a huge amount of innovation across a range of applications, from smart homes and cities to industrial, retail, and healthcare. The potential for innovation within each of these different areas is huge, with hundreds of sub segments and thousands of potential applications that will make a real difference to people’s lives.

Read More

Join us at TensorFlow Everywhere

Posted by Biswajeet Mallik, Program Manager

Developer communities have played a strong part in TensorFlow’s success over the years. Our 70+ user groups, 170+ ML GDEs, and 12 special interest groups all play a critical role in education, advancing the art of machine learning, and supporting ML communities around the world.

To continue this momentum, we’re excited to announce a new event series: TensorFlow Everywhere, a series of global events led by TensorFlow and machine learning communities around the world.

Events planned by our community leads will be specific to each locale (with talks in local languages). These events are for everyone from machine learning and data science beginners to advanced developers; check with your event organizers for details on what will be presented. Be sure to join as members of the TensorFlow team may be dropping in virtually to say hi and answer questions.

And after the event, you can stay in touch with the organizing communities for future events if you enjoyed and benefited from these sessions.

Find an event and join the conversation on social #TFEverywhere2021.

Read More

Leveraging TensorFlow-TensorRT integration for Low latency Inference

Posted by Jonathan Dekhtiar (NVIDIA), Bixia Zheng (Google), Shashank Verma (NVIDIA), Chetan Tekur (NVIDIA)

TensorFlow-TensorRT (TF-TRT) is an integration of TensorFlow and TensorRT that leverages inference optimization on NVIDIA GPUs within the TensorFlow ecosystem. It provides a simple API that delivers substantial performance gains on NVIDIA GPUs with minimal effort. The integration allows for leveraging of the optimizations that are possible in TensorRT while providing a fallback to native TensorFlow when it encounters segments of the model that are not supported by TensorRT.

In our previous blog on TF-TRT integration, we covered the workflow for TensorFlow 1.13 and earlier releases. This blog will introduce TensorRT integration in TensorFlow 2.x, and demonstrate a sample workflow with the latest API. Even if you are new to this integration, this blog contains all the information you need to get started. Using the TensorRT integration has shown to improve performance by 2.4X compared to native TensorFlow inference on Nvidia T4 GPUs.

TF-TRT Integration

When TF-TRT is enabled, in the first step, the trained model is parsed in order to partition the graph into TensorRT-supported subgraphs and unsupported subgraphs. Then each TensorRT-supported subgraph is wrapped in a single special TensorFlow operation (TRTEngineOp). In the second step, for each TRTEngineOp node, an optimized TensorRT engine is built. The TensorRT-unsupported subgraphs remain untouched and are handled by the TensorFlow runtime. This is illustrated in Figure 1.

TF-TRT allows for leveraging TensorFlow’s flexibility while also taking advantage of the optimizations that can be applied to the TensorRT supported subgraphs. Only portions of the graph are optimized and executed with TensorRT, and TensorFlow executes the remaining graph.

In the inference example shown in Figure 1, TensorFlow executes the Reshape Op and the Cast Op. Then TensorFlow passes the execution of the TRTEngineOp_0, the pre-built TensorRT engine, to TensorRT runtime.

An example of graph partitioning and building TRT engine in TF-TRT
Figure 1: An example of graph partitioning and building TRT engine in TF-TRT

Workflow

In this section, we will take a look at the typical TF-TRT workflow using an example.

Workflow diagram when performing inference in TensorFlow only, and in TensorFlow-TensorRT using a converted SavedModel
Figure 2: Workflow diagram when performing inference in TensorFlow only, and in TensorFlow-TensorRT using a converted SavedModel

Figure 2 shows a standard inference workflow in native TensorFlow and contrasts it with the TF-TRT workflow. The SavedModel format contains all the information required to share or deploy a trained model. In native TensorFlow, the workflow typically involves loading the saved model and running inference using TensorFlow runtime. In TF-TRT, there are a few additional steps involved, including applying TensorRT optimizations to the TensorRT supported subgraphs of the model, and optionally pre-building the TensorRT engines.

First, we create an object to hold the conversion parameters, including a precision mode. The precision mode is used to indicate the minimum precision (for example FP32, FP16 or INT8) that TF-TRT can use to implement the TensorFlow operations. Then we create a converter object which takes the conversion parameters and input from a saved model. Note that in TensorFlow 2.x, TF-TRT only supports models saved in the TensorFlow SavedModel format.

Next, when we call the converter convert() method, TF-TRT will convert the graph by replacing TensorRT compatible portions of the graph with TRTEngineOps. For better performance at runtime, the converter build() method can be used for creating the TensorRT execution engine ahead of time. The build() method requires the input data shapes to be known before the optimized TensorRT execution engines are built. If input data shapes are not known then TensorRT execution engine can be built at runtime when the input data is available. The TensorRT execution engine should be built on a GPU of the same device type as the one on which inference will be executed as the building process is GPU specific. For example, an execution engine built for a Nvidia A100 GPU will not work on a Nvidia T4 GPU.

Finally, the TF-TRT converted model can be saved to disk by calling the save method. The code corresponding to the workflow steps mentioned in this section are shown in the codeblock below:

from tensorflow.python.compiler.tensorrt import trt_convert as trt

# Conversion Parameters
conversion_params = trt.TrtConversionParams(
precision_mode=trt.TrtPrecisionMode.<FP32 or FP16>)

converter = trt.TrtGraphConverterV2(
input_saved_model_dir=input_saved_model_dir,
conversion_params=conversion_params)

# Converter method used to partition and optimize TensorRT compatible segments
converter.convert()

# Optionally, build TensorRT engines before deployment to save time at runtime
# Note that this is GPU specific, and as a rule of thumb, we recommend building at runtime
converter.build(input_fn=my_input_fn)

# Save the model to the disk
converter.save(output_saved_model_dir)

As can be seen from the code example above, the build() method requires an input function corresponding to the shape of the input data. An example of an input function is shown below:

# input_fn: a generator function that yields input data as a list or tuple,
# which will be used to execute the converted signature to generate TensorRT
# engines. Example:
def my_input_fn():
# Let's assume a network with 2 input tensors. We generate 3 sets
# of dummy input data:
input_shapes = [[(1, 16), (2, 16)], # min and max range for 1st input list
[(2, 32), (4, 32)], # min and max range for 2nd list of two tensors
[(4, 32), (8, 32)]] # 3rd input list
for shapes in input_shapes:
# return a list of input tensors
yield [np.zeros(x).astype(np.float32) for x in shapes]

Support for INT8

Compared to FP32 and FP16, INT8 requires additional calibration data to determine the best quantization thresholds. When the precision mode in the conversion parameter is INT8, we need to provide an input function to the convert() method call. This input function is similar to the input function provided to the build() method. In addition, the calibration data generated by the input function passed to the convert() method should generate data that are statistically similar to the actual data seen during inference.

from tensorflow.python.compiler.tensorrt import trt_convert as trt

conversion_params = trt.TrtConversionParams(
precision_mode=trt.TrtPrecisionMode.INT8)

converter = trt.TrtGraphConverterV2(
input_saved_model_dir=input_saved_model_dir,
conversion_params=conversion_params)

# requires some data for calibration
converter.convert(calibration_input_fn=my_input_fn)

# Optionally build TensorRT engines before deployment.
# Note that this is GPU specific, and as a rule of thumb we recommend building at runtime
converter.build(input_fn=my_input_fn)

converter.save(output_saved_model_dir)

Example: ResNet-50

The rest of this blog will show the workflow of taking a TensorFlow 2.x ResNet-50 model, training it, saving it, optimizing it with TF-TRT and finally deploying it for inference. We will also compare inference throughputs using TensorFlow native vs TF-TRT in three precision modes, FP32, FP16, and INT8.

Prerequisites for the example :


Training ResNet-50 using the TensorFlow 2.x container:

First, the latest release of the ResNet-50 model needs to be downloaded from the TensorFlow github repository:

# Adding the git remote and fetch the existing branches
$ git clone --depth 1 https://github.com/tensorflow/models.git .

# List the files and directories present in our working directory
$ ls -al

rwxrwxr-x user user 4 KiB Wed Sep 30 15:31:05 2020 ./
rwxrwxr-x user user 4 KiB Wed Sep 30 15:30:45 2020 ../
rw-rw-r-- user user 337 B Wed Sep 30 15:31:05 2020 AUTHORS
rw-rw-r-- user user 1015 B Wed Sep 30 15:31:05 2020 CODEOWNERS
rwxrwxr-x user user 4 KiB Wed Sep 30 15:31:05 2020 community/
rw-rw-r-- user user 390 B Wed Sep 30 15:31:05 2020 CONTRIBUTING.md
rwxrwxr-x user user 4 KiB Wed Sep 30 15:31:15 2020 .git/
rwxrwxr-x user user 4 KiB Wed Sep 30 15:31:05 2020 .github/
rw-rw-r-- user user 1 KiB Wed Sep 30 15:31:05 2020 .gitignore
rw-rw-r-- user user 1 KiB Wed Sep 30 15:31:05 2020 ISSUES.md
rw-rw-r-- user user 11 KiB Wed Sep 30 15:31:05 2020 LICENSE
rwxrwxr-x user user 4 KiB Wed Sep 30 15:31:05 2020 official/
rwxrwxr-x user user 4 KiB Wed Sep 30 15:31:05 2020 orbit/
rw-rw-r-- user user 3 KiB Wed Sep 30 15:31:05 2020 README.md
rwxrwxr-x user user 4 KiB Wed Sep 30 15:31:06 2020 research/

As noted in the earlier section, for this example we will be using the latest TensorFlow container available in the Docker repository. The user does not need any additional installation steps as TensorRT integration is already included in the container. The steps to pull the container and launch it are as follows:

$ docker pull tensorflow/tensorflow:latest-gpu

# Please ensure that the Nvidia Container Toolkit is installed before running the following command
$ docker run -it --rm
--gpus="all"
--shm-size=2g --ulimit memlock=-1 --ulimit stack=67108864
--workdir /workspace/
-v "$(pwd):/workspace/"
-v "</path/to/save/data/>:/data/" # This is the path that will hold the training data
tensorflow/tensorflow:latest-gpu

From inside the container, we can then verify that we have access to the relevant files and the Nvidia GPU we would like to target:

# Let's first test that we can access the ResNet-50 code that we previously downloaded
$ ls -al
drwxrwxr-x 8 1000 1000 4096 Sep 30 22:31 .git
drwxrwxr-x 3 1000 1000 4096 Sep 30 22:31 .github
-rw-rw-r-- 1 1000 1000 1104 Sep 30 22:31 .gitignore
-rw-rw-r-- 1 1000 1000 337 Sep 30 22:31 AUTHORS
-rw-rw-r-- 1 1000 1000 1015 Sep 30 22:31 CODEOWNERS
-rw-rw-r-- 1 1000 1000 390 Sep 30 22:31 CONTRIBUTING.md
-rw-rw-r-- 1 1000 1000 1115 Sep 30 22:31 ISSUES.md
-rw-rw-r-- 1 1000 1000 11405 Sep 30 22:31 LICENSE
-rw-rw-r-- 1 1000 1000 3668 Sep 30 22:31 README.md
drwxrwxr-x 2 1000 1000 4096 Sep 30 22:31 community
drwxrwxr-x 12 1000 1000 4096 Sep 30 22:31 official
drwxrwxr-x 3 1000 1000 4096 Sep 30 22:31 orbit
drwxrwxr-x 23 1000 1000 4096 Sep 30 22:31 research

# Let's verify we can see our GPUs:
$ nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.XX.XX Driver Version: 450.XX.XX CUDA Version: 11.X |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:1A:00.0 Off | Off |
| 38% 52C P8 14W / 70W | 1MiB / 16127MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

We can now start training ResNet-50. To avoid spending hours training a deep learning model, this article will use the smaller MNIST dataset. However, the workflow will not change with a more state-of-the-art dataset like ImageNet.

# Install dependencies
$ pip install tensorflow_datasets tensorflow_model_optimization

# Download MNIST data and Train
$ python -m "official.vision.image_classification.mnist_main"
--model_dir=./checkpoints
--data_dir=/data
--train_epochs=10
--distribution_strategy=one_device
--num_gpus=1
--download

# Let’s verify that we have the trained model saved on our machine.
$ ls -al checkpoints/

-rw-r--r-- 1 root root 87 Sep 30 22:34 checkpoint
-rw-r--r-- 1 root root 6574829 Sep 30 22:34 model.ckpt-0001.data-00000-of-00001
-rw-r--r-- 1 root root 819 Sep 30 22:34 model.ckpt-0001.index
[...]
-rw-r--r-- 1 root root 6574829 Sep 30 22:34 model.ckpt-0010.data-00000-of-00001
-rw-r--r-- 1 root root 819 Sep 30 22:34 model.ckpt-0010.index
drwxr-xr-x 4 root root 4096 Sep 30 22:34 saved_model
drwxr-xr-x 3 root root 4096 Sep 30 22:34 train
drwxr-xr-x 2 root root 4096 Sep 30 22:34 validation


Obtaining a SavedModel to be used by TF-TRT

After training, Google’s ResNet-50 code exports the model in the SavedModel format at the following path: checkpoints/saved_model/.

The following sample code can be used as a reference in order to export your own trained model as a TensorFlow SavedModel.

import numpy as np

import tensorflow as tf
from tensorflow import keras

def get_model():
# Create a simple model.
inputs = keras.Input(shape=(32,))
outputs = keras.layers.Dense(1)(inputs)
model = keras.Model(inputs, outputs)
model.compile(optimizer="adam", loss="mean_squared_error")
return model

model = get_model()

# Train the model.
test_input = np.random.random((128, 32))
test_target = np.random.random((128, 1))
model.fit(test_input, test_target)

# Calling `save('my_model')` creates a SavedModel folder `my_model`.
model.save("my_model")

We can verify that the SavedModel generated by Google’s ResNet-50 script is readable and correct:

$ ls -al checkpoints/saved_model

drwxr-xr-x 2 root root 4096 Sep 30 22:49 assets
-rw-r--r-- 1 root root 118217 Sep 30 22:49 saved_model.pb
drwxr-xr-x 2 root root 4096 Sep 30 22:49 variables

$ saved_model_cli show --dir checkpoints/saved_model/ --tag_set serve --signature_def serving_default

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

The given SavedModel SignatureDef contains the following input(s):
inputs['input_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 28, 28, 1)
name: serving_default_input_1:0
The given SavedModel SignatureDef contains the following output(s):
outputs['dense_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 10)
name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict

Now that we have verified that our SavedModel has been properly saved, we can proceed with loading it with TF-TRT for inference.


Inference

ResNet-50 Inference using TF-TRT

In this section, we will go over the steps for deploying the saved ResNet-50 model on the NVIDIA GPU using TF-TRT. As previously described, we first convert a SavedModel into a TF-TRT model using the convert method and then load the model.

# Convert the SavedModel
converter = trt.TrtGraphConverterV2(input_saved_model_dir=path)
converter.convert()

# Save the converted model
converter.save(converted_model_path)

# Load converted model and infer
model = tf.saved_model.load(converted_model_path)
func = root.signatures['serving_default']
output = func(input_tensor)

For simplicity, we will use a script to perform inference (tf2_inference.py). We will download the script from github.com and put it in the working directory “/workspace/” of the same docker container as before. After this, we can execute the script:

$ wget https://raw.githubusercontent.com/tensorflow/tensorrt/master/tftrt/blog_posts/Leveraging%20TensorFlow-TensorRT%20integration%20for%20Low%20latency%20Inference/tf2_inference.py

$ ls
AUTHORS CONTRIBUTING.md LICENSE checkpoints data orbit tf2_inference.py
CODEOWNERS ISSUES.md README.md community official research

$ python tf2_inference.py --use_tftrt_model --precision fp16

=========================================
Inference using: TF-TRT …
Batch size: 512
Precision: fp16
=========================================

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
TrtConversionParams(rewriter_config_template=None, max_workspace_size_bytes=8589934592, precision_mode='FP16', minimum_segment_size=3, is_dynamic_op=True, maximum_cached_engines=100, use_calibration=True, max_batch_size=512, allow_build_at_runtime=True)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


Processing step: 0100 ...
Processing step: 0200 ...
[...]
Processing step: 9900 ...
Processing step: 10000 ...

Average step time: 2.1 msec
Average throughput: 244248 samples/sec

Similarly, we can run inference for INT8, and FP32

$ python tf2_inference.py --use_tftrt_model --precision int8

$ python tf2_inference.py --use_tftrt_model --precision fp32

Inference using native TensorFlow (GPU) FP32

You can also run the unmodified SavedModel without any TF-TRT acceleration.

$ python tf2_inference.py --use_native_tensorflow

=========================================
Inference using: Native TensorFlow …
Batch size: 512
=========================================

Processing step: 0100 ...
Processing step: 0200 ...
[...]
Processing step: 9900 ...
Processing step: 10000 ...

Average step time: 4.1 msec
Average throughput: 126328 samples/sec

This run was executed with a NVIDIA T4 GPU. The same workflow will work on any NVIDIA GPU.


Comparing Native Tensorflow 2.x performance vs TF-TRT for Inference

Making minimal code changes to take advantage of TF-TRT can result in a significant performance boost. For example, using the inference script in this blog, with a batch-size of 512 on an NVIDIA T4 GPU, we observe almost 2x speedup with TF-TRT FP16, and a 2.4x speedup with TF-TRT INT8 over native TensorFlow. The amount of speedup obtained may differ depending on various factors like the model used, the batch size, the size and format of images in the dataset, and any CPU bottlenecks.

In conclusion, in this blog we show the acceleration provided by TF-TRT. Additionally, with TF-TRT we can use the full TensorFlow Python API and interactive environments like Jupyter Notebooks or Google Colab.

Supported Operators

The TF-TRT user guide lists operators that are supported in TensorRT-compatible subgraphs. Operators outside this list will be executed by the native TensorFlow runtime.

We encourage you to try it yourself and if you encounter problems, please open an issue here.

Read More

Custom object detection in the browser using TensorFlow.js

A guest post by Hugo Zanini, Machine Learning Engineer

Object detection is the task of detecting where in an image an object is located and classifying every object of interest in a given image. In computer vision, this technique is used in applications such as picture retrieval, security cameras, and autonomous vehicles.

One of the most famous families of Deep Convolutional Neural Networks (DNN) for object detection is the YOLO (You Only Look Once).

In this post, we are going to develop an end-to-end solution using TensorFlow to train a custom object-detection model in Python, then put it into production, and run real-time inferences in the browser through TensorFlow.js.

This post is going to be divided into four steps, as follows:

Object detection pipeline

Prepare the data

The first step to train a great model is to have good quality data. When developing this project, I did not find a suitable (and small enough) object detection dataset, so I decided to create my own.

I looked around and saw a Kangaroo sign that I have in my bedroom — a souvenir that I bought to remember my Aussie days. So I decided to build a Kangaroo detector.

To build my dataset, I downloaded 350 kangaroo images from an image search for kangaroos and labeled all of them by hand using the LabelImg application. As we can have more than one animal per image, the process resulted in 520 labeled kangaroos.

Labelling example

In that case, I chose just one class, but the software can be used to annotate multiple classes as well. It’s going to generate an XML file per image (Pascal VOC format) that contains all annotations and bounding boxes.

<annotation>
<folder>images</folder>
<filename>kangaroo-0.jpg</filename>
<path>/home/hugo/Documents/projects/tfjs/dataset/images/kangaroo-0.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>3872</width>
<height>2592</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>kangaroo</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>60</xmin>
<ymin>367</ymin>
<xmax>2872</xmax>
<ymax>2399</ymax>
</bndbox>
</object>
</annotation>

XML Annotation example

To facilitate the conversion to TF.record format (below), I then converted the XML of the program above into two CSV files containing the data already split in train and test (80%-20%). These files have 9 columns:

  • filename: Image name
  • width: Image width
  • height: Image height
  • class: Image class (kangaroo)
  • xmin: Minimum bounding box x coordinate value
  • ymin: Minimum bounding box y coordinate value
  • xmax: Maximum value of the x coordinate of the bounding box
  • ymax: Maximum value of the y coordinate of the bounding box
  • source: Image source

Using LabelImg makes it easy to create your own dataset, but feel free to use my kangaroo dataset, I’ve uploaded it on Kaggle:

Kangaroo Dataset

Training the model

With a good dataset, it’s time to think about the model.TensorFlow 2 provides an Object Detection API that makes it easy to construct, train, and deploy object detection models. In this project, we’re going to use this API and train the model using a Google Colaboratory Notebook. The remainder of this section explains how to set up the environment, the model selection, and training. If you want to jump straight to the Colab Notebook, click here.

Setting up the environment

Create a new Google Colab notebook and select a GPU as hardware accelerator:

 Runtime > Change runtime type > Hardware accelerator: GPU 

Clone, install, and test the TensorFlow Object Detection API:

Getting and processing the data

As mentioned before, the model is going to be trained using the Kangaroo dataset on Kaggle. If you want to use it as well, it’s necessary to create a user, go into the account section of Kaggle, and get an API Token:

Getting an API Token

Then, you’re ready to download the data:

Now, it’s necessary to create a labelmap file to define the classes that are going to be used. Kangaroo is the only one, so right-click in the File section on Google Colab and create a New file named labelmap.pbtxt as follows:

 item {
name: "kangaroo"
id: 1
}

The last step is to convert the data into a sequence of binary records so that they can be fed into Tensorflow’s object detection API. To do so, transform the data into the TFRecord format using the generate_tf_records.py script available in the Kangaroo Dataset:

Choosing the model

We’re ready to choose the model that’s going to be the Kangaroo Detector. TensorFlow 2 provides 40 pre-trained detection models on the COCO 2017 Dataset. This collection is the TensorFlow 2 Detection Model Zoo and can be accessed here.

Every model has a Speed, Mean Average Precision(mAP) and Output. Generally, a higher mAP implies a lower speed, but as this project is based on a one-class object detection problem, the faster model (SSD MobileNet v2 320×320) should be enough.

Besides the Model Zoo, TensorFlow provides a Models Configs Repository as well. There, it’s possible to get the configuration file that has to be modified before the training. Let’s download the files:

Configure training

As mentioned before, the downloaded weights were pre-trained on the COCO 2017 Dataset, but the focus here is to train the model to recognize one class so these weights are going to be used only to initialize the network — this technique is known as transfer learning, and it’s commonly used to speed up the learning process.

From now, what has to be done is to set up the mobilenet_v2.config file, and start the training. I highly recommend reading the MobileNetV2 paper (Sandler, Mark, et al. – 2018) to get the gist of the architecture.

Choosing the best hyperparameters is a task that requires some experimentation. As the resources are limited in the Google Colab, I am going to use the same batch size as the paper, set a number of steps to get a reasonably low loss, and leave all the other values as default. If you want to try something more sophisticated to find the hyperparameters, I recommend Keras Tuner – an easy-to-use framework that applies Bayesian Optimization, Hyperband, and Random Search algorithms.

With the parameters set, start the training:

To identify how well the training is going, we use the loss value. Loss is a number indicating how bad the model’s prediction was on the training samples. If the model’s prediction is perfect, the loss is zero; otherwise, the loss is greater. The goal of training a model is to find a set of weights and biases that have low loss, on average, across all examples (Descending into ML: Training and Loss | Machine Learning Crash Course).

From the logs, it’s possible to see a downward trend in the values so we say that “The model is converging”. In the next section, we’re going to plot these values for all training steps and the trend will be even clearer.

The model took around 4h to train (with Colab GPU), but by setting different parameters, you can make the process faster or slower. Everything depends on the number of classes you are using and your Precision/Recall target. A highly accurate network that recognizes multiple classes will take more steps and require more detailed parameters tuning.

Validate the model

Now let’s evaluate the trained model using the test data:

The evaluation was done in 89 images and provides three metrics based on the COCO detection evaluation metrics: Precision, Recall and Loss.

The Recall measures how good the model is at hitting the positive class, That is, from the positive samples, how many did the algorithm get right?

Recall

Precision defines how much you can rely on the positive class prediction: From the samples that the model said were positive, how many actually are?

Precision

Setting a practical example: Imagine we have an image containing 10 kangaroos, our model returned 5 detections, being 3 real kangaroos (TP = 3, FN =7) and 2 wrong detections (FP = 2). In that case, we have a 30% recall (the model detected 3 out of 10 kangaroos in the image) and a 60% precision (from the 5 detections, 3 were correct).

The precision and recall were divided by Intersection over Union (IoU) thresholds. The IoU is defined as the area of the intersection divided by the area of the union of a predicted bounding box (B) to a ground-truth box (B)(Zeng, N. – 2018):

Intersection over Union

For simplicity, it’s possible to consider that the IoU thresholds are used to determine whether a detection is a true positive(TP), a false positive(FP) or a false negative (FN). See an example below:

IoU threshold examples

With these concepts in mind, we can analyze some of the metrics we got from the evaluation. From the TensorFlow 2 Detection Model Zoo, the SSD MobileNet v2 320×320 has an mAP of 0.202. Our model presented the following average precisions (AP) for different IoUs:

 
AP@[IoU=0.50:0.95 | area=all | maxDets=100] = 0.222
AP@[IoU=0.50 | area=all | maxDets=100] = 0.405
AP@[IoU=0.75 | area=all | maxDets=100] = 0.221

That’s pretty good! And we can compare the obtained APs with the SSD MobileNet v2 320×320 mAP as from the COCO Dataset documentation:

We make no distinction between AP and mAP (and likewise AR and mAR) and assume the difference is clear from context.

The Average Recall(AR) was split by the max number of detection per image (1, 10, 100). When we have just one kangaroo per image, the recall is around 30% while when we have up to 100 kangaroos it is around 51%. These values are not that good but are reasonable for the kind of problem we’re trying to solve.

 
(AR)@[ IoU=0.50:0.95 | area=all | maxDets= 1] = 0.293
(AR)@[ IoU=0.50:0.95 | area=all | maxDets= 10] = 0.414
(AR)@[ IoU=0.50:0.95 | area=all | maxDets=100] = 0.514

The Loss analysis is very straightforward, we’ve got 4 values:

 
INFO:tensorflow: + Loss/localization_loss: 0.345804
INFO:tensorflow: + Loss/classification_loss: 1.496982
INFO:tensorflow: + Loss/regularization_loss: 0.130125
INFO:tensorflow: + Loss/total_loss: 1.972911

The localization loss computes the difference between the predicted bounding boxes and the labeled ones. The classification loss indicates whether the bounding box class matches with the predicted class. The regularization loss is generated by the network’s regularization function and helps to drive the optimization algorithm in the right direction. The last term is the total loss and is the sum of three previous ones.

Tensorflow provides a tool to visualize all these metrics in an easy way. It’s called TensorBoard and can be initialized by the following command:

 
%load_ext tensorboard
%tensorboard --logdir '/content/training/'

This is going to be shown, and you can explore all training and evaluation metrics.

Tensorboard — Loss

In the tab IMAGES, it’s possible to find some comparisons between the predictions and the ground truth side by side. A very interesting resource to explore during the validation process as well.

Tensorboard — Testing images

Exporting the model

Now that the training is validated, it’s time to export the model. We’re going to convert the training checkpoints to a protobuf (pb) file. This file is going to have the graph definition and the weights of the model.

As we’re going to deploy the model using TensorFlow.js and Google Colab has a maximum lifetime limit of 12 hours, let’s download the trained weights and save them locally. When running the command files.download(‘/content/saved_model.zip”), the colab will prompt the file automatically.

If you want to check if the model was saved properly, load, and test it. I’ve created some functions to make this process easier so feel free to clone the inferenceutils.py file from my GitHub to test some images.

Everything is working well, so we’re ready to put the model in production.

Deploying the model

The model is going to be deployed in a way that anyone can open a PC or mobile camera and perform inferences in real-time through a web browser. To do that, we’re going to convert the saved model to the Tensorflow.js layers format, load the model in a javascript application and make everything available on Glitch.

Converting the model

At this point, you should have something similar to this structure saved locally:

 
├── inference-graph
│ ├── saved_model
│ │ ├── assets
│ │ ├── saved_model.pb
│ │ ├── variables
│ │ ├── variables.data-00000-of-00001
│ │ └── variables.index

Before we start, let’s create an isolated Python environment to work in an empty workspace and avoid any library conflict. Install virtualenv and then open a terminal in the inference-graph folder and create and activate a new virtual environment:

 
virtualenv -p python3 venv
source venv/bin/activate

Install the TensorFlow.js converter:

  pip install tensorflowjs[wizard]

Start the conversion wizard:

  tensorflowjs_wizard

Now, the tool will guide you through the conversion, providing explanations for each choice you need to make. The image below shows all the choices that were made to convert the model. Most of them are the standard ones, but options like the shard sizes and compression can be changed according to your needs.

To enable the browser to cache the weights automatically, it’s recommended to split them into shard files of around 4MB. To guarantee that the conversion is going to work, don’t skip the op validation as well, not all TensorFlow operations are supported so some models can be incompatible with TensorFlow.js — See this list for which ops are currently supported.

Model conversion using Tensorflow.js Converter (Full resolution image here

If everything worked well, you’re going to have the model converted to the Tensorflow.js layers format in the web_modeldirectory. The folder contains a model.json file and a set of sharded weights files in a binary format. The model.json has both the model topology (aka “architecture” or “graph”: a description of the layers and how they are connected) and a manifest of the weight files (Lin, Tsung-Yi, et al).

  
└ web_model
├── group1-shard1of5.bin
├── group1-shard2of5.bin
├── group1-shard3of5.bin
├── group1-shard4of5.bin
├── group1-shard5of5.bin
└── model.json

Configuring the application

The model is ready to be loaded in javascript. I’ve created an application to perform inferences directly from the browser. Let’s clone the repository to figure out how to use the converted model in real-time. This is the project structure:

 
├── models
│ └── kangaroo-detector
│ ├── group1-shard1of5.bin
│ ├── group1-shard2of5.bin
│ ├── group1-shard3of5.bin
│ ├── group1-shard4of5.bin
│ ├── group1-shard5of5.bin
│ └── model.json
├── package.json
├── package-lock.json
├── public
│ └── index.html
├── README.MD
└── src
├── index.js
└── styles.css

For the sake of simplicity, I already provide a converted kangaroo-detector model in the models folder. However, let’s put the web_model generated in the previous section in the models folder and test it.

The first thing to do is to define how the model is going to be loaded in the function load_model (lines 10–15 in the file src>index.js). There are two choices.

The first option is to create an HTTP server locally that will make the model available in a URL allowing requests and be treated as a REST API. When loading the model, TensorFlow.js will do the following requests:

 
GET /model.json
GET /group1-shard1of5.bin
GET /group1-shard2of5.bin
GET /group1-shard3of5.bin
GET /group1-shardo4f5.bin
GET /group1-shardo5f5.bin

If you choose this option, define the load_model function as follows:

  async function load_model() {
// It's possible to load the model locally or from a repo
// You can choose whatever IP and PORT you want in the "http://127.0.0.1:8080/model.json" just set it before in your https server
const model = await loadGraphModel("http://127.0.0.1:8080/model.json");
//const model = await loadGraphModel("https://raw.githubusercontent.com/hugozanini/TFJS-object-detection/master/models/web_model/model.json");
return model;
}

Then install the http-server:

  npm install http-server -g

Go to models > web_model and run the command below to make the model available at http://127.0.0.1:8080 . This a good choice when you want to keep the model weights in a safe place and control who can request inferences to it. The -c1 parameter is added to disable caching, and the –cors flag enables cross origin resource sharing allowing the hosted files to be used by the client side JavaScript for a given domain.

  http-server -c1 --cors .

Alternatively you can upload the model files somewhere, in my case, I chose my own Github repo and referenced to the model.json URL in the load_model function:

 

async function load_model() {
// It's possible to load the model locally or from a repo
//const model = await loadGraphModel("http://127.0.0.1:8080/model.json");
const model = await loadGraphModel("https://raw.githubusercontent.com/hugozanini/TFJS-object-detection/master/models/web_model/model.json");
return model;
}

This is a good option because it gives more flexibility to the application and makes it easier to run on some platform as Glitch.

Running locally

To run the app locally, install the required packages:

 npm install
 And start:
 npm start

The application is going to run at http://localhost:3000 and you should see something similar to this:

Application running locally

The model takes from 1 to 2 seconds to load and, after that, you can show kangaroos images to the camera and the application is going to draw bounding boxes around them.

Publishing in Glitch

Glitch is a simple tool for creating web apps where we can upload the code and make the application available for everyone on the web. Uploading the model files in a GitHub repo and referencing to them in the load_model function, we can simply log into Glitch, click on New project > Import from Github and select the app repository.

Wait some minutes to install the packages and your app will be available in a public URL. Click on Show > In a new window and a tab will be open. Copy this URL and past it in any web browser (PC or Mobile) and your object detection will be ready to run. See some examples in the video below:

Running the model on different devices

First, I did a test showing a kangaroo sign to verify the robustness of the application. It showed that the model is focusing specifically on the kangaroo features and did not specialize in irrelevant characteristics that were present in many images, such as pale colors or shrubs.

Then, I opened the app on my mobile and showed some images from the test set. The model runs smoothly and identifies most of the kangaroos. If you want to test my live app, it is available here (glitch takes some minutes to wake up).

Besides the accuracy, an interesting part of these experiments is the inference time — everything runs in real-time in the browser via JavaScript. Good object detection models running in the browser and using few computational resources is a must in many applications, mostly in industry. Putting the Machine Learning model on the client-side means cost reduction and safer applications as user privacy is preserved as there is no need to send the information to any server to perform the inference.

Next steps

Object detection in the browser can solve a lot of real-world problems and I hope this article will serve as a basis for new projects involving Computer Vision, Python, TensorFlow and Javascript.

As the next steps, I’d like to make more detailed training experiments. Due to the lack of resources, I could not try many different parameters and I’m sure that there is a lot of room for improvements in the model.

I’m more focused on the models’ training, but I’d like to see a better user interface for the app. If someone is interested in contributing to the project, feel free to create a pull request in the project repo. It will be nice to make a more user-friendly application.

If you have any questions or suggestions you can reach me out on Linkedin. Thanks for reading!

Read More

ML Metadata: Version Control for ML

Posted by Ben Mathes and Neoklis Polyzotis, on behalf of the TFX Team

When you write code, you need version control to keep track of it. What’s the ML equivalent of version control? If you’re building production ML systems, you need to be able to answer questions like these:

  • Which dataset was this model trained on?
  • What hyperparameters were used?
  • Which pipeline was used to create this model?
  • Which version of TensorFlow (and other libraries) were used to create this model?
  • What caused this model to fail?
  • What version of this model was last deployed?

Engineers at Google have learned, through years of hard-won experience, that this history and lineage of ML artifacts is far more complicated than a simple, linear log. You use Git (or similar) to track your code; you need something to track your models, datasets, and more. Git, for example, may simplify your life a lot, but under the hood there’s a graph of many things! The complexity of ML code and artifacts like models, datasets, and much more requires a similar approach.

That’s why we built Machine Learning Metadata (MLMD). It’s a library to track the full lineage of your entire ML workflow. Full lineage is all the steps from data ingestion, data preprocessing, validation, training, evaluation, deployment, and so on. MLMD is a standalone library, and also comes integrated in TensorFlow Extended. There’s also a demo notebook to see how you can integrate MLMD into your ML infrastructure today.

ML Metadata icon
Beyond versioning your model, ML Metadata captures the full lineage of the training process, including the dataset, hyperparameters, and software dependencies.

Here’s how MLMD can help you:

  • If you’re a ML Engineer: You can use MLMD to trace bad models back to their dataset, or trace from a bad dataset to the models you trained on it, and so on.
  • If you’re working in ML infrastructure: You can use MLMD to record the current state of your pipeline and enable event-based orchestration. You can also enable optimizations like skipping a step if the inputs and code are the same, memoizing steps in your pipelines. You can integrate MLMD into your training system so it automatically creates logs for querying later. We’ve found that this auto-logging of the full lineage as a side effect of training is the best way to use MLMD. Then you have the full history without extra effort.

MLMD is more than a TFX research project. It’s a key foundation to multiple internal MLOps solutions at Google. Furthermore, Google Cloud integrates tools like MLMD into its core MLOps platform:

The foundation of all these new services is our new ML Metadata Management service in AI Platform. This service lets AI teams track all the important artifacts and experiments they run, providing a curated ledger of actions and detailed model lineage. This will enable customers to determine model provenance for any model trained on AI Platform for debugging, audit, or collaboration. AI Platform Pipelines will automatically track artifacts and lineage and AI teams can also use the ML Metadata service directly for custom workloads, artifact and metadata tracking.

Want to know where your models come from? What training data was used? Did anyone else train a model on this dataset already, and was their performance better? Are there any tainted datasets we need to clean up after?

If you want to answer these questions for your users, check out MLMD on github, as a part of TensorFlow Extended, or in our demo notebook.

Read More

Join the TensorFlow Special Interest Groups (SIGs)

Join the TensorFlow Special Interest Groups (SIGs)

Posted by Joana Carrasqueira, TensorFlow Program Manager and Thea Lamkin, Open Source Program Manager, in collaboration with TensorFlow SIG Leads.

TensorFlow SIGs (Special Interest Groups) organize community contributions to key parts of the TensorFlow ecosystem, and enable community members to contribute and maintain new features in important areas.

SIG leads and members work together to build and support important TensorFlow use cases, and are a vital part of our open source community. It all started with the SIG Build, and we now have 13 Active SIGs, with more on the way.

In this article, you’ll learn about the SIGs that exist today, and how you can get involved. Many SIGs are led by members of the open source community, from industry collaborators to Machine Learning Google Developer Experts (ML GDEs). TensorFlow’s success is due in large part to the hard work and contributions of our vibrant community. We welcome contributors to join the SIGs working on the parts of TensorFlow’s ecosystem they are most excited to collaborate on. Here is an overview of the SIGs and their areas of focus, contributed by their leads:

SIG Addons

In a fast-moving field like Machine Learning, there are many new developments that cannot be integrated into core TensorFlow. SIG Addons was created to tackle this problem by maintaining a repository of bleeding edge contributions that conform to well-established API patterns, but implement new functionality not available in core TensorFlow and adopted some of the parts of tf.contrib.

To contribute to TensorFlow Addons, join the conversation at our monthly meeting.

SIG Build

Started as a forum for development topics like new architecture support and packaging improvements, SIG Build grew to a discussion center dedicated to building, testing, packaging, and distributing TensorFlow that bridges internal and external TensorFlow development. The goal of this group is to ensure TensorFlow is a good citizen in the wider OSS ecosystem (Python, C++, Linux, Windows, MacOS).

To contribute to TensorFlow Build, join the conversation at our monthly meeting.

SIG IO

SIG IO is a repository of dataset, streaming, and file systems extension support for TensorFlow. Recent accomplishments include the release of v.0.13.0 (with TF 2.2), added Video Studio Code tutorial, and added AVIF imagine file format support.

To contribute to TensorFlow IO, join the conversation at our monthly meeting.

SIG JVM

SIG JVM provides comprehensive support for building, training and serving TensorFlow models on top of Java Virtual Machine (JVM). This group focuses on using Java but also includes other popular JVM languages, like Kotlin and Scala. Some of the recent accomplishments include adding n-dimensional data access in native memory and the creation of a high-level API similar to Keras for building models.

To contribute to TensorFlow JVM, join the conversation at our monthly meeting.

SIG Keras

This group focuses on care and feeding of the tf.Keras API (new features, docs, guides), Keras Tuner, AutoKeras, and Keras applications.

To contribute to TensorFlow Keras, join the conversation at our bi-monthly meeting.

SIG Micro

SIG Micro is a discussion and collaboration group around running TensorFlow models on Microcontrontrollers, DSPs, and other highly resource constrained embedded devices.

To contribute to TensorFlow Micro, join the conversation at our monthly meeting.

SIG MLIR

The goal of this group is to foster an open discussion on high performance compilers and how optimization techniques can be applied to TensorFlow graphs. Ultimately this project aims to create a common intermediate representation that reduces the cost of new hardware and improves usability for existing TensorFlow users.

To contribute to TensorFlow MLIR, join the conversation at our monthly meeting.

SIG Networking

SIG Networking aims to add support for different network fabrics and protocols. The group evaluates proposals and designs in this area and maintains code in the tensorflow/networking repository. Join us, if you are interested in improving TensorFlow on different types of networks or underlying drivers and libraries!

To contribute to TensorFlow Networking, join the conversation at our monthly meeting.

SIG Reccomenders (New!)

SIG Recommenders was created to drive discussion and collaborations around using TensorFlow for large scale recommendation systems (Recommenders). We hope to encourage sharing of best practices in the industry, get consensus and product feedback to help evolve TensorFlow support for recommenders, and facilitate the contributions of RFCs and PRs in this domain.

To contribute to TensorFlow Recommenders, join the mailing list to get updates about our upcoming meetings.

SIG Rust

SIG Rust was created for users and contributors on the TensorFlow Rust binding project. It provides stable support for running models created in other languages, and can both train and evaluate.

To contribute to TensorFlow Rust, join the conversation at our monthly meeting.

SIG Swift

The purpose of SIG Swift is to host design reviews, discuss upcoming API changes, share project roadmap, and encourage collaboration in the Swift for TensorFlow (S4TF) open-source community.

To contribute to TensorFlow Swift, join the conversation at our monthly meeting.

SIG Tensorboard

SIG TensorBoard was created for discussion and collaboration around TensorBoard, the visualization tool for TensorFlow. The goal of this group is to engage the TensorBoard user and developer community and get feedback; encourage development of new TensorBoard plugins; promote collaboration ML via TensorBoard.dev; and encourage community improvements to TensorBoard.

To contribute to TensorFlow TensorBoard, join the conversation at our monthly meeting.

SIG TF.js (New!)

SIG TF.js was created to facilitate community-contributed components to tensorflow/tfjs (and potential community-maintained libraries). The core TensorFlow.js engineering team has been working on building the infrastructure and tooling to enable ML to run in JavaScript powered applications, and has an active contributor community of individual developers, GDEs, and enterprise users. We want to accelerate the community involvement in the project to help continue meet the needs and help drive new directions for the project.

To contribute to TensorFlow TF.js, join the conversation at our monthly meeting.

Thank you to our SIG Leads for their work and leadership:

Picture: 1st TensorFlow Contributor Summit, Santa Clara, 2019.
Picture: 1st TensorFlow Contributor Summit, Santa Clara, 2019.

Sean Morgan, Tzu-Wei Sung | SIG Addons

Jason Zaman, Austin Anderson | SIG Build

Yong Tang, Anthony Dmitriev, Derek Murray | SIG IO

Karl Lessard, Adam Pocock, Rajagopal Ananthanarayanan | SIG JVM

Francois Chollet | SIG Keras

Neil Tan, Pete Warden | SIG Micro

Tatiana Shpeisman, Pankaj Kanwar | SIG MLIR

Bairen Yi, Jeroen Bedorf | SIG Networking

Bo Liu, Haidong Rong, Yong Li, Wei Wei | SIG Recommenders

Adam Crume | SIG Rust

Ewa Matejska | SIG Swift

Mani Varadarajan, Gal Oshri | SIG TensorBoard

Sandeep Gupta, Ping Yu | SIG TF.js

Read More

InSpace: A new video conferencing platform that uses TensorFlow.js for toxicity filters in chat

InSpace: A new video conferencing platform that uses TensorFlow.js for toxicity filters in chat

A guest post by Narine Hall, Assistant Professor at Champlain College, CEO of InSpace

InSpace is a communication and virtual learning platform that gives people the ability to interact, collaborate, and educate in familiar physical ways, but in a virtual space. InSpace is built by educators for educators, putting education at the center of the platform.

  • InSpace is designed to mirror the fluid, personal, and interactive nature of a real classroom. It allows participants to break free of “Brady Bunch” boxes in existing conference solutions to create a fun, natural, and engaging environment that fosters interaction and collaboration.
  • Each person is represented in a video circle that can freely move around the space. When people are next to each other, they can hear and engage in conversation, and as they move away, the audio fades, allowing them to find new conversations.
  • As participants zoom out, they can see the entire space, which provides visual social cues. People can seamlessly switch from class discussions to private conversations or group/team-based work, similar to the format of a lab or classroom.
  • Teachers can speak to everyone when needed, move between individual students and groups for more private discussion, and place groups of students in audio-isolated rooms for collaboration while still belonging to one virtual space.

Being a collaboration platform, a fundamental InSpace feature is chat. From day one, we wanted to provide a mechanism to help warn users from sending and receiving toxic messages. For example, a teacher in a classroom setting with young people may want a way to prevent students from typing inappropriate comments. Or, a moderator of a large discussion may want a way to reduce inappropriate spam. Or individual users may want to filter them out on their own.

A simple way to identify toxic comments would be to check for the presence of a list of words, including profanity. Moving a step beyond this, we did not want to identify toxic messages just by words contained in the message, we also wanted to consider the context. So, we decided to use machine learning to accomplish that goal.

After some research, we found a pre-trained model for toxicity detection in TensorFlow.js that could be easily integrated into our platform. Importantly, this model runs entirely in the browser, which means that we can warn users against sending toxic comments without their messages ever being stored or processed by a server.

Performance wise, we found that running the toxicity process in a browser’s main thread would be detrimental to the user experience. We decided a good approach was to use the Web Workers API to separate message toxicity detection from the main application so the processes are independent and non-blocking.

GIF moderating toxic comments

Web Workers connect to the main application by sending and receiving messages, in which you can wrap your data. Whenever a user sends a message, it is automatically added to a so-called queue and is sent from the main application to the web worker. When the web worker receives the message from the main app, it starts classification of the message, and when the output is ready, it sends the results back to the main application. Based on the results from the web worker, the main app either sends the message to all participants or warns the user that it is toxic.

Chart showing how toxicity filter works

Below is the pseudocode for the main application, where we initialize the web worker by providing its path as an argument, then set the callback that will be called each time the worker sends a message, and also we declare the callback that will be called when the user submits a message.

// main application
// initializing the web worker
const toxicityFilter = new Worker('toxicity-filter.worker.js'));
// now we need to set the callback which will process the data from the worker
worker.onMessage = ({ data: { message, isToxic } }) => {
if (isToxic) {
markAsToxic(message);
} else {
sendToAll(message);
}
}

When the user sends the message, we pass it to the web worker:

onMessageSubmit = message => {
worker.postMessage(message);
addToQueue(message);
}

After the worker is initialized, it starts listening to the data messages from the main app, and handling them using the declared onmessage callback, which then sends a message back to the main app.

// toxicity-filter worker
// here we import dependencies
importScripts(
// the main library to run Tenser Flow in the browser
'https://cdn.jsdelivr.net/npm/@tensorflow/tfjs',
// trained models for toxicity detection
'https://cdn.jsdelivr.net/npm/@tensorflow-models/toxicity',
);
// threshold point for the decision
const threshold = 0.9;
// the main model promise which would be used to classify the message
const modelPromise = toxicity.load(threshold);
// registered callback to run when the main app sends the data message
onmessage = ({ data: message }) => {
modelPromise.then(model => {
model.classify([message.body]).then(predictions => {
// as we want to check the toxicity for all labels,
// `predictions` will contain the results for all 7 labels
// so we check, whether there is a match for any of them
const isToxic = predictions.some(prediction => prediction.results[0].match);
// here we send the data message back to the main app with the results
postMessage({ message, isToxic });
});
});
};

As you can see, the toxicity detector is straightforward to integrate the package with an app, and does not require significant changes to existing architecture. The main application only needs a small “connector,” and the logic of the filter is written in a separate file.

To learn more about InSpace visit https://inspace.chat.

Read More

How to generate super resolution images using TensorFlow Lite on Android

How to generate super resolution images using TensorFlow Lite on Android

Posted by Wei Wei, TensorFlow Developer Advocate

The task of recovering a high resolution (HR) image from its low resolution counterpart is commonly referred to as Single Image Super Resolution (SISR). While interpolation methods, such as bilinear or cubic interpolation, can be used to upsample low resolution images, the quality of resulting images is generally less appealing. Deep learning, especially Generative Adversarial Networks, has successfully been applied to generate more photo-realistic images, for example, SRGAN and ESRGAN. In this blog, we are going to use a pre-trained ESRGAN model from TensorFlow Hub and generate super resolution images using TensorFlow Lite in an Android app. The final app looks like below and the complete code has been released in TensorFlow examples repo for reference.

Screencap of TensorFlow Lite

First, we can conveniently load the ESRGAN model from TFHub and easily convert it to a TFLite model. Note that here we are using dynamic range quantization and fixing the input image dimensions to 50×50. The converted model has been uploaded to TFHub but we want to demonstrate how to do it just in case you want to convert it yourself (for example, try a different input size in your own app):

model = hub.load("https://tfhub.dev/captain-pool/esrgan-tf2/1")
concrete_func = model.signatures[tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
concrete_func.inputs[0].set_shape([1, 50, 50, 3])
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

# Save the TF Lite model.
with tf.io.gfile.GFile('ESRGAN.tflite', 'wb') as f:
f.write(tflite_model)

esrgan_model_path = './ESRGAN.tflite'

You can also convert the model without hardcoding the input dimensions at conversion time and later resize the input tensor at runtime, as TFLite now supports inputs of dynamic shapes. Please refer to this example for more information.

Once the model is converted, we can quickly verify that the ESRGAN TFLite model does generate a much better image than bicubic interpolation. We also have another tutorial on ESRGAN if you want to better understand the model.

lr = cv2.imread(test_img_path)
lr = cv2.cvtColor(lr, cv2.COLOR_BGR2RGB)
lr = tf.expand_dims(lr, axis=0)
lr = tf.cast(lr, tf.float32)

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path=esrgan_model_path)
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Run the model
interpreter.set_tensor(input_details[0]['index'], lr)
interpreter.invoke()

# Extract the output and postprocess it
output_data = interpreter.get_tensor(output_details[0]['index'])
sr = tf.squeeze(output_data, axis=0)
sr = tf.clip_by_value(sr, 0, 255)
sr = tf.round(sr)
sr = tf.cast(sr, tf.uint8)
ESRGAN model with low res and high res image
LR: low resolution input image cropped from an butterfly image in DIV2K dataset. ESRGAN (x4): super resolution output image generated using ESRGAN model with upscale_ratio=4. Bicubic: output image generated using bicubic interpolation. As can be seen here, bicubic interpolation-generated image is much blurrier than the ESRGAN-generated one. PSNR is also higher on the ESRGAN-generated image.

As you may already know, TensorFlow Lite is the official framework to run inference with TensorFlow models on edge devices and is deployed on more than 4 billions edge devices worldwide, supporting Android, iOS, Linux-based IoT devices and microcontrollers. You can use TFLite in Java, C/C++ or other languages to build Android apps. In this blog we will use TFLite C API since many developers have asked for such an example.

We distribute TFLite C header files and libraries in prebuilt AAR files (the core library and the GPU library). To set up the Android build correctly, the first thing we need to do is download the AAR files and extract the header files and shared libraries. Have a look at how this is done in the download.gradle file.

Since we are using Android NDK to build the app (NDK r20 is confirmed to work), we need to let Android Studio know how native files should be handled. This is done in CMakeList.txt file:

We included 3 sample images in the app, so a user may easily run the same model multiple times, which means we need to cache the interpreter for better efficiency. This is done by passing the interpreter pointer from C++ code to Java code, after the interpreter is successfully created:

extern "C" JNIEXPORT jlong JNICALL
Java_org_tensorflow_lite_examples_superresolution_MainActivity_initWithByteBufferFromJNI(JNIEnv *env, jobject thiz, jobject model_buffer, jboolean use_gpu) {
const void *model_data = static_cast<void *>(env->GetDirectBufferAddress(model_buffer));
jlong model_size_bytes = env->GetDirectBufferCapacity(model_buffer);
SuperResolution *super_resolution = new SuperResolution(model_data, static_cast<size_t>(model_size_bytes), use_gpu);
if (super_resolution->IsInterpreterCreated()) {
LOGI("Interpreter is created successfully");
return reinterpret_cast<jlong>(super_resolution);
} else {
delete super_resolution;
return 0;
}
}

Once the interpreter is created, running the model is fairly straightforward as we can follow the TFLite C API documentation. We first need to carefully extract RGB values from each pixel. Now we can run the interpreter:

// Feed input into model
status = TfLiteTensorCopyFromBuffer(input_tensor, input_buffer, kNumberOfInputPixels * kImageChannels * sizeof(float));
…...

// Run the interpreter
status = TfLiteInterpreterInvoke(interpreter_);
…...

// Extract the output tensor data
const TfLiteTensor* output_tensor = TfLiteInterpreterGetOutputTensor(interpreter_, 0);
float output_buffer[kNumberOfOutputPixels * kImageChannels];
status = TfLiteTensorCopyToBuffer(output_tensor, output_buffer, kNumberOfOutputPixels * kImageChannels * sizeof(float));
…...

With the models results at hand, we can pack the RGB values back into each pixel.

There you have it. A reference Android app using TFLite to generate super resolution images on device. More details can be found in the code repository. Hopefully this is useful as a reference to Android developers who are getting started to use TFLite in C/C++ to build amazing ML apps.

Feedback

We are looking forward to seeing what you have built with TensorFlow Lite, as well as hearing your feedback. Share your use cases with us directly or on Twitter with hashtags #TFLite and #PoweredByTF. To report bugs and issues, please reach out to us on GitHub.

Acknowledgements

The author would like to thank @captain__pool for uploading the ESRGAN model to TFHub, and Tian Lin and Jared Duke for the helpful feedback.

[1] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi. 2016. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.

[2] Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang. 2018. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks.

[3] Tensorflow 2.x based implementation of EDSR, WDSR and SRGAN for single image super-resolution

[4] @captain__pool’s ESGRAN code implementation

[5] Eirikur Agustsson, Radu Timofte. 2017. NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study.

Read More

How Rasa Open Source Gained Layers of Flexibility with TensorFlow 2.x

How Rasa Open Source Gained Layers of Flexibility with TensorFlow 2.x

A guest post by Vincent D. Warmerdam and Vladimir Vlasov, Rasa

Rasa logo with bird

At Rasa, we are building infrastructure for conversational AI, used by developers to build chat- and voice-based assistants. Rasa Open Source, our cornerstone product offering, provides a framework for NLU (Natural Language Understanding) and dialogue management. On the NLU side we offer models that handle intent classification and entity detection using models built with Tensorflow 2.x.

In this article, we would like to discuss the benefits of migrating to the latest version of TensorFlow and also give insight into how some of the Rasa internals work.

A Typical Rasa Project Setup

When you’re building a virtual assistant with Rasa Open Source, you’ll usually begin by defining stories, which represent conversations users might have with your agent. These stories will serve as training data and you can configure them as yaml files. If we pretend that we’re making an assistant that allows you to buy pizzas online then we might have stories in our configuration that look like this:

yaml
version: "2.0"

stories:

- story: happy path
steps:
- intent: greet
- action: utter_greet
- intent: mood_great
- action: utter_happy

- story: purchase path
steps:
- intent: greet
- action: utter_greet
- intent: purchase
entities:
product: “pizza”
- action: confirm_purchase
- intent: affirm
- action: confirm_availability

These stories consist of intents and actions. Actions can be simple text replies, or they can trigger custom Python code (that checks a database, for instance). To define training data for each intent, you supply the assistant with example user messages, which might look something like:

yaml
version: "2.0"

nlu:
- intent: greet
examples: |
- hey
- hello
- hi
- hello there
- good morning

- intent: purchase
examples: |
- i’d like to buy a [veggie pizza](product) for [tomorrow](date_ref)
- i want to order a [pizza pepperoni](product)
- i’d want to buy a [pizza](product) and a [cola](product)
- ...

When you train an assistant using Rasa you’ll supply configuration files like those shown above. You can be very expressive in the types of conversations your agent can handle. Intents and actions are like lego bricks and can be combined expressively to cover many conversational paths. Once these files are defined they are combined to create a training dataset that the agent will learn from.

Rasa allows users to build custom machine learning pipelines to fit their datasets. That means you can incorporate your own (pre-trained) models for natural language understanding if you’d like. But Rasa also provides models, written in TensorFlow, that are specialized for these tasks.

Specific Model Requirements

You may have noticed that our examples include not just intents but also entities. When a user is interested in making a purchase, they (usually) also say what they’re interested in buying. This information needs to be detected when the user provides it. It’d be a bad experience if we needed to supply the user with a form to retrieve this information.

intent greet vs intent purchase

If you take a step back and think about what kind of model could work well here, you’ll soon recognize that it’s not a standard task. It’s not just that we have numerous labels at each utterance; we have multiple *types* of labels too. That means that we need models that have two outputs.

Types of labels texts to tokens

Rasa Open Source offers a model that can detect both intents and entities, called DIET. It uses a transformer architecture that allows the system to learn from the interaction between intents and entities. Because it needs to handle these two tasks at once, the typical machine learning pattern won’t work:

model.fit(X, y).predict(X)

You need a different abstraction.

Abstraction

This is where TensorFlow 2.x has made an improvement to the Rasa codebase. It is now much easier to customize TensorFlow classes. In particular, we’ve made a custom abstraction on top of Keras to suit our needs. One example of this is Rasa’s own internal `RasaModel.` We’ve added the base class’s signature below. The full implementation can be found here.

class RasaModel(tf.keras.models.Model):

def __init__(
self,
random_seed: Optional[int] = None,
tensorboard_log_dir: Optional[Text] = None,
tensorboard_log_level:Optional[Text] = "epoch",
**kwargs,
) -> None:
...

def fit(
self,
model_data: RasaModelData,
epochs: int,
batch_size: Union[List[int], int],
evaluate_on_num_examples: int,
evaluate_every_num_epochs: int,
batch_strategy: Text,
silent: bool = False,
eager: bool = False,
) -> None:
...

This object is customized to allow us to pass in our own `RasaModelData` object. The benefit is that we can keep all the existing features that the Keras model object offers while we can override a few specific methods to suit our needs. We can run the model with our preferred data format while maintaining manual control over “eager mode,” which helps us debug.

These Keras objects are now a central API in TensorFlow 2.x, which made it very easy for us to integrate and customize.

Training Loop

To give another impression of how the code became simpler, let’s look at the training loop inside the Rasa model.

Python Pseudo-Code for TensorFlow 1.8

We’ve got a part of the code used for our old training loop listed below (see here for the full implementation). Note that it is using `session.run` to calculate the loss as well as the accuracy.

def train_tf_dataset(
train_init_op: "tf.Operation",
eval_init_op: "tf.Operation",
batch_size_in: "tf.Tensor",
loss: "tf.Tensor",
acc: "tf.Tensor",
train_op: "tf.Tensor",
session: "tf.Session",
epochs: int,
batch_size: Union[List[int], int],
evaluate_on_num_examples: int,
evaluate_every_num_epochs: int,
)
session.run(tf.global_variables_initializer())
pbar = tqdm(range(epochs),desc="Epochs", disable=is_logging_disabled())

for ep in pbar:
ep_batch_size=linearly_increasing_batch_size(ep, batch_size, epochs)
session.run(train_init_op, feed_dict={batch_size_in: ep_batch_size})

ep_train_loss = 0
ep_train_acc = 0
batches_per_epoch = 0
while True:
try:
_, batch_train_loss, batch_train_acc = session.run(
[train_op, loss, acc])
batches_per_epoch += 1
ep_train_loss += batch_train_loss
ep_train_acc += batch_train_acc

except tf.errors.OutOfRangeError:
break

The train_tf_dataset function requires a lot of tensors as input. In TensorFlow 1.8, you need to keep track of these tensors because they contain all the operations you intend to run. In practice, this can lead to cumbersome code because it is hard to separate concerns.

Python Pseudo-Code for TensorFlow 2.x

In TensorFlow 2, all of this has been made much easier because of the Keras abstraction. We can inherit from a Keras class that allows us to compartmentalize the code much better. Here is the `train` method from Rasa’s DIET classifier (see here for the full implementation).

def train(
self,
training_data: TrainingData,
config: Optional[RasaNLUModelConfig] = None,
**kwargs: Any,
) -> None:
"""Train the embedding intent classifier on a data set."""

model_data = self.preprocess_train_data(training_data)

self.model = self.model_class()(
config=self.component_config,
)

self.model.fit(
model_data,
self.component_config[EPOCHS],
self.component_config[BATCH_SIZES],
self.component_config[EVAL_NUM_EXAMPLES],
self.component_config[EVAL_NUM_EPOCHS],
self.component_config[BATCH_STRATEGY],
)

The object-oriented style of programming from Keras allows us to customize more. We’re able to implement our own `self.model.fit` in such a way that we don’t need to worry about the `session` anymore. We don’t even need to keep track of the tensors because the Keras API abstracts everything away for you.

If you’re interested in the full code, you can find the old loop here and the new loop here.

An Extra Layer of Features

It’s not just the Keras models where we apply this abstraction; we’ve also developed some neural network layers using a similar technique.

We’ve implemented a few custom layers ourselves. For example, we’ve got a layer called `DenseWithSparseWeights.` It behaves just like a dense layer, but we drop many weights beforehand to make it more sparse. Again we only need to inherit from the right class (tf.keras.layers.Dense) to create it.

normal dense vs sparse dense model

We’ve grown so fond of customizing that we’ve even implemented a loss function as a layer. This made a lot of sense for us, considering that losses can get complex in NLP. Many NLP tasks will require you to sample such that you also have labels of negative examples during training. You may also need to mask tokens during the process. We’re also interested in recording the similarity loss as well as the label accuracy. By just making our own layer, we are building components for re-use, and it is easy to maintain as well.

custom layer

Lessons Learned

Discovering this opportunity for customization made a massive difference for Rasa. We like to design our algorithms to be flexible and applicable in many circumstances, and we were happy to learn that the underlying technology stack allowed us to do so. We do have some advice for folks who are working on their TensorFlow migration:

  1. Start by thinking about what “lego bricks” you need in your application. This mental design step will make it much easier to recognize how you can leverage existing Keras/TensorFlow objects for your use-case.
  2. It can be tempting to try to immerse yourself by going for a deep dive immediately. Instead, it may help to start from a working example and drill down from there. TensorFlow is not an average Python package, and the internals can get complex. The Python code that you interact with needs to interact with C++ to keep the tensor operations performant. Once the code works, you’re at a much better place to start tuning/optimizing all the new TensorFlow version’s performance features.

Read More

How The Trevor Project assesses LGBTQ youth suicide risk with TensorFlow

How The Trevor Project assesses LGBTQ youth suicide risk with TensorFlow

Posted by Wilson Lee (Machine Learning Engineering Manager at The Trevor Project), Dan Fichter (Head of AI & Engineering at The Trevor Project), Amber Zhang, and Nick Hamatake (Software Engineers at Google)

Introduction

The Trevor Project’s mission is to end suicide among LGBTQ youth. In addition to offering free crisis services through our original phone lifeline (started in 1998), we’ve since expanded to a digital platform, including SMS and web browser-based chat. Unfortunately, there are high-volume times when there are more youth reaching out on the digital platform than there are counselors, and youth have to wait for a counselor to become available. Ideally, youth would be connected with counselors based on their relative risk of attempting suicide, so that those who are at imminent risk of harm would be connected earlier.

As part of the Google AI Impact Challenge, Google.org provided us with a $1.5M grant, Cloud credits, and a Google.org Fellowship, a team of ML, product, and UX specialists who worked full-time pro bono with The Trevor Project for 6 months. The Googlers joined forces with The Trevor Project’s in-house engineering team to apply Natural Language Processing to the crisis contact intake process. As a result, Trevor is now able to connect youth with the help they need faster. And our work together is continuing with the support of a new $1.2M grant as well as a new cohort of Google.org Fellows, who are at The Trevor Project through December helping expand ML solutions.

ML Problem Framing

We framed the problem as a binary text classification problem. The inputs are answers to questions on the intake form that youth complete when they reach out:

  • Have you attempted suicide before? Yes / No
  • Do you have thoughts of suicide? Yes / No
  • How upset are you? [multiple choice]
  • What’s going on? [free text input]
AI Impact Challenge Product Demo

The output is a binary classification: whether to place the youth in the standard queue or a priority queue. As counselors become available, they connect with youth from the priority queue before youth from the standard queue.

Data

Once a youth connects with a counselor, the counselor performs a clinical risk assessment and records the result. The risk assessment result can be mapped to whether the youth should have been placed in the standard queue or the priority queue. The full transcript of the (digital) conversation is also logged, as are the answers to the intake questions. Thus, the dataset used for training consisted of a mixture of free-form text, binary / multiple-choice features, and human-provided labels.

Fortunately, there are relatively few youth classified as high-risk compared to standard-risk. This resulted in a significant class imbalance which had to be accounted for during training. Another major challenge was low signal-to-noise ratio in the dataset. Different youth could provide very similar responses on the intake form and then be given opposite classifications by counselors after completing in-depth conversations. Various methods of dealing with these issues are detailed later.

Because of the extremely sensitive nature of the dataset, special measures were taken to limit its storage, access, and processing. We automatically scrubbed and replaced data with personally-identifiable information (PII) such as names and locations with placeholder strings such as “[PERSON_NAME]” or “[LOCATION]”. This means the models were not trained using PII. Access to the scrubbed dataset was limited to the small group of people working on the project, and the data and model were kept strictly within Trevor’s systems and are not accessible to Google.

Metrics

For a binary classification task, we would usually optimize for metrics like precision and recall, or derived measures like F1 score or AUC. For crisis contact classification, however, the metric we need to optimize most for is how long a high-risk youth (one who should be classified into the priority queue) has to wait before connecting with a counselor. To estimate this, we built a queue simulation system that can predict average wait times given a historical snapshot of class balance, quantitative flow of contacts over time, number of counselors available, and the precision and recall of the prediction model.

The simulation was too slow to run during the update step of gradient descent, so we optimized first for proxy metrics such as precision at 80% recall, and precision at 90% recall. We then ran simulations at all points on the precision-recall curve of the resulting model to determine the optimal spot on the curve for minimizing wait time for high-risk youth.

It was also critical to quantify the fairness of the model with respect to the diverse range of demographic and intersectional groups that reach out to Trevor. For each finalized model, we computed false positive and false negative rates broken out by over 20 demographic categories, including intersectionality. We made sure that no demographic group was favored or disfavored by the model more often than the previous system.

Model Selection

We experimented with bi-LSTM and transformer-based models, as they have been shown to provide state-of-the-art results across a broad range of textual tasks. We tried embedding the textual inputs using Glove, Elmo, and Universal Sentence Encoder. For transformer-based models, we tried a single-layer transformer network and ALBERT (many transformer layers pre-trained with unlabeled text from the web).

We selected ALBERT for several reasons. It showed the best performance at the high-recall end of the curve where we were most interested. ALBERT allowed us not only to take advantage of massive amounts of pre-training, but also to leverage some of our own unlabeled data to do further pretraining (more on this later). Since ALBERT shares weights between its transformer layers, the model is cheaper to deploy (important for a non-profit organization) and less prone to overfitting (important given the noisiness of our data).

Training

We trained in a three-step process:

  1. Pre-training: ALBERT is already pre-trained with a large amount of data from the web. We simply loaded a pre-trained model using TF Hub.

    Instructions available here for loading a pre-trained model for text classification,.

  2. Further pre-training: Since ALBERT’s language model is based on generic Web data, we pre-trained it further using our own in-domain, unlabeled data. This included anonymized text from chat transcripts as well as from forum posts on TrevorSpace, The Trevor Project’s safe-space social networking site for LGBTQ youth. Although the unlabeled data is not labeled for suicide risk, it comes from real youth in our target demographics and is therefore linguistically closer to our labeled dataset than ALBERT’s generic web corpora are. We found that this increased model performance significantly.

    Instructions available here for checkpoint management strategies.

  3. Fine-tuning: We fine-tuned the model using our hand-labeled training data. We initially used ALBERT just to encode the textual response to “What’s going on” and used one-hot vectors to encode the responses to the binary and multiple-choice questions. We then tried converting everything to text and using ALBERT to encode everything. Specifically, instead of encoding the Yes / No answer to a question like “Do you have thoughts of suicide?” as a one-hot vector, we prepended something like “[ counselor] Do you have thoughts of suicide? [ youth] No” to the textual response to “What’s going on?” This yielded significant improvements in performance.

    Instructions available here for encoding with BERT tokenizer.

Optimization

We did some coarse parameter selection (learning rate and batch size) using manual trials. We also used Keras Tuner to refine the parameter space further. Because Keras Tuner is model-agnostic, we were able to use a similar tuning script for each of our model classes. For the LSTM-based models, we also used Keras Tuner to decide which kind of embeddings to use.

Normally, we would train with as large of a batch size as would fit on a GPU, but in this case we found better performance with fairly small batch sizes (~8 examples). We theorize that this is because the data has so much noise that it tends to regularize itself. This self-regularization effect is more pronounced in small batches.

Instructions available here here for setting up hyperparameter trials.

Conclusion

We trained a text-based model to prioritize at-risk youth seeking crisis services. The model outperformed a baseline classifier that only used responses from several multiple-choice intake questions as features. The NLP model was also shown to have less bias than the baseline model. Some of the highest-impact ingredients to the final model were 1) Using in-domain unlabeled data to further pretrain an off-the-shelf ALBERT model, 2) encoding multiple-choice responses as full text, which is in turn encoded by ALBERT, and 3) tuning hyperparameters using intuition about our specific dataset in addition to standard search methods.

Despite the success of the model, there are some limitations. The intake questions that produced our dataset were not extremely well-correlated with the results of the expert risk assessments that made up our training labels. This resulted in a low signal-to-noise ratio in our training dataset. More non-ML work could be done in the future to elicit more high-signal responses from youth in the intake process.

We’d like to acknowledge all of the teams and individuals who contributed to this project: Google.org and the Google.org Fellows, The Trevor Project’s entire engineering and data science team, as well as many hours of review and input from Trevor’s crisis service and clinical staff.

You can support our work by donating at TheTrevorProject.org/Donate. Your life-saving gift can help us expand our advocacy efforts, train a record number of crisis counselors, and provide all of our crisis services 24/7.

Read More