Recap of TensorFlow at Google I/O 2021

May 24, 2021

by TensorFlow Blog Tensorflow

Posted by the TensorFlow team

Thanks to everyone who joined our virtual I/O 2021 livestream! While we couldn’t meet in person, we hope we were able to make the event more accessible than ever. In this article, we’re recapping a few of the updates we shared during the keynote. You can watch the keynote below, and you can find recordings of every talk on the TensorFlow YouTube channel. Here’s a summary of a few announcements by product area (and there’s more in the videos, so be sure to check them out, too).

TensorFlow for Mobile and Web

The TensorFlow Lite runtime will be bundled with Google Play services

Let’s start with the announcement that the TensorFlow Lite runtime is going to be bundled with Google Play services, meaning you don’t need to distribute it with your app. This can greatly reduce your app’s bundle size. Now you can distribute your model without needing to worry about the runtime. You can sign up for an early access program today, and we expect a full rollout later this year.

You can now run TensorFlow Lite models on the web

All your TensorFlow Lite models can now directly be run on the web in the browser with the new TFLite Web APIs that are unified with TensorFlow.js. This task-based API supports running all TFLite Task Library models for image classification, objection detection, image segmentation, and many NLP problems. It also supports running arbitrary, custom TFLite models with easy, intuitive TensorFlow.js compatible APIs. With this option, you can unify your mobile and web ML development with a single stack.

A new On-Device Machine Learning site

We understand that the most effective developer path to reach Android, the Web and iOS isn’t always the most obvious. That’s why we created a new On-Device Machine Learning site to help you navigate your options, from turnkey to custom models, from cross platform mobile, to in-browser. It includes pathways to take you from an idea to a deployed app, with all the steps in between.

Performance profiling

When it comes to performance, we’re also working on additional tooling for Android developers. TensorFlow Lite includes built-in support for Systrace, integrating seamlessly with perfetto for Android 10.

And perf improvements aren’t limited to Android – for iOS developers TensorFlow Lite comes with built-in support for signpost-based profiling. When you build your app with the trace option enabled, you can run the Xcode profiler to see the signpost events, letting you dive deeper, and seeing all the way down to individual ops during execution.

TFX

TFX 1.0: Production ML at Enterprise-scale

Moving your ML models from prototype to production requires lots of infrastructure. Google created TFX because we needed a strong framework for our ML products and services, and then we open-sourced it so that others can use it too. It includes support for training models for mobile and web applications, as well as server-based applications.

After a successful beta with many partners, today we’re announcing TFX 1.0 — ready today for production ML at enterprise-scale. TFX includes all of the things an enterprise-ready framework needs, including enterprise-grade support, security patches, bug fixes, and guaranteed backward compatibility for the entire 1.X release cycle. It also includes strong support for running on Google Cloud and support for mobile, web, and NLP applications.

If you’re ready for production ML, TFX is ready for you. Visit the TFX site to learn more.

Responsible AI

We’re also sharing a number of new tools to help you keep Responsible AI top of mind in everything that you do when developing with ML.

Know Your Data

Know Your Data (KYD) is a new tool to help ML researchers and product teams understand rich datasets (images and text) with the goal of improving data and model quality, as well as surfacing and mitigating fairness and bias issues. Try the interactive demo at the link above to learn more.

People + AI Guidebook 2.0

As you create AI solutions, building with a people centric approach is a key to doing it responsibly, and we’re delighted to announce the People + AI Guidebook 2.0. This update is designed to help you put best practices and guidance for people-centric AI into practice with a lot of new resources including code, design patterns and much more!

Also check out our Responsible AI Toolkit to help you integrate Responsible AI practices into your ML workflow using TensorFlow.

Decision forests in Keras

New support for random forests and gradient boosted trees

There’s more to ML than neural networks. Starting with TensorFlow 2.5, you can easily train powerful decision forest models (including favorites like random forests and gradient boosted trees) using familiar Keras APIs. There’s support for many state-of-the-art algorithms for training, serving and interpreting models for classification, regression and ranking tasks. And you can serve your decision forests using TF Serving, just like any other model trained with TensorFlow. Check out the tutorials here, and the video from this session.

TensorFlow Lite for Microcontrollers

A new pre-flashed board, experiments, and a challenge

TensorFlow Lite for Microcontrollers is designed to help you run ML models on microcontrollers and other devices with only a few kilobytes of memory. You can now purchase pre-flashed Arduino boards that will connect via Bluetooth and your browser. And you can use these to try out new Experiments With Google that let you make gestures and even create your own classifiers and run custom TensorFlow models. If you’re interested in challenges, we’re also running a new TensorFlow Lite for Microcontrollers challenge, you can check it out here. And also be sure to check out the TinyML workshop video in the next steps below.

Google Cloud

Vertex AI: A new managed ML platform on Google Cloud

An ML model is only valuable if you can actually put it into production. And as you know, it can be challenging to productionize efficiently and at scale. That’s why Google Cloud is releasing Vertex AI, a new managed machine learning platform to help you accelerate experimentation and deployment of AI models. Vertex AI has tools that span every stage of the developer workflow, from data labeling, to working with notebooks and models, to prediction tools and continuous monitoring – all unified into one UI. While many of these offerings may be familiar to you, what really distinguishes Vertex AI is the introduction of new MLOps features. You can now manage your models with confidence using our MLOps tools such as Vertex Pipelines and Vertex Feature Store, to remove the complexity of robust self-service model maintenance and repeatability.

TensorFlow Cloud: Transition from local model building to distributed training on the Cloud

TensorFlow Cloud provides APIs that ease the transition from local model building and debugging to distributed training and hyperparameter tuning on Google Cloud. From inside a Colab or Kaggle Notebook or a local script file, you can send your model for tuning or training on Cloud directly, without needing to use the Cloud Console. We recently added a new site and new features, check it out if you’re interested in learning more.

Community

A new TensorFlow Forum

We created a new TensorFlow Forum for you to ask questions and connect with the community. It’s a place for developers, contributors, and users to engage with each other and the TensorFlow team. Create your account and join the conversation at discuss.tensorflow.org.

Find all the talks here

This is just a small part of what was shared at Google I/O 2021. You can find all of the TensorFlow sessions in this playlist, and for your convenience here are direct links to each of the sessions also:

To learn more about TensorFlow, you check out tensorflow.org, read other articles on the blog, follow us on social media, and subscribe to our YouTube Channel, or join a TensorFlow User Group near you.

High Fidelity Pose Tracking with MediaPipe BlazePose and TensorFlow.js

May 19, 2021

by TensorFlow Blog Tensorflow

Posted by Ivan Grishchenko, Valentin Bazarevsky and Na Li, Google Research

Today we’re excited to launch MediaPipe‘s BlazePose in our new pose-detection API. BlazePose is a high-fidelity body pose model designed specifically to support challenging domains like yoga, fitness and dance. It can detect 33 keypoints, extending the 17 keypoint topology of the original PoseNet model we launched a couple of years ago. These additional keypoints provide vital information about face, hands, and feet location with scale and rotation. Together with our face and hand models they can be used to unlock various domain-specific applications like gesture control or sign language without special hardware. With today’s release we enable developers to use the same models on the web that are powering MLKit Pose and MediaPipe Python unlocking the same great performance across all devices.

The new TensorFlow.js pose-detection API supports two runtimes: TensorFlow.js and MediaPipe. TensorFlow.js provides the flexibility and wider adoption of JavaScript, optimized for several backends including WebGL (GPU), WASM (CPU), and Node. MediaPipe capitalizes on WASM with GPU accelerated processing and provides faster out-of-the-box inference speed. The MediaPipe runtime currently lacks Node and iOS Safari support, but we’ll be adding the support soon.

Try out the live demo!

3 examples of BlazePose tracking dancing, stretching, and exercising

BlazePose can track 33 keypoints across a variety complex poses in real-time.

Installation

To use BlazePose with the new pose-detection API, you have to first decide whether to use the TensorFlow.js runtime or MediaPipe runtime. To understand the advantages of each runtime, check the performance and loading times section later in this document for further details.

For each runtime, you can use either script tag or NPM for installation.

Using TensorFlow.js runtime:

Through script tag:

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-converter"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-webgl"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/pose-detection"></script>

Through NPM:

yarn add @tensorflow/tfjs-core, @tensorflow/tfjs-converter
yarn add @tensorflow/tfjs-backend-webgl
yarn add @tensorflow-models/pose-detection

Using MediaPipe runtime:

Through script tag:

<script src="https://cdn.jsdelivr.net/npm/@mediapipe/pose"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/pose-detection"></script>

Through NPM:

yarn add @mediapipe/pose
yarn add @tensorflow-models/pose-detection

Try it yourself!

Once the package is installed, you only need to follow the few steps below to start using it. There are three variants of the model: lite, full, and heavy. The model accuracy increases from lite to heavy, while the inference speed decreases and memory footprint increases. The heavy variant is intended for applications that require high accuracy, while the lite variant is intended for latency-critical applications. The full variant is a balanced option, which is also the default option here.

Using TensorFlow.js runtime:

// Import TFJS runtime with side effects.
import '@tensorflow/tfjs-backend-webgl';
import * as poseDetection from '@tensorflow-models/pose-detection';

// Create a detector.
const detector = await poseDetection.createDetector(poseDetection.SupportedModels.BlazePose, {runtime: 'tfjs'});

Using MediaPipe runtime:

// Import MediaPipe runtime with side effects.
import '@mediapipe/pose';
import * as poseDetection from '@tensorflow-models/pose-detection';

// Create a detector.
const detector = await poseDetection.createDetector(poseDetection.SupportedModels.BlazePose, {runtime: 'mediapipe'});

You can also choose the lite or the heavy variant by setting the modelType field, as shown below:

// Create a detector.
const detector = await poseDetection.createDetector(poseDetection.SupportedModels.BlazePose, {runtime, modelType:'lite'});

// Pass in a video stream to the model to detect poses.
const video = document.getElementById('video');
const poses = await detector.estimatePoses(video);

Each pose contains 33 keypoints, with absolute x, y coordinates, confidence score and name:

console.log(poses[0].keypoints);
// Outputs:
// [
//    {x: 230, y: 220, score: 0.9, name: "nose"},
//    {x: 212, y: 190, score: 0.8, name: "left_eye_inner"},
//    ...
// ]

Refer to our ReadMe (TFJS runtime, MediaPipe runtime) for more details about the API.

As you begin to play and develop with BlazePose, we would appreciate your feedback and contributions. If you make something using this model, tag it with #MadeWithTFJS on social media so we can find your work, as we would love to see what you create.

Model deep dive

BlazePose provides real-time human body pose perception in the browser, working up to 4 meters from the camera.

We trained BlazePose specifically for highly demanded single-person use cases like yoga, fitness, and dance which require precise tracking of challenging postures, enabling the overlay of digital content and information on top of the physical world in augmented reality, gesture control, and quantifying physical exercises.

For pose estimation, we utilize our proven two-step detector-tracker ML pipeline. Using a detector, this pipeline first locates the pose region-of-interest (ROI) within the frame. The tracker subsequently predicts all 33 pose keypoints from this ROI. Note that for video use cases, the detector is run only on the first frame. For subsequent frames we derive the ROI from the previous frame’s pose keypoints as discussed below.

BlazePose architecture.

BlazePose’s topology contains 33 points extending 17 points by COCO with additional points on palms and feet to provide lacking scale and orientation information for limbs, which is vital for practical applications like fitness, yoga and dance.

Since the BlazePose CVPR’2020 release, MediaPipe has been constantly improving the models’ quality to remain state-of-the-art on the web / edge for single person pose tracking. Besides running through the TensorFlow.js pose-detection API, BlazePose is also available on Android, iOS and Python via MediaPipe and ML Kit. For detailed information, read the Google AI Blog post and the model card.

BlazePose Browser Performance

TensorFlow.js continuously seeks opportunities to bring the latest and fastest runtime for browsers. To achieve the best performance for this BlazePose model, in addition to the TensorFlow.js runtime (w/ WebGL backend) we further integrated with the MediaPipe runtime via the MediaPipe JavaScript Solutions. The MediaPipe runtime leverages WASM to utilize state-of-the-art pipeline acceleration available across platforms, which also powers Google products such as Google Meet.

Inference speed:

To quantify the inference speed of BlazePose, we benchmark the model across multiple devices.

MacBook Pro 15” 2019.

Intel core i9.

AMD Radeon Pro Vega 20 Graphics.

(FPS)

iPhone 11

(FPS)

Pixel 5

(FPS)

Desktop

Intel i9-10900K. Nvidia GTX 1070 GPU.

(FPS)

MediaPipe Runtime

With WASM & GPU Accel.

92 | 81 | 38

N/A

32 | 22 | N/A

160 | 140 | 98

TensorFlow.js Runtime
With WebGL backend

48 | 53 | 28

34 | 30 | N/A

13 | 11 | 5

44 | 40 | 30

Inference speed of BlazePose across different devices and runtimes. The first number in each cell is for the lite model, and the second number is for the full model, the third number is for the heavy model. Certain model types and runtime do not work at time of this release, and we will be adding the support soon.

To see the model’s FPS on your device, try our demo. You can switch the model type and runtime live in the demo UI to see what works best for your device.

Loading times:

Bundle size can affect initial page loading experience, such as Time-To-Interactive (TTI), UI rendering, etc. We evaluate the pose-detection API and the two runtime options. The bundle size affects file fetching time and UI smoothness, because processing the code and loading them into memory will compete with UI rendering on CPU. It also affects when the model is available to make inference.

There is a difference of how things are loaded between the two runtimes. For the MediaPipe runtime, only the @tensorflow-models/pose-detection and the @mediapipe/pose library are loaded at initial page download; the runtime and the model assets are loaded when the createDetector method is called. For the TF.js runtime with WebGL backend, the runtime is loaded at initial page download; only the model assets are loaded when the createDetector method is called. The TensorFlow.js package sizes can be further reduced with a custom bundle technique. Also, if your application is currently using TensorFlow.js, you don’t need to load those packages again, models will share the same TensorFlow.js runtime. Choose the runtime that best suits your latency and bundle size requirements. A summary of loading times and bundle sizes is provided below:

	Bundle Size gzipped + minified	Average Loading Time WiFi: download speed 100Mbps
MediaPipe Runtime
Initial Page Load	22.1KB	0.04s
Initial Detector Creation:
Runtime	1.57MB
Lite model	10.6MB	1.91s
Full model	14MB	1.91s
Heavy model	34.9MB	4.82s
TensorFlow.js Runtime
Initial Page Load	162.6KB	0.07s
Initial Detector Creation:
Lite model	10.4MB	1.91s
Full model	13.8MB	1.91s
Heavy model	34.7MB	4.82s

Bundle size and loading time analysis for MediaPipe and TF.js runtime. The loading time is estimated based on a simulated WiFi network with 100Mbps download speed and includes time from request sent to content downloaded, see what is included in more detail here.

Looking ahead

In the future, we plan to extend TensorFlow.js pose-detection API with new features like BlazePose GHUM 3D pose. We also plan to speed up the TensorFlow.js WebGL backend to make model execution even faster. This will be achieved through repeated benchmarking and backend optimization, such as operator fusion. We will also bring Node.js support in the near future.

Acknowledgements

We would like to acknowledge our colleagues, who participated in creating BlazePose GHUM 3D: Eduard Gabriel Bazavan, Cristian Sminchisescu, Tyler Zhu, the other contributors to MediaPipe: Chuo-Ling Chang, Michael Hays and Ming Guang Yong, along with those involved with the TensorFlow.js pose-detection API: Ping Yu, Sandeep Gupta, Jason Mayes, and Masoud Charkhabi.

Speed-up your sites with web-page prefetching using Machine Learning

May 18, 2021

by TensorFlow Blog Tensorflow

Posted by Minko Gechev, David Zats, Na Li, Ping Yu, Anusha Ramesh, and Sandeep Gupta

Page load time is one of the most important determinants of user experience on a web site. Research shows that faster page load time directly leads to increased page views, conversion, and customer satisfaction. Retail superstore Newegg has seen a 50% increase in conversions after implementing web-page prefetching to optimize page load experience.

Using TensorFlow tooling, it is now possible to use machine learning to implement a powerful solution for your website to improve page load times. In this blog post, we show an end-to-end workflow for using your site’s navigation data from Google Analytics and training a custom machine learning model that can predict the user’s next actions. You can use these predictions in an Angular app to pre-fetch candidate pages and dramatically improve user experience on your web site. Fig. 1 illustrates this side-by-side with default page load experience with no optimization compared to the greatly improved page load times with machine learning based predictive prefetching implemented on the right. Both examples are running on an emulated slow 3G network.

Fig: Comparison of un-optimized and machine learning based page loading time in a sample web application

A high-level schematic of our solution is as follows:

Fig: Solution overview

We use Google Cloud services (BigQuery and Dataflow) to store and preprocess the site’s Google Analytics data, then train a custom model using TensorFlow Extended (TFX) to run our model training pipeline, produce a site-specific model, and then convert it into a web-deployable TensorFlow.js format. This client-side model will be loaded in a sample Angular web app for an e-store to demonstrate how to deploy the model in a web application. Let’s take a look at these components in more detail.

Data Preparation & Ingestion

Google Analytics stores each page visit as an event, providing key aspects such as the page name, visit time, and load time. This data contains everything we need to train our model. We need to:

Convert this data to training examples containing features and labels
Make it available to TFX for training.

We accomplish the first by leveraging existing support for exporting the Google Analytics data to a large-scale cloud data store called BigQuery. We accomplish the latter by creating an Apache Beam pipeline that:

Reads the data from BigQuery
Sorts and filters the events in a session
Walks through each session, creating examples that take properties of the current event as features and the page visit in the next event as the label
Stores these generated examples in Google Cloud Storage so that they can be used by TFX for training.

We run our Beam pipeline in Dataflow.

In the following table, each row represents a training example:

cur_page	session_index	label
page2	0	page3
page3	8	page1

While our training example only contains two training features (cur_page and session_index), additional features from Google Analytics can be easily added to create a richer dataset and used for training to create a more powerful model. To do so, extend the following code:

def ga_session_to_tensorflow_examples(session):
  examples = []

  for i in range(len(session)-1):
    features = {‘cur_page’: [session[i][‘page’][‘pagePath’]],
                ‘label’: [session[i+1][‘page’][‘pagePath’]],
                ‘session_index’: [i],
                # Add additional features here.
                …
               }
    examples.append(create_tensorflow_example(features))
  return examples

Model Training

Tensorflow Extended (TFX) is an end to end production scale ML platform and is used to automate the process of data validation, training at scale (using accelerators), evaluation & validation of the generated model.

To create a model within TFX, you must provide the preprocessing function and the run function. The preprocessing function defines the operations that should be performed on the data before it is passed to the main model. These include operations that involve a full pass over the data, such as vocab creation. The run function defines the main model and how it is to be trained.

Our example shows how to implement the preprocessing_fn and the run_fn to define and train a model for predicting the next page. And the TFX example pipelines demonstrate how to implement these functions for many other key use cases.

Creating a Web Deployable Model

After training our custom model, we want to deploy this model in our web application so it can be used to make live predictions when users visit our website. For this, we use TensorFlow.js, which is TensorFlow’s framework for running machine learning models directly in the browser client-side. By running this code in the browser client-side, we can reduce latency associated with server-side roundtrip traffic, reduce server-side costs, and also keep user’s data private by not having to send any session data to the server.

TFX employs the Model Rewriting Library to automate conversion between trained TensorFlow models and the TensorFlow.js format. As part of this library, we have implemented a TensorFlow.js rewriter. We simply invoke this rewriter within the run_fn to perform the desired conversion. Please see the example for more details.

Angular Application

Once we have the model we can use it within an Angular application. On each navigation, we will query the model and prefetch the resources associated with the pages that are likely to be visited in the future.

An alternative solution would be to prefetch the resources associated with all the possible future navigation paths, but this would have much higher bandwidth consumption. Using machine learning, we can predict only the pages, which are likely to be used next and reduce the number of false positives.

Depending on the specifics of the application we may want to prefetch different types of assets, for example: JavaScript, images, or data. For the purposes of this demonstration we’ll be prefetching images of products.

A challenge is how to implement the mechanism in a performant way without impacting the application load time or runtime performance. Techniques to mitigate the risks of performance regressions we can use are:

Load the model and TensorFlow.js lazily without blocking the initial page load time
Query the model off the main thread so we don’t drop frames in the main thread and achieve 60fps rendering experience

A web platform API that satisfies both of these constraints is the service worker. A service worker is a script that your browser runs in the background in a new thread, separate from a web page. It also allows you to plug into a request cycle and provides you with cache control.

When the user navigates across the application, we’ll post messages to the service worker with the pages they have visited. Based on the navigation history, the service worker will make predictions for future navigation and prefetch relevant product assets.

Let us look at a high-level overview of the individual moving parts.

From within the main file of our Angular application, we can load the service worker:

// main.ts

if ('serviceWorker' in navigator) {
  navigator.serviceWorker.register('/prefetch.worker.js', { scope: '/' });
}

This snippet will download the prefetch.worker.js script and run it in the background. As the next step, we want to forward navigation events to it:

// app.component.ts

this.route.params.subscribe((routeParams) => {
  if (this._serviceWorker) {
    this._serviceWorker.postMessage({ page: routeParams.category });
  }
});

In the snippet above, we watch for changes of the parameters of the URL. On change, we forward the category of the page to the service worker.

In the implementation of the service worker we need to handle messages from the main thread, make predictions based on them, and prefetch the relevant information. On a high-level this looks as follows:

// prefetch.worker.js

addEventListener('message', ({ data }) => prefetch(data.page));

const prefetch = async (path) => {
  const predictions = await predict(path);
  const cache = await caches.open(ImageCache);

  predictions.forEach(async ([probability, category]) => {
    const products = (await getProductList(category)).map(getUrl);
    [...new Set(products)].forEach(url => {
      const request = new Request(url, {
        mode: 'no-cors',
      });
      fetch(request).then(response => cache.put(request, response));
    });
  });
};

Within the service worker we listen for messages from the main thread. When we receive a message we trigger the logic responsible for making predictions and prefetching data.

In the prefetch function we first predict, which are the pages the user could visit next. After that, we iterate over all the predictions and fetch the corresponding resources to improve the user experience in subsequent navigation.

For details you can follow the sample app in the TensorFlow.js examples repository.

Try it yourself

Check out the model training code sample which shows the TFX pipeline for training a page prefetching model as well as an Apache Beam pipeline that converts Google Analytics data to training examples, and the deployment sample showing how to deploy the TensorFlow.js model in a sample Angular app for client-side predictions.

Acknowledgements

This project wouldn’t have been possible without the incredible effort and support of Becky Chan, Deepak Aujla, Fei Dong, and Jason Mayes.

Next-Generation Pose Detection with MoveNet and TensorFlow.js

May 17, 2021

by TensorFlow Blog Tensorflow

Posted by Ronny Votel and Na Li, Google Research

Today we’re excited to launch our latest pose detection model, MoveNet, with our new pose-detection API in TensorFlow.js. MoveNet is an ultra fast and accurate model that detects 17 keypoints of a body. The model is offered on TF Hub with two variants, known as Lightning and Thunder. Lightning is intended for latency-critical applications, while Thunder is intended for applications that require high accuracy. Both models run faster than real time (30+ FPS) on most modern desktops, laptops, and phones, which proves crucial for live fitness, sports, and health applications. This is achieved by running the model completely client-side, in the browser using TensorFlow.js with no server calls needed after the initial page load and no dependencies to install.

Try out the live demo!

MoveNet can track keypoints through fast motions and atypical poses.

Human pose estimation has come a long way in the last five years, but surprisingly hasn’t surfaced in many applications just yet. This is because more focus has been placed on making pose models larger and more accurate, rather than doing the engineering work to make them fast and deployable everywhere. With MoveNet, our mission was to design and optimize a model that leverages the best aspects of state-of-the-art architectures, while keeping inference times as low as possible. The result is a model that can deliver accurate keypoints across a wide variety of poses, environments, and hardware setups.

Unlocking Live Health Applications with MoveNet

We teamed up with IncludeHealth, a digital health and performance company, to understand whether MoveNet can help unlock remote care for patients. IncludeHealth has developed an interactive web application that guides a patient through a variety of routines (using a phone, tablet, or laptop) from the comfort of their own home. The routines are digitally built and prescribed by physical therapists to test balance, strength, and range of motion.

The service requires web-based and locally run pose models for privacy that can deliver precise keypoints at high frame rates, which are then used to quantify and qualify human poses and movements. While a typical off-the-shelf detector is sufficient for easy movements such as shoulder abductions or full body squats, more complicated poses such as seated knee extensions or supine positions (laying down) cause grief for even state-of-the-art detectors trained on the wrong data.

Comparison of a traditional detector (top) vs MoveNet (bottom) on difficult poses.

We provided an early release of MoveNet to IncludeHealth, accessible through the new pose-detection API. This model is trained on fitness, dance, and yoga poses (see more details about the training dataset below). IncludeHealth integrated the model into their application and benchmarked MoveNet relative to other available pose detectors:

“The MoveNet model has infused a powerful combination of speed and accuracy needed to deliver prescriptive care. While other models trade one for the other, this unique balance has unlocked the next generation of care delivery. The Google team has been a fantastic collaborator in this pursuit.” – Ryan Eder, Founder & CEO at IncludeHealth.

As a next step, IncludeHealth is partnering with hospital systems, insurance plans, and the military to enable the extension of traditional care and training beyond brick and mortar.

IncludeHealth demo application running in browser that quantifies balance and motion using keypoint estimation powered by MoveNet and TensorFlow.js

Installation

There are two ways to use MoveNet with the new pose-detection api:

Through NPM:

import * as poseDetection from '@tensorflow-models/pose-detection';

Through script tag:

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-converter"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-webgl"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/pose-detection"></script>

Try it yourself!

Once the package is installed, you only need to follow the few steps below to start using it:

// Create a detector.
const detector = await poseDetection.createDetector(poseDetection.SupportedModels.MoveNet);

The detector defaults to use the Lightning version; to choose the Thunder version, create the detector as below:

// Create a detector.
const detector = await poseDetection.createDetector(poseDetection.SupportedModels.MoveNet, {modelType: poseDetection.movenet.modelType.SINGLEPOSE_THUNDER});

// Pass in a video stream to the model to detect poses.
const video = document.getElementById('video');
const poses = await detector.estimatePoses(video);

Each pose contains 17 keypoints, with absolute x, y coordinates, confidence score and name:

console.log(poses[0].keypoints);
// Outputs:
// [
//    {x: 230, y: 220, score: 0.9, name: "nose"},
//    {x: 212, y: 190, score: 0.8, name: "left_eye"},
//    ...
// ]

Refer to our README for more details about the API.

As you begin to play and develop with MoveNet, we would appreciate your feedback and contributions. If you make something using this model, tag it with #MadeWithTFJS on social so we can find your work, as we would love to see what you create.

MoveNet Deep Dive

MoveNet Architecture

MoveNet is a bottom-up estimation model, using heatmaps to accurately localize human keypoints. The architecture consists of two components: a feature extractor and a set of prediction heads. The prediction scheme loosely follows CenterNet, with notable changes that improve both speed and accuracy. All models are trained using the TensorFlow Object Detection API.

The feature extractor in MoveNet is MobileNetV2 with an attached feature pyramid network (FPN), which allows for a high resolution (output stride 4), semantically rich feature map output. There are four prediction heads attached to the feature extractor, responsible for densely predicting a:

Person center heatmap: predicts the geometric center of person instances
Keypoint regression field: predicts full set of keypoints for a person, used for grouping keypoints into instances
Person keypoint heatmap: predicts the location of all keypoints, independent of person instances
2D per-keypoint offset field: predicts local offsets from each output feature map pixel to the precise sub-pixel location of each keypoint

MoveNet architecture

Although these predictions are computed in parallel, one can gain insight into the model’s operation by considering the following sequence of operations:

Step 1: The person center heatmap is used to identify the centers of all individuals in the frame, defined as the arithmetic mean of all keypoints belonging to a person. The location with the highest score (weighted by the inverse-distance from the frame center) is selected.

Step 2: An initial set of keypoints for the person is produced by slicing the keypoint regression output from the pixel corresponding to the object center. Since this is a center-out prediction – which must operate over different scales – the quality of regressed keypoints will not be very accurate.

Step 3: Each pixel in the keypoint heatmap is multiplied by a weight which is inversely proportional to the distance from the corresponding regressed keypoint. This ensures that we do not accept keypoints from background people, since they typically will not be in the proximity of regressed keypoints, and hence will have low resulting scores.

Step 4: The final set of keypoint predictions are selected by retrieving the coordinates of the maximum heatmap values in each keypoint channel. The local 2D offset predictions are then added to these coordinates to give refined estimates. See the figure below which illustrates these four steps.

MoveNet post-processing steps.

Training Datasets

MoveNet was trained on two datasets: COCO and an internal Google dataset called Active. While COCO is the standard benchmark dataset for detection – due to its scene and scale diversity – it is not suitable for fitness and dance applications, which exhibit challenging poses and significant motion blur. Active was produced by labeling keypoints (adopting COCO’s standard 17 body keypoints) on yoga, fitness, and dance videos from YouTube. No more than three frames are selected from each video for training, to promote diversity of scenes and individuals.

Evaluations on the Active validation dataset show a significant performance boost relative to identical architectures trained using only COCO. This isn’t surprising since COCO infrequently exhibits individuals with extreme poses (e.g. yoga, pushups, headstands, and more).

To learn more about the dataset and how MoveNet performs across different categories, please see the model card.

Images from Active keypoint dataset.

Optimization

While a lot of effort went into architecture design, post-processing logic, and data selection to make MoveNet a high-quality detector, an equal focus was given to inference speed. First, bottleneck layers from MobileNetV2 were selected for lateral connections in the FPN. Likewise, the number of convolution filters in each prediction head were slimmed down significantly to speed up execution on the output feature maps. Depthwise separable convolutions are used throughout the network, except in the first MobileNetV2 layer.

MoveNet was repeatedly profiled, uncovering and removing particularly slow ops. For example, we replaced tf.math.top_k with tf.math.argmax, since it executes significantly faster and is adequate for the single-person setting.

To ensure fast execution with TensorFlow.js, all model outputs were packed into a single output tensor, so that there is only one download from GPU to CPU.

Perhaps the most significant speedup is the use of 192×192 inputs to the model (256×256 for Thunder). To counteract the lower resolution, we apply intelligent cropping based on detections from the previous frame. This allows the model to devote its attention and resources to the main subject, and not the background.

Temporal Filtering

Operating on a high FPS camera stream provides the luxury of applying smoothing to keypoint estimates. Both Lightning and Thunder apply a robust, non-linear filter to the incoming stream of keypoint predictions. This filter is tuned to simultaneously suppress high-frequency noise (i.e. jitter) and outliers from the model, while also maintaining high-bandwidth throughput during quick motions. This leads to smooth keypoint visualizations with minimal lag in all circumstances.

MoveNet Browser Performance

To quantify the inference speed of MoveNet, the model was benchmarked across multiple devices. The model latency (expressed in FPS) was measured on GPU with WebGL, as well as WebAssembly (WASM), which is the typical backend for devices with lower-end or no GPUs.

MacBook Pro 15” 2019.

Intel core i9.

AMD Radeon Pro Vega 20 Graphics.

(FPS)

iPhone 12

(FPS)

Pixel 5

(FPS)

Desktop

Intel i9-10900K. Nvidia GTX 1070 GPU.

(FPS)

WebGL

104 | 77

51 | 43

34 | 12

87 | 82

WASM

with SIMD + Multithread

42 | 21

N/A

71 | 30

Inference speed of MoveNet across different devices and TF.js backends. The first number in each cell is for Lightning, and the second number is for Thunder.

TF.js continuously optimizes its backends to accelerate model execution across all supported devices. We applied several techniques here to help the models achieve this performance, such as implementing a packed WebGL kernel for the depthwise separable convolutions and improving GL scheduling for mobile Chrome.

To see the model’s FPS on your device, try our demo. You can switch the model type and backends live in the demo UI to see what works best for your device.

Looking Ahead

The next step is to extend Lightning and Thunder models to the multi-person domain, so that developers can support applications with multiple people in the camera field-of-view.

We also have plans to speed up the TensorFlow.js backends to make model execution even faster. This is achieved through repeated benchmarking and backend optimization.

Acknowledgements

We would like to acknowledge the other contributors to MoveNet: Yu-Hui Chen, Ard Oerlemans, Francois Belletti, Andrew Bunner, and Vijay Sundaram, along with those involved with the TensorFlow.js pose-detection API: Ping Yu, Sandeep Gupta, Jason Mayes, and Masoud Charkhabi.

Building a TinyML Application with TF Micro and SensiML

May 7, 2021

by TensorFlow Blog Tensorflow

A guest post by Chris Knorowski, SensiML CTO

TinyML reduces the complexity of adding AI to the edge, enabling new applications where streaming data back to the cloud is prohibitive. Some examples of applications that are making use of TinyML right now are :

Visual and audio wake words that trigger an action when a person is detected in an image or a keyword is spoken .
Predictive maintenance on industrial machines using sensors to continuously monitor for anomalous behavior.
Gesture and activity detection for medical, consumer, and agricultural devices, such as gait analysis, fall detection or animal health monitoring.

One common factor for all these applications is the low cost and power usage of the hardware they run on. Sure, we can detect audio and visual wake words or analyze sensor data for predictive maintenance on a desktop computer. But, for a lot of these applications to be viable, the hardware needs to be inexpensive and power efficient (so it can run on batteries for an extended time).

Fortunately, the hardware is now getting to the point where running real-time analytics is possible. It is crazy to think about, but the Arm Cortex-M4 processor can do more FFT’s per second than the Pentium 4 processor while using orders of magnitude less power. Similar gains in power/performance have been made in sensors and wireless communication. TinyML allows us to take advantage of these advances in hardware to create all sorts of novel applications that simply were not possible before.

At SensiML our goal is to empower developers to rapidly add AI to their own edge devices, allowing their applications to autonomously transform raw sensor data into meaningful insight. We have taken years of lessons learned in creating products that rely on edge optimized machine learning and distilled that knowledge into a single framework, the SensiML Analytics Toolkit, which provides an end-to-end development platform spanning data collection, labeling, algorithm development, firmware generation, and testing.

So what does it take to build a TinyML application?

Building a TinyML application touches on skill sets ranging from hardware engineering, embedded programming, software engineering, machine learning, data science and domain expertise about the application you are building. The steps required to build the application can be broken into four parts:

Collecting and annotating data
Applying signal preprocessing
Training a classification algorithm
Creating firmware optimized for the resource budget of an edge device

This tutorial will walk you through all the steps, and by the end of it you will have created an edge optimized TinyML application for the Arduino Nano 33 BLE Sense that is capable of recognizing different boxing punches in real-time using the Gyroscope and Accelerometer sensor data from the onboard IMU sensor.

Gesture recognition using TinyML. Male punching a punching bag

What you need to get started

We will use the SensiML Analytics Toolkit to handle collecting and annotating sensor data, creating a sensor preprocessing pipeline, and generating the firmware. We will use TensorFlow to train our machine learning model and TensorFlow Lite Micro for inferencing. Before you start, we recommend signing up for SensiML Community Edition to get access to the SensiML Analytics Toolkit.

The Software

We will use the SensiML Open Gateway, an open-source python application to stream data from edge devices.
We will use the SensiML Data Capture Lab (Windows 10) to record and label the sensor data.
We will use Google Colab to train our model using TensorFlow Lite for Microcontrollers
We will use the SensiML Analytics Studio for offline validation and code generation of the firmware
We will use Visual Studio Code with the Platform IO extension to flash the firmware.

The Hardware

Arduino Nano 33 BLE Sense
Adafruit Li-Ion Backpack Add-On (optional)
Lithium-Ion Polymer Battery ( 3.7v 100mAh)
Zebra Byte Case
Glove and Double Sided Tape

The Arduino Nano 33 BLE Sense has an Arm Cortex-M4 microcontroller running at 64 MHz with 1MB Flash memory and 256 KB of RAM. If you are used to working with cloud/mobile this may seem tiny, but many applications can run in such a resource-constrained environment.

The Nano 33 BLE Sense also has a variety of onboard sensors which can be used in your TinyML applications. For this tutorial, we are using the motion sensor which is a 9-axis IMU (accelerometer, gyroscope, magnetometer).

For wireless power, we used the Adafruit Li-Ion Battery Pack. If you do not have the battery pack, you can still walk through this tutorial using a suitably long micro USB cable to power the board. Though collecting gesture data is not quite as fun when you are wired. See the images below hooking up the battery to the Nano 33 BLE Sense.

Building Your Data Set

For every machine learning project, the quality of the final product depends on the quality of your data set. Time-series data, unlike image and audio, are typically unique to each application. Because of this, you often need to collect and annotate your datasets. The next part of this tutorial will walk you through how to connect to the Nano 33 BLE Sense to stream data wirelessly over BLE as well as label the data so it can be used to train a TensorFlow model.

For this project we are going to collect data for 5 different gestures as well as some data for negative cases which we will label as Unknown. The 5 boxing gestures we are going to collect data for are Jab, Overhand, Cross, Hook, and Uppercut.

We will also collect data on both the right and left glove. Giving us a total of 10 different classes. To simplify things we will build two separate models one for the right glove, and one for the left. This tutorial will focus on the left glove.

Streaming sensor data from the Nano 33 over BLE

The first challenge of a TinyML project is often to figure out how to get data off of the sensor. Depending on your needs you may choose Wi-Fi, BLE, Serial, or LoRaWAN. Alternatively, you may find storing data to an internal SD card and transferring the files after is the best way to collect data. For this tutorial, we will take advantage of the onboard BLE radio to stream sensor data from the Nano 33 BLE Sense.

We are going to use the SensiML Open Gateway running on our computer to retrieve the sensor data. To download and launch the gateway open a terminal and run the following commands:

git clone https://github.com/sensiml/open-gateway

cd open-gateway
pip3 install -r requirements.txt
python3 app.py

The gateway should now be running on your machine.

Next, we need to connect the gateway server to the Nano 33 BLE Sense. Make sure you have flashed the Data Collection Firmware to your Nano 33. This firmware implements the Simple Streaming Interface specification which creates two topics used for streaming data. The /config topic returns a JSON describing the sensor data and /stream topic streams raw sensor data as a byte array of Int16 values.

To configure the gateway to connect to your sensor:

Go to the gateway address in your browser (defaults to localhost:5555)
Click on the Home Tab
Set Device Mode: Data Capture
Set Connection Type: BLE
Click the Scan button, and select the device named Nano 33 DCL
Click the Connect to Device button

The gateway will pull the configuration from your device, and be ready to start forwarding sensor data. You can verify it is working by going to the Test Stream tab and clicking the Start Stream button.

Setting up the Data Capture Lab Project

Now that we can stream data, the next step is to record and label the boxing gestures. To do that we will use the SensiML Data Capture Lab. If you haven’t already done so, download and install the Data Capture Lab to record sensor data.

We have created a template project to get you started. The project is prepopulated with the gesture labels and metadata information, along with some pre-recorded example gestures files. To add this project to your account:

Download and unzip the Boxing Glove Gestures Demo Project
Open the Data Capture Lab
Click Upload Project
Click Browse which will open the file explorer window
Navigate to the Boxing Glove Gestures Demo folder you just unzipped and select the Boxing Glove Gestures Demo.dclproj file
Click Upload

Connecting to the Gateway

After uploading the project, you can start capturing sensor data. For this tutorial we will be streaming data to the Data Capture Lab from the gateway over TCP/IP. To connect to the Nano 33 BLE Sense from the Data Capture Lab through the gateway:

Open the Project Boxing Glove Gestures Demo
Click Switch Modes -> Capture Mode
Select Connection Method: Wi-Fi
Click the Find Devices button
Enter the IP Address of your gateway machine, and the port the server is running on (typically 127.0.0.1:5555)
Click Add Device
Select the newly added device
Click the Connect button

You should see sensor data streaming across the screen. If you are having trouble with this step, see the full documentation here for troubleshooting.

Capturing Boxing Gesture Sensor Data

The Data Capture Lab can also play videos that have been recorded alongside your sensor data. If you want to capture videos and sync them up with sensor data see the documentation here. This can be extremely helpful during the annotation phase to help interpret what is happening at a given point in the time-series sensor waveforms.

Now that data is streaming into the Data Capture Lab, we can begin capturing our gesture data set.

Select “Jab” from the Label dropdown in the Capture Properties screen. (this will be the name of the file)
Select the Metadata which captures the context (subject, glove, experience, etc.)
Then click the Begin Recording button to start recording the sensor data
Perform several “Jab” gestures
Click the Stop Recording button when you are finished

After you hit stop recording, the captured data will be saved locally and synced with the cloud project. You can view the file by going to the Project Explorer and double-clicking on the newly created file.

GIF showing male boxing while program collects data

The following video walks through capturing sensor data.

Annotating Sensor Data

To classify sensor data in real-time, you need to decide how much and which portion of the sensor stream to feed to the classifier. On edge devices, it gets even more difficult as you are limited to a small buffer of data due to the limited RAM. Identifying the right segmentation algorithm for an application can save on battery life by limiting the number of classifications performed as well as improving the accuracy by identifying the start and end of a gesture.

Segmentation algorithms work by taking the input from the sensor and buffering the data until they determine a new segment has been found. At that point, they pass the data buffer down to the result of the pipeline. The simplest segmentation algorithm is a sliding window, which continually feeds a set chunk of data to the classifier. However, there are many drawbacks to the sliding window for discrete gesture recognition, such as performing classifications when there are no events. This wastes battery and runs the risk of having events split across multiple windows which can lower accuracy.

Segmenting in the Data Capture Lab

We identify events in the Data Capture Lab by creating Segments around the events in your sensor data. Segments are displayed with a pair of blue and red lines when you open a file and define where an event is located.

The Data Capture Lab has two methods for labeling your events: Manual and Auto. In manual mode you can manually drag and drop a segment onto the graph to identify an event in your sensor data. Auto mode uses a segmentation algorithm to automatically detect events based on customizable parameters. For this tutorial, we are going to use a segmentation algorithm in Auto mode. The segmentation algorithms we use for determining events will also be compiled as part of the firmware so that the on-device model will be fed the same segments of data it was trained against.

We have already created a segmentation algorithm for this project based on the dataset we have collected so far. To perform automatic event detection on newly captured data file:

Select the file from the Project Explorer
Click on the Detect Segments button
The segmentation algorithm will be run against the capture and the segments it finds will be added to the file

Note: If the events are not matching the real segments in your file, you may need to adjust the parameters of the segmentation algorithm.

Labeling Events in the Data Capture Lab

Keep in mind that automatic event detection only detects that an event has occurred, it does not determine what type of event has occurred. For each event that was detected, you will need to apply a label to them. To do that:

Select one or more of the segments from the graph
Click the Edit button or (Ctrl+E)
Specify which label is associated with that event
Repeat steps 1-3 for all segments in the capture
Click Save

Building a TinyML Model

We are going to use Google Colab to train our machine learning model using the data we collected from the Nano 33 BLE Sense in the previous section. Colab provides a Jupyter notebook that allows us to run our TensorFlow training in a web browser. Open the Google Colab notebook and follow along to train your model.

Offline Model Validation

After saving the model, go to the Analytic Studio to perform offline validation. To test the model against any of the captured data files

Open the Boxing Glove Gestures Demo project in the Summary Tab
Go to Test Model Tab
Select your model from the Model Name dropdown
Select one or more of the capture files by clicking on them
Click the Compute Accuracy Button to classify the captures using the selected model

p>When you click the Compute Accuracy button, the segmentation algorithm, preprocessing steps, and TensorFlow model are compiled into a single Knowledge Pack. Then the classification results and accuracy for each of the captures you selected are computed using the compiled Knowledge Pack. Click the Results button for the individual capture to see the classifications for all of the detected events and how they compared with the ground truth labels.

Deploy and Test on the Nano 33 BLE Sense

Downloading the model as firmware

Now that you validated the model offline, it’s time to see how it performs at the edge. To do that we download and flash the model to the Nano 33 BLE Sense.

Go to the Download Model tab of the Analytics Studio
Select the HW Platform: Arduino CortexM4
Select Format: Library
Click the Download button
The compiled library file should download to your computer

Flashing the Firmware

After downloading the library, we will build and upload the firmware to the Nano 33 BLE Sense. For this step, you will need the Nano 33 Knowledge Pack Firmware. To compile the firmware, we are using Visual Studio Code with the Platform IO plugin. To compile your model and Flash the Nano 33 BLE Sense with this firmware:

Open your terminal and run

git clone https://github.com/sensiml/nano33_knowledge_pack/

Unzip the downloaded Knowledge Pack.
In the folder, you will find the following directories:
knowledgepack_project/

libsensiml/
Copy the files from libsensiml to nano33_knowledge_pack/lib/sensiml. You will overwrite the files included in the repository.
Copy the files from knowledgepack_project to nano33_knowledge_pack/src/
Switch to the Platform I/O extension tab in VS Code
Connect your Nano 33 BLE Sense to your computer using the micro USB cable.
Click Upload and Monitor under the nano33ble_with_tensorflow in the PlatformI/O tab.

When the device restarts, it will boot up and your model will be running automatically. The video below walks through these steps.

Viewing Classification Results

To see the classification results in real-time connect to your device over BLE using the Android TestApp or the SensiML Open Gateway. The device will show up with the name Nano33 SensiML KP when you scan for devices. We have trained two models, one for the left glove and one for the right glove. You can see a demo of both models running at the same time in the following video.

Conclusion

We hope this blog has given you the tools you need to start building an end-to-end TinyML application using TensorFlow Lite For Microcontrollers and the SensiML Analytics Toolkit. For more tutorials and examples of TinyML applications checkout the application examples in our documentation. Follow us on LinkedIn or get in touch with us, we love hearing about all of the amazing TinyML applications the community is working on!

Using TFX inference with Dataflow for large scale ML inference patterns

May 6, 2021

by TensorFlow Blog Tensorflow

Posted by Reza Rokni, Snr Staff Developer Advocate

In part I of this blog series we discussed best practices and patterns for efficiently deploying a machine learning model for inference with Google Cloud Dataflow. Amongst other techniques, it showed efficient batching of the inputs and the use of shared.py to make efficient use of a model.

In this post, we walk through the use of the RunInference API from tfx-bsl, a utility transform from TensorFlow Extended (TFX), which abstracts us away from manually implementing the patterns described in part I. You can use RunInference to simplify your pipelines and reduce technical debt when building production inference pipelines in batch or stream mode.

The following four patterns are covered:

Using RunInference to make ML prediction calls.
Post-processing RunInference results. Making predictions is often the first part of a multistep flow, in the business process. Here we will process the results into a form that can be used downstream.
Attaching a key. Along with the data that is passed to the model, there is often a need for an identifier — for example, an IOT device ID or a customer identifier — that is used later in the process even if it’s not used by the model itself. We show how this can be accomplished.
Inference with multiple models in the same pipeline.Often you may need to run multiple models within the same pipeline, be it in parallel or as a sequence of predict – process – predict calls. We walk through a simple example.

Creating a simple model

In order to illustrate these patterns, we’ll use a simple toy model that will let us concentrate on the data engineering needed for the input and output of the pipeline. This model will be trained to approximate multiplication by the number 5.

Please note the following code snippets can be run as cells within a notebook environment.

Step 1 – Set up libraries and imports

%pip install tfx_bsl==0.29.0 --quiet

import argparse

import tensorflow as tf
from tensorflow import keras
from tensorflow_serving.apis import prediction_log_pb2

import apache_beam as beam
import tfx_bsl
from tfx_bsl.public.beam import RunInference
from tfx_bsl.public import tfxio
from tfx_bsl.public.proto import model_spec_pb2

import numpy

from typing import Dict, Text, Any, Tuple, List

from apache_beam.options.pipeline_options import PipelineOptions

project = "<your project>"
bucket = "<your bucket>"

save_model_dir_multiply = f'gs://{bucket}/tfx-inference/model/multiply_five/v1/'
save_model_dir_multiply_ten = f'gs://{bucket}/tfx-inference/model/multiply_ten/v1/'

Step 2 – Create the example data

In this step we create a small dataset that includes a range of values from 0 to 99 and labels that correspond to each value multiplied by 5.

"""
Create our training data which represents the 5 times multiplication table for 0 to 99. x is the data and y the labels. 

x is a range of values from 0 to 99.
y is a list of 5x

value_to_predict includes a values outside of the training data
"""
x = numpy.arange(0, 100)
y = x * 5

Step 3 – Create a simple model, compile, and fit it

"""
Build a simple linear regression model.
Note the model has a shape of (1) for its input layer, it will expect a single int64 value.
"""
input_layer = keras.layers.Input(shape=(1), dtype=tf.float32, name='x')
output_layer= keras.layers.Dense(1)(input_layer)

model = keras.Model(input_layer, output_layer)
model.compile(optimizer=tf.optimizers.Adam(), loss='mean_absolute_error')
model.summary()

Let’s teach the model about multiplication by 5.

model.fit(x, y, epochs=2000)

Next, check how well the model performs using some test data.

value_to_predict = numpy.array([105, 108, 1000, 1013], dtype=numpy.float32)
model.predict(value_to_predict)

From the results below it looks like this simple model has learned its 5 times table close enough for our needs!

OUTPUT: 

array([[ 524.9939],
       [ 539.9937],
       [4999.935 ],
       [5064.934 ]], dtype=float32)

Step 4 – Convert the input to tf.example

In the model we just built, we made use of a simple list to generate the data and pass it to the model. In this next step we make the model more robust by using tf.example objects in the model training.

tf.example is a serializable dictionary (or mapping) from names to tensors, which ensures the model can still function even when new features are added to the base examples. Making use of tf.example also brings with it the benefit of having the data be portable across models in an efficient, serialized format.

To use tf.example for this example, we first need to create a helper class, ExampleProcessor, that is used to serialize the data points.

class ExampleProcessor:
  
   def create_example_with_label(self, feature: numpy.float32,
                            label: numpy.float32)-> tf.train.Example:
       return tf.train.Example(
           features=tf.train.Features(
                 feature={'x': self.create_feature(feature),
                          'y' : self.create_feature(label)
                 }))

   def create_example(self, feature: numpy.float32):
       return tf.train.Example(
           features=tf.train.Features(
                 feature={'x' : self.create_feature(feature)})
           )

   def create_feature(self, element: numpy.float32):
       return tf.train.Feature(float_list=tf.train.FloatList(value=[element]))

Using the ExampleProcess class, the in-memory list can now be moved to disk.

# Create our labeled example file for 5 times table

example_five_times_table = 'example_five_times_table.tfrecord'

with tf.io.TFRecordWriter(example_five_times_table) as writer:
 for i in zip(x, y):
   example = ExampleProcessor().create_example_with_label(
       feature=i[0], label=i[1])
   writer.write(example.SerializeToString())

# Create a file containing the values to predict

predict_values_five_times_table = 'predict_values_five_times_table.tfrecord'

with tf.io.TFRecordWriter(predict_values_five_times_table) as writer:
 for i in value_to_predict:
   example = ExampleProcessor().create_example(feature=i)
   writer.write(example.SerializeToString())

With the new examples stored in TFRecord files on disk, we can use the Dataset API to prepare the data so it is ready for consumption by the model.

RAW_DATA_TRAIN_SPEC = {
'x': tf.io.FixedLenFeature([], tf.float32),
'y': tf.io.FixedLenFeature([], tf.float32)
}

RAW_DATA_PREDICT_SPEC = {
'x': tf.io.FixedLenFeature([], tf.float32),
}

With the feature spec in place, we can train the model as before.

dataset = tf.data.TFRecordDataset(example_five_times_table) 
dataset = dataset.map(lambda e : tf.io.parse_example(e, RAW_DATA_TRAIN_SPEC)) 
dataset = dataset.map(lambda t : (t['x'], t['y'])) 
dataset = dataset.batch(100) 
dataset = dataset.repeat()
model.fit(dataset, epochs=500, steps_per_epoch=1)

Note that these steps would be done automatically for us if we had built the model using a TFX pipeline, rather than hand-crafting the model as we did here.

Step 5 – Save the model

Now that we have a model, we need to save it for use with the RunInference transform. RunInference accepts TensorFlow saved model pb files as part of its configuration. The saved model file must be stored in a location that can be accessed by the RunInference transform. In a notebook this can be the local file system; however, to run the pipeline on Dataflow, the file will need to be accessible by all the workers, so here we use a GCP bucket.

Note that the gs:// schema is directly supported by the tf.keras.models.save_model api.

tf.keras.models.save_model(model, save_model_dir_multiply)

During development it’s useful to be able to inspect the contents of the saved model file. For this, we use the saved_model_cli that comes with TensorFlow. You can run this command from a cell:

!saved_model_cli show --dir {save_model_dir_multiply} --all

Abbreviated output from the saved model file is shown below. Note the signature def 'serving_default', which accepts a tensor of float type. We will change this to accept another type in the next section.

OUTPUT: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['example'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: serving_default_example:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['dense_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict

RunInference will pass a serialized tf.example to the model rather than a tensor of float type as seen in the current signature. To accomplish this we have one more step to prepare the model: creation of a specific signature.

Signatures are a powerful feature as they enable us to control how calling programs interact with the model. From the TensorFlow documentation:

“The optional signatures argument controls which methods in obj will be available to programs which consume SavedModels, for example, serving APIs. Python functions may be decorated with @tf.function(input_signature=…) and passed as signatures directly, or lazily with a call to get_concrete_function on the method decorated with @tf.function.”

In our case, the following code will create a signature that accepts a tf.string data type with a name of ‘examples’. This signature is then saved with the model, which replaces the previous saved model.

@tf.function(input_signature=[tf.TensorSpec(shape=[None], dtype=tf.string , name='examples')])
def serve_tf_examples_fn(serialized_tf_examples):
 """Returns the output to be used in the serving signature."""
 features = tf.io.parse_example(serialized_tf_examples, RAW_DATA_PREDICT_SPEC)
 return model(features, training=False)

signature = {'serving_default': serve_tf_examples_fn}

tf.keras.models.save_model(model, save_model_dir_multiply, signatures=signature)

If you run the saved_model_cli command again, you will see that the input signature has changed to DT_STRING.

Pattern 1: RunInference for Predictions

Step 1 – Use RunInference within the pipeline

Now that the model is ready, the RunInference transform can be plugged into an Apache Beam pipeline. The pipeline below uses TFXIO TFExampleRecord, which it converts to a transform via RawRecordBeamSource(). The saved model location and signature are passed to the RunInference API as a SavedModelSpec configuration object.

pipeline = beam.Pipeline()

tfexample_beam_record = tfx_bsl.public.tfxio.TFExampleRecord(file_pattern=predict_values_five_times_table)

with pipeline as p:
   _ = (p | tfexample_beam_record.RawRecordBeamSource()
          | RunInference(
              model_spec_pb2.InferenceSpecType(
                  saved_model_spec=model_spec_pb2.SavedModelSpec(model_path=save_model_dir_multiply)))
          | beam.Map(print)
       )

Note:

You can perform two types of inference using RunInference:

In-process inference from a SavedModel instance. Used when the saved_model_spec field is set in inference_spec_type.
Remote inference by using a service endpoint. Used when the ai_platform_prediction_model_spec field is set in inference_spec_type.

Below is a snippet of the output. The values here are a little difficult to interpret as they are in their raw unprocessed format. In the next section the raw results are post-processed.

OUTPUT: 

predict_log {
  request { 
model_spec { signature_name: "serving_default" }
                inputs {
      key: "examples"
... 
       string_val: "n22n20n07example22053203n01i"
...
response {
    outputs {
      key: "output_0"
      value {
   ...
        float_val: 524.993896484375

Pattern 2: Post-processing RunInference results

The RunInference API returns a PredictionLog object, which contains the serialized input and the output from the call to the model. Having access to both the input and output enables you to create a simple tuple during post-processing for use downstream in the pipeline. Also worthy of note is that RunInference will consider the amenable-to-batching capability of the model (and does batch inference for performance purposes) transparently for you.

The PredictionProcessor beam.DoFn takes the output of RunInference and produces formatted text with the questions and answers as output. Of course in a production system, the output would more normally be a Tuple[input, output], or simply the output depending on the use case.

class PredictionProcessor(beam.DoFn):

   def process(
           self,
           element: prediction_log_pb2.PredictionLog):
       predict_log = element.predict_log
       input_value = tf.train.Example.FromString(predict_log.request.inputs['examples'].string_val[0])
       output_value = predict_log.response.outputs
       yield (f"input is [{input_value.features.feature['x'].float_list.value}] output is {output_value['output_0'].float_val}");

pipeline = beam.Pipeline()

tfexample_beam_record = tfx_bsl.public.tfxio.TFExampleRecord(file_pattern=predict_values_five_times_table)

with pipeline as p:
   _ = (p | tfexample_beam_record.RawRecordBeamSource()
          | RunInference(
              model_spec_pb2.InferenceSpecType(
                  saved_model_spec=model_spec_pb2.SavedModelSpec(model_path=save_model_dir_multiply)))
          | beam.ParDo(PredictionProcessor())
          | beam.Map(print)
       )

Now the output contains both the original input and the model’s output values.

OUTPUT: 

input is [[105.]] output is [523.6328735351562]
input is [[108.]] output is [538.5157470703125]
input is [[1000.]] output is [4963.6787109375]
input is [[1013.]] output is [5028.1708984375]

Pattern 3: Attaching a key

One useful pattern is the ability to pass information, often a unique identifier, with the input to the model and have access to this identifier from the output. For example, in an IOT use case you could associate a device id with the input data being passed into the model. Often this type of key is not useful for the model itself and thus should not be passed into the first layer.

RunInference takes care of this for us, by accepting a Tuple[key, value] and outputting Tuple[key, PredictLog]

Step 1 – Create a source with attached key

Since we need a key with the data that we are sending in for prediction, in this step we create a table in BigQuery, which has two columns: One holds the key and the second holds the test value.

CREATE OR REPLACE TABLE
  maths.maths_problems_1 ( key STRING OPTIONS(description="A unique key for the maths problem"),
    value FLOAT64 OPTIONS(description="Our maths problem" ) );
INSERT INTO
  maths.maths_problems_1
VALUES
  ( "first_question", 105.00),
  ( "second_question", 108.00),
  ( "third_question", 1000.00),
  ( "fourth_question", 1013.00)

Step 2 – Modify post processor and pipeline

In this step we:

Modify the pipeline to read from the new BigQuery source table
Add a map transform, which converts a table row into a Tuple[ bytes, Example]
Modify the post inference processor to output results along with the key

class PredictionWithKeyProcessor(beam.DoFn):

   def __init__(self):
       beam.DoFn.__init__(self)

   def process(
           self,
           element: Tuple[bytes, prediction_log_pb2.PredictionLog]):
       predict_log = element[1].predict_log
       input_value = tf.train.Example.FromString(predict_log.request.inputs['examples'].string_val[0])
       output_value = predict_log.response.outputs
       yield (f"key is {element[0]} input is {input_value.features.feature['x'].float_list.value} output is { output_value['output_0'].float_val[0]}" )

pipeline_options = PipelineOptions().from_dictionary({'temp_location':f'gs://{bucket}/tmp'})
pipeline = beam.Pipeline(options=pipeline_options)

with pipeline as p:
 _ = (p | beam.io.gcp.bigquery.ReadFromBigQuery(table=f'{project}:maths.maths_problems_1')
         | beam.Map(lambda x : (bytes(x['key'], 'utf-8'), ExampleProcessor().create_example(numpy.float32(x['value']))))
         | RunInference(
             model_spec_pb2.InferenceSpecType(
                 saved_model_spec=model_spec_pb2.SavedModelSpec(model_path=save_model_dir_multiply)))
         | beam.ParDo(PredictionWithKeyProcessor())
         | beam.Map(print)
     )

key is b'first_question' input is [105.] output is 524.0875854492188
key is b'second_question' input is [108.] output is 539.0093383789062
key is b'third_question' input is [1000.] output is 4975.75830078125
key is b'fourth_question' input is [1013.] output is 5040.41943359375

Pattern 4: Inference with multiple models in the same pipeline

In part I of the series, the “join results from multiple models” pattern covered the various branching techniques in Apache Beam that make it possible to run data through multiple models.

Those techniques are applicable to RunInference API, which can easily be used by multiple branches within a pipeline, with the same or different models. This is similar in function to cascade ensembling, although here the data flows through multiple models in a single Apache Beam DAG.

Inference with multiple models in parallel

In this example, the same data is run through two different models: the one that we’ve been using to multiply by 5 and a new model, which will learn to multiply by 10.

Example of data being ran through 2 different models

"""
Create multiply by 10 table.

x is a range of values from 0 to 100.
y is a list of x * 10

value_to_predict includes a values outside of the training data
"""
x = numpy.arange( 0, 1000)
y = x * 10

# Create our labeled example file for 10 times table

example_ten_times_table = 'example_ten_times_table.tfrecord'

with tf.io.TFRecordWriter( example_ten_times_table ) as writer:
 for i in zip(x, y):
   example = ExampleProcessor().create_example_with_label(
       feature=i[0], label=i[1])
   writer.write(example.SerializeToString())

dataset = tf.data.TFRecordDataset(example_ten_times_table) 
dataset = dataset.map(lambda e : tf.io.parse_example(e, RAW_DATA_TRAIN_SPEC)) 
dataset = dataset.map(lambda t : (t['x'], t['y'])) 
dataset = dataset.batch(100)
dataset = dataset.repeat() 

model.fit(dataset, epochs=500, steps_per_epoch=10, verbose=0)

tf.keras.models.save_model(model,
                           save_model_dir_multiply_ten,
                           signatures=signature)

Now that we have two models, we apply them to our source data.

pipeline_options = PipelineOptions().from_dictionary(
                                     {'temp_location':f'gs://{bucket}/tmp'})

pipeline = beam.Pipeline(options=pipeline_options)

with pipeline as p:
 questions = p | beam.io.gcp.bigquery.ReadFromBigQuery(
                                   table=f'{project}:maths.maths_problems_1')

 multiply_five = ( questions
             | "CreateMultiplyFiveTuple" >>
             beam.Map(lambda x : (bytes('{}{}'.format(x['key'],' * 5'),'utf-8'),
                                   ExampleProcessor().create_example(x['value'])))
            
             | "Multiply Five" >> RunInference(
                 model_spec_pb2.InferenceSpecType(
                 saved_model_spec=model_spec_pb2.SavedModelSpec(
                                           model_path=save_model_dir_multiply)))
     )
 multiply_ten = ( questions
         | "CreateMultiplyTenTuple" >>
         beam.Map(lambda x : (bytes('{}{}'.format(x['key'],'* 10'), 'utf-8'),
                              ExampleProcessor().create_example(x['value'])))
         | "Multiply Ten" >> RunInference(
             model_spec_pb2.InferenceSpecType(
             saved_model_spec=model_spec_pb2.SavedModelSpec(
                                         model_path=save_model_dir_multiply_ten)))
 )
 _ = ((multiply_five, multiply_ten) | beam.Flatten()
                                    | beam.ParDo(PredictionWithKeyProcessor())
                                    | beam.Map(print))

Output:

key is b'first_question * 5' input is [105.] output is 524.0875854492188
key is b'second_question * 5' input is [108.] output is 539.0093383789062
key is b'third_question * 5' input is [1000.] output is 4975.75830078125
key is b'fourth_question * 5' input is [1013.] output is 5040.41943359375
key is b'first_question* 10' input is [105.] output is 1054.333984375
key is b'second_question* 10' input is [108.] output is 1084.3131103515625
key is b'third_question* 10' input is [1000.] output is 9998.0908203125
key is b'fourth_question* 10' input is [1013.] output is 10128.0009765625

Inference with multiple models in sequence

In a sequential pattern, data is sent to one or more models in sequence, with the output from each model chaining to the next model.

sequential pattern sending data to one or more models in sequence, with the output from each model chaining to the next model.

Here are the steps:

Read the data from BigQuery
Map the data
RunInference with multiply by 5 model
Process the results
RunInference with multiply by 10 model
Process the results

pipeline_options = PipelineOptions().from_dictionary(
                                       {'temp_location':f'gs://{bucket}/tmp'})

pipeline = beam.Pipeline(options=pipeline_options)

def process_interim_inference(element : Tuple[
                                        bytes, prediction_log_pb2.PredictionLog
                                        ])-> Tuple[bytes, tf.train.Example]:
  
  key = '{} original input is {}'.format(
             element[0], str(tf.train.Example.FromString(
                 element[1].predict_log.request.inputs['examples'].string_val[0]
                 ).features.feature['x'].float_list.value[0]))
  
  value = ExampleProcessor().create_example(
              element[1].predict_log.response.outputs['output_0'].float_val[0])
  
  return (bytes(key,'utf-8'),value)

with pipeline as p:
  
 questions = p | beam.io.gcp.bigquery.ReadFromBigQuery(
                                   table=f'{project}:maths.maths_problems_1')

 multiply = ( questions
             | "CreateMultiplyTuple" >>
             beam.Map(lambda x : (bytes(x['key'],'utf-8'),
                                   ExampleProcessor().create_example(x['value'])))
             | "MultiplyFive" >> RunInference(
                 model_spec_pb2.InferenceSpecType(
                 saved_model_spec=model_spec_pb2.SavedModelSpec(
                                   model_path=save_model_dir_multiply)))
            
     )

 _ = ( multiply
         | "Extract result " >> 
         beam.Map(lambda x : process_interim_inference(x))
         | "MultiplyTen" >> RunInference(
             model_spec_pb2.InferenceSpecType(
             saved_model_spec=model_spec_pb2.SavedModelSpec(
                             model_path=save_model_dir_multiply_ten)))
         | beam.ParDo(PredictionWithKeyProcessor())
         | beam.Map(print)
 )

Output: 

key is b"b'first_question' original input is 105.0" input is [524.9771118164062] output is 5249.7822265625
key is b"b'second_question' original input is 108.0" input is [539.9765014648438] output is 5399.7763671875
key is b"b'third_question' original input is 1000.0" input is [4999.7841796875] output is 49997.9453125
key is b"b'forth_question' original input is 1013.0" input is [5064.78125] output is 50647.91796875

Running the pipeline on Dataflow

Until now the pipeline has been run locally, using the direct runner, which is implicitly used when running a pipeline with the default configuration. The same examples can be run using the production Dataflow runner by passing in configuration parameters including --runner. Details and an example can be found here.

Here is an example of the multimodel pipeline graph running on the Dataflow service:

With the Dataflow runner you also get access to pipeline monitoring as well as metrics that have been output from the RunInference transform. The following table shows some of these metrics from a much larger list available from the library.

Conclusion

In this blog, part II of our series, we explored the use of the tfx-bsl RunInference within some common scenarios, from standard inference, to post processing and the use of RunInference API in multiple locations in the pipeline.

To learn more, review the Dataflow and TFX documentation, you can also try out TFX with Google Cloud AI platform pipelines..

Acknowledgements

None of this would be possible without the hard work of many folks across both the Dataflow TFX and TF teams. From the TFX and TF team we would especially like to thank Konstantinos Katsiapis, Zohar Yahav, Vilobh Meshram, Jiayi Zhao, Zhitao Li, and Robert Crowe. From the Dataflow team I would like to thank Ahmet Altay for his support and input throughout.

Adaptive Framework for On-device Recommendation

April 29, 2021

by TensorFlow Blog Tensorflow

Posted by Ellie Zhou, Tian Lin, Shuangfeng Li and Sushant Prakash

Introduction & Motivation

We are excited to announce an adaptive framework to build on-device recommendation ML solutions with your own data and advanced user modeling architecture.

After the previously open-sourced on-device recommendation solution, we received a lot of interest from the community on introducing on-device recommender AI. Motivated and inspired by the feedback, we considered various use cases, and created a framework that could generate TensorFlow Lite recommendation models accommodating different kinds of data, features, and architectures to improve the previous models.

Benefits of this framework:

Flexible: The adaptive framework allows users to create a model in a configurable way.
Better model representation: To improve the previous model, our new recommendation models can utilize multiple kinds of features than a single feature.

Personalized recommendations play an increasingly important role in digital life nowadays. With more and more user actions being moved to edge devices, supporting recommenders on-device becomes an important direction. Compared with conventional pure server-based recommenders, the on-device solution has unique advantages, such as protecting users’ privacy, providing fast reaction to on-device user actions, leveraging lightweight TensorFlow Lite inference, and bypassing network dependency. We welcome you to try out this framework and create recommendation experience in your applications.

In this article, we will

Introduce the improved model architecture and framework adaptivity.
Walk you through how to utilize the framework step-by-step.
Provide insights based on research done with a public dataset.

Please find more details on TensorFlow website.

Model

A recommendation model typically predicts users’ future activities, based on users’ previous activities. Our framework supports models using context information to do the prediction, which can be described in the following architecture:

Figure 1: An illustration of the configurable recommendation model. Each module is created according to the user-defined configuration.

At the context side, representations of all user activities are aggregated by the encoder to generate the context embedding. We support three different types of encoders: 1) bag-of-words (a.k.a. BOW), 2) 1-D convolution (a.k.a. CNN), and 3) LSTM. At the label side, the label item as positive and all other items in the vocabulary as negatives will be encoded to vectors as well. Context and label embeddings are combined with a dot product and fed to the loss of softmax cross entropy.

Inside the framework, we encapsulate tf.keras layers for ContextEncoder, LabelEncoder and DotProductSimilarity as key components in RecommendationModel.

To model each user activity, we could use the ID of the activity item (called ID-based), or multiple features of the item (called feature-based), or a combination of both. The feature-based model utilizing multiple features to collectively encode users’ behavior. With our framework, you could create either ID-based or feature-based models in a configurable way.

Similar to the last version, a TensorFlow Lite model will be exported after training which can directly provide top-K predictions among the recommendation candidates.

Step-by-step

To demonstrate the new adaptive framework, we trained a on-device movie recommendation model with MovieLens dataset using multiple features, and integrated it in the demo app. (Both the model and the app are for demonstration purposes only.) The MovieLens 1M dataset contains ratings from 6039 users across 3951 movies, with each user rating only a small subset of movies.

Let’s look at how to use the framework step-by-step in this notebook.

(a) Environment preparation

git clone https://github.com/tensorflow/examples
cd examples/lite/examples/recommendation/ml/
pip install -r requirements.txt

(b) Prepare training data

Please prepare your training data reference to the movielens example generation file. Would like to note that TensorFlow Lite input features are expected to be FixedLenFeature, please pad or truncate your features, and set up feature lengths in input configuration. Feel free to use the following command to process the example dataset.

python -m data.example_generation_movielens 
 --data_dir=data/raw 
 --output_dir=data/examples 
 --min_timeline_length=3 
 --max_context_length=10 
 --max_context_movie_genre_length=32 
 --min_rating=2 
 --train_data_fraction=0.9 
 --build_vocabs=True

MovieLens data contains ratings.dat (columns: UserID, MovieID, Rating, Timestamp), and movies.dat (columns: MovieID, Title, Genres). The example generation script takes both files, only keep ratings higher than 2, form user movie interaction timelines, sample activities as labels and previous user activities as the context for prediction. Please find the generated tf.Example:

0 : {
  features: {
    feature: {
      key  : "context_movie_id"
      value: { int64_list: { value: [ 1124, 2240, 3251, ..., 1268 ] } }
    }
    feature: {
      key  : "context_movie_rating"
      value: { float_list: {value: [ 3.0, 3.0, 4.0, ..., 3.0 ] } }
    }
    feature: {
      key  : "context_movie_year"
      value: { int64_list: { value: [ 1981, 1980, 1985, ..., 1990 ] } }
    }
    feature: {
      key  : "context_movie_id"
      value: { int64_list: { value: [ 1124, 2240, 3251, ..., 1268 ] } }
    }
    feature: {
      key  : "context_movie_genre"
      value: { bytes_list: { value: [ "Drama", "Drama", "Mystery", ..., "UNK" ] } }
    }
    feature: {
      key  : "label_movie_id"
      value: { int64_list: { value: [ 3252 ] }  }
    }
  }
}

(c) Create input config

Once data prepared, please set up input configuration, e.g. this is one example configuration for movielens movie recommendation model.

activity_feature_groups {
  features {
    feature_name: "context_movie_id"
    feature_type: INT
    vocab_size: 3953
    embedding_dim: 8
    feature_length: 10
  }
  features {
    feature_name: "context_movie_rating"
    feature_type: FLOAT
    feature_length: 10
  }
  encoder_type: CNN
}
activity_feature_groups {
  features {
    feature_name: "context_movie_genre"
    feature_type: STRING
    vocab_name: "movie_genre_vocab.txt"
    vocab_size: 19
    embedding_dim: 4
    feature_length: 32
  }
  encoder_type: CNN
}
label_feature {
  feature_name: "label_movie_id"
  feature_type: INT
  vocab_size: 3953
  embedding_dim: 8
  feature_length: 1
}

(d) Train model

The model trainer will construct the recommendation model based on the input config, with a simple interface.

python -m model.recommendation_model_launcher -- 
 --training_data_filepattern "data/examples/train_movielens_1m.tfrecord" 
 --testing_data_filepattern "data/examples/test_movielens_1m.tfrecord"
 --model_dir "model/model_dir" 
 --vocab_dir "data/examples" 
 --input_config_file "configs/sample_input_config.pbtxt" 
 --batch_size 32 
 --learning_rate 0.01 
 --steps_per_epoch 2 
 --num_epochs 2 
 --num_eval_steps 2 
 --run_mode "train_and_eval" 
 --gradient_clip_norm 1.0 
 --num_predictions 10 
 --hidden_layer_dims "32,32" 
 --eval_top_k "1,5" 
 --conv_num_filter_ratios "2,4" 
 --conv_kernel_size 4 
 --lstm_num_units 16

Inside the recommendation model, core components are packaged up to keras layers (context_encoder.py, label_encoder.py and dotproduct_similarity.py), each of which could be utilized by itself. The following diagram illustrates the code structure:

Figure 2: An example of model architecture using context information to predict the next movie. The inputs are the history of (a) movie IDs, (b) ratings and (c) genres, which is specified by the config mentioned above.

With the framework, you can directly execute the model training launcher with command:

python -m model.recommendation_model_launcher 
 --input_config_file "configs/sample_input_config.pbtxt" 
 --vocab_dir "data/examples" 
 --run_mode "export" 
 --checkpoint_path "model/model_dir/ckpt-1000" 
 --num_predictions 10 
 --hidden_layer_dims "32,32" 
 --conv_num_filter_ratios "2,4" 
 --conv_kernel_size 4 
 --lstm_num_units 16

The inference code after exporting to TensorFlow Lite can be found in the notebook, and we refer readers to check out the details there.

Framework Adaptivity

Our framework provides a protobuf interface, through which feature groups, types and other information can be configured to build models accordingly. With the interface, you can configure:

Features
The framework generically categorizes features into 3 types: integer, string, and float. Embedding spaces will be created for both integer and string features, hence, embedding dimension, vocabulary name and size need to be specified. Float feature values will be directly used. Besides, for on-device models, we suggest to use fixed length features which can be configured directly.

message Feature {
  optional string feature_name = 1;

  // Supported feature types: STRING, INT, FLOAT.
  optional FeatureType feature_type = 2;

  optional string vocab_name = 3;

  optional int64 vocab_size = 4;

  optional int64 embedding_dim = 5;

  optional int64 feature_length = 6;
}

Feature groups
One feature for one user activity may have multiple values. For instance, one movie can belong to multiple categories, each movie will have multiple genre feature values. To handle the different feature shapes, we introduced the “feature group” to combine features as a group . The features with the same length can be put in the same feature group to be encoded together. Inside input config, you can set up global feature groups and activity feature groups.

message FeatureGroup {
  repeated Feature features = 1;

  // Supported encoder types: BOW, CNN, LSTM.
  optional EncoderType encoder_type = 2;
}

Input config
You can use the input config interface to set up all the features and feature groups together.

message InputConfig {
  repeated FeatureGroup global_feature_groups = 1;

  repeated FeatureGroup activity_feature_groups = 2;

  optional Feature label_feature = 3;
}

The input config is utilized by both input_pipeline.py and recommendation_model.py to process training data to tf.data.Dataset and construct the model accordingly. Inside ContexEncoder, FeatureGroupEncoders will be created for all feature groups, and used to compute feature group embeddings from input features. Concatenated feature group embeddings will be fed through top hidden layers to get the final context embedding. Worth noting that the final context embedding and label embedding dimensions should be equal.

Please check out the different model graphs produced with the different input configurations in the Appendix section.

Experiments and Analysis

We take this opportunity to analyze the performance of ID-based and feature-based models with various configurations, and provide some empirical results.

For the ID-based model, only movie_id is used as the input feature. And for the feature-based model, both movie_id and movie_genre features are used. Both types of models are experimented with 3 encoder types (BOW/CNN/LSTM) and 3 context history lengths (10/50/100).

Comparison between ID-based and Feature-based models. We compare them on BOW/CNN/LSTM encoders and context history lengths 10/50/100.

Since MovieLens dataset is an experimental dataset with ~4000 candidate movies and 19 movie genres, hence we scaled down embedding dimensions in the experiments to simulate the production scenario. For the above experiment result chart, ID embedding dimension is set to 8, and movie genre embedding dimension is set to 4. If we take the context10_cnn as an example, the feature-based model outperforms the ID-based model by 58.6%. Furthermore, the on average results show that feature-based models outperforms by 48.35%. Therefore, In this case, the feature-based model outperforms the ID-based model, because movie_genre feature introduces additional information to the model.

Besides, underlying features of candidate items mostly have a smaller vocabulary size, hence smaller embedding spaces as well. For instance,the movie genre vocabulary is much smaller than the movie ID vocabulary. In this case, utilizing underlying features could reduce the memory size of the model, which is more on-device friendly.

Acknowledgement

Special thanks to Cong Li, Josh Gordon, Khanh LeViet‎, Arun Venkatesan and Lawrence Chan for providing valuable suggestions to this work.

Reconstructing thousands of particles in one go at the CERN LHC with TensorFlow

April 22, 2021

by TensorFlow Blog Tensorflow

A guest post by Jan Kieseler from CERN, EP/CMG

Introduction

At large colliders such as the CERN LHC (Large Hadron Collider) high energetic particle beams collide and thereby create massive and possibly yet unknown particles from the collision energy following the well known equation E=mc². Most of these newly created particles are not stable and decay to more stable particles almost immediately. Detecting these decay products and measuring their properties precisely is the key to understanding what happened during the high energy collision, and will possibly shed light on big questions such as the origin of dark matter.

Detecting and measuring particles

For this purpose, the collision interaction points are surrounded by large detectors covering as much as possible in all possible directions and energies of the decay products. These detectors are further split into sub-detectors, each collecting complementary information. The innermost detector, closest to the interaction point, is the tracker consisting of multiple layers. Similar to a camera, each layer can detect the spatial position at which a charged particle passed through it, providing access to its trajectory. Combined with a strong magnetic field, this trajectory gives access to the particle charge and the particle momentum.

While the tracker is aimed at measuring the trajectories, only, while minimising any further interaction with and scattering of the particles, the next sub-detector layer is aimed at stopping them entirely. By stopping the particles completely, these calorimeters can extract the initial particle energy, and can also detect neutral particles. The only particles that pass through these calorimeters are muons, which are identified by additional muon chambers that constitute the outermost detector shell and use the same detection principles as the tracker.

Layout of the CMS detector, showing different particle species interacting with the sub-detectors (Image credit: CERN).

Combining the information from all these sub-detectors to reconstruct the final particles is a challenging task, not only because we want to achieve the best physics performance, but also in terms of computing resources and person-power available to develop and tune the reconstruction algorithms. In particular for the High-Luminosity LHC, the extension of the CERN LHC, aiming to collect unprecedented amounts of data, these algorithms need to perform well given a collision rate of 40MHz and up to 200 simultaneous interactions in each collision, which result in up to about a million signals from all sub-detectors.

Even with triggers, a fast, step-wise filtering of interesting events, in place, the total data collected to disk still comprises several petabytes, making efficient algorithms a must at all stages.

Classic reconstruction algorithms in high energy physics heavily rely on factorisation of individual steps and a lot of domain (physics) knowledge. While they perform reasonably well, the idealistic assumptions that are needed to develop these algorithms limit the performance, such that machine-learning approaches are often used to refine the classically reconstructed particles, and make their way into more and more reconstruction chains. The machine learning approaches benefit from a very precise simulation of all detector components and physics processes, valid over several orders of magnitude, such that large sets of labelled data can be produced very easily in a short amount of time. This led to a rise of neural network based identification and regression algorithms, and to the inclusion of TensorFlow as the standard inference engine in the software framework of the Compact Muon Solenoid (CMS) experiment.

Machine-learning reconstruction approaches also come with the advantage that by construction they have to be automatically optimizable and need a loss function to train that quantifies the final reconstruction target. In contrast, classic approaches are often optimised without the need to define such an inclusive quantitative metric, and parameters are tuned by hand, involving many experts, and taking a lot of person-power that could be spent on developing new algorithms instead of tuning existing ones. Therefore, moving to differentiable machine-learning algorithms such as deep neural networks in general can also help use the human resources more efficiently.

However, extending machine-learning based algorithms to the first step of reconstructing the particles from hits – instead of just refining already reconstructed particles – comes with two main challenges: the data structure and phrasing reconstruction as a minimisation problem.

The detector data is highly irregular, due to the inclusion of multiple sub-detectors, each with its own geometry. But even within a sub-detector, such as the tracker, the geometry is designed based on physics aspects with a fine resolution close to the interaction point and more coarse further away. Furthermore, the individual tracker layers are not densely packed, but have a considerable amount of space between them, and in each event only a small fraction of sensors are actually active, changing the number of inputs from event to event. Therefore, neural networks that require a regular grid, such as convolutional neural network architectures are – despite their good performance and highly optimised implementations – not applicable.

Graph neural networks can bridge this gap and, in principle, allow abstracting from the detector geometry. Recently, several graph neural network proposals from computer science have been studied in the context of refining already reconstructed particles in high energy physics. However, given the high input dimensionality of the data many of these proposals cannot be employed for reconstructing particles directly from hits, and custom solutions are needed. One example is GravNet that – by construction – reduces the resource requirements significantly while maintaining good physics performance by using sparse dynamic adjacency matrices and performing most operations without memory overhead.

This in particular becomes possible through TensorFlow which makes it easy to implement and load custom kernels into the graph and integrate custom analytic gradients for fused operations. Only the combination of these custom kernels and the network structure allows loading a full physics event into the GPU memory, training the network on it, and performing the inference.

GravNet layer architecture (from left to right): point features are projected into a feature space F_LR, and a low dimensional coordinate space S; k nearest neighbours are determined in S; mean and maximum of distance weighted neighbour features are accumulated; accumulated features are combined with original features.

Since many of the reconstruction tasks, even the refinement of already reconstructed particles, rely on an unknown number of inputs, the recent addition and support of ragged data structures in TensorFlow in principle opens up a lot of new possibilities. While the integration is not sufficient to build full neural network architectures, yet, a future full support of ragged data structures would be a significant step forward for integrating TensorFlow even deeper into the reconstruction algorithms and would make some custom kernels obsolete.

The second challenge when reconstructing particles directly from hits using deep neural networks is to train the network to predict an unknown number of particles from an unknown number of inputs. There is a plethora of algorithms and training methods for detecting dense objects in dense data, such as images, but while the requirement of the dense data can be loosened in some cases, most of these algorithms still rely on the objects being dense or having a clear boundary, making it possible to exploit anchor boxes or central points of the object. Particles in the detector however, often overlap to a large degree, and their sparsity does not allow defining central points nor clear boundaries. A solution to this problem is Object Condensation, where object properties are condensed in at least one representative condensation point per object that can be chosen freely by the network through a high confidence score.

To resolve ambiguities, the other points are clustered around the object they belong to using potential functions (illustrated above). However, these potentials scale with the confidence score in a tunable manner, such that the amount of segmentation the network should perform is adjustable up to the point where all points except the condensation points can be left free floating in case we are only interested in the final object properties.

Some parts of this algorithm are very similar to the method proposed in a previous paper, but the goal is entirely different. While the previous approach constitutes a very powerful segmentation algorithm moving pixels to cluster objects in an image towards a central point, the condensation points here directly carry object properties, and through the choice of the potential functions the clustering space can be completely detached from the input space. The latter has big implications for the applicability to the sparse detector data with overlapping particles, but also does not distinguish conceptually between “stuff” and “things”, providing a new perspective on one-shot panoptic segmentation.

But coming back to particle reconstruction, as shown in the corresponding paper, Object Condensation can outperform classic particle reconstruction algorithms even on simplified problems that are quite close to the idealistic assumptions of the classic algorithm. Therefore, it provides an alternative to classic reconstruction approaches directly from hits.

Based on this promising study, there is work ongoing to extend the approach to simulated events in the High Granularity Calorimeter, a planned new sub-detector of the CMS experiment at CERN, with about 2 million sensors, covering the particularly challenging forward region close to the incident beams, where most particles are produced. Compared to the published proof-of-concept, this realistic environment is much more challenging and requires the optimisation of the network structures and even more custom TensorFlow operations that can be found in the developing repository on github, using DeepJetCore as an interface to data formats commonly used in high-energy physics. Right now, there is a particular focus on implementing fast k-nearest-neighbour algorithms, a crucial building block for GravNet, that can handle the large input dimensionality, but also ragged implementations of other operations as well as implementations of the Object Condensation loss can be found there.

Conclusion

To conclude, the application of deep neural networks to reconstruction tasks is exhibiting a shift from refining classically reconstructed particles to reconstructing the particles and their properties directly, in an optimizable and highly parallelizable way to meet the person-power and computing challenges in the future. This development will give rise to more custom implementations, and will be using more of the bleeding edge features in TensorFlow and tf.keras such as ragged data structures, so that a closer contact between high-energy physics reconstruction developers and TensorFlow developers is foreseeable.

Acknowledgements

I would like to acknowledge the support of Thiru Palanisamy and Josh Gordon at Google for their help with the blog post collaboration and with providing active feedback.

How-to Write a Python Fuzzer for TensorFlow

April 1, 2021

by TensorFlow Blog Tensorflow

Posted by Laura Pak

Fuzz testing is a process of testing APIs with generated data. Fuzzing ensures that code will not break on the negative path, generating randomized inputs that try to cover every branch of code. A popular choice is to pair fuzzers with sanitizers, which are tools that check for illegal conditions and thus flag the bugs triggered by the fuzzers’ inputs.

In this way, fuzzing can find:

Buffer overflows
Memory leaks
Deadlocks
Infinite recursion
Round-trip consistency failures
Uncaught exceptions
And more.

The best way to fuzz to have your fuzz tests running continuously. The more a test runs, the more inputs can be generated and tested against. In this article, you’ll learn how to add a Python fuzzer to TensorFlow.

The technical how-to

TensorFlow Python fuzzers run via OSS-Fuzz, the continuous fuzzing service for open source projects.

For Python fuzzers, OSS-Fuzz uses Atheris, a coverage-guided Python fuzzing engine. Atheris is based on the fuzzing engine libFuzzer, and it can be used with the dynamic memory error detector Address Sanitizer or the fast undefined behavior detector, Undefined Behavior Sanitizer. Atheris dependencies will be pre-installed on OSS-Fuzz base Docker images.

Here is a barebones example of a Python fuzzer for TF. The runtime will call TestCode with different random data.

import sys
import atheris_no_libfuzzer as atheris

def TestCode(data):
  DoSomethingWith(data)

def main():
  atheris.Setup(sys.argv, TestCode, enable_python_coverage=True)
  atheris.Fuzz()

In the tensorflow repo, in the directory with the other fuzzers, add your own Python fuzzer like above. In TestCode, pick a TensorFlow API that you want to fuzz. In constant_fuzz.py, that API is tf.constant. That fuzzer simply passes data to the chosen API to see if it breaks. No need for code that catches the breakage; OSS-Fuzz will detect and report the bug.

Sometimes an API needs more structured data than just one input. TensorFlow has a Python class called FuzzingHelper that allows you to generate random int lists, a random bool, etc. See an example of its use in sparseCountSparseOutput_fuzz.py, a fuzzer that checks for uncaught exceptions in the API tf.raw_ops.SparseCountSparseOutput.

To build and run, your fuzzer needs a fuzzing target of type tf_py_fuzz_target, defined in tf_fuzzing.bzl. Here is an example fuzz target, with more examples here.

tf_py_fuzz_target(
    name = "fuzz_target_name",
    srcs = ["your_fuzzer.py"],
    tags = ["notap"],  # Important: include to run in OSS.
)

Testing your fuzzer with Docker

Make sure that your fuzzer builds in OSS-Fuzz with Docker.

First install Docker. In your terminal, run command docker image prune to remove any dangling images.

Clone oss-fuzz from Github. The project for a Python TF fuzzer, tensorflow-py, contains a build.sh file to be executed in the Docker container defined in the Dockerfile. Build.sh defines how to build binaries for fuzz targets in tensorflow-py. Specifically, it builds all the Python fuzzers found in $SRC/tensorflow/tensorflow, including your new fuzzer!

Inside oss-fuzz, run the following commands:

python infra/helper.py shell tensorflow
export FUZZING_LANGUAGE=python
compile

The command compile will run build.sh, which will attempt to build your new fuzzer.

The results

Once your fuzzer is up and running, you can search this dashboard for your fuzzer to see what vulnerabilities your fuzzer has uncovered.

Conclusion

Fuzzing is an exciting way to test software from the unhappy path. Whether you want to dabble in security or gain a deeper understanding of TensorFlow’s internals, we hope this post gives you a good place to start.

TensorFlow Quantum turns one year old

March 18, 2021

by TensorFlow Blog Tensorflow

Posted by Michael Broughton, Alan Ho, Masoud Mohseni

Last year we announced TensorFlow Quantum (TFQ) at the 2020 TensorFlow developer summit and on the Google AI Blog. Bringing all of the tools and features that TensorFlow has to offer to the world of quantum computing has led to some great research success stories. In this post, we would like to look back on what’s happened in the last year involving TensorFlow Quantum and how far it’s come. We also discuss the future of quantum computing and machine learning in TensorFlow Quantum.

Since the release of TensorFlow Quantum, we’ve been happy to see increasing use of the library in the academic world as well as inside Alphabet, in particular the Quantum AI team at Google. There have been many research articles published in the last year that made use of TensorFlow Quantum in quantum machine learning or hybrid quantum-classical models, including discriminative models and generative models. With the cross pollination of ideas between the two fields, we are also seeing advanced learning algorithms from classical machine learning being reimagined such as quantum reinforcement learning, layerwise, and neural architecture search. We leverage the scalability and tooling of TensorFlow to run numerical experiments with large numbers of qubits and gates to more faithfully discover algorithms that will be practical in the future.

Here are a few papers published using TFQ if you’d like to check them out:

Power of data in quantum machine learning (Google, Discrimitivate, 30 qubits)
Absence of Barren Plateaus in Quantum Convolutional Neural Networks (LANL, Discriminative, 26 qubits)
Quantum Hamiltonian-Based Models and the Variational Quantum Thermalizer Algorithm (Alphabet X, Generative, 16 qubits)
Entanglement Diagnostics for Efficient Quantum Computation (Princeton / Tel-Aviv U., Generative, 20 qubits)
Layerwise learning for quantum neural networks (VW / Google, Discriminative, 18 qubits)
Differentiable Quantum Architecture Search (Tencent, Neural Architecture Search, 8 qubits)

In our recent publication to quantify the computational advantage of quantum machine learning, experiments were conducted at PetaFLOP/s throughput scales, which is nothing new for classical machine learning, but represents a huge leap forward in the scale seen in quantum machine learning experiments before TensorFlow Quantum came along. We are very excited for the future that quantum computing and machine learning have together and we are happy to see TensorFlow Quantum having such a positive impact already.

The academic world isn’t the only place machine learning and quantum computing have been able to come together. Over the past year members of the TensorFlow Quantum team helped out in supporting the artistic works of Refik Anadol Studios’ “Quantum memories” piece. This combines the random circuit sample data from the 2019 beyond classical experiment and adoptions of StyleGAN to create some truly magnificent works of art

Quantum memories installation at the NGV (image used with permission).

Next steps

We will soon be releasing TensorFlow Quantum 0.5.0, with more support for distributed workloads as well as lots of new quantum centric features and some small performance boosts. Looking forward, we hope that these features will enable our users to continue to push the boundaries of complexity and scale in quantum computing and machine learning and eventually help lead to groundbreaking quantum computing experiments (not just simulations). Our ultimate goal when we released TensorFlow Quantum was to have it aid in the search for quantum advantage in the field of machine learning. In time, it is our hope to see the world reach that goal, with the help of the continued hard work and dedication of the QML research community. Quantum machine learning is still a very young field and there’s still a long way to go before this happens, but over the past year we’ve seen the community make amazing strides in many different areas and we can’t wait to see what you will accomplish in the years to come.

TensorFlow for Mobile and Web

TFX

Responsible AI

Decision forests in Keras

TensorFlow Lite for Microcontrollers

Google Cloud

Community

Find all the talks here

Installation

Try it yourself!

Model deep dive

BlazePose Browser Performance

Looking ahead

Acknowledgements

Data Preparation & Ingestion

Model Training

Creating a Web Deployable Model

Angular Application

Try it yourself

Acknowledgements

Unlocking Live Health Applications with MoveNet

Installation

Try it yourself!

MoveNet Deep Dive

MoveNet Architecture

Training Datasets

Optimization

Temporal Filtering

MoveNet Browser Performance

Looking Ahead

Acknowledgements

What you need to get started

The Software

The Hardware

Building Your Data Set

Streaming sensor data from the Nano 33 over BLE

Setting up the Data Capture Lab Project

Connecting to the Gateway

Capturing Boxing Gesture Sensor Data

Annotating Sensor Data

Segmenting in the Data Capture Lab

Labeling Events in the Data Capture Lab

Building a TinyML Model

Offline Model Validation

Deploy and Test on the Nano 33 BLE Sense

Downloading the model as firmware

Flashing the Firmware

Viewing Classification Results

Conclusion

Creating a simple model

Step 1 – Set up libraries and imports

Step 2 – Create the example data

Step 3 – Create a simple model, compile, and fit it

Step 4 – Convert the input to tf.example

Step 5 – Save the model

Pattern 1: RunInference for Predictions

Step 1 – Use RunInference within the pipeline

Pattern 2: Post-processing RunInference results

Pattern 3: Attaching a key

Step 1 – Create a source with attached key

Step 2 – Modify post processor and pipeline

Pattern 4: Inference with multiple models in the same pipeline

Inference with multiple models in parallel

Inference with multiple models in sequence

Running the pipeline on Dataflow

Conclusion

Acknowledgements

Introduction

Detecting and measuring particles

Conclusion

Acknowledgements

The technical how-to

Testing your fuzzer with Docker

The results

Conclusion

Next steps

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.