Galaxy Zoo: Classifying Galaxies with Crowdsourcing and Active Learning

Galaxy Zoo: Classifying Galaxies with Crowdsourcing and Active Learning

A guest article by Mike Walmsley, University of Oxford

The way we do science is changing; there’s exponentially more data every day but around the same number of scientists. The traditional approach of collecting data samples, looking through them, and drawing some conclusions about each one is often inadequate.

One solution is to deploy algorithms to process the data automatically. Another solution is to deploy more eyeballs: recruit members of the public to join in and help. I work on the intersection between the two – combining crowdsourcing and machine learning to do better science than with either alone.

In this article, I want to share how I’ve been using crowdsourcing and machine learning to investigate how galaxies evolve by classifying millions of galaxy images. Along the way, I’ll share some techniques we use to train CNNs that make predictions with uncertainty. I’ll also explain how to use those predictions to do active learning: labelling only the data which would best help you improve your models.

Better Telescopes, Bigger Problems

Ever since Edwin Hubble in the 1920’s, astronomers have looked up at galaxies and tried to classify them into different types – smooth galaxies, spiral galaxies, and so on. But the number of galaxies kept on climbing. About 20 years ago, a grad student named Kevin Schawinski sat at his desk with a pile of 900,000 galaxy pictures, put his head in his hands and thought – “there has to be a better way” than classifying every one himself. He wasn’t alone. To classify all 900,000 galaxies without sacrificing Kevin’s sanity, a team of scientists (including Kevin) built Galaxy Zoo.

Galaxy Zoo is a website that asks members of the public to classify galaxies for us. We show you a galaxy, and we ask simple questions about what you can see, like – is the galaxy smooth, or featured? As you answer, we lead you down a decision tree where the questions depend on how you’ve previously responded.

The Galaxy Zoo UI. Check it out, and join in with the science, here.

Since launch, hundreds of thousands of volunteers have classified millions of galaxies – advancing our understanding of supermassive black holes, spiral arms, the births and deaths of stars, and much more. However, there’s a problem: humans don’t scale. Galaxy surveys keep getting bigger, but we will always have about the same number of volunteers. The latest space-based telescopes can image hundreds of millions of galaxies – far more than we could ever label with crowdsourcing alone.

To keep up, we used TensorFlow to build a galaxy classifier. Other researchers have used the responses we’ve collected to train convolutional neural networks (CNNs) – a type of deep learning model tailored for image recognition. However, traditional CNNs have a drawback; they don’t easily handle uncertainty.

Training a CNN to solve a regression problem by predicting a value for each label and minimising the mean squared error, as is common, implicitly assumes that all labels are equally uncertain – which is definitely not the case for Galaxy Zoo. Further, the CNN only gives a ‘best guess’ answer with no error bars – making it difficult to draw scientific conclusions.

In our paper, we use Bayesian CNNs for morphology classification. Bayesian CNNs provide two key improvements:

  1. They account for varying uncertainty when learning from volunteer responses.
  2. They predict full posteriors over the morphology of each galaxy.

Using our Bayesian CNN, we can learn from noisy labels and make reliable predictions (with error bars) for hundreds of millions of galaxies.

How Bayesian Convolutional Neural Networks Work

There are two key steps to creating our Bayesian CNNs.
1. Predict the parameters of a probability distribution, not the label itself
Training neural networks is much like any other fitting problem: you tweak the model to match the observations. If you are equally confident in all your collected labels, you can just minimise the difference (e.g. mean squared error) between your predictions and the observed values. However for Galaxy Zoo, many labels are more confident than others.
If I observe that, for some galaxy, 30% of volunteers say “bar”, my confidence in that 30% depends heavily on how many people replied – was it 4 or 40? Instead, we predict the probability that a typical volunteer will say “Bar”, and minimise how surprised we should be given the total number of volunteers who replied.
This way, our model understands that errors on galaxies where many volunteers replied are worse than errors on galaxies where few volunteers replied – letting it learn from every galaxy.
In our case, we can model our surprise with the Binomial distribution by recognising that k “Bar” responses from N volunteers is much like k successes from N independent trials.

loss = tf.reduce_mean(binomial_loss(labels, scalar_predictions))

Where `binomial_loss` calculates the surprise (negative log likelihood) of the observed labels given our model predictions: In TF, we can calculate this with:

def binomial_loss(observations, est_prob_success):
one = tf.constant(1., dtype=tf.float32)
# to avoid calculating log 0
epsilon = tf.keras.backend.epsilon()

# multiplication in tf requires floats
k_successes = tf.cast(observations[:, 0], tf.float32)
n_trials = tf.cast(observations[:, 1], tf.float32)

# binomial negative log likelihood, dropping (fixed) combinatorial terms
return -( k_successes * tf.log(est_prob_success + epsilon) + (n_trials - k_successes) * tf.log(one - est_prob_success + epsilon )

2. Use Dropout to Pretend to Train Many Networks
Our model now makes probabilistic predictions, but what if we had trained a different model? It would make slightly different probabilistic predictions. To be Bayesian, we need to marginalise over the possible models we might have trained. To do this, we use dropout.
At train time, dropout reduces overfitting by “approximately combining exponentially many different neural network architectures efficiently” (Srivastava 2014). This approximates the Bayesian approach of treating the network weights as random variables to be marginalised over. By also applying dropout at test time, we can exploit this idea of approximating many models to also make Bayesian predictions (Gal 2016).
Here’s a TF 2.0 example using the Subclassing API:

from tensorflow.keras import layers, Model

class SimpleClassifier(Model):

def __init__(self):
super(SimpleClassifier, self).__init__()
self.conv1 = layers.Conv2D(32, 3, activation='relu')
self.flatten = layers.Flatten()
self.d1 = layers.Dense(128, activation='relu')
self.dropout1 = layers.Dropout(rate=0.5)
self.d2 = layers.Dense(2, activation='softmax')

def call(self, x, training):
x = self.conv1(x)
x = self.flatten(x)
x = self.d1(x)
if training: # dropout typically applied only at train time
x = self.dropout1(x)
return self.d2(x)

Switching on test-time dropout actually involves less code:

 def call(self, x):  # no ‘training’ argument required
x = self.conv1(x)
x = self.flatten(x)
x = self.d1(x)
x = self.dropout1(x) # dropout always on
return self.d2(x)

Below, you can see our Bayesian CNN in action. Each row is a galaxy (shown to the left). In the central column, our CNN makes a single probabilistic prediction (the probability that a typical volunteer would answer “Bar”). We can interpret that as a posterior for the probability that k of N volunteers would say “Bar” – shown in black. On the right, we marginalise over many CNNs using dropout. Each CNN posterior (grey) is different, but we can marginalise over them to get the posterior over many CNNs (green) – our Bayesian posterior.

Left: input images of galaxies, with or without a bar. Center: single probabilistic predictions (i.e. without dropout) for how many volunteers would say “Bar”. Right: many probabilistic predictions made with different dropout masks (grey), marginalised into our approximate Bayesian posterior (green).

The Bayesian posterior does an excellent job at quantifying if each galaxy has a bar. Read more about it in the paper (and check out the code).

Active Learning

Modern surveys will image hundreds of millions of galaxies – more than we can show to volunteers. Given that, which galaxies should we classify with volunteers, and which by our Bayesian CNN?
Ideally we would only show volunteers the images that the model would find most informative. The model should be able to ask, “Hey, these galaxies would be really helpful to learn from; can you label them for me please?” Then the humans would label them and the model would retrain – this is active learning. In our experiments, applying active learning reduces the number of galaxies needed to reach a given performance level by up to 35-60%.
We can use our posteriors to work out which galaxies are most informative. Remember that we use dropout to approximate training many models (see above). We show in the paper that informative galaxies are galaxies where those models confidently disagree.
Why? We often hold our strongest opinions where we are least informed – and so do our CNN (Hendrycks 2016). Without a basis in evidence, different CNN will often disagree confidently.

Formally, informative galaxies are galaxies where each model is confident (entropy H in the posterior from each model, p(votes|weights), is low) but the average prediction over all the models is uncertain (entropy across all averaged posteriors is high). This is only possible because we think about labels probabilistically and approximate training many models. For more, see Houlsby, N. (2014) and Gal 2017, or our code for an implementation.
What galaxies are informative? Exactly the galaxies you would intuitively expect.

  • The model strongly prefers diverse featured galaxies over ellipticals (smooth ‘blobs’).
  • For identifying bars, the model prefers galaxies which are better resolved (lower redshift).

This selection is completely automatic. Indeed, I didn’t realise the lower redshift preference until I looked at the images!

Our active learning system selects galaxies on the left (featured and diverse) over those on the right (smooth ‘blobs’).

Active learning is picking galaxies to label right now on Galaxy Zoo – check it out here by selecting the ‘Enhanced’ workflow. I’m excited to see what science can be done as we move from classifying hundreds of thousands of galaxies to hundreds of millions.
If you’d like to know more or you have any questions, get in touch in the comments or on Twitter (@mike_w_ai, @chrislintott, @yaringal, @OATML_Oxford).
Cheers,
Mike
*Dropout is an imperfect approximation of a fully Bayesian approach that’s feasible for large vision models but may underestimate uncertainty. It’s possible to make better approximations, especially for small models. Check out this post by the Tensorflow Probability team showing how to do this for one-dimensional regression. Read More

BigTransfer (BiT): State-of-the-art transfer learning for computer vision

BigTransfer (BiT): State-of-the-art transfer learning for computer vision

Posted by Jessica Yung and Joan Puigcerver

In this article, we’ll walk you through using BigTransfer (BiT), a set of pre-trained image models that can be transferred to obtain excellent performance on new datasets, even with only a few examples per class.

ImageNet-pretrained ResNet50s are a current industry standard for extracting representations of images. With our BigTransfer (BiT) paper, we share models that perform significantly better across many tasks, and transfer well even when using only a few images per dataset.

You can find BiT models pre-trained on ImageNet and ImageNet-21k in TFHub as TensorFlow2 SavedModels that you can use easily as Keras Layers. There are a variety of sizes ranging from a standard ResNet50 to a ResNet152x4 (152 layers deep, 4x wider than a typical ResNet50) for users with larger computational and memory budgets but higher accuracy requirements.

Figure 1: The x-axis shows the number of images used per class, ranging from 1 to the full dataset. On the plots on the left, the curve in blue above is our BiT-L model, whereas the curve below is a ResNet-50 pre-trained on ImageNet (ILSVRC-2012).

In this tutorial, we show how to load one of our BiT models and either (1) use it out-of-the-box or (2) fine-tune it to your target task for higher accuracy. Specifically, we demonstrate using a ResNet50 trained on ImageNet-21k.

What is Big Transfer (BiT)?

Before we get into the details of how to use the models, how did we train models that transfer well to many tasks?

Upstream training

The essence is in the name – we effectively train large architectures on large datasets. Before our paper, few papers had seen significant benefits from training on larger public datasets such as ImageNet-21k (14M images, 10x larger than the commonly-used ImageNet). The components we distilled for training models that transfer well are:

Big datasets
The best performance across our models increases as the dataset size increases.

Big architectures
We show that in order to make the most out of big datasets, one needs large enough architectures. For example, training a ResNet50 on JFT (which has 300M images) does not always improve performance relative to training the ResNet50 on ImageNet-21k (14.8M images), but we consistently see improvements when training larger models like a ResNet152x4 on JFT as opposed to ImageNet-21k (Figure 2 below).

ILSVRC
Pets
Flowers
Figure 2: The effect of larger upstream datasets (x-axis) and model size (bubble size/colour) on performance on downstream tasks. Using larger datasets or larger models alone may hurt performance – both need to be increased in tandem.

Long pre-training time
We also show that it’s important to train for long enough when pre-training on larger datasets. It’s standard to train on ImageNet for 90 epochs, but if we train on a larger dataset such as ImageNet-21k for the same number of steps (and then fine-tune on ImageNet), the performance is worse than if we’d trained on ImageNet directly.

GroupNorm and Weight Standardisation
Finally, we use GroupNorm combined with Weight Standardisation instead of BatchNorm. Since our models are large, we can only fit a few images on each accelerator (e.g. GPU or TPU chip). However, BatchNorm performs worse when the number of images on each accelerator is too low. GroupNorm does not have this problem, but does not scale well to large overall batch sizes. But when we combine GroupNom with Weight Standardisation, we see that GroupNorm scales well to large batch sizes, even outperforming BatchNorm.

Downstream fine-tuning

Moreover, downstream fine-tuning is cheap in terms of data efficiency and compute – our models attain good performance with only a few examples per class on natural images. We also designed a hyperparameter configuration which we call ‘BiT-HyperRule’ that performs fairly well on many tasks without the need to do an expensive hyperparameter sweep.

BiT-HyperRule: our hyperparameter heuristic
As alluded to above, this is not a hyperparameter sweep – given a dataset, it specifies one set of hyperparameters that we’ve seen produce good results. You can often obtain better results by running a more expensive hyperparameter sweep, but BiT-HyperRule is an effective way of getting good initial results on your dataset.

In BiT-HyperRule, we use SGD with an initial learning rate of 0.003, momentum 0.9, and batch size 512. During fine-tuning, we decay the learning rate by a factor of 10 at 30%, 60% and 90% of the training steps.

As data preprocessing, we resize the image, take a random crop, and then do a random horizontal flip (details in Table 1). We do random crops and horizontal flips for all tasks except those where such actions destroy label semantics. For example, we don’t apply random crops to counting tasks, or random horizontal flips to tasks where we’re meant to predict the orientation of an object (Figure 3).

Table 1: Downstream resizing and random cropping details. If images are larger, we resize them to a larger fixed size to benefit from fine-tuning on higher resolution.
Figure 3: CLEVR count example: Here the task is to count the number of small cylinders or red objects in the image. We would not apply a random crop since that may crop out objects we would like to count, but we apply a random horizontal flip since that doesn’t change the number of objects we care about in the image (and thus does not change the label). Image attribution: CLEVR count example by Johnson et. al.)

We determine the schedule length and whether or not to use MixUp (Zhang et. al., 2018, illustrated in Figure 4) according to the dataset size (Table 2).

Figure 4: MixUp takes pairs of examples and linearly combines the images and labels. These images are taken from the dataset tf_flowers.
Table 2: Details on downstream schedule length and when we use MixUp.

We determined these hyperparameter heuristics based on empirical results. We explain our method and describe our results in more detail in our paper and in our Google AI blog post.

Tutorial

Now let’s actually fine-tune one of these models! You can follow along by running the code in this colab.

1) Load the pre-trained BiT model

You can download one of our BiT models pre-trained on ImageNet-21k from TensorFlow Hub. The models are saved as SavedModels. Loading them is very simple:

import tensorflow_hub as hub
# Load model from TFHub into KerasLayer
model_url = "https://tfhub.dev/google/bit/m-r50x1/1"
module = hub.KerasLayer(model_url)

2) Use BiT out-of-the-box

If you don’t yet have labels for your images (or just want to have some fun), you may be interested in using the model out-of-the-box, i.e. without fine-tuning it. For this, we will use a model fine-tuned on ImageNet so it has the interpretable ImageNet label space of 1k classes. Many common objects are not covered, but it gives a reasonable idea of what is in the image.

# use model
logits = imagenet_module(image)

Note that BiT models take inputs with values between 0 and 1.

In the colab, you can load an image from an URL and see what the model predicts:

> show_preds(preds, image[0])
Image from PikRepo

Here the pre-trained model on ImageNet correctly classifies the photo as an elephant.It is also more likely to be an Indian as opposed to an African elephant because of the size of its ears. In the colab, we also predict on an image from the dataset we’re going to fine-tune on, TF flowers, which has also been used in other tutorials. Note that the correct label ‘tulip’ is not a class in ImageNet and so the model cannot predict that at the moment – let’s see what it tries to do instead:

The model predicts a reasonably similar-looking class, ‘bell pepper’.

3) Fine-tune BiT on your task

Now, we are going to fine-tune the BiT model so it performs better on a specific dataset. Here we are going to use Keras for simplicity, and we are going to fine-tune the model on a dataset of flowers (tf_flowers). We will use the model we loaded at the start (i.e. the one pre-trained on ImageNet-21k) so that it is less biased towards a narrow subset of classes.

There are two steps:

  1. Create a new model with a new final layer (called the ‘head’)
  2. Fine-tune this model using BiT-HyperRule, our hyperparameter heuristic. We described this in detail earlier in the ‘Downstream fine-tuning’ section of the post.

To create the new model, we:

  1. Cut off the BiT model’s original head. This leaves us with the “pre-logits” output.
    • We do not have to do this if we use the ‘feature extraction’ models, since for those models the head has already been cut off.
  2. Add a new head with the number of outputs equal to the number of classes of our new task. Note that it is important that we initialise the head to all zeroes.
class MyBiTModel(tf.keras.Model):
"""BiT with a new head."""

def __init__(self, num_classes, module):
super().__init__()

self.num_classes = num_classes
self.head = tf.keras.layers.Dense(num_classes, kernel_initializer='zeros')
self.bit_model = module

def call(self, images):
# No need to cut head off since we are using feature extractor model
bit_embedding = self.bit_model(images)
return self.head(bit_embedding)

model = MyBiTModel(num_classes=5, module=module)

When we fine-tune the model, we use BiT-HyperRule, our heuristic for choosing hyperparameters for downstream fine-tuning which we described earlier. We also code our heuristic in full in the colab.

# Define optimiser and loss

# Decay learning rate by factor of 10 at SCHEDULE_BOUNDARIES.
lr = 0.003
SCHEDULE_BOUNDARIES = [200, 300, 400, 500]
lr_schedule = tf.keras.optimizers.schedules.PiecewiseConstantDecay(boundaries=SCHEDULE_BOUNDARIES,
values=[lr, lr*0.1, lr*0.001, lr*0.0001])
optimizer = tf.keras.optimizers.SGD(learning_rate=lr_schedule, momentum=0.9)

To fine-tune the model, we use the simple Keras model.fit API:

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer,
loss=loss_fn,
metrics=['accuracy'])

# Fine-tune model
model.fit(
pipeline_train,
batch_size=512,
steps_per_epoch=10,
epochs=50,
validation_data=pipeline_test)

We see that our model attains 95% validation accuracy within 20 steps, and attains over 98% validation accuracy after fine-tuning using BiT-HyperRule.
4) Save the fine-tuned model for later use
It is easy to save your model to use later on. You can then load your saved model in exactly the same way as we loaded the BiT models at the start.

# Save fine-tuned model as SavedModel
export_module_dir = '/tmp/my_saved_bit_model/'
tf.saved_model.save(model, export_module_dir)

# Load saved model
saved_module = hub.KerasLayer(export_module_dir, trainable=True)

Voila – we now have a model that predicts tulips as tulips and not bell peppers.

Summary

In this post, you learned about the key components you can use to train models that can transfer well to many different tasks. You also learned how to load one of our BiT models, fine-tune it on your target task and save the resulting model. Hope this helped and happy fine-tuning!
Acknowledgements
This blog post is based on work by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly and Neil Houlsby. We thank many members of Brain Research Zurich and the TensorFlow team for their feedback, especially Luiz Gustavo Martins, André Susano Pinto, Marcin Michalski, Josh Gordon, Martin Wicke, Daniel Keysers, Amélie Royer, Basil Mustafa, and Mario Lučić.
Additional links

Read More

How Hugging Face achieved a 2x performance boost for Question Answering with DistilBERT in Node.js

How Hugging Face achieved a 2x performance boost for Question Answering with DistilBERT in Node.js

A guest post by Hugging Face: Pierric Cistac, Software Engineer; Victor Sanh, Scientist; Anthony Moi, Technical Lead.

Hugging Face 🤗 is an AI startup with the goal of contributing to Natural Language Processing (NLP) by developing tools to improve collaboration in the community, and by being an active part of research efforts.

Because NLP is a difficult field, we believe that solving it is only possible if all actors share their research and results. That’s why we created 🤗 Transformers, a leading NLP library with more than 2M downloads and used by researchers and engineers across many companies. It allows the amazing international NLP community to quickly experiment, iterate, create and publish new models for a variety of tasks (text/token generation, text classification, question answering…) in a variety of languages (English of course, but also French, Italian, Spanish, German, Turkish, Swedish, Dutch, Arabic and many others!) More than 300 different models are available today through Transformers.

While Transformers is very handy for research, we are also working hard on the production aspects of NLP, looking at and implementing solutions that can ease its adoption everywhere. In this blog post, we’re going to showcase one of the paths we believe can help fulfill this goal: the use of “small”, yet performant models (such as DistilBERT), and frameworks targeting ecosystems different from Python such as Node via TensorFlow.js.

The need for small models: DistilBERT

One of the areas we’re interested in is “low-resource” model with close to state-of-the-art results, while being a lot smaller and also a lot faster to run. That’s why we created DistilBERT, a distilled version of BERT: it has 40% fewer parameters, runs 60% faster while preserving 97% of BERT’s performance as measured on the GLUE language understanding benchmark.

NLP models through time, with their number of parameters

To create DistilBERT, we’ve been applying knowledge distillation to BERT (hence its name), a compression technique in which a small model is trained to reproduce the behavior of a larger model (or an ensemble of models), demonstrated by Hinton et al.

In the teacher-student training, we train a student network to mimic the full output distribution of the teacher network (its knowledge). Rather than training with a cross-entropy over the hard targets (one-hot encoding of the gold class), we transfer the knowledge from the teacher to the student with a cross-entropy over the soft targets (probabilities of the teacher). Our training loss thus becomes:

With t the logits from the teacher and s the logits of the student

Our student is a small version of BERT in which we removed the token-type embeddings and the pooler (used for the next sentence classification task). We kept the rest of the architecture identical while reducing the numbers of layers by taking one layer out of two, leveraging the common hidden size between student and teacher. We trained DistilBERT on very large batches leveraging gradient accumulation (up to 4000 examples per batch), with dynamic masking, and removed the next sentence prediction objective.

With this, we were then able to fine-tune our model on the specific task of Question Answering. To do so, we used the BERT-cased model fine-tuned on SQuAD 1.1 as a teacher with a knowledge distillation loss. In other words, we distilled a question answering model into a language model previously pre-trained with knowledge distillation! That’s a lot of teachers and students: DistilBERT-cased was first taught by BERT-cased, and then “taught again” by the SQuAD-finetuned BERT-cased version in order to get the DistilBERT-cased-finetuned-squad model.

This results in very interesting performances given the size of the network: our DistilBERT-cased fine-tuned model reaches an F1 score of 87.1 on the dev set, less than 2 points behind the full BERT-cased fine-tuned model! (88.7 F1 score).

If you’re interested in learning more about the distillation process, you can read our dedicated blog post.

The need for a language-neutral format: SavedModel

Using the previous process, we end up with a 240MB Keras file (.h5) containing the weights of our DistilBERT-cased-squad model. In this format, the architecture of the model resides in an associated Python class. But our final goal is to be able to use this model in as many environments as possible (Node.js + TensorFlow.js for this blog post), and the TensorFlow SavedModel format is perfect for this: it’s a “serialized” format, meaning that all the information necessary to run the model is contained into the model files. It is also a language-neutral format, so we can use it in Python, but also in JS, C++, and Go.

To convert to SavedModel, we first need to construct a graph from the model code. In Python, we can use tf.function to do so:

import tensorflow as tf
from transformers import TFDistilBertForQuestionAnswering

distilbert = TFDistilBertForQuestionAnswering.from_pretrained('distilbert-base-cased-distilled-squad')
callable = tf.function(distilbert.call)

Here we passed to tf.function the function called in our Keras model, call. What we get in return is a callable that we can in turn use to trace our call function with a specific signature and shapes thanks to get_concrete_function:

concrete_function = callable.get_concrete_function([tf.TensorSpec([None, 384], tf.int32, name="input_ids"), tf.TensorSpec([None, 384], tf.int32, name="attention_mask")])

By calling get_concrete_function, we trace-compile the TensorFlow operations of the model for an input signature composed of two Tensors of shape [None, 384], the first one being the input ids and the second one the attention mask.

Then we can finally save our model to the SavedModel format:

tf.saved_model.save(distilbert, 'distilbert_cased_savedmodel', signatures=concrete_function)

A conversion in 4 lines of code, thanks to TensorFlow! We can check that our resulting SavedModel contains the correct signature by using the

saved_model_cli:

$ saved_model_cli show --dir distilbert_cased_savedmodel --tag_set serve --signature_def serving_default

Output:

The given SavedModel SignatureDef contains the following input(s):
inputs['attention_mask'] tensor_info:
dtype: DT_INT32
shape: (-1, 384)
name: serving_default_attention_mask:0
inputs['input_ids'] tensor_info:
dtype: DT_INT32
shape: (-1, 384)
name: serving_default_input_ids:0
The given SavedModel SignatureDef contains the following output(s):
outputs['output_0'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 384)
name: StatefulPartitionedCall:0
outputs['output_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 384)
name: StatefulPartitionedCall:1
Method name is: tensorflow/serving/predict

Perfect! You can play with the conversion code yourself by opening this colab notebook. We are now ready to use our SavedModel with TensorFlow.js!

The need for ML in Node.js: TensorFlow.js

Here at Hugging Face we strongly believe that in order to reach its full adoption potential, NLP has to be accessible in other languages that are more widely used in production than Python, with APIs simple enough to be manipulated with software engineers without a Ph.D. in Machine Learning; one of those languages is obviously Javascript.

Thanks to the API provided by TensorFlow.js, interacting with the SavedModel we created previously in Node.js is very straightforward. Here is a slightly simplified version of the Typescript code in our NPM Question Answering package:

const model = await tf.node.loadSavedModel(path); // Load the model located in path

const result = tf.tidy(() => {
// ids and attentionMask are of type number[][]
const inputTensor = tf.tensor(ids, undefined, "int32");
const maskTensor = tf.tensor(attentionMask, undefined, "int32");

// Run model inference
return model.predict({
// “input_ids” and “attention_mask” correspond to the names specified in the signature passed to get_concrete_function during the model conversion
“input_ids”: inputTensor, “attention_mask”: maskTensor
}) as tf.NamedTensorMap;
});

// Extract the start and end logits from the tensors returned by model.predict
const [startLogits, endLogits] = await Promise.all([
result[“output_0"].squeeze().array() as Promise,
result[“output_1”].squeeze().array() as Promise
]);

tf.dispose(result); // Clean up memory used by the result tensor since we don’t need it anymore

Note the use of the very helpful TensorFlow.js function tf.tidy, which takes care of automatically cleaning up intermediate tensors like inputTensor and maskTensor while returning the result of the model inference.

How do we know we need to use "ouput_0" and "output_1" to extract the start and end logits (beginning and end of the possible spans answering the question) from the result returned by the model? We just have to look at the output names indicated by the saved_model_cli command we ran previously after exporting to SavedModel.

The need for fast and easy to use tokenizer: 🤗 Tokenizers

Our goal while building our Node.js library was to make the API as simple as possible. As we just saw, running model inference once we have our SavedModel is quite simple, thanks to TensorFlow.js. Now, the most difficult part is passing the data in the right format to the input ids and attention mask tensors. What we collect from a user is usually a string, but the tensors require arrays of numbers: we need to tokenize the user input.

Enter 🤗 Tokenizers: a performant library written in Rust that we’ve been working on at Hugging Face. It allows you to play with different tokenizers such as BertWordpiece very easily, and it works in Node.js too thanks to the provided bindings:

const tokenizer = await BertWordPieceTokenizer.fromOptions({
vocabFile: vocabPath, lowercase: false
});

tokenizer.setPadding({ maxLength: 384 }); // 384 matches the shape of the signature input provided while exporting to SavedModel

// Here question and context are in their original string format
const encoding = await tokenizer.encode(question, context);
const { ids, attentionMask } = encoding;

That’s it! In just 4 lines of code, we are able to convert the user input to a format we can then use to feed our model with TensorFlow.js.

The Final Result: Powerful Question Answering in Node.js

Thanks to the powers of the SavedModel format, TensorFlow.js for inference, and Tokenizers for tokenization, we’ve reached our goal to offer a very simple, yet very powerful, public API in our NPM package:

import { QAClient } from "question-answering"; // If using Typescript or Babel
// const { QAClient } = require("question-answering"); // If using vanilla JS

const text = `
Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season.
The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.
As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.
`;

const question = "Who won the Super Bowl?";

const qaClient = await QAClient.fromOptions();
const answer = await qaClient.predict(question, text);

console.log(answer); // { text: 'Denver Broncos', score: 0.3 }

Powerful? Yes! Thanks to the native support of SavedModel format in TensorFlow.js, we get very good performances: here is a benchmark comparing our Node.js package and our popular transformers Python library, running the same DistilBERT-cased-squad model. As you can see, we achieve a 2X speed gain! Who said Javascript was slow?

Short texts are texts between 500 and 1000 characters, long texts are between 4000 and 5000 characters. You can check the Node.js benchmark script here (the Python one is equivalent). Benchmark run on a standard 2019 MacBook Pro running on macOS 10.15.2.

It’s a very interesting time for NLP: big models such as GPT2 or T5 keep getting better and better, and research on how to “minify” those good but heavy and costly models is also getting more and more traction, with distillation being one technique among others. Adding to the equation tools that allow big developer communities to be part of the revolution (such as TensorFlow.js with the Javascript ecosystem), only makes the future of NLP more exciting and more production-ready than ever!

For further reading, feel free to check our Github repositories:
https://github.com/huggingfaceRead More

TFRT: A new TensorFlow runtime

TFRT: A new TensorFlow runtime

Posted by Eric Johnson, TFRT Product Manager and Mingsheng Hong, TFRT Tech Lead/Manager

TensorFlow aims to make it easy for you to build and deploy ML models across many different devices. Yet, what it means to “build and deploy ML models” is not static and continues to change with increased investment in the ML ecosystem.

At the top-half of the TensorFlow stack, innovation is leading to more complex models and deployment scenarios. Researchers are inventing new algorithms that require more compute, while application developers are enhancing their products with these new techniques across edge and server.

At the bottom-half of the stack, the tension from increasing compute needs and rising compute costs due to the ending of Moore’s law has sparked a proliferation of new hardware aimed at specific ML use cases. Traditional chip makers, startups, and software companies alike (including Google) have invested in specialized silicon.

The result is that the needs of the ML ecosystem are vastly different than they were 4 or 5 years ago when TensorFlow was first created. Of course, we’ve continued to iterate with the release of 2.x, but the current TensorFlow stack is optimized for graph execution, and incurs non-trivial overhead when dispatching a single op. A high-performance low-level runtime is a key to enable the trends of today and empower the innovations of tomorrow.

Enter TFRT, a new TensorFlow RunTime. It aims to provide a unified, extensible infrastructure layer with best-in-class performance across a wide variety of domain specific hardware. It provides efficient use of multithreaded host CPUs, supports fully asynchronous programming models, and focuses on low-level efficiency.

TFRT will benefit a broad range of users, including:

  • Researchers looking for faster iteration time and better error reporting when developing complex new models in eager mode.
  • Application developers looking for improved performance when training and serving models in production.
  • Hardware makers looking to integrate edge and datacenter devices into TensorFlow in a modular way.

What is TFRT?

TFRT is a new runtime that will replace the existing TensorFlow runtime. It is responsible for efficient execution of kernels – low-level device-specific primitives – on targeted hardware. It plays a critical part in both eager and graph execution, which is illustrated by this simplified diagram of the TensorFlow training stack:

TFRT’s role in graph and eager execution within the TensorFlow training stack

Note that everything in grey is part of TFRT. In eager execution, TensorFlow APIs call directly into the new runtime. In graph execution, your program’s computational graph is lowered to an optimized target-specific program and dispatched to TFRT. In both execution paths, the new runtime invokes a set of kernels that call into the underlying hardware devices to complete the model execution, as shown by the black arrows.

Key design points

Whereas the existing TensorFlow runtime was initially built for graph execution and training workloads, the new runtime will make eager execution and inference first-class citizens, while putting special emphasis on architecture extensibility and modularity. More specifically, TFRT has the following selected design highlights:

  • To achieve higher performance, TFRT has a lock-free graph executor that supports concurrent op execution with low synchronization overhead, and a thin eager op dispatch stack so that eager API calls will be asynchronous and more efficient.
  • To make extending the TF stack easier, we decoupled device runtimes from the host runtime, the core TFRT component that drives host CPU and I/O work.
  • To get consistent behavior, TFRT leverages common abstractions, such as shape functions and kernels, across both eager and graph.

The power of MLIR

TFRT is also tightly-integrated with MLIR. For example:

  • TFRT utilizes MLIR’s compiler infrastructure to generate an optimized, target-specific representation of your computational graph that the runtime executes.
  • TFRT uses MLIR’s extensible type system to support arbitrary C++ types in the runtime, which removes tensor-specific limitations.

Together, TFRT and MLIR will improve TensorFlow’s unification, flexibility, and extensibility.

Initial Results

Early performance results from the inference and serving use case are encouraging. As part of a benchmarking study for TensorFlow Dev Summit 2020, we integrated TFRT with TensorFlow Serving and measured the latency of sending requests to the model and getting prediction results back. We picked a common MLPerf model, ResNet-50, and chose a batch size of 1 and a data precision of FP16 to focus our study on runtime related op dispatch overhead. In comparing performance of GPU inference over TFRT to the current runtime, we saw an improvement of 28% in average inference time. These early results are strong validation for TFRT, and we expect it to provide a big boost to performance. We hope you are as excited as we are!

What’s next

TFRT is being integrated with TensorFlow, and will be enabled initially through an opt-in flag, giving the team time to fix any bugs and fine-tune performance. Eventually, it will become TensorFlow’s default runtime. Although it is still an early stage project, we have made the GitHub repository available to the community. We are limiting contributions to begin with, but encourage participation in the form of requirements and design discussions.

To learn more, please check out our Dev Summit 2020 presentation, where we first introduced TFRT to the world, and our MLIR Open Design Deep Dive presentation, where we provided a detailed overview of TFRT’s core components, low-level abstractions, and general design principles. And finally, if you want to keep up with all things TFRT, please join our new mailing list. Thanks!

Read More

AI for Medicine Specialization featuring TensorFlow

AI for Medicine Specialization featuring TensorFlow

Posted by Laurence Moroney, AI Advocate

I’m excited to share an important aspect of the TensorFlow community: when educators and domain experts teach and train developers how to use machine learning technology to solve important tasks for a variety of scenarios, including in health care. To this end, deeplearning.ai and Coursera have launched an “AI for Medicine” specialization using TensorFlow.
Nothing excites our team more than when we see how others are using TensorFlow to solve real-world problems. In this three course specialization introduced by Andrew Ng and taught by Pranav Rajpurkar, we hope to widen access so that more people can understand the needs of medical Machine Learning problems.

Deeplearning.ai and Coursera have designed a specialization that is divided into three courses. The first Machine Learning for Medical Diagnosis will take you through some hypothetical Machine Learning scenarios for diagnosis of medical issues. In the first week, you’ll explore scenarios like detecting skin cancer, eye disease and histopathology. You’ll get hands-on with how you can write code in TensorFlow using convolutional neural networks to examine images, which, for example can be used to identify different conditions in an X-Ray.

The course does require some knowledge of TensorFlow, using techniques such as convolutional neural networks, transfer learning, natural language processing, and more. I recommend that you take the TensorFlow: In Practice specialization to understand the coding skills behind it, and the Deep Learning Specialization to go deeper into how the underlying technology works. Another great resource to learn the techniques used in this course is the book “Hands on Machine Learning with SciKit-Learn, Keras and TensorFlow” by Aurelien Geron.

One of the things I really enjoyed about the course is the balance of medical terminology and using common machine learning techniques from TensorFlow, such as data augmentation, to improve your models. Note: all of the data used in the course is de-identified.

Exercises from Rajpurkar’s and Ng’s course: Using image augmentation to extend the effective size of a dataset.

The course continues with techniques such as evaluation metrics and isolating key ones and understanding how to interpret confidence intervals accurately.

The first course wraps up with another deep dive into image processing, this time using segmentation in MRI images, wrapping up with a programming assignment in doing brain tumor auto segmentation on MRIs.

The second course in the specialization will be on Machine Learning for Medical Prognosis where you learn to build models to predict future patient health. You’ll learn techniques to extract data from reports such as a patient’s health metrics, history, and demographics to predict their risk of a major event such as a heart attack.

The third, and final, course will be on Machine Learning for Medical Treatment, where models may be used to assist in medical care to predict what the potential effect of a medical treatment might be on a patient. It will also go into using machine learning for text so that you can use NLP techniques to extract information from radiography reports to get labels or get the basis for a bot for answering medical questions.

In the words of Andrew Ng, “Even if your current work is not in medicine, I think you will find the application scenarios and the practice of these scenarios to be really useful, and maybe this specialization will inspire you to get more interested in medicine”.

The specialization is available at Coursera, and like all courses can be audited for free. You can learn more about deeplearning.ai at their website, and about TensorFlow at tensorflow.org.
Read More

What’s new in TensorFlow Lite from DevSummit 2020

What’s new in TensorFlow Lite from DevSummit 2020

Posted by Khanh LeViet, Developer Advocate on behalf of the TensorFlow Lite team
Edge devices, such as smartphones, have become more powerful each year and enable an increasing number of on-device machine learning use cases. TensorFlow Lite is the official framework for running TensorFlow model inference on edge devices. It runs on more than 4 billion active devices globally, on various platforms, including Android, iOS, and Linux-based IoT devices, and on bare metal microcontrollers.

We continue to push the limits of on-device machine learning with TensorFlow Lite by making it faster and easier to use. In this blog post, we highlight recent TensorFlow Lite features that were launched within the past six months, leading up to the TensorFlow Dev Summit in March 2020.

Pushing the limits of on-device machine learning

Enabling state-of-the-art models

Machine learning is a fast-moving field with new models that break the state-of-the-art records every few months. We put a lot of effort into making these state-of-the-art models run well on TensorFlow Lite. As examples, we now support EfficientNet-Lite (paper), a family of image classification models, MobileBERT (paper), and ALBERT-Lite (paper), a light-weight version of BERT (paper) that supports multiple NLP (natural language processing) tasks. Check out some of the performance benchmarks below.

EfficientNet-Lite

EfficientNet-Lite is a family of image classification models that achieve state-of-the-art accuracy with an order of magnitude fewer computations and parameters. The models are optimized for TensorFlow Lite with quantization, resulting in faster inference with negligible accuracy loss, and they can run on the CPU, GPU, or Edge TPU. Find out more in our blog post.

Figure: Integer-only quantized models running on Pixel 4 CPU with 4 threads.

MobileBERT and ALBERT-Lite

MobileBERT and ALBERT-Lite are the optimized versions of the popular BERT model that achieved state-of-the-art accuracy on a range of NLP tasks, including question and answer, natural language inference, and others. MobileBERT is about 4x faster and smaller than BERT and retains similar accuracy. Meanwhile, ALBERT-Lite is 6x smaller than BERT, but slower than MobileBERT.

Figure: Pixel 4, Float32 question & answer models, CPU 4 threads

New TensorFlow Lite converter

We launched a new converter that enables more models and improves the developer experience:

  • Enables conversion of new classes of models, including DeepSpeech V2, Mask R-CNN, Mobile BERT, MobileNetSSD, and many more
  • Adds support for functional control flow (enabled by default in TensorFlow 2.x)
  • Tracks original TensorFlow node names and Python code and exposes them during conversion if errors occur
  • Leverages MLIR, Google’s cutting edge compiler technology for ML, which makes it easier to troubleshoot conversion errors and extend to accommodate feature requests

The new converter is fully backward compatible and is enabled by default since TensorFlow 2.2, while the old converter is still available via a flag. See the documentation for more details.

Quantization-aware-training support for Keras

Quantization-aware-training (QAT) enables you to train and deploy models with the performance and size benefits of quantization—makes your model 4x times smaller and run faster, while retaining accuracy. You can add QAT with one line of code.

import tensorflow_model_optimization as tfmot

model = tf.keras.Sequential([
...
])
# Quantize the entire model.
quantized_model = tfmot.quantization.keras.quantize_model(model)

# Continue with training as usual.
quantized_model.compile(...)
quantized_model.fit(...)

Here is how QAT stacks up against the original float model and post-training quantization. Find out more in our blog post.

Faster inference across platforms

Better CPU performance

Improving CPU performance has been a major priority for the team and we’ve shipped a number of substantial CPU-related performance optimizations in recent months. As part of this effort, we developed an optimized matrix multiplication library (ruy), built from the ground up to deliver better performance on the classes of CPU hardware and models typically used in mobile environments. As of TensorFlow 1.15, this library is enabled by default for all ARM devices and has helped deliver latency improvements anywhere from 1.2x to 5x across an extremely broad range of models and use cases.

Pixel 4 – Single Threaded CPU, February 2020

We have some additional CPU optimizations scheduled to ship in the TensorFlow 2.3 release, including ~40% faster execution of models with post-training weight quantization, as well as a new highly optimized floating-point convolutional kernel library (XNNPACK) that delivers 20-50% faster execution across all of the key floating-point convolutional models supported by TensorFlow Lite.

Faster inference with new hardware accelerator delegates

TensorFlow Lite is truly cross platform, so that you can train a model once and get the optimal performance on every supported platform. In the past few months, we have added support for running inference on Qualcomm’s Hexagon DSPs, Apple’s Core ML, Android GPU with OpenCL.
Hexagon DSPs are microprocessors that can be found on millions of modern Android phones using Qualcomm Snapdragon SoCs. The new TensorFlow Lite Hexagon delegate leverages the DSP to achieve performance gains in the range of 3-25x for models like MobileNet and Inceptionv3 compared to CPU, while being more power efficient than both CPU and GPU. Learn more in our blog post.
Core ML is the machine learning framework available on Apple’s devices and provides the API to run ML models on Apple’s Neural Engine. The new TensorFlow Lite Core ML delegate allows running TensorFlow Lite models on Core ML and Neural Engine, if available, to achieve faster inference with better power consumption efficiency. On iPhone XS and newer devices, where Neural Engine is available, we have observed performance gains from 1.3x to 11x on various computer vision models. More details can be found in our blog post.
OpenCL is a framework for writing programs that execute across heterogeneous platforms. We recently added support for OpenCL to the TensorFlow Lite GPU delegate, achieving approximately 4-6x speed-up over CPU and approximately 2x speed-up over OpenGL on a variety of computer vision models. Here is a snapshot of the OpenCL backend performance on Pixel 4.

Android performance bottleneck profiler

TensorFlow Lite on Android supports instrumented logging of internal events, including ops invocations, that can be tracked by Android’s system tracing. The new profiling data allows you to identify performance bottlenecks.
Here are some examples of insights that you can get from the profiler and potential solutions to improve performance:

  • If the number of available CPU cores is smaller than the number of inference threads, then the CPU scheduling overhead can lead to subpar performance. You can reschedule other CPU intensive tasks in your application to avoid overlapping with your model inference or tweak the number of interpreter threads.
  • If the operators are not fully delegated, then some parts of the model graph are executed on the CPU rather than the expected hardware accelerator. You can substitute the unsupported operators with similar supported operators.

This feature is available now in TensorFlow Lite Android library nightly build. More details can be found here.

Make ML easier to use

Model creation with no ML expertise

TensorFlow Lite Model Maker enables you to adapt state-of-the-art machine learning models to your dataset with transfer learning. It shortens the learning curve for developers new to machine learning by wrapping the complex machine learning concepts with an intuitive API. For example, you can train a state-of-the-art image classification with only four lines of code.

data = ImageClassifierDataLoader.from_folder('flower_photos/')
model = image_classifier.create(data)
loss, accuracy = model.evaluate()
model.export('flower_classifier.tflite', 'flower_label.txt', with_metadata=True)

Model Maker supports many state-of-the-art models that are available on TensorFlow Hub, including the EfficientNet-Lite models mentioned above. It currently supports image classification (tutorial) and text classification (tutorial) with more computer-vision and NLP use cases coming soon.

Model sharing made easy with metadata

Traditionally, running inference with TensorFlow Lite means working with the raw tensors. This presented two hurdles:

  1. The consumer of the TensorFlow Lite model will need to know exactly what the tensor shape means (e.g. 1 x 224 x 224 x 3). Is it a bitmap? If so, is it in red, blue, and green channels or some other scheme? This poses a problem if the team creating the model is not the same team consuming it.
  2. The need to use a lot of error-prone boilerplate code to convert from high-level data types, such as Bitmap to an RGB float array or a ByteArray, before it can be used.

To solve the first problem, we added support for model metadata to TensorFlow Lite, allowing model creators to describe the input and output of their model using typed objects. In addition to basic information, such as the size of the bitmap or color channels, we also included information, such as mean and standard deviation, to communicate to the model consumers, so that the appropriate normalization can be applied.
To solve the second problem of boilerplate code, we created the Android code generator, which reads the TensorFlow Lite metadata and creates the appropriate wrapper code to resize, normalize, and convert to and from ByteArray. This means you can now interact with the TensorFlow Lite model using high-level objects that you are familiar with:

// 1. Initializing the Model    
MyClassifierModel myImageClassifier = new MyClassifierModel(activity);

// 2. Setting the input with a Bitmap called inputBitmap
MyClassifierModel.Inputs inputs = myImageClassifier.createInputs();
inputs.loadImage(inputBitmap));

// 3. Running the model
MyClassifierModel.Outputs outputs = myImageClassifier.run(inputs);

// 4. Retrieving the result
Map labeledProbability = outputs.getProbability();

This is currently an experimental feature and only supports image-based models. We added metadata support to most TensorFlow Lite vision models on TensorFlow Hub and the Image Classifier Model Maker. Going forward, the project is expanding in three ways:

  1. Support input types beyond images to enable more use-cases
  2. Build an Android Studio plugin that makes this even easier to use
  3. Add iOS support

More sample and learning materials

We launched two online courses on Coursera and Udacity to provide a structured learning path for TensorFlow Lite. Both courses are four weeks long and teach how to use TensorFlow Lite on Android, iOS, and IoT devices.
We released new sample apps demonstrating how to use pretrained models, including style transfer, question and answer and more. We love to see the engagement in the TensorFlow Lite community. Recently, one member collected pretrained models, samples, and tutorials created by the community and curated them on GitHub. Feel free to contribute!

Better support for microcontrollers

Official support for Arduino

TensorFlow Lite for Microcontrollers is now available as an official Arduino library, which makes it easy to deploy speech detection to an Arduino Nano in under 5 minutes.

More TensorFlow for Microcontrollers optimizations

We are working with leading industry partners who are writing optimized implementations of TensorFlow Lite for Microcontrollers kernels for their hardware architectures. For example, Cadence announced their support for TensorFlow for Microcontrollers for their Tensilica HiFi DSP family.

How Google is using TensorFlow Lite

TensorFlow Lite is used extensively within Google in many of our key products, including YouTube, Google Assistant, and Google Photos.
The Google Lens team shared how they migrated from a server-based model to a client-based on-device model to improve the user experience.

The Live Perception team showed how to build a machine learning pipeline to process live camera feed in real-time.

What’s next

We have new features and improvements coming in a few months:

  • XNNPACK integration for highly optimized floating-point model execution. This will significantly speed up CPU inference across platforms.
  • New state-of-the-art on-device models, an updated guide, and examples demonstrating more use cases, such as native C/C++ APIs for inference on mobile.
  • Additional tools for trimming binary size based on the ops used in client models, reducing the size impact on client apps.
  • Enhancements to Model Maker for more tasks like object detection or NLP tasks. We are adding BERT support to enable new NLP tasks like question and answer, which will empower developers without ML expertise to build state-of-the-art NLP models through transfer learning.
  • Expansion of the metadata and codegen tools to support more use cases, including object detection and other NLP-related tasks, and better integration with Android Studio.

To see the longer term TensorFlow Lite product roadmap, please check out our website.Read More

Optimizing style transfer to run on mobile with TFLite

Optimizing style transfer to run on mobile with TFLite

Posted by Khanh LeViet and Luiz Gustavo Martins, Developer Advocates

Neural style transfer is an optimization technique used to take two images, a content image (such as a building) and a style image (such as artwork by an iconic painter), and blend them together so the output image looks like the content image “painted” in the style of the reference image. Today, we are excited to share a pre-trained style transfer TensorFlow Lite model that is optimized for mobile, and an Android and an iOS sample app that uses the model to stylize any images.

In this article, we will walk you through the journey of optimizing the large TensorFlow model for mobile deployment, and how to use it efficiently in a mobile app with TensorFlow Lite. We hope that you can use our pre-trained style transfer model or leverage our insights for your use cases.

Background

An example of style transfer

Style transfer was first published in A Neural Algorithm of Artistic Style. The original technique, however, was computationally expensive and it can take several seconds to stylize an image even on high-end GPUs. Subsequent work by several authors (for example) showed how to speed up style transfer.

After evaluating several model architectures, we decided to start with a pre-trained arbitrary style transfer model from Magenta for our sample app. The model can take any content and style image as input, then use a feedforward neural network to generate a stylized output image. This model allows much faster style transfer compared to the technique in Gatys’s paper, but it is still quite large (44 MB) and slow (2340 ms on Pixel 4 CPU). Therefore, we need to optimize the model to make it suitable to use in mobile applications. This article shares our experience doing so, with resources you can take advantage of in your work.

Optimizing the model architecture

The structure of our style transfer model

The Magenta’s arbitrary style transfer model consists of two subnetworks:

  • Style prediction network: converts the style image to a style embedding vector.
  • Style transform network: applies the style embedding vector on the content image to generate a stylized image.

Magenta’s style prediction network has an InceptionV3 backbone, so we replaced it with a MobileNetV2 backbone, which is optimized for mobile. The style transform network consists of several convolution layers. We applied the width multiplier idea from MobileNet, scaling down the number of output channels of all convolution layers by a factor of 4.
Then, we had to decide how to train our model. We experimented with multiple options: training the mobile model from scratch or distilling from the pre-trained Magenta’s model. We found that fixing the weights of MobileNetV2 while optimizing other parameters from scratch gave the best result.
We were able to achieve a similar level of style and content loss, while significantly shrinking and speeding up the model.

* Benchmarked on Pixel 4 CPU using TensorFlow Lite with 2 threads, April 2020.
* See this paper for more details about the definition of loss function used in this style transfer model

Quantization

Once we have settled on the model architecture, we continue to shrink our mobile model further with quantization using the TensorFlow Model Optimization Toolkit. This is an important technique that is applicable for most mobile deployment of TensorFlow models, as it can shrink the model size up to 4X and speed up model inference with insignificant quality trade-off.
Among the quantization options available that TensorFlow provides, we decided to use post-training integer quantization because it has the right balance of simplicity and model quality. We only needed to provide a small portion of our training dataset when converting the TensorFlow model to TensorFlow Lite.
After quantization, our model is more than an order smaller and faster than the original model, while maintaining the same level of style and content loss.

* Benchmarked on Pixel 4 CPU using TensorFlow Lite with 2 threads, April 2020.

Deployment to mobile

We implemented an Android app to demonstrate how to use the style transfer model. The app takes a style image, a content image, and outputs an image that mixes the style and content of the input images.
We use the phone’s camera to capture the content images with the Camera2 API and provide a set of famous paintings to be used as style images. As mentioned above, there are two steps to apply a style to a content image. Firstly, we extract the style as an array floats using the style prediction network. Then we apply this style to the content image using the style transform network.
In order to achieve the best performance on both CPU and GPU, we created two sets of TensorFlow Lite models optimized for each chip. We use the int8 quantized model for CPU inference, and float16 quantized model for GPU inference. GPU generally achieves better performance than CPU but it currently only supports float models, which are larger than int8 quantized models. Here is how the int8 and the float16 model perform.

* Benchmarked on Pixel 4 using TensorFlow Lite, April 2020.

Another possible performance gain is to cache the results of the style prediction network if you only plan to support a fixed set of style images in your mobile app. This will make your app smaller as you do not need to include the style prediction network, which accounts for 91% of the total network size. This is the main reason why the process is splitted into two models instead of only one.
The sample can be found on GitHub and the main class applying style is the StyleTransferModelExecutor.
It is important that we do not run style transfer on the UI thread as it is computational expensive. We instead use the ViewModel class from AndroidX and a Coroutine to run it on a dedicated background thread and easily update the view. Besides, when running a model using GPU delegate, TF Lite interpreter initialization, GPU delegate initialization and inference have to run all on the same thread.

Style transfer in production

The Google Arts & Culture app recently added Art Transfer that uses TensorFlow Lite to run style transfer on-device. The model used is very similar to the one above but prioritizes quality over speed and model size. Try it out if you are interested in seeing style transfer in production.

Your Turn

If you want to add Style Transfer to your own app, you can start by downloading the mobile sample. Both model versions, the float16 (predict network, transform network) and the int8 quantized version (predict network, transform network), are available on TensorFlow Hub. We can’t wait to see what you can create! Don’t forget to share with us your creations.

Resources

Running machine learning models on-device has the benefits of keeping the users data private while enabling features with low latency.
In this post, we have shown that directly converting a TensorFlow model to TensorFlow Lite might be just the first step. To achieve good performance, developers should optimize their model with quantization, and find the right trade-off between model quality, model size, and inference time.
We used the resources below to create our model. They might be also applicable to your on-device machine learning use cases.

  • Magenta model repository
    Magenta is an open source project powered by TensorFlow. It uses machine learning to make music and art. There are many models that can be converted to TensorFlow Lite, including this style transfer model.
  • TensorFlow Model Optimization Toolkit
    Model Optimization Toolkit provides multiple methods to optimize your model, including quantization and pruning.
  • TensorFlow Lite delegates
    TensorFlow Lite can leverage many different types of hardware accelerator available on devices, including GPUs and DSPs, to speed up model inference.

Read More

Introducing the new TensorFlow Profiler

Introducing the new TensorFlow Profiler

Posted by Anirudh Sriram, Technical Writer, and Gal Oshri, Product Manager

Performance is a key consideration of successful ML research and production solutions. Faster model training leads to faster iterations and reduced overhead. It is sometimes an essential requirement to make a particular ML solution feasible.

However, it is not always clear what should be optimized. Is there an issue with a specific operation (op), or the input pipeline?

To help answer this, we have developed an extensive set of tools for TensorFlow performance profiling. Beyond the ability to capture and investigate numerous aspects of a profile, the tools offer guidance on how to resolve performance bottlenecks (e.g. input-bound programs).

These tools are used by low-level experts improving TensorFlow’s infrastructure, as well as engineers in Google’s most popular products to optimize their model performance. We want to enable the broader community to take advantage of the tools used at Google for performance profiling. That is why we recently open sourced the new TensorFlow Profiler.

TensorFlow Profiler overview page

What is the TensorFlow Profiler?

The TensorFlow Profiler (or the Profiler) provides a set of tools that you can use to measure the training performance and resource consumption of your TensorFlow models. This new version of the Profiler is integrated into TensorBoard, and builds upon existing capabilities such as the Trace Viewer.

The Profiler has the following new profiling tools available:

  • Overview Page: Provides a top-level view of model performance and recommendations to optimize performance
  • Input Pipeline Analyzer: Analyzes your model’s data input pipeline for bottlenecks and recommends improvements to improve performance
  • TensorFlow Stats: Displays performance statistics for every TensorFlow operation executed during the profiling session
  • GPU Kernel Stats: Displays performance statistics and the originating operation for every GPU accelerated kernel

Check out the Profiler guide in the TensorFlow documentation to learn more about these tools.

Getting started

The best way to get started with the Profiler is to follow the Colab tutorial here. We will cover a few of the important steps and insights in the blog post. First, we install the Profiler plugin for TensorBoard:

pip install -U tensorboard_plugin_profile

This adds the full Profiler capabilities to our TensorBoard installation. Next, we ensure that our model training captures a profile. In this case, we will use the TensorBoard callback in Keras:

tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir = logs,
profile_batch = '500,510')

We can choose which batches to profile with the profile_batch parameter. This enables us to choose the number of steps to capture (recommended to be no more than 10). It also helps us skip the first few batches to avoid inaccuracy due to initialization overhead. There are other methods for capturing a profile, described here. We now start TensorBoard with the following command:

tensorboard --logdir {log directory}    # in terminal
%tensorboard --logdir {log directory} # in Colab

After clicking on Profile, we see the overview page: This immediately gives us an indication of our program’s performance. Besides a useful summary, we see a recommendation telling us that our program is input-bound (meaning our accelerator is wasting time waiting for input). This is a really common problem.
By following the instructions in the tutorial, we can bring our average step time from ~30ms to ~3ms. That’s a 10x improvement! While this is a toy example, it is common to hear from engineers and researchers at Google that they managed to improve their performance by significant factors.

Recommendations

Performance optimization is an iterative process and can sometimes be frustrating as it is tricky to pinpoint the exact location of the bottlenecks in your program. Not only can the Profiler tell you where your program has bottlenecks, it can often also tell you what you can do to resolve them and make your code execute faster. Following the recommendations provided can shorten the overall time taken to optimize your program.
When you open TensorBoard to view the profiling results, the Overview page provides code optimization recommendations below the Step time graph. One of the most common reasons for slow code execution is an improperly configured data input pipeline. Leverage the capabilities of the Input pipeline analyzer to effectively identify and eliminate bottlenecks in your data input pipeline. Read the best practices section of the Profiler guide to learn more about other strategies you can employ to get optimal performance.

More resources

Check out these resources to learn more:

What’s next for the TensorFlow Profiler?

In addition to addressing feedback, we are expanding the profiler’s capabilities. A few areas we are currently working on:

  • Memory Profiler: View memory usage over time and the associated op/training step.
  • Keras Analysis: Enable linking the information in the profiler to Keras. This enables, for example, identifying which Keras layers correspond to the ops shown in the trace viewer.
  • Multiworker GPU Analysis: Enable profiling multiple GPU workers and aggregate the results. Analyze the hotspot and the communication across workers.

We are excited to continue bringing the tools used at Google to improve ML performance to the broader community. If there are specific capabilities that would help you the most, or to report a bug, feel free to open an issue here!

Read More

How TensorFlow Lite helps you from prototype to product

How TensorFlow Lite helps you from prototype to product

Posted by Khanh LeViet, Developer Advocate

TensorFlow Lite is the official framework to run inference with TensorFlow models on edge devices. TensorFlow Lite is deployed on more than 4 billions edge devices worldwide, supporting Android, iOS, Linux-based IoT devices and microcontrollers.

Since first launch in late 2017, we have been improving TensorFlow Lite to make it robust while keeping it easy to use for all developers – from the machine learning experts to the mobile developers who just started learning about machine learning.

In this blog, we will highlight recent launches that made it easier for you to go from prototyping an on-device use case to deploying in production.
If you prefer a video format, check out this talk from TensorFlow DevSummit 2020.

Prototype: jump-start with state-of-the-art models

As machine learning is a very fast-moving field, it is very important to be able to know what is possible with current technologies before investing resources into building a feature. We have a repository of pretrained models and sample applications that implement the models so that you can try out TensorFlow Lite models on real devices without writing any code. Then, you can quickly integrate the models into your application to prototype and test how your user experiences will be like before spending time on training your own model.

We have published several new pretrained models, including a question & answer model and a style transfer model.

We are also committed to bringing more state-of-the-art models from research teams to TensorFlow Lite. Recently we have enabled 3 new model architectures: EfficientNet-Lite (paper), MobileBERT (paper) and ALBERT-Lite (paper).

  • EfficientNet-Lite is a novel image classification model that achieves state-of-the-art accuracy with an order of magnitude of fewer computations and parameters. It is optimized for TensorFlow Lite, supporting quantization with negligible accuracy loss and fully supported by the GPU delegate for faster inference. Find out more in our blog post.
    Benchmark on Pixel 4 CPU, 4 Threads, March 2020
  • MobileBERT is an optimized version of the popular BERT (paper) model that achieved state-of-the-art accuracy on a range of NLP tasks, including question and answer, natural language inference and others. MobileBERT is about 4x faster and smaller than BERT but retains similar accuracy.
  • ALBERT is another light-weight version of the BERT that was optimized for model size while retaining the same accuracy. ALBERT-Lite is the TensorFlow Lite compatible version of ALBERT, which is 6x smaller than BERT, or 1.5x smaller than MobileBERT, while the latency is on par with BERT.
Benchmark on Pixel 4 CPU, 4 Threads, March 2020
Model hyper parameters: Sequence length 128, Vocab size 30K

Develop model: without ML expertise, create models for your dataset

When bringing state-of-the-art research models to TensorFlow Lite, we also want to make it easier for you to customize these models to your own use cases. We are excited to announce TensorFlow Lite Model Maker, an easy-to-use tool to adapt state-of-the-art machine learning models to your dataset with transfer learning. It wraps the complex machine learning concepts with an intuitive API, so that everyone can get started without any machine learning expertise. You can train a state-of-the-art image classification with only 4 lines of code:

data = ImageClassifierDataLoader.from_folder('flower_photos/')
model = image_classifier.create(data)
loss, accuracy = model.evaluate()
model.export('flower_classifier.tflite', 'flower_label.txt', with_metadata=True)

Model Maker supports many state-of-the-art models that are available on TensorFlow Hub, including the EfficientNet-Lite models. If you want to get higher accuracy, you can switch to a different model architecture by changing just one line of code while keeping the rest of your training pipeline.

# EfficinetNet-Lite2.
model = image_classifier.create(data, efficientnet_lite2_spec)

# ResNet 50.
model = image_classifier.create(data, resnet_50_spec)

Model Maker currently supports two use cases: image classification (tutorial) and text classification (tutorial), with more computer vision and NLP use cases coming soon.

Develop model: attach metadata for seamless model exchange

The TensorFlow Lite file format has always had the input/output tensor shape in its metadata. This works well when the model creator is also the app developer. However, as the on-device machine learning ecosystem grows, these tasks are increasingly performed by different teams within an organization or even between organizations. To facilitate these model knowledge exchanges, we have added new fields in the metadata. They fall into two broad categories:

  1. Machine-readable parameters – e.g. normalization parameters such as mean and standard deviation, category label files. These parameters can be read by other systems so wrapper code can be generated. You can see an example of this in the next section.
  2. Human-readable parameters – e.g. model description, model license. This can provide the app developer using the model crucial information on how to use the model correctly – are there strengths or weaknesses they should be aware of? Also, fields like licenses can be critical in deciding whether a model can be used. Having this attached to the model significantly reduces the barrier of adoption.

To supercharge this effort, models created by TensorFlow Lite Model Maker and image related TensorFlow Lite models on TensorFlow Hub already have metadata attached to it. If you are creating your own model, you can attach metadata to make sharing models easier.

# Creates model info.
model_meta = _metadata_fb.ModelMetadataT()
model_meta.name = "MobileNetV1 image classifier"
model_meta.description = ("Identify the most prominent object in the "
"image from a set of 1,001 categories such as "
"trees, animals, food, vehicles, person etc.")
model_meta.version = "v1"
model_meta.author = "TensorFlow"
model_meta.license = ("Apache License. Version 2.0 "
"http://www.apache.org/licenses/LICENSE-2.0.")
# Describe input and output tensors
# ...

# Writing the metadata to your model
b = flatbuffers.Builder(0)
b.Finish(
model_meta.Pack(b),
_metadata.MetadataPopulator.METADATA_FILE_IDENTIFIER)
metadata_buf = b.Output()
populator = _metadata.MetadataPopulator.with_model_file(model_file)
populator.load_metadata_buffer(metadata_buf)
populator.load_associated_files(["your_path_to_label_file"])
populator.populate()

For a complete example of how we populate the metadata for MobileNet v1, please refer to this guide.

Develop app: automatically generate code from model

Instead of copy and pasting error-prone boilerplate code to transform typed objects such as Bitmap to ByteArray to feed to TensorFlow Lite interpreter, a code generator can generate the wrapper code ready for integration using the machine-readable parts of the metadata.
You can use our first code generator build for Android to generate model wrappers. We are also working on integrating this tool into Android Studio.

Develop app: discover performance with the benchmark and profiling tools

Once a model is created, we would like to check how it performs on mobile devices. TensorFlow Lite provides benchmark tools to measure model performance of models. We have added support for running benchmarks with all runtime options, including running models on GPU or other supported hardware accelerators, specifying the number of threads and more. You can also get inference latency breakdown to the granularity of a single operation to identify the most time consuming operations and optimize your model inference.
After integrating a model to your application, you may encounter other performance issues so that you may resort to platform-provided performance profiling tools. For example, on Android, one could investigate performance issues via various tracing tools. We have launched a TensorFlow Lite performance tracing module on Android that helps to poke into TensorFlow Lite internals. It is installed by default in our nightly release. With tracing, one may find whether there is resource contention during inference. Please refer to our documentation to learn more about how to use the module in the context of the Android benchmark tool.
We will continue working on improving TensorFlow Lite performance tooling to make it more intuitive and more helpful to measure and tune TensorFlow Lite performance on various devices.

Deploy: easily scale to multiple platforms

Nowadays, most applications need to support multiple platforms. That’s why we built TensorFlow Lite to work seamlessly across platforms: Android, iOS, Raspberry Pi, and other Linux-based IoT devices. All TensorFlow Lite models will just work out-of-the-box on any officially supported platforms, so that you can focus on creating good models instead of worrying about how to adapt your models to different platforms.
Each platform has its own hardware accelerator that can be used to speed up model inference. TensorFlow Lite has already supported running models on NNAPI for Android, GPU for both iOS and Android. We are excited to add more hardware accelerators:

  • On Android, we have added support for Qualcomm Hexagon DSP which is available on millions of devices. This enables developers to leverage the DSP on older Android devices below Android 8.1 where Android NN API is unavailable.
  • On iOS, we have launched CoreML delegate to allow running TensorFlow Lite models on Apple’s Neural Engine.

Besides, we continued to improve performance on existing supported platforms as you can see from the graph below comparing the performance between May 2019 and February 2020. You only need to upgrade to the latest version of TensorFlow Lite library to benefit from these improvements.

Pixel 4 – Single Threaded CPU, February 2020

Future work

Over the coming months, we will work on supporting more use cases and improving developer experiences:

  • Continuously release up-to-date state-of-the-art on-device models, including better support for BERT-family models for NLP tasks and new vision models.
  • Publish new tutorials and examples demonstrating more use cases, including how to use C/C++ APIs for inference on mobile.
  • Enhance Model Maker to support more tasks including object detection and several NLP tasks. We will add BERT support for NLP tasks, such as question and answer. This will empower developers without machine learning expertise to build state-of-the-art NLP models through transfer learning.
  • Expand the metadata and codegen tools to support more use cases, including object detection and more NLP tasks.
  • Launch more platform integration for even easier end-to-end experience, including better integration with Android Studio and TensorFlow Hub.

Feedback

We are committed to continue improving TensorFlow Lite and looking forward to seeing what you have built with TensorFlow Lite, as well as hearing your feedback. Share your use cases with us directly or on Twitter with hashtags #TFLite and #PoweredByTF. To report bugs and issues, please reach out to us on GitHub.

Acknowledgements

Thanks to Amy Jang, Andrew Selle, Arno Eigenwillig‎, Arun Venkatesan‎, Cédric Deltheil, Chao Mei, Christiaan Prins, Denny Zhou, Denis Brulé, Elizabeth Kemp, Hoi Lam, Jared Duke, Jordan Grimstad, Juho Ha, Jungshik Jang‎, Justin Hong, Hongkun Yu, Karim Nosseir, Khanh LeViet, Lawrence Chan, Lei Yu, Lu Wang‎, Luiz Gustavo Martins‎, Maxime Brénon, Mia Roh, Mike Liang, Mingxing Tan, Renjie Liu‎, Sachin Joglekar, Sarah Sirajuddin, Sebastian Goodman, Shiyu Hu, Shuangfeng Li‎, Sijia Ma, Tei Jeong, Tian Lin, Tim Davis, Vojtech Bardiovsky, Wei Wei, Wouter van Oortmerssen, Xiaodan Song, Xunkai Zhang‎, YoungSeok Yoon‎, Yuqi Li‎‎, Yi Zhou, Zhenzhong Lan, Zhiqing Sun and more.Read More

Introducing TensorFlow Videos for a Global Audience: Spanish

Introducing TensorFlow Videos for a Global Audience: Spanish

Posted by the TensorFlow Team

When the TensorFlow YouTube channel launched in 2018, we had a vision to inform and inspire developers around the world about what was possible with Machine Learning. With series like Coding TensorFlow showing how you can use it, and Made with TensorFlow showing inspirational stories about what people have done with TensorFlow and much more, the channel has grown greatly. But we learned an important lesson: it’s a global phenomenon, and to reach the world effectively, we should provide some of our best content in multiple languages with native speakers presenting. Check out the popular Zero to Hero series in Spanish!

Machine Learning with TensorFlow: Zero to Hero

Parece que uno no puede abrir un navegador, periódico o libro sin ver algo relacionado a Machine Learning o AI. Hay mucha información y mucha publicidad. Con eso en mente, Laurence Moroney, del equipo de TensorFlow, quiso producir una serie de cuatro videos, desde la perspectiva del desarrollador, sobre lo que realmente es el aprendizaje automático. Se basa en su popular conferencia de Google IO 2019, y se titula “Machine Learning: From Zero to Hero with TensorFlow”

Aquí está el primer video donde aprenderás que machine learning representa un nuevo paradigma en la programación, donde en lugar de programar reglas explícitas en un lenguaje como Java o C ++, se construye un sistema que está capacitado en datos para inferir las reglas en sí. Pero, ¿cómo se ve realmente ML? Aquí podrás ver un ejemplo básico de Hello World de cómo construir un modelo de ML, presentando ideas que aplicaremos en episodios posteriores a un reto más interesante: computer vision.

En el segundo video aprenderás acerca de computer vision al enseñarle a una computadora a ver y reconocer diferentes objetos. También puedes practicar y ver el ejemplo tú mismo aquí: https://goo.gle/34cHkDk

En el tercer vídeo discutimos las redes neuronales convolucionales y por qué son tan poderosas en aplicaciones de computer vision. Una convolución es un filtro que pasa sobre una imagen, la procesa e identifica características que muestran una similitud en la imagen. ¡En este video verás cómo funcionan, procesando una imagen para ver si puedes extraer características de ella! ¡También puedes probar aquí un codelab!: http://bit.ly/2lGoC5f

Aquí está el cuarto y último video donde aprenderás cómo construir un clasificador de imágenes para piedra, papel y tijeras. En el episodio uno, hablamos del juego de piedra, papel y tijeras; y discutimos lo difícil que podría ser escribir código para detectarlos y clasificarlos. A medida que los episodios han progresado hacia el machine learning, hemos aprendido cómo construir redes neuronales desde la detección de patrones en píxeles sin procesar, hasta la clasificación de ellos, hasta la detección de características mediante convoluciones. En este episodio, ponemos en práctica toda la información de las primeras tres partes de la serie. Cuaderno Colab: http://bit.ly/2lXXdw5. Conjunto de datos de piedra, papel y tijera: http://bit.ly/2kbV92O

¡Esperamos que disfrutes de esta serie y déjanos saber si deseas ver más!
Read More