Tensorflow – Page 24

From singing to musical scores: Estimating pitch with SPICE and Tensorflow Hub

June 17, 2020

by TensorFlow Blog Tensorflow

Posted by Luiz Gustavo Martins, Beat Gfeller and Christian Frank

Pitch is an attribute of musical tones (along with duration, intensity and timbre) that allows you to describe a note as “high” or “low”. Pitch is quantified by frequency, measured in Hertz (Hz), where one Hz corresponds to one cycle per second. The higher the frequency, the higher the note.

Pitch detection is an interesting challenge. Historically, for a machine to understand pitch, it would need to rely on complex hand-crafted signal-processing algorithms to measure the frequency of a note, in particular to separate the relevant frequency from background noise and backing instruments. Today, we can do that with machine learning, more specifically with the SPICE model (SPICE: Self-Supervised Pitch Estimation).

SPICE is a pretrained model that can recognize the fundamental pitch from mixed audio recordings (including noise and backing instruments). The model is also available to use on the web with TensorFlow.js and on mobile devices with TensorFlow Lite.

In this tutorial, we’ll walk you through using SPICE to extract the pitch from short musical clips. First we will load the audio file and process it. Then we will use machine learning to solve this problem (and you’ll notice how easy it is with TensorFlow Hub). Finally, we will do some post-processing and some cool visualization. You can follow along with this Colab notebook.

Loading the audio file

The model expects raw audio samples as input. To help you with this, we’ve shown four methods you can use to import your input wav file to the colab:

Record a short clip of yourself singing directly in Colab
Upload a recording from your computer
Download a file from your Google Drive
Download a file from a URL

You can choose any one of these methods. Recording yourself singing directly in Colab is the easiest one to try, and the most fun.
Audio can be recorded in many formats (for example, you might record using an Android app, or on a desktop computer, or on the browser), converting your audio into the exact format the model expects can be challenging. To help you with that, there’s a helper function convert_audio_for_model to convert your wav file to the correct format of one audio channel at 16khz sampling rate.
For the rest of this post, we will use this file:

Preparing the audio data

Now that we have loaded the audio, we can visualize it using a spectrogram, which shows frequencies over time. Here, we use a logarithmic frequency scale, to make the singing more clearly visible (note that this step is not required to run the model, it is just for visualization).

Note: this graph was created using Librosa lib. You can find more information here.

We need one last conversion. The input must be normalized to floats between -1 and 1. In a previous step we converted the audio to be in 16 bit format (using the helper function convert_audio_for_model). To normalize it, we just need to divide all the values by 2¹⁶ or in our code, MAX_ABS_INT16:

audio_samples = audio_samples / float(MAX_ABS_INT16)

Executing the model

Loading a model from TensorFlow Hub is simple. You just use the load method with the model’s URL.

model = hub.load("https://tfhub.dev/google/spice/2")

Note: An interesting detail here is that all the model urls from Hub can be used for download and also to read the documentation, so if you point your browser to that link, you can read documentation on how to use the model and learn more about how it was trained.
Now we can use the model loaded from TensorFlow Hub by passing our normalized audio samples:

output = model.signatures["serving_default"](tf.constant(audio_samples, tf.float32))

pitch_outputs = output["pitch"]
uncertainty_outputs = output["uncertainty"]

At this point we have the pitch estimation and the uncertainty (per pitch detected). Converting uncertainty to confidence (confidence_outputs = 1.0 - uncertainty_outputs), we can get a good understanding of the results: As we can see, for some predictions (especially where no singing voice is present), the confidence is very low. Let’s only keep the predictions with high confidence by removing the results where the confidence was below 0.9. To confirm that the model is working correctly, let’s convert pitch from the [0.0, 1.0] range to absolute values in Hz. To do this conversion we can use the function present in the Colab notebook:

def output2hz(pitch_output):
  # Constants taken from https://tfhub.dev/google/spice/2
  PT_OFFSET = 25.58
  PT_SLOPE = 63.07
  FMIN = 10.0;
  BINS_PER_OCTAVE = 12.0;
  cqt_bin = pitch_output * PT_SLOPE + PT_OFFSET;
  return FMIN * 2.0 ** (1.0 * cqt_bin / BINS_PER_OCTAVE)

confident_pitch_values_hz = [ output2hz(p) for p in confident_pitch_outputs_y ]

If we plot these values over the spectrogram we can see how well the predictions match the dominant pitch, that can be seen as the stronger lines in the spectrogram: Success! We managed to extract the relevant pitch from the singer’s voice.
Note that for this particular example, a spectrogram-based heuristic for extracting pitch may have worked as well. In general, ML-based models perform better than hand-crafted signal processing methods in particular when background noise and backing instruments are present in the audio. For a comparison of SPICE with a spectrogram-based algorithm (SWIPE) see here.

Converting to musical notes

To make the pitch information more useful, we can also find the notes that each pitch represents. To do that we will apply some math to convert frequency to notes. One important observation is that, in contrast to the inferred pitch values, the converted notes are quantized as this conversion involves rounding (the function hz2offset in the notebook, uses some math for which you can find a good explanation here). In addition, we also need to group the predictions together in time, to obtain longer sustained notes instead of a sequence of equal ones. This temporal quantization is not easy, and our notebook just implements some heuristics which won’t produce perfect scores in general. It does work for sequences of notes with equal durations though, as in our example.
We start by adding rests (no singing intervals) based on the predictions that had low confidence. The next step is more challenging. When a person sings freely, the melody may have an offset to the absolute pitch values that notes can represent. Hence, to convert predictions to notes, one needs to correct for this possible offset.
After calculating the offsets and trying different speeds (how many predictions make an eighth) we end up with these rendered notes: We can also export the converted notes to a MIDI file using music21:

converted_audio_file_as_midi = converted_audio_file[:-4] + '.mid'
fp = sc.write('midi', fp=converted_audio_file_as_midi)

What’s next?

With TensorFlow Hub you can easily find great models, like SPICE and many others, to help you solve your machine learning challenges. Keep exploring the model, play with the colab and maybe try building something similar to FreddieMeter, but with your favorite singer!
We are eager to know what you can come up with. Share your ideas with us on social media adding the #TFHub to your post.

Acknowledgements

This blog post is based on work by Beat Gfeller, Christian Frank, Dominik Roblek, Matt Sharifi, Marco Tagliasacchi and Mihajlo Velimirović on SPICE: Self-Supervised Pitch Estimation. Thanks also to Polong Lin for reviewing and suggesting great ideas and to Jaesung Chung for supporting the creation of the TF Lite version of the model.Read More

Running and Testing TF Lite on Microcontrollers without hardware in Renode

June 9, 2020

by TensorFlow Blog Tensorflow

A guest post by Michael Gielda of Antmicro

Every day more and more software developers are exploring the worlds of machine learning, embedded systems, and the Internet of Things. Perhaps one of the most exciting advances to come out of the most recent innovations in these fields is the incorporation of ML at the edge and into smaller and smaller devices – often referred to as TinyML.

In “The Future of Machine Learning is Tiny”, Pete Warden predicted that machine learning would become increasingly available on tiny, low-power devices. Thanks to the work of the TensorFlow community, the power and flexibility of the framework is now also available on fairly resource-constrained devices like Arm Cortex-M MCUs, as per Pete’s prediction.

Thousands of developers using TensorFlow can now deploy ML models for actions such as keyphrase detection or gesture recognition onto embedded and IoT devices. However, testing software at scale on many small and embedded devices can still be challenging. Whether it’s difficulty sourcing hardware components, incorrectly setting up development environments or running into configuration issues while incorporating multiple unique devices into a multi-node network, sometimes even a seemingly simple task turns out to be complex.

Renode 1.9 was released just last month

Even experienced embedded developers find themselves trudging through the process of flashing and testing their applications on physical hardware just to accomplish simple test-driven workflows which are now commonplace in other contexts like Web or desktop application development.

The TensorFlow Lite MCU team also faced these challenges: how do you repeatedly and reliably test various demos, models, and scenarios on a variety of hardware without manually re-plugging, re-flashing and waving around a plethora of tiny boards?

To solve these challenges, they turned to Renode, an open source simulation framework from Antmicro that strives to do just that: allow hardware-less, Continuous Integration-driven workflows for embedded and IoT systems.

In this article, we will show you the basics of how to use Renode to run TensorFlow Lite on a virtual RISC-V MCU, without the need for physical hardware (although if you really want to, we’ve also prepared instructions to run the same exact software on a Digilent Arty board).

While this tutorial focuses on a RISC-V-based platform, Renode is able to simulate software targeting many different architectures, like Arm, POWER and others, so this approach can be used with other hardware as well.

What’s the deal with Renode?

At Antmicro, we pride ourselves on our ability to enable our customers and partners to create scalable and sustainable advanced engineering solutions to tackle complex technical challenges. For the last 10 years, our team has worked to overcome many of the same structural barriers and developer tool deficiencies now faced by the larger software developer community. We initially created the Renode framework to meet our own needs, but as proud proponents of open source, in 2015 we decided to release it under a permissive license to expand the reach and make embedded system design flexible, mobile and accessible to everyone.

Renode, which has just released version 1.9, is a development framework which accelerates IoT and embedded systems development by letting you simulate physical hardware systems – including both the CPU, peripherals, sensors, environment and – in case of multi-node systems – wired or wireless medium between nodes. It’s been called “docker for embedded” and while the comparison is not fully accurate, it does convey the idea pretty well.
Renode allows you to deterministically simulate entire systems and dynamic environments – including feeding modeled sample data to simulated sensors which can then be read and processed by your custom software and algorithms. The ability to quickly run unmodified software without access to physical hardware makes Renode an ideal platform for developers looking to experiment and build ML-powered applications on embedded and IoT devices with TensorFlow Lite.

Getting Renode and demo software

To get started, you first need to install Renode as detailed in its README file – binaries are available for Linux, Mac and Windows.

Make sure you download the proper version for your operating system to have the renode command available. Upon running the renode command in your terminal you should see the Monitor pop up in front of you, which is Renode’s command-line interface.

The Renode “Monitor” CLI

Once Renode has started, you’re good to go – remember, you don’t need any hardware.

We have prepared all the files you will need for this demo in a dedicated GitHub repository.

Clone this repository with git (remember to get the submodules):

git clone --recurse-submodules https://github.com/antmicro/litex-vexriscv-tensorflow-lite-demo

We will need a demo binary to run. To simplify things, you can use the precompiled binary from the binaries/magic_wand directory (in “Building your own application” below we’ll explain how to compile your own, but you only need to do that when you’re ready).

Running TensorFlow Lite in Renode

Now the fun part! Navigate to the renode directory:

cd renode

The renode directory contains a model of the ADXL345 accelerometer and all necessary scripts and assets required to simulate the Magic Wand demo.

To start the simulation, first run renode with the name of the script to be loaded. Here we use “litex-vexriscv-tflite.resc“, which is a “Renode script” (.resc) file with the relevant commands to create the needed platform and load the application to its memory:

renode litex-vexriscv-tflite.resc

You will see Renode’s CLI, called “Monitor”, from which you can control the emulation. In the CLI, use the start command to begin the simulation:

(machine-0) start

You should see the following output on the simulated device’s virtual serial port (also called UART – which will open as a separate terminal in Renode automatically):

As easy as 1-2-3

What just happened?

Renode simulates the hardware (both the RISC-V CPU but also the I/O and sensors) so that the binary thinks it’s running on the real board. This is achieved by two Renode features: machine code translation and full SoC support.

First, the machine code of the executed application is translated to the native host machine language.

Whenever the application tries to read from or write to any peripheral, the call is intercepted and directed to an appropriate model. Renode models, usually (but not exclusively) written in C# or Python, implement the register interface and aim to be behaviorally consistent with the actual hardware. Thanks to the abstract nature of these models, you can interact with them programmatically from the Renode CLI or from script files.

In our example we feed the virtual sensor with some offline, pre-recorded angle and circle gesture data files:

i2c.adxl345 FeedSample @circle.data

The TF Lite binary running in Renode processes the data and – unsurprisingly – detects the gestures.

This shows another benefit of running in simulation – we can be entirely deterministic should we choose to, or devise more randomized test scenarios, feeding specially prepared generated data, choosing different simulation seeds etc.

Building your own application

If you want to build other applications, or change the provided demos, you can now build them yourself using the repository you have downloaded. You will need to install the following prerequisites (tested on Ubuntu 18.04):

sudo apt update
sudo apt install cmake ninja-build gperf ccache dfu-util device-tree-compiler wget python python3-pip python3-setuptools python3-tk python3-wheel xz-utils file make gcc gcc-multilib locales tar curl unzip

Since the software is running the Zephyr RTOS, you will need to install Zephyr’s prerequisites too:

sudo pip3 install psutil netifaces requests virtualenv

# install Zephyr SDK
wget https://github.com/zephyrproject-rtos/sdk-ng/releases/download/v0.11.2/zephyr-sdk-0.11.2-setup.run
chmod +x zephyr-sdk-0.11.2-setup.run
./zephyr-sdk-0.11.2-setup.run -- -d /opt/zephyr-sdk

Once all necessary prerequisites are in place, go to the repository you downloaded earlier:

cd litex-vexriscv-tensorflow-lite-demo

And build the software with:

cd tensorflow
make -f tensorflow/lite/micro/tools/make/Makefile TARGET=zephyr_vexriscv 
magic_wand_bin

The resulting binary can be found in the tensorflow/lite/micro/tools/make/gen/zephyr_vexriscv_x86_64/magic_wand/CMake/zephyr folder.

Copy it into the root folder with:

TF_BUILD_DIR=tensorflow/lite/micro/tools/make/gen/zephyr_vexriscv_x86_64
cp ${TF_BUILD_DIR}/magic_wand/CMake/zephyr/zephyr.elf ../
cp ${TF_BUILD_DIR}/magic_wand/CMake/zephyr/zephyr.bin ../

You can run it in Renode exactly as before.

To make sure the tutorial keeps working, and to showcase how simulation also enables you to do Continuous Integration easily, we also put together a Travis CI for the demo, and that is how the binary in the example is generated.

We will describe how the TensorFlow Lite team uses Renode for Continuous Integration and how you can do that yourself in a separate note soon – stay tuned for that!

Running on hardware

Now that you have the binaries and you’ve seen them work in Renode, let’s see how the same binary behaves on physical hardware.

You will need a Digilent Arty A7 board and ACL2 PMOD, connected to the rightmost Pmod connector as in the picture.

The hardware setup

The system is a SoC-in-FPGA called LiteX, with a pretty capable RISC-V core and various I/O options.

To build the necessary FPGA gateware containing our RISC-V SoC, we will be using LiteX Build Environment, which is an FPGA oriented build system that serves as an easy entry into FPGA development on various hardware platforms.

Now initialize the LiteX Build Environment:

cd litex-buildenv
export CPU=vexriscv
export CPU_VARIANT=full
export PLATFORM=arty
export FIRMWARE=zephyr
export TARGET=tf

./scripts/download-env.sh
source scripts/enter-env.sh

Then build the gateware:

make gateware

Once you have built the gateware, load it onto the FPGA with:

make gateware-load

With the FPGA programmed, you can load the Zephyr binary on the device using the flterm program provided inside the environment you just initialized above:

flterm --port=/dev/ttyUSB1 --kernel=zephyr.bin --speed=115200

flterm will open the serial port. Now you can wave the board around and see the gestures being recognized in the terminal. Congratulations! You have now completed the entire tutorial.

Summary

In this post, we have demonstrated how you can useTensorFlow Lite for MCUs without (and with) hardware. In the coming months, we will follow up with a description of how you can proceed from interactive development with Renode to doing Continuous Integration of your Machine Learning code, and then show the advantages of combining the strengths of TensorFlow Lite and the Zephyr RTOS.

You can find the most up to date instructions in the demo repository. The repository links to tested TensorFlow, Zephyr and LiteX code versions via submodules. Travis CI is used to test the guide.

If you’d like to explore more hardware and software with Renode, check the complete list of supported boards. If you encounter problems or have ideas, file an issue on GitHub, and for specific needs, such as enabling TensorFlow Lite and simulation on your platform, you can contact us at contact@renode.io.Read More

Part 2: Fast, scalable and accurate NLP: Why TFX is a perfect match for deploying BERT

June 8, 2020

by TensorFlow Blog Tensorflow

Guest author Hannes Hapke, Senior Data Scientist, SAP Concur Labs. Edited by Robert Crowe on behalf of the TFX team

Transformer models and the concepts of transfer learning in Natural Language Processing have opened up new opportunities around tasks like sentiment analysis, entity extractions, and question-answer problems.

BERT models allow data scientists to stand on the shoulders of giants. Pre-trained on large corpora, data scientists can then apply transfer learning using these multi-purpose trained transformer models and achieve state-of-the-art results for their domain-specific problems.

In part one of our blog post, we discussed why current deployments of BERT models felt too complex and cumbersome and how the deployment can be simplified through libraries and extensions of the TensorFlow ecosystem. If you haven’t checked out the post, we recommend it as a primer for the implementation discussion in this blog post.

At SAP Concur Labs, we looked at simplifying our BERT deployments and we discovered that the TensorFlow ecosystem provides the perfect tools to achieve simple and concise Transformer deployments. In this blog post, we want to take you on a deep dive of our implementation and how we use components of the TensorFlow ecosystem to achieve scalable, efficient and fast BERT deployments.

Want to jump ahead to the code?

If you would like to jump to the complete example, check out the Colab notebook. It showcases the entire TensorFlow Extended (TFX) pipeline we used to produce a deployable BERT model with the preprocessing steps as part of the model graph. If you want to try out our demo deployment, check out our demo page at SAP ConcurLabs showcasing our sentiment classification project.

Why use Tensorflow Transform for Preprocessing?

Before we answer this question, let’s take a quick look at how a BERT transformer works and how BERT is currently deployed.

What preprocessing does BERT require?

Transformers like BERT are initially trained with two main tasks in mind: masked language models and next sentence predictions (NSP). These tasks require an input data structure beyond the raw input text. Therefore, the BERT model requires, besides the tokenized input text, a tensor input_type_ids to distinguish between different sentences. A second tensor input_mask is used to note the relevant tokens within the input_word_ids tensor. This is required because we will expand our input_word_ids tensors with pad tokens to reach the maximum sequence length. That way all input_word_ids tensors will have the same lengths but the transformer can distinguish between relevant tokens (tokens from our input sentence) and irrelevant pads (filler tokens).

Figure 1: BERT tokenization

Currently, with most transformer model deployments, the tokenization and the conversion of the input text is either handled on the client side or on the server side as part of a pre-processing step outside of the actual model prediction.

This brings a few complexities with it: if the preprocessing happens on the client side then all clients need to be updated if the mapping between tokens and ids changes (e.g., when we want to add a new token). Most deployments with server-side preprocessing use a Flask-based web application to accept the client requests for model predictions, tokenize and convert the input sentence, and then submit the data structures to the deep learning model. Having to maintain two “systems” (one for the preprocessing and one for the actual model inference) is not just cumbersome and error prone, but also makes it difficult to scale.

Figure 2: Current BERT deployments

It would be great if we could get the best of both solutions: easy scalability and simple upgradeability. With TensorFlow Transform (TFT), we can achieve both requirements by building the preprocessing steps as a graph, exporting them together with the deep learning model, and ultimately only deploying one “system” (our combined deep learning model with the integrated preprocessing functionality). It’s worth pointing out that moving all of BERT into preprocessing is not an option when we want to fine-tune the tf.hub module of BERT for our domain-specific task.

Figure 3: BERT with TFX

Processing Natural Language with tf.text

In 2019, the TensorFlow team released a new tensor type: RaggedTensors which allow storing arrays of different lengths in a tensor. The implementation of RaggedTensors became very useful specifically in NLP applications, e.g., when we want to tokenize a 1-D array of sentences into a 2-D RaggedTensor with different array lengths.

Before tokenization:

[
 “Clara is playing the piano.”
 “Maria likes to play soccer.’”
 “Hi Tom!”
]

After the tokenization:

[
   [[b'clara'], [b'is'], [b'playing'], [b'the'], [b'piano'], [b'.']],
   [[b'maria'], [b'likes'], [b'to'], [b'play'], [b'soccer'], [b'.']],
   [[b'hi'], [b'tom'], [b'!']]
]

As we will see in a bit, we use RaggedTensors for our preprocessing pipelines. In late October 2019, the TensorFlow team then released an update to the tf.text module which allows wordpiece tokenization required for the preprocessing of BERT model inputs.

import tensorflow_text as text
 
vocab_file_path = bert_layer.resolved_object.vocab_file.asset_path.numpy()
do_lower_case = bert_layer.resolved_object.do_lower_case.numpy()
 
bert_tokenizer = text.BertTokenizer(
vocab_lookup_table=vocab_file_path, 
token_out_type=tf.int64, 
lower_case=do_lower_case)

TFText provides a comprehensive tokenizer specific for the wordpiece tokenization (BertTokenizer) required by the BERT model. The tokenizer provides the tokenization results as strings (tf.string) or already converted to word_ids (tf.int32).

NOTE: The tf.text version needs to match the imported TensorFlow version. If you use TensorFlow 2.2.x, you will need to install TensorFlow Text version 2.2.x, not 2.1.x or 2.0.x.

How can we preprocess text with TensorFlow Transform?

Earlier, we discussed that we need to convert any input text to our Transformer model into the required data structure of input_word_ids, input_mask, and input_type_ids. We can perform the conversion with TensorFlow Transform. Let’s have a closer look.

For our example model, we want to classify the sentiment of IMDB reviews using the BERT model.

    ‘This is the best movie I have ever seen ...’       -> 1
 ‘Probably the worst movie produced in 2019 ...’     -> 0
 ‘Tom Hank’s performance turns this movie into ...’ -> ?

That means that we’ll input only one sentence with every prediction. In practice, that means that all submitted tokens are relevant for the prediction (noted by a vector of ones) and all tokens are part of sentence A (noted by a vector of zeros). We won’t submit any sentence B in our classification case.

If you want to use a BERT model for other tasks, e.g., predicting the similarity of two sentences, entity extraction or question-answer tasks, you would have to adjust the preprocessing step.

Since we want to export the preprocessing steps as a graph, we need to use TensorFlow ops for all preprocessing steps exclusively. Due to this requirement, we can’t reuse functions of Python’s standard library which are implemented in CPython.

The BertTokenizer, provided by TFText, handles the preprocessing of the incoming raw text data. There is no need for lower casing your strings (if you use the uncased BERT model) or removing unsupported characters. The tokenizer from the TFText library requires a table of the support tokens as input. The tokens can be provided as TensorFlow LookupTable, or simply as a file path to a vocabulary file. The BERT model from TFHub provides such a file and we can determine the file path with

import tensorflow_hub as hub
 
BERT_TFHUB_URL = "https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/2"
bert_layer = hub.KerasLayer(handle=BERT_TFHUB_URL, trainable=True)
vocab_file_path = 
    bert_layer.resolved_object.vocab_file.asset_path.numpy()

Similarly, we can determine if the loaded BERT model is case-sensitive or not.

do_lower_case = bert_layer.resolved_object.do_lower_case.numpy()

We can now pass the two arguments to our TFText BertTokenizer and specify the data type of our tokens. Since we are passing the tokenized string to the BERT model, we need to provide the tokens as token indices (provided as int64 integers)

bert_tokenizer = text.BertTokenizer(
vocab_lookup_table=vocab_file_path, 
token_out_type=tf.int64, 
lower_case=do_lower_case
)

After instantiating the BertTokenizer, we can perform the tokenizations with the tokenize method.

 tokens = bert_tokenizer.tokenize(text)

Once the sentence is tokenized into token ids, we will need to prepend the start and append a separation token.

 CLS_ID = tf.constant(101, dtype=tf.int64)
    SEP_ID = tf.constant(102, dtype=tf.int64)
    start_tokens = tf.fill([tf.shape(text)[0], 1], CLS_ID)
    end_tokens = tf.fill([tf.shape(text)[0], 1], SEP_ID)
    tokens = tokens[:, :sequence_length - 2]
    tokens = tf.concat([start_tokens, tokens, end_tokens], axis=1)

At this point, our token tensors are still ragged tensors with different lengths. TensorFlow Transform expects all tensors to have the same length, therefore we will be padding the truncating the tensors to a maximum length (MAX_SEQ_LEN) and pad shorter tensors with a defined pad token.

PAD_ID = tf.constant(0, dtype=tf.int64)
tokens = tokens.to_tensor(default_value=PAD_ID)
      padding = sequence_length - tf.shape(tokens)[1]
      tokens = tf.pad(tokens, 
     [[0, 0], [0, padding]], 
     constant_values=PAD_ID)

The last step provides us with constant length token vectors which are the final step of the major preprocessing steps. Based on the token vectors, we can create the two required, additional data structures, input_mask, and input_type_ids.

In the case of the input_mask, we want to note all relevant tokens, basically all tokens besides the pad token. Since the pad token has the value zero and all ids are greater or equal zero, we can define the input_mask with the following ops.

input_word_ids = tokenize_text(text)
        input_mask = tf.cast(input_word_ids > 0, tf.int64)
        input_mask = tf.reshape(input_mask, [-1, MAX_SEQ_LEN])

To determine the input_type_ids is even simpler in our case. Since we are only submitting one sentence, the type ids are all zero in our classification example.

input_type_ids = tf.zeros_like(input_mask)

To complete the preprocessing setup, we will wrap all steps in the preprocessing_fn function which is required by TensorFlow Transform.

def preprocessing_fn(inputs):
    
    def tokenize_text(text, sequence_length=MAX_SEQ_LEN):
   ...
        return tf.reshape(tokens, [-1, sequence_length])
 
    def preprocess_bert_input(text, segment_id=0):
        input_word_ids = tokenize_text(text)
        ...        
        return (
            input_word_ids, 
            input_mask,
            input_type_ids
        )
    ...
 
    input_word_ids, input_mask, input_type_ids = 
        preprocess_bert_input(_fill_in_missing(inputs['text']))
 
    return {
        'input_word_ids': input_word_ids,
        'input_mask': input_mask,
        'input_type_ids': input_type_ids,
        'label': inputs['label']
    }

Train the Classification Model

The latest updates of TFX allow the use of native Keras models. In the example code below, we define our classification model. The model takes advantage of the pretrained BERT model and KerasLayer provided by TFHub. To avoid any misalignment between the transform step and the model training, we are creating the input layers dynamically based on the feature specification provided by the transformation step.

 feature_spec = tf_transform_output.transformed_feature_spec() 
    feature_spec.pop(_LABEL_KEY)
 
    inputs = {
key: tf.keras.layers.Input(
shape=(max_seq_length), 
name=key, 
dtype=tf.int32)
            for key in feature_spec.keys()}

We need to cast the variables since TensorFlow Transform can only output variables as one of the types: tf.string, tf.int64 or tf.float32 (tf.int64 in our case). However, the BERT model from TensorFlow Hub used in our Keras model above expects tf.int32 inputs. So, in order to align the two TensorFlow components, we need to cast the inputs in the input functions or in the model graph before passing them to the instantiated BERT layer.

 input_word_ids = tf.cast(inputs["input_word_ids"], dtype=tf.int32)
    input_mask = tf.cast(inputs["input_mask"], dtype=tf.int32)
    input_type_ids = tf.cast(inputs["input_type_ids"], dtype=tf.int32)

Once our inputs are converted to tf.int32 data types, we can pass them to our BERT layer. The layer returns two data structures: a pooled output, which represents the context vector for the entire text and list of vectors providing context specific vector representation for each submitted token. Since we are only interested in the classification of the entire text, we can ignore the second data structure.

bert_layer = load_bert_layer()
    pooled_output, _ = bert_layer(
        [input_word_ids, 
         input_mask, 
         input_type_ids
        ]
    )

Afterwards, we can assemble our classification model with tf.keras. In our example, we used the functional Keras API.

 x = tf.keras.layers.Dense(256, activation='relu')(pooled_output)
    dense = tf.keras.layers.Dense(64, activation='relu')(x)
    pred = tf.keras.layers.Dense(1, activation='sigmoid')(dense)
 
    model = tf.keras.Model(
        inputs=[inputs['input_word_ids'], 
                inputs['input_mask'], 
                inputs['input_type_ids']], 
        outputs=pred
    )
    model.compile(loss='binary_crossentropy', 
                  optimizer='adam', 
                  metrics=['accuracy'])

The Keras model can then be consumed by our run_fn function which is called by the TFX Trainer component. With the recent updates to TFX, the integration of Keras models was simplified. No “detour” with TensorFlow’s model_to_estimator function is required anymore. We can now define a generic run_fn function which executes the model training and exports the model after the completion of the training.

Here is an example of the setup of a run_fn function to work with the latest TFX version:

def run_fn(fn_args: TrainerFnArgs):
    tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)
    train_dataset = _input_fn(
        fn_args.train_files, tf_transform_output, 32)
    eval_dataset = _input_fn(
        fn_args.eval_files, tf_transform_output, 32)
 
    mirrored_strategy = tf.distribute.MirroredStrategy()
    with mirrored_strategy.scope():
        model = get_model(tf_transform_output=tf_transform_output)
 
    model.fit(
        train_dataset,
        steps_per_epoch=fn_args.train_steps,
        validation_data=eval_dataset,
        validation_steps=fn_args.eval_steps)
 
    signatures = {
        'serving_default':
            _get_serve_tf_examples_fn(model, tf_transform_output
            ).get_concrete_function(
                                 tf.TensorSpec(
                                 shape=[None],
                                 dtype=tf.string,
                                 name='examples')),
    }
    model.save(
        fn_args.serving_model_dir, 
        save_format='tf', 
        signatures=signatures)

It is worth taking special note of a few lines from the example Trainer function. With the latest release of TFX, we can now take advantage of the distribution strategies introduced in Keras last year in our TFX trainer components.

mirrored_strategy = tf.distribute.MirroredStrategy()
    with mirrored_strategy.scope():
        model = get_model(tf_transform_output=tf_transform_output)

It is most efficient to preprocess the data sets ahead of the model training, which allows for faster training, especially when the trainer passes multiple times over the same data set.
Therefore, TensorFlow Transform will perform the preprocessing prior to the training and evaluation, and store the preprocessed data as TFRecords.

{'input_mask': array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
 'input_type_ids': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
 'input_word_ids': array([  101,  2023,  3319,  3397, 27594,  2545,  2005,  2216,  2040, ..., 2014,   102]),
 'label': array([0], dtype=float32)}

This allows us to generate a preprocessing graph which then can be applied during our post-training prediction mode. Because we reuse the preprocessing graph, we can avoid skew between the training and the prediction preprocessing.

In our run_fn function we can then “wire up” the preprocessed training and evaluation data sets instead of the raw data sets to be used during the training:

 tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)
    train_dataset = _input_fn(fn_args.train_files, tf_transform_output, 32)
    eval_dataset = _input_fn(fn_args.eval_files, tf_transform_output, 32) 
         ...
    model.fit(
        train_dataset,
        validation_data=eval_dataset,
        ...)

Once the training is completed, we can export our trained model together with the processing steps.

Export the Model with its Preprocessing Graph

After the model.fit() completes the model training, we are calling model.save()to export the model in the SavedModel format. In our model signature definition, we are calling the function _get_serve_tf_examples_fn() which parses serialized tf.Example records submitted to our TensorFlow Serving endpoint (e.g. in our case the raw text strings to be classified) and then applies the transformations preserved in the TensorFlow Transform graph. The model prediction is then performed with the transformed features which are the output of the model.tft_layer(parsed_features)call. In our case, this would be the BERT token ids, masks ids and type ids.

def _get_serve_tf_examples_fn(model, tf_transform_output):
   model.tft_layer = tf_transform_output.transform_features_layer()
 
   @tf.function
   def serve_tf_examples_fn(serialized_tf_examples):
       feature_spec = tf_transform_output.raw_feature_spec()
       feature_spec.pop(_LABEL_KEY)
       parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)
 
       transformed_features = model.tft_layer(parsed_features)
       return model(transformed_features)
 
   return serve_tf_examples_fn

The _get_serve_tf_examples_fn() function is the important connection between the transformation graph generated by TensorFlow Transform, and the trained tf.Keras model. Since the prediction input is passed through the model.tft_layer(), it guarantees that the exported SavedModel will include the same preprocessing that was performed during training. The SavedModel is one graph, consisting of both the preprocessing and the model graphs.

With the deployment of the BERT classification model through TensorFlow Serving, we can now submit raw strings to our model server (submitted as tf.Example records) and receive a prediction result without any preprocessing on the client side or a complicated model deployment with a preprocessing step.

Future work

The presented work allows a simplified deployment of BERT models. The preprocessing steps shown in our demo project can easily be extended to handle more complicated preprocessing, e.g., for tasks like entity extractions or question-answer tasks. We are also investigating if the prediction latency can be further reduced if we reuse a quantized or distilled version of the pre-trained BERT model (e.g., Albert).

Thank you for reading our two-part blog post. Feel free to get in touch if you have questions or recommendations by email.

For more information

To learn more about TFX check out the TFX website, join the TFX discussion group, dive into other posts in the TFX blog, watch our TFX playlist on YouTube, and subscribe to the TensorFlow channel.

Acknowledgments

This project wouldn’t have been possible without the tremendous support from Catherine Nelson, Richard Puckett, Jessica Park, Robert Reed, and the SAP’s Concur Labs team. Thanks goes also out to Robert Crowe, Irene Giannoumis, Robby Neale, Konstantinos Katsiapis, Arno Eigenwillig, and the rest of the TensorFlow team for discussing implementation details and for the detailed review of this post. A special thanks to Varshaa Naganathan, Zohar Yahav, and Terry Huang from Google’s TensorFlow team for providing updates to the TensorFlow libraries to make this pipeline implementation possible. Big thanks also to Cole Howard from Talenpair for always enlightening discussions about Natural Language Processing.
Read More

TensorFlow User Groups: Updates from Around the World

May 29, 2020

by TensorFlow Blog Tensorflow

Posted by Soonson Kwon, Biswajeet Mallik, and Siddhant Agarwal, Program Managers

TensorFlow User Groups (or TFUGs, for short) are a community of curious, passionate machine learning developers and researchers around the world. TFUGs play an important role in helping developers share their knowledge and experience in machine learning, and the latest TensorFlow updates. Google and the TensorFlow team are proud of (and grateful for) the many TFUGs around the world, and we’re excited to see them grow and become active developer communities.

Currently, there are more than 75 TFUGs around the globe, on 6 continents, with events in more than 15 languages, engaged in many creative ways to bring developers together. In this article, we wanted to share some global updates from around the world, and information on how you can get involved. If you would like to start a TFUG, please check this page or email us.

Here are a few examples of the many activities they are running around the world.

India

From March 23th to April 3rd, TensorFlow User Group Mumbai hosted “10 days of ML challenge” to help developers learn about Machine Learning. (Check out this blog from a participant as well.)
TensorFlow User Group Kolkata organized TweetsOnTF a fun twitter contest from March 27th – April 17th to celebrate TensorFlow Dev Summit 2020.
TensorFlow User Group Ahmedabad conducted their first event around Machine Learning & Data Science in Industry & Research with over 100+ students and developers.

Nigeria

TensorFlow User Group Ibadan has been running a monthly meetup. On May 14th, they hosted an online meetup about running your models on browsers with JavaScript.

(Photo from Dec, 2019)

Mainland China

TensorFlow User Group Shanghai, TensorFlow User Group Zhuhai, and many China TFUGs hosted a TensorFlow Dev Summit 2020 viewing party.

Turkey

TensorFlow User Group Turkey has been hosting an online event series. On May 17th, they hosted a session called: “NLP and Its Applications in Healthcare”. (YouTube Channel)

Japan

On May 20th, TensorFlow User Group Tokyo hosted a ”Reading Neural Network Paper meetup” with 110 researchers via online covering Explainable AI.

Korea

On May 14th, TensorFlow User Group Korea hosted an online interview with Laurence Moroney celebrating its 50K members.

Australia

On May 30th, TensorFlow User Group Melbourne will host TensorFlow.js Show & Tell to share latest creations on Machine Learning in JavaScript together with Jason Mayes, TF.js Developer Advocate.

Vietnam

TensorFlow User Group Vietnam organized a Webinar led by Ba Ngoc (Machine Learning GDE) and Khanh (TensorFlow Developer Advocate) on how to prepare for the recently announced TensorFlow Certification.

Morocco

Also, welcome to our newest TensorFlow User Group in Casablanca (Twitter, Facebook), newly created and in the process of ramping up.

How to get involved

Those are just a few of the many of the activities TFUG are having around the world. If you would like to start a TFUG in your region, please visit this page. To find a user group near you, check out this list. And, if you have any questions regarding TFUG, email us. Thank you!
Read More

Pose Animator – An open source tool to bring SVG characters to life in the browser via motion capture

May 27, 2020

by TensorFlow Blog Tensorflow

By Shan Huang, Creative Technologist, Google Partner Innovation

Background

The PoseNet and Facemesh (from Mediapipe) TensorFlow.js models made real time human perception in the browser possible through a simple webcam. As an animation enthusiast who struggles to master the complex art of character animation, I saw hope and was really excited to experiment using these models for interactive, body-controlled animation.

The result is Pose Animator, an open-source web animation tool that brings SVG characters to life with body detection results from webcam. This blog post covers the technical design of Pose Animator, as well as the steps for designers to create and animate their own characters.

Using FaceMesh and PoseNet with TensorFlow.js to animate full body character

The overall idea of Pose Animator is to take a 2D vector illustration and update its containing curves in real-time based on the recognition result from PoseNet and FaceMesh. To achieve this, Pose Animator borrows the idea of skeleton-based animation from computer graphics and applies it to vector characters.
In skeletal animation a character is represented in two parts:

a surface used to draw the character, and
a hierarchical set of interconnected bones used to animate the surface.

In Pose Animator, the surface is defined by the 2D vector paths in the input SVG files. For the bone structure, Pose Animator provides a predefined rig (bone hierarchy) representation, based on the key points from PoseNet and FaceMesh. This bone structure’s initial pose is specified in the input SVG file, along with the character illustration, while the real time bone positions are updated by the recognition result from ML models.

Detection keypoints from PoseNet (blue) and FaceMesh (red)

Check out these steps to create your own SVG character for Pose Animator.

Animated bezier curves controlled by PoseNet and FaceMesh output/td>

Rigging Flow Overview

The full rigging (skeleton binding) flow requires the following steps:

Parse the input SVG file for the vector illustration and the predefined skeleton, both of which are in T-pose (initial pose).
Iterate through every segment in vector paths to compute the weight influence and transformation from each bone using Linear Blend Skinning (explained later in this post).
In real time, run FaceMesh and PoseNet on each input frame and use result keypoints to update the bone positions.
Compute new positions of vector segments from the updated bone positions, bone weights and transformations.

There are other tools that provide similar puppeteering functionality, however, most of them only update asset bounding boxes and do not deform the actual geometry of characters with recognition key points. Also, few tools provide full body recognition and animation. By deforming individual curves, Pose Animator is good at capturing the nuances of facial and full body movement and hopefully provides more expressive animation.

Rig Definition

The rig structure is designed according to the output key points from PoseNet and FaceMesh. PoseNet returns 17 key points for the full body, which is simple enough to directly include in the rig. FaceMesh however provides 486 keypoints, so I needed to be more selective about which ones to include. In the end I selected 73 key points from the FaceMesh output and together we have a full body rig of 90 keypoints and 78 bones as shown below:

The 90 keypoints, 78 bones full body rig

Every input SVG file is expected to contain this skeleton in default position. More specifically, Pose Animator will look for a group called ‘skeleton’ containing anchor elements named with the respective joint they represent. A sample rig SVG can be found here. Designers have the freedom to move the joints around in their design files to best embed the rig into the character. Pose Animator will compute skinning according to the default position in the SVG file, although extreme cases (e.g. very short leg / arm bones) may not be well supported by the rigging algorithm and may produce unnatural results.

The illustration with embedded rig in design software (Adobe Illustrator)

Linear Blend Skinning for vector paths

Pose Animator uses one of the most common rigging algorithms for deforming surfaces using skeletal structures – Linear Blend Skinning (LBS), which transforms a vertex on a surface by blending together its transformation controlled by each bone alone, weighted by each bone’s influence. In our case, a vertex refers to an anchor point on a vector path, and bones are defined by two connected keypoints in the above rig (e.g. the ‘leftWrist’ and ‘leftElbow’ keypoints define the bone ‘leftWrist-leftElbow’).
To put into math formula, the world space position of the vertex vi’ is computed as where
– w_i is the influence of bone i on vertex i,
– v_i describes vertex i’s initial position,
– T_j describes the spatial transformation that aligns the initial pose of bone j with its current pose.
The influence of bones can be automatically generated or manually assigned through weight painting. Pose Animator currently only supports auto weight assignment. The raw influence of bone j on vertex i is calculated as: Where d is the distance from vi to the nearest point on bone j. Finally we normalize the weight of all bones for a vertex to sum up to 1. Now, to apply LBS on 2D vector paths, which are composed of straight lines and bezier curves, we need some special treatment for bezier curve segments with in and out handles. We need to compute weights separately for curve points, in control point, and out control point. This produces better looking results because the bone influence for control points are more accurately captured.
There is one exception case. When the in control point, curve point, and out control point are collinear, we use the curve point weight for all three points to guarantee that they stay collinear when animated. This helps to preserve the smoothness of curves.

Collinear curve handles share the same weight to stay collinear

Motion stabilization

While LBS already gives us animated frames, there’s a noticeable amount of jittering introduced by FaceMesh and PoseNet raw output. To reduce the jitter and get smoother animation, we can use the confidence scores from prediction results to weigh each input frame unevenly, granting less influence to low-confidence frames.
Following this idea, Pose Animator computes the smoothed position of joint i at frame t as where
The smoothed confidence score of frame i is computed as Consider extreme cases. When two consecutive frames both have confidence score 1, position approaches the latest position at 50% speed, which looks responsive and reasonably smooth. (To further play with responsiveness, you can tweak the approach speed by changing the weight on the latest frame.) When the latest frame has confidence score 0, its influence is completely ignored, preventing low confidence results from introducing sudden jerkiness.

Confidence score based clipping

In addition to interpolating joint positions with confidence scores, we also introduce a minimum threshold to decide if a path should be rendered at all.
The confidence score of a path is the averaged confidence score of its segment points, which in turn is the weighted average of the influence bones’ scores. The whole path is hidden for a particular frame when its score is below a certain threshold.
This is useful for hiding paths in low confidence areas, which are often body parts out of the camera view. Imagine an upper body shot: PoseNet will always return keypoint predictions for legs and hips though they will have low confidence scores. With this clamping mechanism we can make sure lower body parts are properly hidden instead of showing up as strangely distorted paths.

Looking ahead

To mesh or not to mesh

The current rigging algorithm is heavily centered around 2D curves. This is because the 2D rig constructed from PoseNet and FaceMesh has a large range of motion and varying bone lengths – unlike animation in games where bones have relatively fixed length. I currently get smoother results from deforming bezier curves than deforming the triangulated mesh from input paths, because bezier curves preserve the curvature / straightness of input lines better.
I am keen to improve the rigging algorithm for meshes. Besides, I want to explore a more advanced rigging algorithm than Linear Blend Skinning, which has limitations such as volume thinning around the bent areas.

New editing features

Pose Animator delegates illustration editing to design softwares like Illustrator, which are powerful for editing vector graphics, but not tailored for animation / skinning requirements. I want to support more animation features through in-browser UI, including:

Skinning weight painting tool, to enable tweaking individual weights on keypoints manually. This will provide more precision than auto weight assignment.
Support raster images in the input SVG files, so artists may use photos / drawings in their design. Image bounding boxes can be easily represented as vector paths so it’s straightforward to compute its deformation using the current rigging algorithm.

Try it yourself!

Try out the live demos, where you can either play with existing characters, or add in your own SVG character and see them come to life.
I’m the most excited to see what kind of interactive animation the creative community will create. While the demos are human characters, Pose Animator will work for any 2D vector design, so you can go as abstract / avant-garde as you want to push its limits.
To create your own animatable illustration, please check out this guide! Don’t forget to share your creations with us using #PoseAnimator on social media. Feel free to reach out to me on twitter @yemount for any questions.
Alternatively if you want to view the source code directly, it is available to fork on github here. Happy hacking!
Read More

Galaxy Zoo: Classifying Galaxies with Crowdsourcing and Active Learning

May 21, 2020

by TensorFlow Blog Tensorflow

A guest article by Mike Walmsley, University of Oxford

The way we do science is changing; there’s exponentially more data every day but around the same number of scientists. The traditional approach of collecting data samples, looking through them, and drawing some conclusions about each one is often inadequate.

One solution is to deploy algorithms to process the data automatically. Another solution is to deploy more eyeballs: recruit members of the public to join in and help. I work on the intersection between the two – combining crowdsourcing and machine learning to do better science than with either alone.

In this article, I want to share how I’ve been using crowdsourcing and machine learning to investigate how galaxies evolve by classifying millions of galaxy images. Along the way, I’ll share some techniques we use to train CNNs that make predictions with uncertainty. I’ll also explain how to use those predictions to do active learning: labelling only the data which would best help you improve your models.

Better Telescopes, Bigger Problems

Ever since Edwin Hubble in the 1920’s, astronomers have looked up at galaxies and tried to classify them into different types – smooth galaxies, spiral galaxies, and so on. But the number of galaxies kept on climbing. About 20 years ago, a grad student named Kevin Schawinski sat at his desk with a pile of 900,000 galaxy pictures, put his head in his hands and thought – “there has to be a better way” than classifying every one himself. He wasn’t alone. To classify all 900,000 galaxies without sacrificing Kevin’s sanity, a team of scientists (including Kevin) built Galaxy Zoo.

Galaxy Zoo is a website that asks members of the public to classify galaxies for us. We show you a galaxy, and we ask simple questions about what you can see, like – is the galaxy smooth, or featured? As you answer, we lead you down a decision tree where the questions depend on how you’ve previously responded.

The Galaxy Zoo UI. Check it out, and join in with the science, here.

Since launch, hundreds of thousands of volunteers have classified millions of galaxies – advancing our understanding of supermassive black holes, spiral arms, the births and deaths of stars, and much more. However, there’s a problem: humans don’t scale. Galaxy surveys keep getting bigger, but we will always have about the same number of volunteers. The latest space-based telescopes can image hundreds of millions of galaxies – far more than we could ever label with crowdsourcing alone.

To keep up, we used TensorFlow to build a galaxy classifier. Other researchers have used the responses we’ve collected to train convolutional neural networks (CNNs) – a type of deep learning model tailored for image recognition. However, traditional CNNs have a drawback; they don’t easily handle uncertainty.

Training a CNN to solve a regression problem by predicting a value for each label and minimising the mean squared error, as is common, implicitly assumes that all labels are equally uncertain – which is definitely not the case for Galaxy Zoo. Further, the CNN only gives a ‘best guess’ answer with no error bars – making it difficult to draw scientific conclusions.

In our paper, we use Bayesian CNNs for morphology classification. Bayesian CNNs provide two key improvements:

They account for varying uncertainty when learning from volunteer responses.
They predict full posteriors over the morphology of each galaxy.

Using our Bayesian CNN, we can learn from noisy labels and make reliable predictions (with error bars) for hundreds of millions of galaxies.

How Bayesian Convolutional Neural Networks Work

There are two key steps to creating our Bayesian CNNs.
1. Predict the parameters of a probability distribution, not the label itself
Training neural networks is much like any other fitting problem: you tweak the model to match the observations. If you are equally confident in all your collected labels, you can just minimise the difference (e.g. mean squared error) between your predictions and the observed values. However for Galaxy Zoo, many labels are more confident than others.
If I observe that, for some galaxy, 30% of volunteers say “bar”, my confidence in that 30% depends heavily on how many people replied – was it 4 or 40? Instead, we predict the probability that a typical volunteer will say “Bar”, and minimise how surprised we should be given the total number of volunteers who replied.
This way, our model understands that errors on galaxies where many volunteers replied are worse than errors on galaxies where few volunteers replied – letting it learn from every galaxy.
In our case, we can model our surprise with the Binomial distribution by recognising that k “Bar” responses from N volunteers is much like k successes from N independent trials.

loss = tf.reduce_mean(binomial_loss(labels, scalar_predictions))

Where `binomial_loss` calculates the surprise (negative log likelihood) of the observed labels given our model predictions: In TF, we can calculate this with:

def binomial_loss(observations, est_prob_success):
   one = tf.constant(1., dtype=tf.float32)
   # to avoid calculating log 0
   epsilon = tf.keras.backend.epsilon()

   # multiplication in tf requires floats
   k_successes = tf.cast(observations[:, 0], tf.float32)
   n_trials = tf.cast(observations[:, 1], tf.float32)

   # binomial negative log likelihood, dropping (fixed) combinatorial terms
   return -( k_successes * tf.log(est_prob_success + epsilon) + (n_trials - k_successes) * tf.log(one - est_prob_success + epsilon )

2. Use Dropout to Pretend to Train Many Networks
Our model now makes probabilistic predictions, but what if we had trained a different model? It would make slightly different probabilistic predictions. To be Bayesian, we need to marginalise over the possible models we might have trained. To do this, we use dropout.
At train time, dropout reduces overfitting by “approximately combining exponentially many different neural network architectures efficiently” (Srivastava 2014). This approximates the Bayesian approach of treating the network weights as random variables to be marginalised over. By also applying dropout at test time, we can exploit this idea of approximating many models to also make Bayesian predictions (Gal 2016).
Here’s a TF 2.0 example using the Subclassing API:

from tensorflow.keras import layers, Model

class SimpleClassifier(Model):

   def __init__(self):
       super(SimpleClassifier, self).__init__()
       self.conv1 = layers.Conv2D(32, 3, activation='relu')
       self.flatten = layers.Flatten()
       self.d1 = layers.Dense(128, activation='relu')
       self.dropout1 = layers.Dropout(rate=0.5)
       self.d2 = layers.Dense(2, activation='softmax')

   def call(self, x, training):
       x = self.conv1(x)
       x = self.flatten(x)
       x = self.d1(x)
       if training:  # dropout typically applied only at train time
           x = self.dropout1(x)
       return self.d2(x)

Switching on test-time dropout actually involves less code:

 def call(self, x):  # no ‘training’ argument required
       x = self.conv1(x)
       x = self.flatten(x)
       x = self.d1(x)
       x = self.dropout1(x)  # dropout always on
       return self.d2(x)

Below, you can see our Bayesian CNN in action. Each row is a galaxy (shown to the left). In the central column, our CNN makes a single probabilistic prediction (the probability that a typical volunteer would answer “Bar”). We can interpret that as a posterior for the probability that k of N volunteers would say “Bar” – shown in black. On the right, we marginalise over many CNNs using dropout. Each CNN posterior (grey) is different, but we can marginalise over them to get the posterior over many CNNs (green) – our Bayesian posterior.

Left: input images of galaxies, with or without a bar. Center: single probabilistic predictions (i.e. without dropout) for how many volunteers would say “Bar”. Right: many probabilistic predictions made with different dropout masks (grey), marginalised into our approximate Bayesian posterior (green).

The Bayesian posterior does an excellent job at quantifying if each galaxy has a bar. Read more about it in the paper (and check out the code).

Active Learning

Modern surveys will image hundreds of millions of galaxies – more than we can show to volunteers. Given that, which galaxies should we classify with volunteers, and which by our Bayesian CNN?
Ideally we would only show volunteers the images that the model would find most informative. The model should be able to ask, “Hey, these galaxies would be really helpful to learn from; can you label them for me please?” Then the humans would label them and the model would retrain – this is active learning. In our experiments, applying active learning reduces the number of galaxies needed to reach a given performance level by up to 35-60%.
We can use our posteriors to work out which galaxies are most informative. Remember that we use dropout to approximate training many models (see above). We show in the paper that informative galaxies are galaxies where those models confidently disagree.
Why? We often hold our strongest opinions where we are least informed – and so do our CNN (Hendrycks 2016). Without a basis in evidence, different CNN will often disagree confidently.

Formally, informative galaxies are galaxies where each model is confident (entropy H in the posterior from each model, p(votes|weights), is low) but the average prediction over all the models is uncertain (entropy across all averaged posteriors is high). This is only possible because we think about labels probabilistically and approximate training many models. For more, see Houlsby, N. (2014) and Gal 2017, or our code for an implementation.
What galaxies are informative? Exactly the galaxies you would intuitively expect.

The model strongly prefers diverse featured galaxies over ellipticals (smooth ‘blobs’).
For identifying bars, the model prefers galaxies which are better resolved (lower redshift).

This selection is completely automatic. Indeed, I didn’t realise the lower redshift preference until I looked at the images!

Our active learning system selects galaxies on the left (featured and diverse) over those on the right (smooth ‘blobs’).

Active learning is picking galaxies to label right now on Galaxy Zoo – check it out here by selecting the ‘Enhanced’ workflow. I’m excited to see what science can be done as we move from classifying hundreds of thousands of galaxies to hundreds of millions.
If you’d like to know more or you have any questions, get in touch in the comments or on Twitter (@mike_w_ai, @chrislintott, @yaringal, @OATML_Oxford).
Cheers,
Mike
*Dropout is an imperfect approximation of a fully Bayesian approach that’s feasible for large vision models but may underestimate uncertainty. It’s possible to make better approximations, especially for small models. Check out this post by the Tensorflow Probability team showing how to do this for one-dimensional regression. Read More

BigTransfer (BiT): State-of-the-art transfer learning for computer vision

May 20, 2020

by TensorFlow Blog Tensorflow

Posted by Jessica Yung and Joan Puigcerver

In this article, we’ll walk you through using BigTransfer (BiT), a set of pre-trained image models that can be transferred to obtain excellent performance on new datasets, even with only a few examples per class.

ImageNet-pretrained ResNet50s are a current industry standard for extracting representations of images. With our BigTransfer (BiT) paper, we share models that perform significantly better across many tasks, and transfer well even when using only a few images per dataset.

You can find BiT models pre-trained on ImageNet and ImageNet-21k in TFHub as TensorFlow2 SavedModels that you can use easily as Keras Layers. There are a variety of sizes ranging from a standard ResNet50 to a ResNet152x4 (152 layers deep, 4x wider than a typical ResNet50) for users with larger computational and memory budgets but higher accuracy requirements.

Figure 1: The x-axis shows the number of images used per class, ranging from 1 to the full dataset. On the plots on the left, the curve in blue above is our BiT-L model, whereas the curve below is a ResNet-50 pre-trained on ImageNet (ILSVRC-2012).

In this tutorial, we show how to load one of our BiT models and either (1) use it out-of-the-box or (2) fine-tune it to your target task for higher accuracy. Specifically, we demonstrate using a ResNet50 trained on ImageNet-21k.

What is Big Transfer (BiT)?

Before we get into the details of how to use the models, how did we train models that transfer well to many tasks?

Upstream training

The essence is in the name – we effectively train large architectures on large datasets. Before our paper, few papers had seen significant benefits from training on larger public datasets such as ImageNet-21k (14M images, 10x larger than the commonly-used ImageNet). The components we distilled for training models that transfer well are:

Big datasets
The best performance across our models increases as the dataset size increases.

Big architectures
We show that in order to make the most out of big datasets, one needs large enough architectures. For example, training a ResNet50 on JFT (which has 300M images) does not always improve performance relative to training the ResNet50 on ImageNet-21k (14.8M images), but we consistently see improvements when training larger models like a ResNet152x4 on JFT as opposed to ImageNet-21k (Figure 2 below).

Figure 2: The effect of larger upstream datasets (x-axis) and model size (bubble size/colour) on performance on downstream tasks. Using larger datasets or larger models alone may hurt performance – both need to be increased in tandem.

Long pre-training time
We also show that it’s important to train for long enough when pre-training on larger datasets. It’s standard to train on ImageNet for 90 epochs, but if we train on a larger dataset such as ImageNet-21k for the same number of steps (and then fine-tune on ImageNet), the performance is worse than if we’d trained on ImageNet directly.

GroupNorm and Weight Standardisation
Finally, we use GroupNorm combined with Weight Standardisation instead of BatchNorm. Since our models are large, we can only fit a few images on each accelerator (e.g. GPU or TPU chip). However, BatchNorm performs worse when the number of images on each accelerator is too low. GroupNorm does not have this problem, but does not scale well to large overall batch sizes. But when we combine GroupNom with Weight Standardisation, we see that GroupNorm scales well to large batch sizes, even outperforming BatchNorm.

Downstream fine-tuning

Moreover, downstream fine-tuning is cheap in terms of data efficiency and compute – our models attain good performance with only a few examples per class on natural images. We also designed a hyperparameter configuration which we call ‘BiT-HyperRule’ that performs fairly well on many tasks without the need to do an expensive hyperparameter sweep.

BiT-HyperRule: our hyperparameter heuristic
As alluded to above, this is not a hyperparameter sweep – given a dataset, it specifies one set of hyperparameters that we’ve seen produce good results. You can often obtain better results by running a more expensive hyperparameter sweep, but BiT-HyperRule is an effective way of getting good initial results on your dataset.

In BiT-HyperRule, we use SGD with an initial learning rate of 0.003, momentum 0.9, and batch size 512. During fine-tuning, we decay the learning rate by a factor of 10 at 30%, 60% and 90% of the training steps.

As data preprocessing, we resize the image, take a random crop, and then do a random horizontal flip (details in Table 1). We do random crops and horizontal flips for all tasks except those where such actions destroy label semantics. For example, we don’t apply random crops to counting tasks, or random horizontal flips to tasks where we’re meant to predict the orientation of an object (Figure 3).

Table 1: Downstream resizing and random cropping details. If images are larger, we resize them to a larger fixed size to benefit from fine-tuning on higher resolution.

Figure 3: CLEVR count example: Here the task is to count the number of small cylinders or red objects in the image. We would not apply a random crop since that may crop out objects we would like to count, but we apply a random horizontal flip since that doesn’t change the number of objects we care about in the image (and thus does not change the label). Image attribution: CLEVR count example by Johnson et. al.)

We determine the schedule length and whether or not to use MixUp (Zhang et. al., 2018, illustrated in Figure 4) according to the dataset size (Table 2).

Figure 4: MixUp takes pairs of examples and linearly combines the images and labels. These images are taken from the dataset tf_flowers.

Table 2: Details on downstream schedule length and when we use MixUp.

We determined these hyperparameter heuristics based on empirical results. We explain our method and describe our results in more detail in our paper and in our Google AI blog post.

Tutorial

Now let’s actually fine-tune one of these models! You can follow along by running the code in this colab.

1) Load the pre-trained BiT model

You can download one of our BiT models pre-trained on ImageNet-21k from TensorFlow Hub. The models are saved as SavedModels. Loading them is very simple:

import tensorflow_hub as hub
# Load model from TFHub into KerasLayer
model_url = "https://tfhub.dev/google/bit/m-r50x1/1"
module = hub.KerasLayer(model_url)

2) Use BiT out-of-the-box

If you don’t yet have labels for your images (or just want to have some fun), you may be interested in using the model out-of-the-box, i.e. without fine-tuning it. For this, we will use a model fine-tuned on ImageNet so it has the interpretable ImageNet label space of 1k classes. Many common objects are not covered, but it gives a reasonable idea of what is in the image.

# use model
logits = imagenet_module(image)

Note that BiT models take inputs with values between 0 and 1.

In the colab, you can load an image from an URL and see what the model predicts:

> show_preds(preds, image[0])

Image from PikRepo

Here the pre-trained model on ImageNet correctly classifies the photo as an elephant.It is also more likely to be an Indian as opposed to an African elephant because of the size of its ears. In the colab, we also predict on an image from the dataset we’re going to fine-tune on, TF flowers, which has also been used in other tutorials. Note that the correct label ‘tulip’ is not a class in ImageNet and so the model cannot predict that at the moment – let’s see what it tries to do instead:

The model predicts a reasonably similar-looking class, ‘bell pepper’.

3) Fine-tune BiT on your task

Now, we are going to fine-tune the BiT model so it performs better on a specific dataset. Here we are going to use Keras for simplicity, and we are going to fine-tune the model on a dataset of flowers (tf_flowers). We will use the model we loaded at the start (i.e. the one pre-trained on ImageNet-21k) so that it is less biased towards a narrow subset of classes.

There are two steps:

Create a new model with a new final layer (called the ‘head’)
Fine-tune this model using BiT-HyperRule, our hyperparameter heuristic. We described this in detail earlier in the ‘Downstream fine-tuning’ section of the post.

To create the new model, we:

Cut off the BiT model’s original head. This leaves us with the “pre-logits” output.
- We do not have to do this if we use the ‘feature extraction’ models, since for those models the head has already been cut off.
Add a new head with the number of outputs equal to the number of classes of our new task. Note that it is important that we initialise the head to all zeroes.

class MyBiTModel(tf.keras.Model):
  """BiT with a new head."""

  def __init__(self, num_classes, module):
    super().__init__()

    self.num_classes = num_classes
    self.head = tf.keras.layers.Dense(num_classes, kernel_initializer='zeros')
    self.bit_model = module

  def call(self, images):
    # No need to cut head off since we are using feature extractor model
    bit_embedding = self.bit_model(images)
    return self.head(bit_embedding)

model = MyBiTModel(num_classes=5, module=module)

When we fine-tune the model, we use BiT-HyperRule, our heuristic for choosing hyperparameters for downstream fine-tuning which we described earlier. We also code our heuristic in full in the colab.

# Define optimiser and loss
 
# Decay learning rate by factor of 10 at SCHEDULE_BOUNDARIES.
lr = 0.003
SCHEDULE_BOUNDARIES = [200, 300, 400, 500]
lr_schedule = tf.keras.optimizers.schedules.PiecewiseConstantDecay(boundaries=SCHEDULE_BOUNDARIES,
                                                                  values=[lr, lr*0.1, lr*0.001, lr*0.0001])
optimizer = tf.keras.optimizers.SGD(learning_rate=lr_schedule, momentum=0.9)

To fine-tune the model, we use the simple Keras model.fit API:

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer,
             loss=loss_fn,
             metrics=['accuracy'])
 
# Fine-tune model
model.fit(
   pipeline_train,
   batch_size=512,
   steps_per_epoch=10,
   epochs=50,
   validation_data=pipeline_test)

We see that our model attains 95% validation accuracy within 20 steps, and attains over 98% validation accuracy after fine-tuning using BiT-HyperRule.
4) Save the fine-tuned model for later use
It is easy to save your model to use later on. You can then load your saved model in exactly the same way as we loaded the BiT models at the start.

# Save fine-tuned model as SavedModel
export_module_dir = '/tmp/my_saved_bit_model/'
tf.saved_model.save(model, export_module_dir)
 
# Load saved model
saved_module = hub.KerasLayer(export_module_dir, trainable=True)

Voila – we now have a model that predicts tulips as tulips and not bell peppers.

Summary

In this post, you learned about the key components you can use to train models that can transfer well to many different tasks. You also learned how to load one of our BiT models, fine-tune it on your target task and save the resulting model. Hope this helped and happy fine-tuning!
Acknowledgements
This blog post is based on work by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly and Neil Houlsby. We thank many members of Brain Research Zurich and the TensorFlow team for their feedback, especially Luiz Gustavo Martins, André Susano Pinto, Marcin Michalski, Josh Gordon, Martin Wicke, Daniel Keysers, Amélie Royer, Basil Mustafa, and Mario Lučić.
Additional links

How Hugging Face achieved a 2x performance boost for Question Answering with DistilBERT in Node.js

May 18, 2020

by TensorFlow Blog Tensorflow

A guest post by Hugging Face: Pierric Cistac, Software Engineer; Victor Sanh, Scientist; Anthony Moi, Technical Lead.

Hugging Face 🤗 is an AI startup with the goal of contributing to Natural Language Processing (NLP) by developing tools to improve collaboration in the community, and by being an active part of research efforts.

Because NLP is a difficult field, we believe that solving it is only possible if all actors share their research and results. That’s why we created 🤗 Transformers, a leading NLP library with more than 2M downloads and used by researchers and engineers across many companies. It allows the amazing international NLP community to quickly experiment, iterate, create and publish new models for a variety of tasks (text/token generation, text classification, question answering…) in a variety of languages (English of course, but also French, Italian, Spanish, German, Turkish, Swedish, Dutch, Arabic and many others!) More than 300 different models are available today through Transformers.

While Transformers is very handy for research, we are also working hard on the production aspects of NLP, looking at and implementing solutions that can ease its adoption everywhere. In this blog post, we’re going to showcase one of the paths we believe can help fulfill this goal: the use of “small”, yet performant models (such as DistilBERT), and frameworks targeting ecosystems different from Python such as Node via TensorFlow.js.

The need for small models: DistilBERT

One of the areas we’re interested in is “low-resource” model with close to state-of-the-art results, while being a lot smaller and also a lot faster to run. That’s why we created DistilBERT, a distilled version of BERT: it has 40% fewer parameters, runs 60% faster while preserving 97% of BERT’s performance as measured on the GLUE language understanding benchmark.

NLP models through time, with their number of parameters

To create DistilBERT, we’ve been applying knowledge distillation to BERT (hence its name), a compression technique in which a small model is trained to reproduce the behavior of a larger model (or an ensemble of models), demonstrated by Hinton et al.

In the teacher-student training, we train a student network to mimic the full output distribution of the teacher network (its knowledge). Rather than training with a cross-entropy over the hard targets (one-hot encoding of the gold class), we transfer the knowledge from the teacher to the student with a cross-entropy over the soft targets (probabilities of the teacher). Our training loss thus becomes:

With t the logits from the teacher and s the logits of the student

Our student is a small version of BERT in which we removed the token-type embeddings and the pooler (used for the next sentence classification task). We kept the rest of the architecture identical while reducing the numbers of layers by taking one layer out of two, leveraging the common hidden size between student and teacher. We trained DistilBERT on very large batches leveraging gradient accumulation (up to 4000 examples per batch), with dynamic masking, and removed the next sentence prediction objective.

With this, we were then able to fine-tune our model on the specific task of Question Answering. To do so, we used the BERT-cased model fine-tuned on SQuAD 1.1 as a teacher with a knowledge distillation loss. In other words, we distilled a question answering model into a language model previously pre-trained with knowledge distillation! That’s a lot of teachers and students: DistilBERT-cased was first taught by BERT-cased, and then “taught again” by the SQuAD-finetuned BERT-cased version in order to get the DistilBERT-cased-finetuned-squad model.

This results in very interesting performances given the size of the network: our DistilBERT-cased fine-tuned model reaches an F1 score of 87.1 on the dev set, less than 2 points behind the full BERT-cased fine-tuned model! (88.7 F1 score).

If you’re interested in learning more about the distillation process, you can read our dedicated blog post.

The need for a language-neutral format: SavedModel

Using the previous process, we end up with a 240MB Keras file (.h5) containing the weights of our DistilBERT-cased-squad model. In this format, the architecture of the model resides in an associated Python class. But our final goal is to be able to use this model in as many environments as possible (Node.js + TensorFlow.js for this blog post), and the TensorFlow SavedModel format is perfect for this: it’s a “serialized” format, meaning that all the information necessary to run the model is contained into the model files. It is also a language-neutral format, so we can use it in Python, but also in JS, C++, and Go.

To convert to SavedModel, we first need to construct a graph from the model code. In Python, we can use tf.function to do so:

import tensorflow as tf
from transformers import TFDistilBertForQuestionAnswering

distilbert = TFDistilBertForQuestionAnswering.from_pretrained('distilbert-base-cased-distilled-squad')
callable = tf.function(distilbert.call)

Here we passed to tf.function the function called in our Keras model, call. What we get in return is a callable that we can in turn use to trace our call function with a specific signature and shapes thanks to get_concrete_function:

concrete_function = callable.get_concrete_function([tf.TensorSpec([None, 384], tf.int32, name="input_ids"), tf.TensorSpec([None, 384], tf.int32, name="attention_mask")])

By calling get_concrete_function, we trace-compile the TensorFlow operations of the model for an input signature composed of two Tensors of shape [None, 384], the first one being the input ids and the second one the attention mask.

Then we can finally save our model to the SavedModel format:

tf.saved_model.save(distilbert, 'distilbert_cased_savedmodel', signatures=concrete_function)

A conversion in 4 lines of code, thanks to TensorFlow! We can check that our resulting SavedModel contains the correct signature by using the

saved_model_cli:

$ saved_model_cli show --dir distilbert_cased_savedmodel --tag_set serve --signature_def serving_default

Output:

The given SavedModel SignatureDef contains the following input(s):
  inputs['attention_mask'] tensor_info:
   dtype: DT_INT32
   shape: (-1, 384)
   name: serving_default_attention_mask:0
  inputs['input_ids'] tensor_info:
   dtype: DT_INT32
   shape: (-1, 384)
   name: serving_default_input_ids:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['output_0'] tensor_info:
   dtype: DT_FLOAT
   shape: (-1, 384)
   name: StatefulPartitionedCall:0
  outputs['output_1'] tensor_info:
   dtype: DT_FLOAT
   shape: (-1, 384)
   name: StatefulPartitionedCall:1
Method name is: tensorflow/serving/predict

Perfect! You can play with the conversion code yourself by opening this colab notebook. We are now ready to use our SavedModel with TensorFlow.js!

The need for ML in Node.js: TensorFlow.js

Here at Hugging Face we strongly believe that in order to reach its full adoption potential, NLP has to be accessible in other languages that are more widely used in production than Python, with APIs simple enough to be manipulated with software engineers without a Ph.D. in Machine Learning; one of those languages is obviously Javascript.

Thanks to the API provided by TensorFlow.js, interacting with the SavedModel we created previously in Node.js is very straightforward. Here is a slightly simplified version of the Typescript code in our NPM Question Answering package:

const model = await tf.node.loadSavedModel(path); // Load the model located in path

const result = tf.tidy(() => {
   // ids and attentionMask are of type number[][]
   const inputTensor = tf.tensor(ids, undefined, "int32");
   const maskTensor = tf.tensor(attentionMask, undefined, "int32");

   // Run model inference
   return model.predict({
     // “input_ids” and “attention_mask” correspond to the names specified in the signature passed to get_concrete_function during the model conversion
     “input_ids”: inputTensor, “attention_mask”: maskTensor
   }) as tf.NamedTensorMap;
});

// Extract the start and end logits from the tensors returned by model.predict
const [startLogits, endLogits] = await Promise.all([
   result[“output_0"].squeeze().array() as Promise,
   result[“output_1”].squeeze().array() as Promise
]);

tf.dispose(result); // Clean up memory used by the result tensor since we don’t need it anymore

Note the use of the very helpful TensorFlow.js function tf.tidy, which takes care of automatically cleaning up intermediate tensors like inputTensor and maskTensor while returning the result of the model inference.

How do we know we need to use "ouput_0" and "output_1" to extract the start and end logits (beginning and end of the possible spans answering the question) from the result returned by the model? We just have to look at the output names indicated by the saved_model_cli command we ran previously after exporting to SavedModel.

The need for fast and easy to use tokenizer: 🤗 Tokenizers

Our goal while building our Node.js library was to make the API as simple as possible. As we just saw, running model inference once we have our SavedModel is quite simple, thanks to TensorFlow.js. Now, the most difficult part is passing the data in the right format to the input ids and attention mask tensors. What we collect from a user is usually a string, but the tensors require arrays of numbers: we need to tokenize the user input.

Enter 🤗 Tokenizers: a performant library written in Rust that we’ve been working on at Hugging Face. It allows you to play with different tokenizers such as BertWordpiece very easily, and it works in Node.js too thanks to the provided bindings:

const tokenizer = await BertWordPieceTokenizer.fromOptions({
     vocabFile: vocabPath, lowercase: false
});

tokenizer.setPadding({ maxLength: 384 }); // 384 matches the shape of the signature input provided while exporting to SavedModel

// Here question and context are in their original string format
const encoding = await tokenizer.encode(question, context);
const { ids, attentionMask } = encoding;

That’s it! In just 4 lines of code, we are able to convert the user input to a format we can then use to feed our model with TensorFlow.js.

The Final Result: Powerful Question Answering in Node.js

Thanks to the powers of the SavedModel format, TensorFlow.js for inference, and Tokenizers for tokenization, we’ve reached our goal to offer a very simple, yet very powerful, public API in our NPM package:

import { QAClient } from "question-answering"; // If using Typescript or Babel
// const { QAClient } = require("question-answering"); // If using vanilla JS

const text = `
  Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season.
  The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.
  As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.
`;

const question = "Who won the Super Bowl?";

const qaClient = await QAClient.fromOptions();
const answer = await qaClient.predict(question, text);

console.log(answer); // { text: 'Denver Broncos', score: 0.3 }

Powerful? Yes! Thanks to the native support of SavedModel format in TensorFlow.js, we get very good performances: here is a benchmark comparing our Node.js package and our popular transformers Python library, running the same DistilBERT-cased-squad model. As you can see, we achieve a 2X speed gain! Who said Javascript was slow?

Short texts are texts between 500 and 1000 characters, long texts are between 4000 and 5000 characters. You can check the Node.js benchmark script here (the Python one is equivalent). Benchmark run on a standard 2019 MacBook Pro running on macOS 10.15.2.

It’s a very interesting time for NLP: big models such as GPT2 or T5 keep getting better and better, and research on how to “minify” those good but heavy and costly models is also getting more and more traction, with distillation being one technique among others. Adding to the equation tools that allow big developer communities to be part of the revolution (such as TensorFlow.js with the Javascript ecosystem), only makes the future of NLP more exciting and more production-ready than ever!

For further reading, feel free to check our Github repositories:
https://github.com/huggingface Read More

TFRT: A new TensorFlow runtime

April 29, 2020

by TensorFlow Blog Tensorflow

Posted by Eric Johnson, TFRT Product Manager and Mingsheng Hong, TFRT Tech Lead/Manager

TensorFlow aims to make it easy for you to build and deploy ML models across many different devices. Yet, what it means to “build and deploy ML models” is not static and continues to change with increased investment in the ML ecosystem.

At the top-half of the TensorFlow stack, innovation is leading to more complex models and deployment scenarios. Researchers are inventing new algorithms that require more compute, while application developers are enhancing their products with these new techniques across edge and server.

At the bottom-half of the stack, the tension from increasing compute needs and rising compute costs due to the ending of Moore’s law has sparked a proliferation of new hardware aimed at specific ML use cases. Traditional chip makers, startups, and software companies alike (including Google) have invested in specialized silicon.

The result is that the needs of the ML ecosystem are vastly different than they were 4 or 5 years ago when TensorFlow was first created. Of course, we’ve continued to iterate with the release of 2.x, but the current TensorFlow stack is optimized for graph execution, and incurs non-trivial overhead when dispatching a single op. A high-performance low-level runtime is a key to enable the trends of today and empower the innovations of tomorrow.

Enter TFRT, a new TensorFlow RunTime. It aims to provide a unified, extensible infrastructure layer with best-in-class performance across a wide variety of domain specific hardware. It provides efficient use of multithreaded host CPUs, supports fully asynchronous programming models, and focuses on low-level efficiency.

TFRT will benefit a broad range of users, including:

Researchers looking for faster iteration time and better error reporting when developing complex new models in eager mode.
Application developers looking for improved performance when training and serving models in production.
Hardware makers looking to integrate edge and datacenter devices into TensorFlow in a modular way.

What is TFRT?

TFRT is a new runtime that will replace the existing TensorFlow runtime. It is responsible for efficient execution of kernels – low-level device-specific primitives – on targeted hardware. It plays a critical part in both eager and graph execution, which is illustrated by this simplified diagram of the TensorFlow training stack:

TFRT’s role in graph and eager execution within the TensorFlow training stack

Note that everything in grey is part of TFRT. In eager execution, TensorFlow APIs call directly into the new runtime. In graph execution, your program’s computational graph is lowered to an optimized target-specific program and dispatched to TFRT. In both execution paths, the new runtime invokes a set of kernels that call into the underlying hardware devices to complete the model execution, as shown by the black arrows.

Key design points

Whereas the existing TensorFlow runtime was initially built for graph execution and training workloads, the new runtime will make eager execution and inference first-class citizens, while putting special emphasis on architecture extensibility and modularity. More specifically, TFRT has the following selected design highlights:

To achieve higher performance, TFRT has a lock-free graph executor that supports concurrent op execution with low synchronization overhead, and a thin eager op dispatch stack so that eager API calls will be asynchronous and more efficient.
To make extending the TF stack easier, we decoupled device runtimes from the host runtime, the core TFRT component that drives host CPU and I/O work.
To get consistent behavior, TFRT leverages common abstractions, such as shape functions and kernels, across both eager and graph.

The power of MLIR

TFRT is also tightly-integrated with MLIR. For example:

TFRT utilizes MLIR’s compiler infrastructure to generate an optimized, target-specific representation of your computational graph that the runtime executes.
TFRT uses MLIR’s extensible type system to support arbitrary C++ types in the runtime, which removes tensor-specific limitations.

Together, TFRT and MLIR will improve TensorFlow’s unification, flexibility, and extensibility.

Initial Results

Early performance results from the inference and serving use case are encouraging. As part of a benchmarking study for TensorFlow Dev Summit 2020, we integrated TFRT with TensorFlow Serving and measured the latency of sending requests to the model and getting prediction results back. We picked a common MLPerf model, ResNet-50, and chose a batch size of 1 and a data precision of FP16 to focus our study on runtime related op dispatch overhead. In comparing performance of GPU inference over TFRT to the current runtime, we saw an improvement of 28% in average inference time. These early results are strong validation for TFRT, and we expect it to provide a big boost to performance. We hope you are as excited as we are!

What’s next

TFRT is being integrated with TensorFlow, and will be enabled initially through an opt-in flag, giving the team time to fix any bugs and fine-tune performance. Eventually, it will become TensorFlow’s default runtime. Although it is still an early stage project, we have made the GitHub repository available to the community. We are limiting contributions to begin with, but encourage participation in the form of requirements and design discussions.

To learn more, please check out our Dev Summit 2020 presentation, where we first introduced TFRT to the world, and our MLIR Open Design Deep Dive presentation, where we provided a detailed overview of TFRT’s core components, low-level abstractions, and general design principles. And finally, if you want to keep up with all things TFRT, please join our new mailing list. Thanks!

AI for Medicine Specialization featuring TensorFlow

April 21, 2020

by TensorFlow Blog Tensorflow

Posted by Laurence Moroney, AI Advocate

I’m excited to share an important aspect of the TensorFlow community: when educators and domain experts teach and train developers how to use machine learning technology to solve important tasks for a variety of scenarios, including in health care. To this end, deeplearning.ai and Coursera have launched an “AI for Medicine” specialization using TensorFlow.
Nothing excites our team more than when we see how others are using TensorFlow to solve real-world problems. In this three course specialization introduced by Andrew Ng and taught by Pranav Rajpurkar, we hope to widen access so that more people can understand the needs of medical Machine Learning problems.

Deeplearning.ai and Coursera have designed a specialization that is divided into three courses. The first Machine Learning for Medical Diagnosis will take you through some hypothetical Machine Learning scenarios for diagnosis of medical issues. In the first week, you’ll explore scenarios like detecting skin cancer, eye disease and histopathology. You’ll get hands-on with how you can write code in TensorFlow using convolutional neural networks to examine images, which, for example can be used to identify different conditions in an X-Ray.

The course does require some knowledge of TensorFlow, using techniques such as convolutional neural networks, transfer learning, natural language processing, and more. I recommend that you take the TensorFlow: In Practice specialization to understand the coding skills behind it, and the Deep Learning Specialization to go deeper into how the underlying technology works. Another great resource to learn the techniques used in this course is the book “Hands on Machine Learning with SciKit-Learn, Keras and TensorFlow” by Aurelien Geron.

One of the things I really enjoyed about the course is the balance of medical terminology and using common machine learning techniques from TensorFlow, such as data augmentation, to improve your models. Note: all of the data used in the course is de-identified.

Exercises from Rajpurkar’s and Ng’s course: Using image augmentation to extend the effective size of a dataset.

The course continues with techniques such as evaluation metrics and isolating key ones and understanding how to interpret confidence intervals accurately.

The first course wraps up with another deep dive into image processing, this time using segmentation in MRI images, wrapping up with a programming assignment in doing brain tumor auto segmentation on MRIs.

The second course in the specialization will be on Machine Learning for Medical Prognosis where you learn to build models to predict future patient health. You’ll learn techniques to extract data from reports such as a patient’s health metrics, history, and demographics to predict their risk of a major event such as a heart attack.

The third, and final, course will be on Machine Learning for Medical Treatment, where models may be used to assist in medical care to predict what the potential effect of a medical treatment might be on a patient. It will also go into using machine learning for text so that you can use NLP techniques to extract information from radiography reports to get labels or get the basis for a bot for answering medical questions.

In the words of Andrew Ng, “Even if your current work is not in medicine, I think you will find the application scenarios and the practice of these scenarios to be really useful, and maybe this specialization will inspire you to get more interested in medicine”.

The specialization is available at Coursera, and like all courses can be audited for free. You can learn more about deeplearning.ai at their website, and about TensorFlow at tensorflow.org.
Read More

Loading the audio file

Preparing the audio data

Executing the model

Converting to musical notes

What’s next?

Acknowledgements

What’s the deal with Renode?

Getting Renode and demo software

Running TensorFlow Lite in Renode

What just happened?

Building your own application

Running on hardware

Summary

Want to jump ahead to the code?

Why use Tensorflow Transform for Preprocessing?

What preprocessing does BERT require?

Processing Natural Language with tf.text

How can we preprocess text with TensorFlow Transform?

Train the Classification Model

Export the Model with its Preprocessing Graph

Future work

Further Reading

For more information

Acknowledgments

India

Nigeria

Mainland China

Turkey

Japan

Korea

Australia

Vietnam

Morocco

How to get involved

Background

Using FaceMesh and PoseNet with TensorFlow.js to animate full body character

Rigging Flow Overview

Rig Definition

Linear Blend Skinning for vector paths

Motion stabilization

Confidence score based clipping

Looking ahead

To mesh or not to mesh

New editing features

Try it yourself!

Better Telescopes, Bigger Problems

How Bayesian Convolutional Neural Networks Work

Active Learning

What is Big Transfer (BiT)?

Upstream training

Downstream fine-tuning

Tutorial

Summary

The need for small models: DistilBERT

The need for a language-neutral format: SavedModel

The need for ML in Node.js: TensorFlow.js

The need for fast and easy to use tokenizer: 🤗 Tokenizers

The Final Result: Powerful Question Answering in Node.js

What is TFRT?

Key design points

The power of MLIR

Initial Results

What’s next

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.