Fast Supernovae Detection using Neural Networks

Fast Supernovae Detection using Neural Networks

A guest post by Rodrigo Carrasco-Davis & The ALeRCE Collaboration, Millennium Institute of Astrophysics, Chile

Introduction

Astronomy is the study of celestial objects, such as stars, galaxies or black holes. Studying celestial objects is a bit like having a natural physics laboratory – where the most extreme processes in nature occur – and most of them cannot be reproduced here on Earth. Observing extreme events in the universe allows us to test and improve our understanding by comparing what we know about physics to what we observe in the universe.

There is a particular type of event that is very interesting for astronomers that occurs at the end of the life of massive stars. Stars are made by the concentration of hydrogen that is pulled together by gravity, and when the density is high enough, the fusion of hydrogen atoms begins, generating light and creating elements such as helium, carbon, oxygen, neon, etc. The fusion process generates an outwards pressure while gravity causes an inward pressure, maintaining the star stable while it’s burning its fuel. This changes when the star tries to fuse iron atoms, which instead of generating energy must extract energy from the star, causing the core of the star to collapse and a supernova explosion to happen.

Crab Nebula, remnant of a supernova. Space Telescope Science Institute/NASA/ESA/J. Hester/A. Loll (Arizona State University). This image is from hubblesite.org.

This process is very important for astronomers. Due to the extreme conditions during the explosion, astronomers can observe the synthesis of heavy elements, test the behavior of matter under intense pressure and temperature, and also observe the product of the explosion, which could be a neutron star or a black hole.

Supernovae can also be used as standard candles. A typical problem in astronomy is measuring distances to celestial objects. Because stars are very far from the Earth, it is difficult to know if a star is faint and close to us, or it is far away and very bright. Most of the supernova explosions in the universe occur in a similar fashion; therefore, astronomers use supernovae to measure distances, which is important for cosmologists to study, for instance, the expansion of the universe and dark energy.

Even though supernova explosion are very bright (compared to the brightness of their own host galaxy), these events are hard to find due to their distance to the Earth, due to their low occurrence rates (roughly one supernova per galaxy per century), and the transient nature of the explosion, which could last from a few days to a couple of weeks. Also, to obtain useful information from a supernova, it is necessary to perform follow-up, this is observing the supernova with an instrument called spectrograph, to measure the energy emitted during the explosion at multiple frequencies. Early follow-up is desired because many of the interesting physical processes occur within a few hours from the beginning of the explosion. So how can we find these supernova explosions fast, among all the other observed astronomical objects in the universe?

Astronomy Today

A few decades ago, the astronomer had to choose and point to a specific object in the sky to study them. Now, modern telescopes such as The Zwicky Transient Facility (ZTF), which is currently operating, or The Vera C. Rubin Observatory, will take large images of the sky at a very high rate, observing the visible sky every three days, creating a movie of the southern hemisphere sky. Today, the ZTF telescope generates 1.4TB of data per night, identifying and sending information about interesting changing objects in the sky in real-time.

When something changes its brightness, these telescopes are able to detect the changes and generate an alert. These alerts are sent through a data stream, where each alert is composed of three cropped images of 63 by 63 pixels. These three images are called science, reference and difference image. The science image is the most recent observation of that particular location, the template is usually taken at the beginning of the survey and it is used to compare it against the science image. Everything that changed between science and template should appear in the difference image, which is computed by subtracting the reference to the science image after some image processing. The ZTF telescope is currently streaming up to one million alerts per night, ten hundred thousand on average. Let’s say a human wants to check each alert manually, then if it takes 3 seconds to inspect each alert, in a regular night, it would take approximately 3.5 days to see all of the alerts of a single night.

Science, reference and difference image from left to right. These three images, plus extra important data such as observation conditions and information about the object. The fourth image is a colored version from PanSTARRS using Aladin Sky Atlas. You can see the full evolution of brightness in time of the supernova in the ALeRCE frontend.

Organizing all the incoming alerts in the stream is a massive task. When a new alert arrives, the type of astronomical objects that generated the alert is not necessarily known. Therefore, we need to check if we already know this object from other observations (cross-match). We also need to figure out which kind of astronomical object generated the alert (classification), and lastly, we need to organize the data and make it available to the community. This is the duty of astronomical broker systems, such as ALeRCE, Lasair, Antares.

Since these alerts are basically everything that changes in the sky, we should be able to find supernovae among all the alerts sent by the ZTF telescope. The problem is that other astronomical objects also produce alerts, such as stars that change their brightness (variable stars), active galactic nuclei (AGNs), asteroids and errors in the measurement (bogus alerts). Fortunately, there are some distinguishable features in the science, reference and difference images that could help us to identify which alert is supernovae, or other objects. We would like to effectively discriminate among these five classes of objects.

Five classes of astronomical objects that can be separated using only the first alert. These are five examples per class, with science, reference and difference image respectively.

In summary, active galactic nuclei tend to occur at the center of galaxies. Supernovae occur usually close to a host galaxy. Asteroids are observed near the solar system plane, and they do not appear in the template image. Variable stars are found in images crowded with other stars since these are found mostly within the Milky Way. Bogus alerts have different causes, some of them are bad pixels in the camera, bad subtraction to generate the difference image, cosmic rays (very bright, concentrated and sharp regions of the image in the center of the alert), etc. As I mentioned before, there is no way a human could possibly check every alert by hand, so we need an automatic way to classify them so astronomers can check the most interesting sources that are more likely to be a supernova.

Finding Supernovae using Neural Networks

Since we roughly understand the differences between images among the five mentioned classes, in principle we could compute specific features to correctly classify them. However, handcrafting features are usually very hard and it takes a long period of trial and error. This is why we decided to train a convolutional neural network (CNN) to solve the classification problem (Carrasco-Davis et al. 2020). In this work, we used the first alert only to quickly find supernovae.

Our architecture provides rotational invariance by making 90° rotated copies of each image in the training set, to then apply average pooling to the dense representation of each rotated version of the image. Imposing rotational invariance in this problem is very helpful, since there is no particular orientation in which structures may appear in the images of the alert (Cabrera-Vives et al. 2017, E. Reyes et al. 2018). We also added part of the metadata contained in the alert, such as the position in sky coordinates, distance to other known objects, and atmospheric condition metrics. After training the model using cross-entropy, the probabilities were highly concentrated around values of 0 or 1, even in cases when the classifier was wrong in its predicted class. This is not so convenient when an expert further filters supernovae candidates after the model made a prediction. Saturated values of 0 or 1 do not give any insight about the chances of a wrong classification and second or third class guess made by the model.

Therefore, in addition to the cross-entropy term in the loss function, we added an extra to maximize the entropy of the prediction, in order to spread the values of the output probabilities (Pereyra et al. 2017). This improves the granularity or definition of predictions, obtaining probabilities in the whole range from 0 to 1 instead of being concentrated, producing much more interpretable predictions to assist the astronomer to choose good supernovae candidates to report for follow-up.

Convolutional neural network with enhanced rotational invariance. Rotated copies for each input are created and fed to the same CNN architecture, to then apply average pooling in the dense layer before concatenating with the metadata. Finally, two other fully connected layers, and a softmax are applied to obtain the predictions.

We performed inference on 400,000 objects uniformly distributed in space over the full coverage of ZTF, as a sanity check of the model predictions. It turns out that each predicted class by the CNN is spatially distributed as expected given the nature of each astronomical object. For instance, AGNs and supernovae (SNe) are mostly found outside the Milky Way plane (extragalactic objects), since it is less likely that further objects can be seen through the Milky Way plane due to occlusion. The model correctly predicts less number of objects close to the Milky Way plane (Galactic latitudes closer to 0). Variable stars are correctly found with higher density within the Galactic plane. Asteroids are found near the solar system plane, also called the ecliptic (marked as a yellow line) as expected and bogus alerts are spread everywhere. Running inference in a large unlabeled set gave us very important clues regarding biases in our training set and helped us to identify important metadata used by the CNN.

We found that the information within the images (science, reference and difference) is enough to obtain a good classification in the training set, but integrating the information from the metadata was critical to obtain the right spatial distribution of the predictions.

Spatial distribution of unlabeled set of astronomical objects. Each plot is in galactic coordinates. The galactic latitude is centered in the Milky Way, so latitudes closer to 0 are also closer to the Milky Way plane. The galactic longitude indicates which part of the disk we are seeing within the Milky Way plane. The yellow line represents the solar system plane (ecliptic).

Supernova Hunter

A vital part of this project is the web interface that allows astronomers to explore the candidates sorted by our neural network model certainty of being a supernova. The Supernova Hunter is a visualization tool that shows important information about the alert so the astronomer chooses which objects should report as supernovae. It also has a button to report wrong classifications made by our model, so we can add it to the training set to later improve the model using these examples labeled by hand.

Supernova Hunter: User interface for exploration of supernovae candidates. It shows a list with the alerts with a high probability of being a supernova. For each alert, the images of the alert, the position of the object and metadata are displayed on the web page.

Using the neural network classifier and the Supernova Hunter, we have been able to confirm 394 supernovae spectroscopically, and report 3060 supernovae candidates to the Transient Name Server, from June 26, 2019 to July 21, 2020 at a rate of 9.2 supernova candidates reported per day. This rate of discovery of supernovae is drastically increasing the amount of available supernovae in early stages of the explosion.

The Future

We are currently working on improving the classification performance of our model to have better supernova candidates and less expert assistants to report them. Ideally, we would like to have a system that is good enough to automatically report each possible supernova candidate with high confidence.

We would also like to extend our model so it can use more than a single stamp. We developed a neural network model that is able to receive a sequence of images instead of a single stamp, so every time a new image is available for a specific object, the model is able to integrate the new arriving information so it can improve the certainty of its prediction for each class.

Another key point of our effort is focused on finding rare objects using outlier detection techniques. This is a crucial task since these new telescopes will possibly reveal new kinds of astronomical objects due to the unprecedented sampling rate and the spatial depth of each observation.

We think this new way of analyzing massive amounts of astronomical data will be not only helpful but necessary. The organization, classification and redistribution of the data for the scientific community is an important part of doing science with astronomical data. This task requires expertise from different fields, such as computer science, astronomy, engineering and mathematics. The construction of new modern telescopes such as The Vera C. Rubin Observatory will drastically change the way astronomers study celestial objects, and as the ALeRCE broker we will be ready to make this possible. For more information, please visit our website, or take a look at our papers: ALeRCE presentation paper which describes the complete processing pipeline, the stamp classifier (the work described in this blogpost), and the light curve classifier, which provides a more complex classification with a larger taxonomy of classes by using the a time series called light curve.Read More

Announcing TensorFlow Lite Micro support on the ESP32

Announcing TensorFlow Lite Micro support on the ESP32

A guest article by Vikram Dattu, Aditya Patwardhan, Kedar Sovani of Espressif Systems

Introducing ESP32: The Wi-Fi MCU

We are glad to announce TensorFlow Lite Micro support for the ESP32 chipset.

The ESP32 is a Wi-Fi/BT/BLE enabled MCU (micro-controller) that is widely used by hobbyists and makers to build cool and interesting projects that sense or modify real world data/object, and also commonly deployed in smart home appliances like light bulbs, switches, refrigerators, and air conditioners to provide connectivity.
The interesting part of ESP32 is that it’s a unique SoC that can be used right from quick prototypes to high-volume production. A wide community, numerous development kits, and plethora of tutorials/SDKs make it a great vehicle for quick prototypes with almost any vertical you might be interested in. The all-in-one package (Wi-Fi/BT/MCU) and existing high volume deployments in the field make it ideal for building end-products with.

ESP32 is already being used in a number of smart-home/connected-device projects with a variety of sensors and actuators connected to the microcontroller to sense the environment and act accordingly. With TensorFlow Lite for Microcontrollers executing on ESP32, this opens up scenarios for all kinds of use-cases that are triggered by local inference. ESP32 has 2 CPU cores and a bunch of optimizations, making it easier to run heavy TF Micro workfloads. The Wi-Fi backhaul helps to raise remote events and trigger actions based on the inferences made.

Person Detection or a Door-Bell Camera?

As an example, we have modified the person_detection example that you all might be familiar with to make it a smart door-bell camera. We use the ESP-EYE developer kit for this demonstration. Note that this example uses person detection (it detects when a face is in front of the camera), and not person identification (identifying who the person is).

The ESP-EYE dev-kit includes the ESP32 Wi-Fi/BT MCU coupled with a 2MP camera.

In Action

In our example, we will use this camera to observe and send out an email notification if we detect a person in the vicinity.

Building it for yourself

  1. Order the ESP-EYE: You can get the ESP-EYE Development Kit from your favourite distributor, or from here. You will need a USB to micro-USB cable for connecting this to your Windows/Linux/macOS host.
  2. Clone the repository: https://github.com/espressif/tensorflow/
  3. Setup your development host: Setup your development host with toolchains and utilities required to cross-build for ESP32. Follow the instructions of the ESP-IDF get started guide to set up the toolchain and the ESP-IDF itself.
  4. Generate the example: The example project can be generated with the following command:
    make -f tensorflow/lite/micro/tools/make/Makefile TARGET=esp generate_doorbell_camera_esp_project
  5. Build the example:

    a. Go to the example project directory

    cd tensorflow/lite/micro/tools/make/gen/esp_xtensa-esp32/prj/doorbell_camera/esp-idf

    b. Clone the esp32-camera component with following command:

    $ git clone https://github.com/espressif/esp32-camera components/esp32-camera

    c. Configure the camera and the email address:

    idf.py menuconfig

    d. Enter the Camera Pins configuration and SMTP Configuration menus to select the camera details, and also the email details.

    e. Build the example:

    idf.py build
  6. Flash and Run the program: Use the following command to flash and run the program:
    idf.py --port /dev/ttyUSB0 flash monitor
  7. Now, whenever a person’s face is detected, the program will send out an email to the configured email address.

What Next?

Now that you have tried the door bell camera example, you may try the other applications that are part of the TF Micro repository: hello_world and micro_speech.
ESP32 is pretty powerful for a microcontroller. Clocked at 240MHz, with just a single core it can do the detection well under 1 second (roughly ~700ms; additional optimizations are on the way to reduce this even further). This leaves the second core free for other tasks from your application.
The TinyML book is an excellent resource for a thorough understanding of TensorFlow Lite for Microcontrollers.Read More

Introducing TF-Coder, a tool that writes tricky TensorFlow expressions for you!

Introducing TF-Coder, a tool that writes tricky TensorFlow expressions for you!

Posted by Kensen Shi, Google Research

When manipulating tensors, one must keep track of multiple dimensions, tensor shape and DType compatibility, and of course mathematical correctness. Additionally, there are hundreds of TensorFlow operations, and finding the right ones to use can be a challenge.

Instead of coding your tensor manipulation directly, what if you could just demonstrate it through an illustrative example and get the corresponding code automatically? TensorFlow Coder (TF-Coder) makes this possible!

TF-Coder is a program synthesis tool that helps you write TensorFlow code. First, the tool asks for an input-output example of the desired tensor transformation. Then, it runs a combinatorial search to find TensorFlow expressions that perform that transformation. TF-Coder’s output is real TensorFlow code that you can include in your projects.

The following one-minute video introduces TF-Coder, and this Colab notebook allows you to use the TF-Coder tool for your own tensor manipulation problems.

In this blog post, we’ll illustrate various scenarios where TF-Coder can help you write TensorFlow code.

Programming in TensorFlow by example

Suppose you want to “add” an M-element vector with an N-element vector in a broadcasted way to produce an M x N matrix containing all pairwise sums. Instead of digging through TensorFlow documentation to figure out how to do this, you can instead provide an input-output example (using M = 3 and N = 4):

Input tensors, as a dict mapping input variable names to example tensor values:

inputs = {
'rows': [10, 20, 30],
'cols': [1, 2, 3, 4],
}

The desired output tensor, corresponding to the provided input tensors:

output = [[11, 12, 13, 14],
[21, 22, 23, 24],
[31, 32, 33, 34]]

Given this information (already entered into the TF-Coder Colab by default), the TF-Coder tool will find the appropriate TensorFlow code automatically in a fraction of a second:

tf.add(cols, tf.expand_dims(rows, 1))

The above problem was pretty simple just to illustrate the idea of programming by example. TF-Coder can be useful for harder problems as well, as we’ll see below.

TF-Coder helps you find the right function to use

Let’s suppose you are working with a numerical feature such as the price of an item. The prices in your dataset have a wide range, e.g., from under $10 to over $1000. If these prices are used directly as features, your model may overfit to specific prices in the training data, and it may also have difficulty with outlier prices during evaluation.

To deal with these issues, you may want to use bucketing to transform the numerical prices into categorical features. For example, using bucket boundaries of [10, 50, 100, 1000] means that prices under $10 should fall into bucket 0, prices between $10 and $50 fall into bucket 1, and so on.

After choosing bucket boundaries, how do you actually map the numerical prices to the bucket indices using TensorFlow? For example, given the following bucket boundaries and item prices:

# Input tensors
boundaries = [10, 50, 100, 1000]
prices = [15, 3, 50, 90, 100, 1001]

you want to compute the bucket number for each item:

# Output tensor
bucketed_prices = [1, 0, 2, 2, 3, 4]

Although TensorFlow comes with various bucketing operations, it may be tricky to figure out which specific operation does this exact kind of bucketing. Since TF-Coder can identify hundreds of Tensor operations by behavior, you can look up the correct operation by providing an input-output example:

# Input-output example
inputs = {
'boundaries': [10, 50, 100, 1000],
'prices': [15, 3, 50, 90, 100, 1001],
}
output = [1, 0, 2, 2, 3, 4]

Within seconds, TF-Coder outputs the following solution:

tf.searchsorted(boundaries, prices, side='right')

This gives us a useful hint, and the documentation for tf.searchsorted confirms that this code indeed performs the bucketing as desired.

TF-Coder helps you combine functions in clever ways

Now let’s consider another problem: compute a 0-1 tensor that identifies the maximum element of each row of the input tensor.

# Input tensor
scores = [[0.7, 0.2, 0.1],
[0.4, 0.5, 0.1],
[0.4, 0.4, 0.2],
[0.3, 0.4, 0.3],
[0.0, 0.0, 1.0]]

# Output tensor
top_scores = [[1, 0, 0],
[0, 1, 0],
[1, 0, 0],
[0, 1, 0],
[0, 0, 1]]

Note that if the same largest element appears multiple times within a row, such as in the third row of scores, then only the first such largest element should be marked, so that every row of top_scores has exactly one entry of 1.

Unlike in the last problem, there is no single TensorFlow function that performs this computation. If you search the documentation for “max”, you may find that tf.reduce_max, tf.argmax, and tf.maximum are relevant, but which one should you use? tf.reduce_max produces [0.7, 0.5, 0.4, 0.4, 1.0], tf.argmax produces [0, 1, 0, 1, 2], and tf.maximum isn’t right because it takes two arguments. None of these look close to our desired output.

TF-Coder can help solve tricky problems like this. You can write the problem in the form of an input-output example:

# Input-output example
inputs = {
'scores': [[0.7, 0.2, 0.1],
[0.4, 0.5, 0.1],
[0.4, 0.4, 0.2],
[0.3, 0.4, 0.3],
[0.0, 0.0, 1.0]],
}
output = [[1, 0, 0],
[0, 1, 0],
[1, 0, 0],
[0, 1, 0],
[0, 0, 1]]

TF-Coder uses a combination of tf.one_hot and tf.argmax in a short solution to this problem:

tf.cast(tf.one_hot(tf.argmax(scores, axis=1), 3), tf.int32)

Through a detailed search over combinations of TensorFlow operations, TF-Coder often finds elegant solutions like this, which may simplify and speed up your TensorFlow programs.

TF-Coder helps you write correct code with less debugging

Consider normalizing lists of integer counts into probability distributions by dividing each row by the sum of that row. For instance:

# Input tensor
counts = [[0, 1, 0, 0],
[0, 1, 1, 0],
[1, 1, 1, 1]]

# Output tensor
normalized = [[0.0, 1.0, 0.0, 0.0],
[0.0, 0.5, 0.5, 0.0],
[0.25, 0.25, 0.25, 0.25]]

Even if you know relevant functions to use (tf.reduce_sum followed by tf.divide), writing the correct code is still nontrivial. A first attempt may look like this:

# First attempt
normalized = tf.divide(counts, tf.reduce_sum(counts, axis=1))

Is this right? There are many potential pitfalls to think about:

  • Is the summation axis correct, or should it be axis=0?
  • Are the shapes of counts and tf.reduce_sum(counts, axis=1) compatible for division, or do you need to reshape or transpose either of these?
  • counts and tf.reduce_sum(counts, axis=1) are both tf.int32 tensors. Can tf.int32 tensors be divided, or do you need to cast them to a float DType first?
  • Are the two arguments in the correct order, or should they be swapped?
  • Does the output have type tf.int32, tf.float32, or something else?
  • Is there a simpler or better way that was not considered?

You can give this task to TF-Coder with the following input-output example:

# Input-output example
inputs = {
'counts': [[0, 1, 0, 0],
[0, 1, 1, 0],
[1, 1, 1, 1]],
}
output = [[0.0, 1.0, 0.0, 0.0],
[0.0, 0.5, 0.5, 0.0],
[0.25, 0.25, 0.25, 0.25]]

TF-Coder’s solution is:

tf.cast(tf.divide(counts, tf.expand_dims(tf.reduce_sum(counts, axis=1), axis=1)), tf.float32)

By using TF-Coder to solve this problem, the mental burden of the exercise is reduced. When TF-Coder produces the solution above, it is guaranteed that the code correctly produces the example output when run on the example input. TF-Coder’s solution will also avoid any unnecessary steps. Thus, you can quickly deduce the answers to most of the questions above: an extra tf.expand_dims step is needed to make the shapes compatible for division, and the result of tf.divide must be cast to tf.float32 (in fact tf.divide returns a tf.float64 tensor when dividing two tf.int32 tensors). In this way, TF-Coder helps you write simple and correct code without painful debugging cycles.

Caveats

There are limitations to TF-Coder. It can currently find solutions involving 3-4 operations within a minute of searching, but solutions involving 6 or more operations are too complex to find in a reasonable amount of time. Furthermore, TF-Coder currently does not support complex or string tensors, or RaggedTensors. The full list of supported operations can be found in the Colab notebook.
In addition, TF-Coder only guarantees that its solutions work for the given input-output example. The tool searches for a simple TensorFlow expression that matches the provided input-output example, but sometimes this solution is too simple and doesn’t generalize in the intended way. It can be helpful to make the example as unambiguous as possible, which can often be achieved by adding more numbers to the input and output tensors. Please review TF-Coder’s solutions to ensure that they correctly implement the intended behavior.

Try TF-Coder yourself!

Be sure to give TF-Coder a try! Even experienced TensorFlow users at Google are learning new things with the help of TF-Coder.
You can access the tool using this Colab notebook — no download or installation is required. Follow this tutorial for a detailed walkthrough. You can also take a look at our code and documentation on GitHub and our research paper.

Note: in the Colab tool, we would like to log the problems given to TF-Coder and the resulting solutions, so that we can improve the tool and build a dataset that will accelerate program synthesis research in general, but this data collection is completely optional.Read More

Introducing Danfo.js, a Pandas-like Library in JavaScript

Introducing Danfo.js, a Pandas-like Library in JavaScript

A guest post by Rising Odegua, Independent Researcher; Stephen Oni, Data Science Nigeria

Danfo.js is an open-source JavaScript library that provides high-performance, intuitive, and easy-to-use data structures for manipulating and processing structured data. Danfo.js is heavily inspired by the Python Pandas library and provides a similar interface/API. This means that users familiar with the Pandas API and know JavaScript can easily pick it up.
One of the main goals of Danfo.js is to bring data processing, machine learning and AI tools to JavaScript developers. This is in line with our vision and essentially the vision of the TensorFlow.js team, which is to bring ML to the web. Open-source libraries like Numpy and Pandas revolutionise the ease of manipulating data in Python and lots of tools were built around them, thus driving the bubbling ecosystem of ML in Python.

Danfo.js is built on TensorFlow.js. That is, as Numpy powers Pandas arithmetic operations, we leverage TensorFlow.js to power our low-level arithmetic operations.

Some of the main features of Danfo.js

Danfo.js is fast. It is built on TensorFlow.js, and supports tensors out of the box. This means you can load Tensors in Danfo and also convert Danfo data structure to Tensors. Leveraging these two libraries, you have a data processing library on one hand (Danfo.js), and a powerful ML library on the other hand (TensorFlow.js).

In the example below, we show you how to create a Danfo DataFrame from a tensor object:

const dfd = require("danfojs-node")
const tf = require("@tensorflow/tfjs-node")

let data = tf.tensor2d([[20,30,40], [23,90, 28]])
let df = new dfd.DataFrame(data)
let tf_tensor = df.tensor
console.log(tf_tensor);
tf_tensor.print()

Output:

Tensor {
kept: false,
isDisposedInternal: false,
shape: [ 2, 3 ],
dtype: 'float32',
size: 6,
strides: [ 3 ],
dataId: {},
id: 3,
rankType: '2'
}
Tensor
[[20, 30, 40],
[23, 90, 28]]

You can easily convert Arrays, JSONs, or Objects to DataFrame objects for manipulation.

JSON object to DataFrame:

const dfd = require("danfojs-node")
json_data = [{ A: 0.4612, B: 4.28283, C: -1.509, D: -1.1352 },
{ A: 0.5112, B: -0.22863, C: -3.39059, D: 1.1632 },
{ A: 0.6911, B: -0.82863, C: -1.5059, D: 2.1352 },
{ A: 0.4692, B: -1.28863, C: 4.5059, D: 4.1632 }]
df = new dfd.DataFrame(json_data)
df.print()

Output:

Object array with column labels to DataFrame:

const dfd = require("danfojs-node")
obj_data = {'A': [“A1”, “A2”, “A3”, “A4”],
'B': ["bval1", "bval2", "bval3", "bval4"],
'C': [10, 20, 30, 40],
'D': [1.2, 3.45, 60.1, 45],
'E': ["test", "train", "test", "train"]
}
df = new dfd.DataFrame(obj_data)
df.print()

Output:

You can easily handle missing data (represented as NaN) in floating point as well as non-floating point data:

const dfd = require("danfojs-node")
let data = {"Name":["Apples", "Mango", "Banana", undefined],
"Count": [NaN, 5, NaN, 10],
"Price": [200, 300, 40, 250]}
let df = new dfd.DataFrame(data)
let df_filled = df.fillna({columns: ["Name", "Count"], values: ["Apples",
df["Count"].mean()]})
df_filled.print()

Output:

Intelligent label-based slicing, fancy indexing, and querying of large data sets:

const dfd = require("danfojs-node")
let data = { "Name": ["Apples", "Mango", "Banana", "Pear"] ,
"Count": [21, 5, 30, 10],
"Price": [200, 300, 40, 250] }

let df = new dfd.DataFrame(data)
let sub_df = df.loc({ rows: ["0:2"], columns: ["Name", "Price"] })
sub_df.print()

Output:

Robust IO tools for loading data from flat-files (CSV and delimited). Both in full and chunks:

const dfd = require("danfojs-node")
//read the first 10000 rows
dfd.read_csv("file:///home/Desktop/bigdata.csv", chunk=10000)
.then(df => {
df.tail().print()
}).catch(err=>{
console.log(err);
})

Robust data preprocessing functions like OneHotEncoders, LabelEncoders, and scalers like StandardScaler and MinMaxScaler are supported on DataFrame and Series:

const dfd = require("danfojs-node")
let data = ["dog","cat","man","dog","cat","man","man","cat"]
let series = new dfd.Series(data)
let encode = new dfd.LabelEncoder()
encode.fit(series)
let sf_enc = encode.transform(series)
let new_sf = encode.transform(["dog","man"])

Output:

Interactive, flexible and intuitive API for plotting DataFrames and Series in the browser:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<script src="https://cdn.jsdelivr.net/npm/danfojs@0.1.1/dist/index.min.js"></script>
<title>Document</title>
</head>
<body>
<div id="plot_div"></div>
<script>
dfd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv")

.then(df => {
var layout = {
title: 'A financial charts',
xaxis: {title: 'Date'},
yaxis: {title: 'Count'}
}
new_df = df.set_index({ key: "Date" })
new_df.plot("plot_div").line({ columns: ["AAPL.Open", "AAPL.High"], layout: layout
})
}).catch(err => {
console.log(err);
})
</script>
</body>
</html>

Output:

Titanic Survival Prediction using Danfo.js and Tensorflow.js
Below we show a simple end-to-end classification task using Danfo.js and TensorFlow.js. We use Danfo for data loading, manipulating and preprocessing of the dataset, and then export the tensor object.

const dfd = require("danfojs-node")
const tf = require("@tensorflow/tfjs-node")

async function load_process_data() {
let df = await dfd.read_csv("https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv")

//A feature engineering: Extract all titles from names columns
let title = df['Name'].apply((x) => { return x.split(".")[0] }).values
//replace in df
df.addColumn({ column: "Name", value: title })

//label Encode Name feature
let encoder = new dfd.LabelEncoder()
let cols = ["Sex", "Name"]
cols.forEach(col => {
encoder.fit(df[col])
enc_val = encoder.transform(df[col])
df.addColumn({ column: col, value: enc_val })
})

let Xtrain,ytrain;
Xtrain = df.iloc({ columns: [`1:`] })
ytrain = df['Survived']

// Standardize the data with MinMaxScaler
let scaler = new dfd.MinMaxScaler()
scaler.fit(Xtrain)
Xtrain = scaler.transform(Xtrain)

return [Xtrain.tensor, ytrain.tensor] //return the data as tensors
}

Next, we create a simple neural network using TensorFlow.js.

function get_model() {
const model = tf.sequential();
model.add(tf.layers.dense({ inputShape: [7], units: 124, activation: 'relu', kernelInitializer: 'leCunNormal' }));
model.add(tf.layers.dense({ units: 64, activation: 'relu' }));
model.add(tf.layers.dense({ units: 32, activation: 'relu' }));
model.add(tf.layers.dense({ units: 1, activation: "sigmoid" }))
model.summary();
return model
}

Finally, we perform training, by first loading the model and the processed data as tensors. This can be fed directly to the neural network.

async function train() {
const model = await get_model()
const data = await load_process_data()
const Xtrain = data[0]
const ytrain = data[1]

model.compile({
optimizer: "rmsprop",
loss: 'binaryCrossentropy',
metrics: ['accuracy'],
});

console.log("Training started....")
await model.fit(Xtrain, ytrain,{
batchSize: 32,
epochs: 15,
validationSplit: 0.2,
callbacks:{
onEpochEnd: async(epoch, logs)=>{
console.log(`EPOCH (${epoch + 1}): Train Accuracy: ${(logs.acc * 100).toFixed(2)},
Val Accuracy: ${(logs.val_acc * 100).toFixed(2)}n`);
}
}
});
};

train()

The reader will notice that the API of Danfo is very similar to Pandas, and a non-Javascript programmer can easily read and understand the code. You can find the full source code of the demo above here (https://gist.github.com/risenW/f54e4e5b6d92e7b1b9b1f30e884ca83c).

Closing Remarks

As web-based machine learning has matured, it is imperative to have efficient data science tools built specifically for it. Tools like Danfo.js will enable web-based applications to easily support ML features, thus opening the space to an ecosystem of exciting applications. TensorFlow.js started the revolution by providing ML capabilities available in Python, and we hope to see Danfo.js as an efficient partner in this journey. We can’t wait to see what Danfo.js grows into! Hopefully, it becomes indispensable to the web community as well.

  • Play with Danfo.js on CodePen
  • Link to the official getting started guide
  • Link to Github repository

Read More

Introducing TensorFlow Videos for a Global Audience: Japanese

Introducing TensorFlow Videos for a Global Audience: Japanese

Posted by the TensorFlow Team

When the TensorFlow YouTube channel launched in 2018, we had a vision to inform and inspire developers around the world about what was possible with Machine Learning. With series like Coding TensorFlow showing how you can use it, and Made with TensorFlow showing inspirational stories about what people have done with TensorFlow and much more, the channel has grown greatly. But we learned an important lesson: it’s a global phenomenon, and to reach the world effectively, we should provide content in multiple languages with native speakers presenting. Check out the popular Zero to Hero series in Japanese!

TensorFlow で機械学習ゼロからヒーローへ

最近は、インターネットや新聞、本などを閲覧していると、嫌でも機械学習や AI のようなバズワードが目に入ってくるようになりました。様々な分野で話題になっているおかげで、たくさんの情報が見つかるようになっています。ですが、デベロッパーの視点から見た機械学習とは、一体どういう物なのでしょうか?TensorFlow チームに所属するロレンス・モローニは、その疑問に応えるため、Google I/O 2019 でした好評だったスピーチをベースに、4 部に及ぶ動画シリーズ「機械学習: TensorFlow でゼロからヒーローへ」を作成しました。

第一部では、Java や C++ で作成された具体的なルールに従って動く従来のプログラムと、データからルール自体を推測するシステムである機械学習の違いを学ぶことができます。機械学習とは、どのようなコードで構成されているのか?などの質問に応えるため、シンプルな具体例を使って、機械学習モデルを作成する手順を解説します。ここで語られるいくつかのコンセプトは、第二部の、コンピュータ ビジョンの動画でも応用されています。

第二部では、機械学習を使った基本的なコンピュータ ビジョン(コンピューターに視覚的に確認させ、様々な対象をを認識させること)の仕組みを解説します。こちらのリンク先では、自らコードを実行してみることも可能です : https://goo.gle/34cHkDk

第三部では、なぜ畳み込みニューラル ネットワークがコンピュータ ビジョンの分野で優れているのかを解説します。畳み込みで使われるフィルタに画像を通すと、画像の類似点を明らかにする特徴を捉えてくれます。動画内では、実際に画像にフィルタを適用し、特徴を抽出するプロセスをご覧になっていただけます。
こちらのリンク先では、動画の内容をおさらいすることができます : http://bit.ly/2lGoC5f

第四部では、じゃんけん識別器の作り方を学びます。第一部では、じゃんけんの手を識別するコードを書くことの難しさについて説明しましたが、これまでの 3 つの動画で学んだことを総合し、画像のピクセルからパターンを探し出して、画像を分類し、畳み込みを使って特徴を抽出するニューラル ネットワークを作成するだけで、なんとじゃんけん識別器を自作できてしまいます。

Colab ノート : http://bit.ly/2lXXdw5

データセット : http://bit.ly/2kbV92O

動画シリーズはお楽しみいただけましたでしょうか?もっと知りたいと感じた方は、ぜひフィードバックで教えてください!Read More

Introducing Semantic Reactor: Explore NLP in Google Sheets

Introducing Semantic Reactor: Explore NLP in Google Sheets

Posted by Dale Markowitz, Applied AI Engineer

Editor’s note: An earlier version of this article was published on Dale’s blog.

Machine learning can be tricky, so being able to prototype ML apps quickly is a boon. If you’re building a language-powered app — like a video game with characters players can talk to or a customer service bot — the Semantic Reactor is a tool that will help you do just that.

The Semantic Reactor is a new plugin for Google Sheets that lets you run natural language understanding (NLU) models (variations the Universal Sentence Encoder) on your own data, right from a spreadsheet.
In this post, I’ll show you how to work with the tool and the NLU models it uses, but first, how does NLP actually work? What’s going on under the hood? (Want to skip straight to the tool? Scrolling to the next section.)

Understanding Embeddings

What are Word Embeddings?

One simple (but powerful) technique for building natural-language-powered software is to use “embeddings.”

In machine learning, embeddings are a learned way of representing data in space (i.e. points plotted on an n-dimensional grid) such that the distances between points are meaningful. Word vectors are one popular example:
The picture above is a rough visual example of how words can be closer or further away from each other. Note that the words “Austin,” “Texas,” and “barbecue” have a close relationship with each other, as do “pet” and “dog,” and “walk” and “run.” Each word is represented by a set of coordinates (or a vector) and are placed on a graph where we can see relationships. For instance, we can see that the word “rat” is close to both “pet” and also “cat”.

Where do these numbers come from? They’re learned by a machine learning model through many bits of conversational and language data. By showing all those examples, the model learns which words tend to occur in the same spots in sentences.

Consider these two sentences:

  • “My mother gave birth to a son.”
  • “My mother gave birth to a daughter.”

Because the words “daughter” and “son” are often used in similar contexts, the model will learn that they should be represented close to each other in space. Word embeddings are useful in natural language processing. They can be used to find synonyms (“semantic similarity”), to solve analogies, or as a preprocessing step for a more complicated model. You can quickly train your own basic word embeddings with TensorFlow here.

What are Sentence Embeddings?

It turns out that entire sentences (and even short paragraphs) can be effectively embedded in space too, using a type of model called a universal sentence encoder. Using sentence embeddings, we can figure out if two sentences are similar. This is useful, for example, if you’re building a chatbot and want to know if a question a user asked (i.e. “When will you wake me up?”) is semantically similar to a question you – the chatbot programmer – have anticipated and written a response to (“What time is my alarm?”).

Semantic Reactor: Prototype using NLP in a Google Sheet

Alright, now onto the fun part: Building things! There are three NLP models available in the Semantic Reactor:

  • Local – A small TensorFlow.js version of the Universal Sentence Encoder that can run entirely within a webpage.
  • Basic Online – A full sized, general-use version of the Universal Sentence Encoder.
  • Multilingual Online – A full-sized Universal Sentence Encoder model trained on question/answer pairs in 16 languages.

Each model offers two ranking methods:

  • Semantic Similarity: How similar are two blocks of text?

    Great for applications where you can anticipate what users might ask, like an FAQ bot. (Many customer service bots use semantic similarity to help deliver good answers to users.)

  • Input / Response: How good of a response is one block of text to another?

    Useful for when you have a large, and constantly changing, set of texts and you don’t know what users might ask. For instance, Talk to Books, a semantic search tool for a regularly updated collection of 100,000 books, uses input / response.

You can use the Semantic Reactor to test a response list against each model and ranking method. Sometimes it takes a good bit of experimenting before you get your response list and model selection to one you think will work for your application. The good news is that doing that work in a Google Sheet makes it fast and easy.
Once you have your response list, model selection and ranking method decided on, you can then begin writing code, and if you want to keep all operations within a website or on device (without requiring online API calls), you can use the newly updated TensorFlow.js model.
As mentioned, there are lots of great uses for NLU tech, and more interesting applications come out almost everyday. Every digital assistant, customer service bot, and search engine is likely using some flavor of machine learning. Smart Reply and Smart Compose in Gmail are two well-used features that make good use of semantic tech.
However, it’s fun and helpful to play with the tech within applications where the quality demands aren’t so high, where failure is okay and even entertaining. To that end, we’ve used the same tech that’s within the Semantic Reactor to create a couple of example games. Semantris is a word association game that uses the input-response ranking method, and The Mystery of the Three Bots uses semantic similarity.
Playing those two games, and finding out where they work and where they don’t, might give you ideas on what experiences you might create.

Semantris, a word-association game powered by word embeddings.
The Mystery of the Three Bots is a simple game powered by NLU and available as open source code. (It’s also playable here.)

One of the coolest applications of this tech comes from Anna Kipnis, a former game designer at Double Fine who now works with Stadia. She used Semantic Reactor to prototype a video game world that infers how the environment should react to player inputs using ML. Check out our conversation here.
In Anna’s game, players interact with a virtual fox by asking any question they think of:

  • “Fox, can I have some coffee?”

Then, using Semantic ML, the game engine (or the utility system) considers all of the possible ways the game might respond:

  • “Fox turns on lights.“
  • “Fox turns on radio.“
  • “Fox move to you.“
  • “Fox brings you mug.“

Using a sentence encoder model, the game decides what the best response is and executes it (in this case, the best response is “Fox brings you a mug,” so the game animates the Fox bringing you a mug). If that sounds a little abstract, definitely watch the video linked above.
Let’s see how you might build something like Anna’s game with Semantic Reactor (for all the nitty gritties of the fox demo, check out her original post).
First, create a new Google sheet and write some sentences in the first column. I put these sentences in the first column of my Google sheet:

  • I grab a ball
  • I go to you
  • I play with a ball
  • I go to school.
  • I go to the mug.
  • I bring you the mug.
  • I turn on music.
  • I take a nap.
  • I go for a hike.
  • I tell you a secret.
  • I snuggle with you.
  • I ask for a belly rub.
  • I send a text.
  • I answer the phone.
  • I make a sandwich.
  • I drink some water.
  • I play a board game.
  • I do some coding.

You’ll have to use your imagination here and think of these “actions” that a potential character (e.g. a chatbot or an actor in a video game) might take.
Once you’ve applied for and been given access to Semantic Reactor, you’ll be able to enable it by clicking on “Add-ons -> Semantic Reactor -> Start”. Clicking “Start” will open a panel that allows you to type in an input and hit “React”: When you hit “React”, Semantic Reactor uses a model to embed all of the responses you’ve written in that first column, calculate a score (how good a response is this sentence to the query?), and sort the results. For example, when my input was “I want some coffee,” the top ranked responses from my spreadsheet were, “I go to the mug” and “I bring you the mug.” You’ll also notice that there are two different ways to rank sentences using this tool: “Input/Response” and “Semantic Similarity.” As the name implies, the former ranks sentences by how good they are as responses to the given query, whereas “Semantic Similarity” simply rates how similar the sentences are to the query.

From Spreadsheet to Code with TensorFlow.js

Underneath the hood, Semantic Reactor is powered by the open-source TensorFlow.js models found here.
Let’s take a look at how to use those models in JavaScript, so that you can convert your spreadsheet prototype into a working app.
1 – Create a new Node project and install the module:

npm init
npm install @tensorflow/tfjs @tensorflow-models/universal-sentence-encoder

2 – Create a new file (use_demo.js) and require the library:

require('@tensorflow/tfjs');
const encoder = require('@tensorflow-models/universal-sentence-encoder');

3 – Load the model:

const model = await encoder.loadQnA();

4 – Encode your sentences and query:

const input = {
queries: ["I want some coffee"],
responses: [
"I grab a ball",
"I go to you",
"I play with a ball",
"I go to school.",
"I go to the mug.",
"I bring you the mug."
]
};

const embeddings = await model.embed(input);

5 – Voila! You’ve transformed your responses and query into vectors. Unfortunately, vectors are just points in space. To rank the responses, you’ll want to compute the distance between those points (you can do this by computing the dot product, which gives you the squared Euclidean distance between points):

  //zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
const zipWith =
(f, xs, ys) => {
const ny = ys.length;
return (xs.length .map((x, i) => f(x, ys[i]));
}

// Calculate the dot product of two vector arrays.
const dotProduct = (xs, ys) => {
const sum = xs => xs ? xs.reduce((a, b) => a + b, 0) : undefined;

return xs.length === ys.length ?
sum(zipWith((a, b) => a * b, xs, ys))
: undefined;
}

If you run this code, you should see output like:

 [
{ response: 'I grab a ball', score: 10.788130270345432 },
{ response: 'I go to you', score: 11.597091717283469 },
{ response: 'I play with a ball', score: 9.346379028479209 },
{ response: 'I go to school.', score: 10.130473646521292 },
{ response: 'I go to the mug.', score: 12.475453722603106 },
{ response: 'I bring you the mug.', score: 13.229019199245684 }
]

Check out the full code sample here.
And that’s it–that’s how you go from a Semantic ML spreadsheet to code fast!
An earlier version of this post was published at https://daleonai.com/semantic-ml.Read More

Optimizing Peptides in TensorFlow 2

Optimizing Peptides in TensorFlow 2

A guest post by Somesh Mohapatra, Rafael Gómez-Bombarelli of MIT

Introduction

A polymer is a material made up of long repeating chains of molecules, like plastic or rubber. Polymers are made up of subunits (monomers) that are chemically bound to one another. The chemical composition and arrangement of monomers dictate the properties of the polymer. A few examples of polymers in everyday use are water bottles, non-stick teflon coatings, and adhesives.

Figure 1. Conceptually, you can think of Peptimizer as generating a sequence of amino acids, then predicting a property of the peptide, then optimizing the sequence.

Peptides are short polymer chains made up of amino acids, analogous to words composed of letters. They are widely used for therapeutic applications, such as for the delivery of gene therapy by cell-penetrating peptides. Thanks to their modular chemistry amenable to automated synthesis and expansive design space, peptides are increasingly preferred over more conventional small molecule drugs, which are harder to synthesize. However, the vast sequence space (in terms of the amino acid arrangement) acts as an impediment to the design of functional peptides.

Synthetic accessibility, apart from functionality optimization, is a challenge. Peptides and other functional polymers with a precise arrangement of monomers are synthesized using methods such as flow chemistry. The synthesis involves monomer-by-monomer addition to a growing polymer chain. This process necessitates a high reaction yield for every step, thus making the accessibility of longer chains challenging.

Conventional approaches for optimization of functional polymers, such as peptides, in a lab environment involve the heuristic exploration of chemical space by trial-and-error. However, the number of possible polymers rises exponentially as mn, where m is the number of possible monomers, and n is the polymer length.

As an alternative to doing an experiment in a lab, you can design functional polymers using machine learning. In our work on optimizing cell-penetrating activity and synthetic accessibility, we design peptides using Peptimizer, a machine learning framework based on TensorFlow. Conceptually, you can think of Peptimizer as generating a sequence of amino acids, then predicting a property of the peptide, then optimizing the sequence.

Peptimizer can be used for the optimization of functionality (other than cell-penetrating activity as well) and synthetic accessibility of polymers. We use topological representations of monomers (amino acids) and matrix representations of polymer chains (peptide sequences) to develop interpretable (attribute the gain in property to a specific monomer and/or chemical substructure) machine learning models. The choice of representation and model architecture enables inference of biochemical design principles, such as monomer composition, sequence length or net charge of polymer, by using gradient-based attribution methods.

Key challenges for applying machine learning to advance functional peptide design include limited dataset size (usually less than 100 data points), choosing effective representations, and the ability to explain and interpret models.

Here, we use a dataset of peptides received from our experimental collaborators to demonstrate the utility of the codebase.

Optimization of functionality

Based on our work on designing novel and highly efficient cell-penetrating peptides, we present a framework for the discovery of functional polymers (Figure 1). The framework consists of a recurrent neural network generator, convolutional neural network predictor, and genetic algorithm optimizer.

The generator is trained on a dataset of peptide sequences using Teacher Forcing, and enables sampling of novel sequences similar to the ones in the training dataset. The predictor is trained over matrix representations of sequences and experimentally determined biological activity. The optimizer is seeded with sequences sampled utilizing the generator. It optimizes by evaluating an objective function that involves the predicted activity and other parameters such as length and arginine content. The outcome is a list of optimized sequences with high predicted activity, which may be validated in wet-lab experiments.

Each of these components can be accessed from the tutorial notebook to train on a custom dataset. The scripts for the individual components have been designed in a modular fashion and can be modified with relative ease.

Optimization of synthetic accessibility

Apart from functionality optimization, Peptimizer allows for the optimization of synthetic accessibility of a wild-type sequence (Figure 2). The framework consists of a multi-modal convolutional neural network predictor and a brute force optimizer. The predictor is trained over experimental synthesis parameters such as pre-synthesized chain, incoming monomer, temperature, flow rate, and catalysts. The optimizer evaluates single-point mutants of the wild-type sequence for higher theoretical yield.

The choice of a brute force optimizer for optimization of synthetic accessibility is based on the linearly growing sequence space (m x n) for the variations of the wild-type sequence. This sequence space is relatively small in comparison to the exponentially growing sequence space (mn) encountered in optimization of functionality.

This framework may be adapted for other stepwise chemical reaction platforms with in-line monitoring by specifying the different input and output variables and respective data types. It can be accessed using a tutorial notebook.

Figure 2. Outline of synthetic accessibility optimization.

Interpretability of models

A key feature of Peptimizer is the gradient-based attribution for the interpretation of model predictions (Figure 3). Taking the gradient of the predicted activity with the input sequence representation, we visualize both positive and negative activations for each input feature. Fingerprint indices corresponding to substructures that positively contribute to the activity have higher activation in the heatmap. This activation heatmap is averaged along the topological fingerprints axis to find key substructures or chemical motifs that contribute positively/negatively to the predicted activity. Averaging over the monomer position axis, we obtain the relative contribution of each monomer to the predicted functionality of the polymer. These visualizations provide in-depth insight into sequence-activity relationships and add to the contemporary understanding of biochemical design principles.

Figure 3. (left) Positive gradient activation heatmap, and (right) activated chemical substructure, for functional peptide sequence.

Outlook

Optimization of functional polymers using Peptimizer can inform experimental strategies and lead to significant savings in terms of time and costs. We believe that the tutorial notebooks will help bench scientists in chemistry, materials science, and the broader field of sequence design to run machine learning models over custom datasets, such as Khazana. In addition, the attribution methods will provide insights into the high-dimensional sequence-activity relationships and elucidation of design principles.

Experimental collaboration

This work was done in collaboration with the lab of Bradley Pentelute (Department of Chemistry, MIT). The collaborators for the optimization of functionality and synthetic accessibility were Carly Schissel and Dr. Nina Hartrampf, respectively. We thank them for providing the dataset, experimental validation, and the discussion during the development of the models.

Acknowledgment

We would like to acknowledge the support of Thiru Palanisamy and Josh Gordon at Google for their help with the blog post collaboration and with providing active feedback.Read More

Even Faster Mobile GPU Inference with OpenCL

Even Faster Mobile GPU Inference with OpenCL

Posted by Juhyun Lee and Raman Sarokin, Software Engineers

While the TensorFlow Lite (TFLite) GPU team continuously improves the existing OpenGL-based mobile GPU inference engine, we also keep investigating other technologies. One of those experiments turned out quite successful, and we are excited to announce the official launch of OpenCL-based mobile GPU inference engine for Android, which offers up to ~2x speedup over our existing OpenGL backend, on reasonably sized neural networks that have enough workload for the GPU.

Figure 1. Duo’s AR effects are powered by our OpenCL backend.

Improvements over the OpenGL Backend

Historically, OpenGL is an API designed for rendering vector graphics. Compute shaders were added with OpenGL ES 3.1, but its backward compatible API design decisions were limiting us from reaching the full potential of the GPU. OpenCL, on the other hand, was designed for computation with various accelerators from the beginning and is thus more relevant to our domain of mobile GPU inference. Therefore, we have looked into an OpenCL-based inference engine, and it brings quite a lot of features that let us optimize our mobile GPU inference engine.

Performance Profiling: Optimizing the OpenCL backend was much easier than OpenGL, because OpenCL offers good profiling features and Adreno supports them well. With these profiling APIs, we are able to measure the performance of each kernel dispatch very precisely.

Optimized Workgroup Sizes: We have observed that the performance of TFLite GPU on Qualcomm Adreno GPUs is very sensitive to workgroup sizes; picking the right workgroup size can boost the performance, whereby picking the wrong one can degrade the performance by an equal amount. Unfortunately, picking the right workgroup size is not trivial for complex kernels with complicated memory access patterns. With the help of the aforementioned performance profiling features in OpenCL, we were able to implement an optimizer for workgroup sizes, which resulted in up to 50% speedup over the average.

Native 16-bit Precision Floating Point (FP16): OpenCL supports FP16 natively and requires the accelerator to specify the data type’s availability. Being a part of the official spec, even some of the older GPUs, e.g. Adreno 305 from 2012, can operate at their full capabilities. OpenGL, on the other hand, relies on hints which the vendors can choose to ignore in their implementations, leading to no performance guarantees.

Constant Memory: OpenCL has a concept of constant memory. Qualcomm added a physical memory that has properties that makes it ideal to be used with OpenCL’s constant memory. This turned out to be very efficient for certain special cases, e.g. very thin layers at the beginning or at the end of the neural network. OpenCL on Adreno is able to greatly outperform OpenGL’s performance by having a synergy with this physical constant memory and the aforementioned native FP16 support.

Performance Evaluation

Below, we show the performance of TFLite on the CPU (single-threaded on a big core), on the GPU using our existing OpenGL backend, and on the GPU using our new OpenCL backend. Figure 2 and Figure 3 depict the performance of the inference engine on select Android devices with OpenCL on a couple of well-known neural networks, MNASNet 1.3 and SSD MobileNet v3 (large), respectively. Each group of 3 bars are to be observed independently which shows the relative speedup among the TFLite backends on a device. Our new OpenCL backend is roughly twice as fast as the OpenGL backend, but does particularly better on Adreno devices (annotated with SD), as we have tuned the workgroup sizes with Adreno’s performance profilers mentioned earlier. Also, the difference between Figure 2 and Figure 3 visualizes that OpenCL performs even better on larger networks.

Figure 2. Inference latency of MNASNet 1.3 on select Android devices with OpenCL.
Figure 3. Inference latency of SSD MobileNet v3 (large) on select Android devices with OpenCL.

Seamless Integration through the GPU Delegate

One major hurdle in employing the OpenCL inference engine is that OpenCL is not a part of the standard Android distribution. While major Android vendors include OpenCL as part of their system library, it is possible that OpenCL is not available for some users. For these devices, one needs to fall back to the OpenGL backend which is available on every Android device.

To make developers’ life easy, we have added a couple of modifications to the TFLite GPU delegate. We first check the availability of OpenCL at runtime. If it is available, we employ the new OpenCL backend as it is much faster than the OpenGL backend; if it is unavailable or couldn’t be loaded, we fall back to the existing OpenGL backend. In fact, the OpenCL backend has been in the TensorFlow repository since mid 2019 and seamlessly integrated through the TFLite GPU delegate v2, so you might be already using it through the delegate’s fallback mechanism.

Acknowledgements

Andrei Kulik, Matthias Grundman, Jared Duke, Sarah Sirajuddin, and special thanks to Sachin Joglekar for his contributions to this blog post.
Read More

Creating Sounds Of India: An on device, AI powered, musical experience built with TensorFlow

Creating Sounds Of India: An on device, AI powered, musical experience built with TensorFlow

Posted by Anusha Ramesh, Product Manager TFX, David Zats, Software Engineer TFX, Ping Yu, Software Engineer TensorFlow.js, Lamtharn (Hanoi) Hantrakul, AI Resident Magenta

Introduction

Sounds of India is a unique and fun interactive musical experience launching for India’s 74th Independence Day, inspired by Indian tradition and powered by machine learning. When users throughout India (and around the world) sing the Indian National Anthem into the microphone of their mobile devices, machine learning models transform their voices into a range of classical Indian musical instruments live in the browser. The entire process of creating this experience took only 12 weeks, showing how rapidly developers can take models from research to production at scale using the TensorFlow Ecosystem.

The Research: Magenta’s Differentiable Digital Signal Processing (DDSP)

Magenta is an open source research project within Google AI exploring the role of machine learning in the creative process. Differentiable Digital Signal Processing or DDSP is a new open source library fusing modern machine learning with interpretable signal processing. Instead of training a pure deep learning model like WaveNet to render waveforms sample-by-sample, we can train lightweight models that output time varying control signals into these differentiable DSP modules (hence the extra “D” in DDSP) which synthesize the final sound. Both recurrent and convolutional models incorporating DDSP in TensorFlow Keras layers can efficiently generate audio 1000 times faster than their larger autoregressive counterparts, with 100x reduction in model parameters and training data requirements. One particularly fun application of DDSP is Tone Transfer, which transforms sounds into musical instruments. Try it by first training a DDSP model on 15 minutes of a target saxophone. You can then sing a melody and the trained DDSP model will re-render it as a saxophone. For Sounds of India, we applied this technology to three classical Indian instruments: the Bansuri, the Shehnai, and the Sarangi.

Train with TFX, deploy to the browser with TensorFlow.js

TFX

TensorFlow Extended (TFX) is an end-to-end platform for production ML, which includes preparing data, training, validating, and deploying models in production environments. TFX was used to train the models responsible for transforming the user’s voice to one of the instruments, and these models were then converted to TensorFlow.js for deployment on a standard Web browser.

Deploying to the browser provides a seamless experience for users to interact with the machine learning model: simply click a hyperlink and load the page just like any other website. No complicated installs necessary. By executing client side in the browser, we are able to perform inference right at the source of the sensor data, minimising latency and reducing server costs associated with large graphics cards, CPU, and memory. Moreover, given the application uses your voice as input, user privacy is quite important. Since the entire end-to-end experience happens client-side and in the browser, absolutely no sensor or microphone data is sent to the server side.

Browser-based machine learning models are often optimized to be as small as possible to minimize bandwidth used. In this case, the ideal hyperparameters for each musical instrument can also vary drastically. We leveraged TFX to perform large-scale training and tuning over hundreds of models to determine the smallest size for each instrument. As a result, we were able to dramatically reduce their memory footprints. For example, the Bansuri instrument model had a reduction in its on-disk size of ~20x without a noticeable impact on sound quality.

TFX also empowered us to perform rapid iteration over different model architectures (GRU, CNN), different types of inputs (loudness, RMS energy), and varying musical instrument data sources. Each time, we were able to quickly and effectively run the TFX pipeline to produce a new model with the desired characteristics.

TensorFlow.js

Creating a TensorFlow.js DDSP model was uniquely challenging because of the need to hit tight performance and model quality targets. We needed the model to be highly efficient at performing tone transfer so that it could effectively run on mobile devices. At the same time, any degradation in model quality would quickly lead to audio distortions and a poor user experience.

We started by exploring a wide range of TensorFlow.js backends and model architectures. The WebGL backend is the most optimized, while the WebAssembly backend works well on low end phones. Given the computational requirements of DDSP, we settled on a Convnet-based DDSP model and leveraged the WebGL backend.

To minimize the model download time, we studied the topology of the model, and compressed a large set of constant tensors with Fill/ZeroLike ops, which reduced the size from 10MB to 300KB.

We also focused on three key areas to make the TensorFlow.js model ready for production scale deployment on devices: inference performance, memory footprint, and numerical stability.

Inference Performance Optimization
DDSP models contain both a neural network and a signal synthesizer. The synthesizer part has many signal processing ops that require large amounts of computation. To improve performance on mobile devices, we re-wrote several kernels with special WebGL shaders to fully utilize the GPU. For example, a parallel version of the cumulative summation op reduced inference time by 90%.

Reduce memory footprint
Our goal is to be able to run the model on as many types of mobile devices as possible. Since many phones have very limited GPU memory, we need to make sure that model has a minimal memory footprint. We achieve this by disposing of intermediate tensors and adding a new flag to allow early disposal of GPU textures. Through these approaches we were able to reduce memory size by 60%.

Numerical stability
The DDSP model requires very high numerical precision in order to generate beautiful music. This is quite different from typical classification models, where a certain level of precision loss does not affect the final classifications. DDSP models used in this experience are generative models. Any loss in precision and discontinuities in the audio output are easily picked up by our sensitive ears. We encountered numerical stability problems with float16 WebGL texture. We therefore rewrote some of the key ops to reduce the overflow and underflow of the outputs. For example, in the Cumulative Summation op, we make sure cumulation is done within the shader with full float precision, and apply modulo calculation to avoid overflow before we write the output to a float16 texture.

Try it yourself!

You can try out the experience on your mobile phone at g.co/SoundsofIndia – and please share your results with us if you wish. We would love to see what you create with your voice.

If you are excited about how machine learning can augment creativity and innovation, you can learn more about Magenta through the team’s blog and contribute to their open source github, or check out #MadeWithTFJS for even more examples of browser-based machine learning from the TensorFlow.js community. If you are interested in training and deploying models at production scale using ML best practices, check out the Tensorflow Extended blog.

Acknowledgements

This project wouldn’t have been possible without the incredible effort of Miguel de Andrés-Clavera, Yiling Liu, Aditya Mirchandani, KC Chung, Alap Bharadwaj, Kiattiyot (Boon) Panichprecha, Pittayathorn (Kim) Nomrak, Phatchara (Lek) Pongsakorntorn, Nattadet Chinthanathatset, Hieu Dang, Ann Yuan, Sandeep Gupta, Chong Li, Edwin Toh, Jesse Engel and additional help from Michelle Carney, Nida Zada, Doug Eck, Hannes Widsomer and Greg Mikels. Huge thanks to Tris Warkentin and Mitch Trott for their tremendous support.
Read More

TensorFlow Model Optimization Toolkit — Weight Clustering API

TensorFlow Model Optimization Toolkit — Weight Clustering API

A guest post by Mohamed Nour Abouelseoud, and Anton Kachatkou at Arm

We are excited to introduce a weight clustering API, proposed and contributed by Arm, to the TensorFlow Model Optimization Toolkit.

Weight clustering is a technique to reduce the storage and transfer size of your model by replacing many unique parameter values with a smaller number of unique values. This benefit applies to all deployments. Along with framework and hardware-specific support, such as in the Arm Ethos-N and Ethos-U machine learning processors, weight clustering can additionally improve memory footprint and inference speed.

This work is part of the toolkit’s roadmap to support the development of smaller and faster ML models. You can see previous posts on post-training quantization, quantization-aware training, and sparsity for more background on the toolkit and what it can do.

Arm and the TensorFlow team have been collaborating in this space to improve deployment to mobile and IoT devices.

What is weight clustering?

Increasingly, Deep Learning applications are moving into more resource-constrained environments, from smartphones to agricultural sensors and medical instruments. This shift into resource-constrained environments led to efforts for smaller and more efficient model architectures as well as increased emphasis on model optimization techniques such as pruning and quantization.

Weight clustering is an optimization algorithm to reduce the storage and network transfer size of your model. The idea in a nutshell is explained in the diagram below.

Here’s an explanation of the diagram. Imagine, for example, that a layer in your model contains a 4×4 matrix of weights (represented by the “weight matrix” above). Each weight is stored using a float32 value. When you save the model, you are storing 16 unique float32 values to disk.

Weight clustering reduces the size of your model by replacing similar weights in a layer with the same value. These values are found by running a clustering algorithm over the model’s trained weights. The user can specify the number of clusters (in this case, 4). This step is shown in “Get centroids” in the diagram above, and the 4 centroid values are shown in the “Centroid” table. Each centroid value has an index (0-3).

Next, each weight in the weight matrix is replaced with its centroid’s index. This step is shown in “Assign indices”. Now, instead of storing the original weight matrix, the weight clustering algorithm can store the modified matrix shown in “Pull indices” (containing the index of the centroid values), and the centroid values themselves.

In this case, we have reduced the size from 16 unique floats, to 4 floats and 16 2-bit indices. The savings increase with larger matrix sizes.

Note that even if we still stored 16 floats, they now have just 4 distinct values. Common compression tools (like zip) can now take advantage of the redundancy in the data to achieve higher compression.

The technical implementation of clustering is derived from Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. See the paper for additional details on the gradient update and weight retrieval.

Clustering is available through a simple Keras API, in which any Keras model (or layer) can be wrapped and fine-tuned. See usage examples below.

Advantages of weight clustering

Weight clustering has an immediate advantage in reducing model storage and transfer size across serialization formats, as a model with shared parameters has a much higher compression rate than one without. This is similar to a sparse (pruned) model, except that the compression benefit is achieved through reducing the number of unique weights, while pruning achieves it through setting weights below a certain threshold to zero. Once a Keras model is clustered, the benefit of the reduced size is available by passing it through any common compression tool.

To further unlock the improvements in memory usage and speed at inference time associated with clustering, specialized run-time or compiler software and dedicated machine learning hardware is required. Examples include the Arm ML Ethos-N driver stack for the Ethos-N processor and the Ethos-U Vela compiler for the Ethos-U processor. Both examples currently require quantizing and converting optimized Keras models to TensorFlow Lite first.

Clustering can be done on its own or as part of a cascaded Deep Compression optimization pipeline to achieve further size reduction and inference speed.

Compression and accuracy results

Experiments were run on several popular models, demonstrating compression benefits of weight clustering. More aggressive optimizations can be applied, but at the cost of accuracy. Though the table below includes measurements for TensorFlow Lite models, similar benefits are observed for other serialization formats such as SavedModel.

The table below demonstrates how clustering was configured to achieve the results. Some models were more prone to accuracy degradation from aggressive clustering, in which case selective clustering was used on layers that are more robust to optimization.

Clustering a model

The clustering API is available in the TensorFlow Model Optimization Toolkit starting from release v0.4.0. To cluster a model, it needs to be fully trained first before passing it to the clustering API. A snippet of full model clustering is shown below.

import tensorflow_model_optimization as tfmot
cluster_weights = tfmot.clustering.keras.cluster_weights


pretrained_model = pretrained_model()

clustering_params = {
'number_of_clusters': 32,
'cluster_centroids_init': tfmot.clustering.keras.CentroidInitialization.LINEAR
}

clustered_model = cluster_weights(pretrained_model, **clustering_params)

# Fine-tune
clustered_model.fit(...)


# Prepare model for serving by removing training-only variables.
model_for_serving = tfmot.clustering.keras.strip_clustering(clustered_model)

...

To cluster select layers in a model, you can apply the same clustering method to those layers when constructing a model.

clustered_model = tf.keras.Sequential([
Dense(...),
cluster_weights(Dense(...,
kernel_initializer=pretrained_weights,
bias_initializer=pretrained_bias),
**clustering_params),
Dense(...)
])

When selectively clustering a layer, it still needs to have been fully trained; therefore, we use the layer’s kernel_initializer parameter to initialize the weights. Using tf.keras.models.clone_model is another option.

Documentation

To learn more about how to use the API, you can try this simple end-to-end clustering example colab to start. A more comprehensive guide with additional tips can be found here.

Acknowledgments

The feature and results presented in this post are the work of many people including the Arm ML Tooling team and our collaborators in Google’s TensorFlow Model Optimization Toolkit team.

From Arm – Anton Kachatkou, Aron Virginas-Tar, Ruomei Yan, Konstantin Sofeikov, Saoirse Stewart, Peng Sun, Elena Zhelezina, Gergely Nagy, Les Bell, Matteo Martincigh, Grant Watson, Diego Russo, Benjamin Klimczak, Thibaut Goetghebuer-Planchon.

From Google – Alan Chiao, Raziel Alvarez
Read More