Accelerating TensorFlow Performance on Mac

Accelerating TensorFlow Performance on Mac

Posted by Pankaj Kanwar and Fred Alcober

Apple M1 logo

With TensorFlow 2, best-in-class training performance on a variety of different platforms, devices and hardware enables developers, engineers, and researchers to work on their preferred platform. TensorFlow users on Intel Macs or Macs powered by Apple’s new M1 chip can now take advantage of accelerated training using Apple’s Mac-optimized version of TensorFlow 2.4 and the new ML Compute framework. These improvements, combined with the ability of Apple developers being able to execute TensorFlow on iOS through TensorFlow Lite, continue to showcase TensorFlow’s breadth and depth in supporting high-performance ML execution on Apple hardware.

Performance on the Mac with ML Compute

The Mac has long been a popular platform for developers, engineers, and researchers. With Apple’s announcement last week, featuring an updated lineup of Macs that contain the new M1 chip, Apple’s Mac-optimized version of TensorFlow 2.4 leverages the full power of the Mac with a huge jump in performance.

ML Compute, Apple’s new framework that powers training for TensorFlow models right on the Mac, now lets you take advantage of accelerated CPU and GPU training on both M1- and Intel-powered Macs.

For example, the M1 chip contains a powerful new 8-Core CPU and up to 8-core GPU that are optimized for ML training tasks right on the Mac. In the graphs below, you can see how Mac-optimized TensorFlow 2.4 can deliver huge performance increases on both M1- and Intel-powered Macs with popular models.

Training impact on common models using ML Compute on M1- and Intel-powered 13-inch MacBook Pro are shown in seconds per batch, with lower numbers indicating faster training time.
Training impact on common models using ML Compute on M1- and Intel-powered 13-inch MacBook Pro are shown in seconds per batch, with lower numbers indicating faster training time.
Training impact on common models using ML Compute on the Intel-powered 2019 Mac Pro
Training impact on common models using ML Compute on the Intel-powered 2019 Mac Pro are shown in seconds per batch, with lower numbers indicating faster training time.

Getting Started with Mac-optimized TensorFlow

Users do not need to make any changes to their existing TensorFlow scripts to use ML Compute as a backend for TensorFlow and TensorFlow Addons.

To get started, visit Apple’s GitHub repo for instructions to download and install the Mac-optimized TensorFlow 2.4 fork.

In the near future, we’ll be making updates like this even easier for users to get these performance numbers by integrating the forked version into the TensorFlow master branch.

You can learn more about the ML Compute framework on Apple’s Machine Learning website.

Footnotes:

  1. Testing conducted by Apple in October and November 2020 using a preproduction 13-inch MacBook Pro system with Apple M1 chip, 16GB of RAM, and 256GB SSD, as well as a production 1.7GHz quad-core Intel Core i7-based 13-inch MacBook Pro system with Intel Iris Plus Graphics 645, 16GB of RAM, and 2TB SSD. Tested with prerelease macOS Big Sur, TensorFlow 2.3, prerelease TensorFlow 2.4, ResNet50V2 with fine-tuning, CycleGAN, Style Transfer, MobileNetV3, and DenseNet121. Performance tests are conducted using specific computer systems and reflect the approximate performance of MacBook Pro.
  2. Testing conducted by Apple in October and November 2020 using a production 3.2GHz 16-core Intel Xeon W-based Mac Pro system with 32GB of RAM, AMD Radeon Pro Vega II Duo graphics with 64GB of HBM2, and 256GB SSD. Tested with prerelease macOS Big Sur, TensorFlow 2.3, prerelease TensorFlow 2.4, ResNet50V2 with fine-tuning, CycleGAN, Style Transfer, MobileNetV3, and DenseNet121. Performance tests are conducted using specific computer systems and reflect the approximate performance of Mac Pro.

Read More

Applying MinDiff to Improve Model Fairness

Applying MinDiff to Improve Model Fairness

Posted by Summer Misherghi and Thomas Greenspan, Software Engineers, Google Research

Last December, we open-sourced Fairness Indicators, a platform that enables sliced evaluation of machine learning model performance. This type of responsible evaluation is a crucial first step toward avoiding bias as it allows us to determine how our models are working for a wide variety of users. When we do identify that our model underperforms on certain slices of our data, we need a strategy to mitigate this to avoid creating or reinforcing unfair bias, in line with Google’s AI Principles.

Today, we’re announcing MinDiff, a technique for addressing unfair bias in machine learning models. Given two slices of data, MinDiff works by penalizing your model for differences in the distributions of scores between the two sets. As the model trains, it will try to minimize the penalty by bringing the distributions closer together. MinDiff is the first in what will ultimately be a larger Model Remediation Library of techniques, each suitable for different use cases. To learn about the research and theory behind MinDiff, please see our post on the Google AI Blog.

MinDiff Walkthrough

You can follow along and run the code yourself in this MinDiff notebook. In this walkthrough, we’ll emphasize important points in the notebook, while providing context on fairness evaluation and remediation.

In this example, we are training a text classifier to identify written content that could be considered “toxic.” For this task, our baseline model will be a simple Keras sequential model pre-trained on the Civil Comments dataset. Since this text classifier could be used to automatically moderate forums on the internet (for example, to flag potentially toxic comments), we want to ensure that it works well for everyone. You can read more about how fairness problems can arise in automated content moderation in this blog post.

To attempt to mitigate potential fairness concerns, we will:

  1. Evaluate our baseline model’s performance on text containing references to sensitive groups.
  2. Improve performance on any underperforming groups by training with MinDiff.
  3. Evaluate the new model’s performance on our chosen metric.

Our purpose is to demonstrate usage of the MinDiff technique for you with a minimal workflow, not to lay out a complete approach to fairness in machine learning. Our evaluation will only focus on one sensitive category and a single metric. We also don’t address potential shortcomings in the dataset, nor tune our configurations.

In a production setting, you would want to approach each of these with more rigor. For example:

  • Consider the application space and the potential societal impact of your model; what are the implications of different types of model errors?
  • Consider additional categories for which underperformance might have fairness implications. Do you have sufficient examples for groups in each category?
  • Consider any privacy implications to storing the sensitive categories.
  • Consider any metric for which poor performance could translate into harmful outcomes.
  • Conduct thorough evaluation for all relevant metrics on multiple sensitive categories.
  • Experiment with the configuration of MinDiff by tuning hyperparameters to get optimal performance.

For the purpose of this blog post, we’ll skip building and training our baseline model, and jump right to evaluating its performance. We’ve used some utility functions to compute our metrics and we’re ready to visualize evaluation results (See “Render Evaluation Results” in the notebook):

 widget_view.render_fairness_indicator(eval_result)  
TensorFlow image

Let’s look at the evaluation results. Try selecting the metric false positive rate (FPR) with threshold 0.450. We can see that the model does not perform as well for some religious groups as for others, displaying a much higher FPR. Note the wide confidence intervals on some groups because they have too few examples. This makes it difficult to say with certainty that there is a significant difference in performance for these slices. We may want to collect more examples to address this issue. We can, however, attempt to apply MinDiff for the two groups that we are confident are underperforming.

We’ve chosen to focus on FPR, because a higher FPR means that comments referencing these identity groups are more likely to be incorrectly flagged as toxic. This could lead to inequitable outcomes for users engaging in dialogue about religion, but note that disparities in other metrics can lead to other types of harm.

Now, we’ll try to improve the FPR for religious groups for which our model underperforms. We’ll attempt to do so using MinDiff, a remediation technique that seeks to balance error rates across slices of your data by penalizing disparities in performance during training. When we apply MinDiff, model performance may degrade slightly on other slices. As such, our goals with MinDiff will be to improve performance for underperforming groups, while sustaining strong performance for other groups and overall.

To use MinDiff, we create two additional data splits:

  • A split for non-toxic examples referencing minority groups: In our case, this will include comments with references to our underperforming identity terms. We don’t include some of the groups because there are too few examples, leading to higher uncertainty with wide confidence interval ranges.
  • A split for non-toxic examples referencing the majority group.

It’s important to have sufficient examples belonging to the underperforming classes. Based on your model architecture, data distribution, and MinDiff configuration, the amount of data needed can vary significantly. In past applications, we have seen MinDiff work well with at least 5,000 examples in each data split.

In our case, the groups in the minority splits have example quantities of 9,688 and 3,906. Note the class imbalances in the dataset; in practice, this could be cause for concern, but we won’t seek to address them in this notebook since our intention is just to demonstrate MinDiff.

We select only negative examples for these groups, so that MinDiff can optimize on getting these examples right. It may seem counterintuitive to carve out sets of ground truth negative examples if we’re primarily concerned with disparities in false positive rate, but remember that a false positive prediction is a ground truth negative example that’s incorrectly classified as positive, which is the issue we’re trying to address.

To prepare our data splits, we create masks for the sensitive & non-sensitive groups:

minority_mask = data_train.religion.apply(
lambda x: any(religion in x for religion in ('jewish', 'muslim')))
majority_mask = data_train.religion.apply(
lambda x: x == "['christian']")

Next, we select negative examples, so MinDiff will be able to reduce FPR for sensitive groups:

true_negative_mask = data_train['toxicity'] == 0

data_train_main = copy.copy(data_train)
data_train_sensitive = (
data_train[minority_mask & true_negative_mask])
data_train_nonsensitive = (
data_train[majority_mask & true_negative_mask])

To start training with MinDiff, we need to convert our data to TensorFlow Datasets (not shown here — see “Create MinDiff Datasets” in the notebook for details). Don’t forget to batch your data for training. In our case, we set the batch sizes to the same value as the original dataset but this is not a requirement and in practice should be tuned.

dataset_train_sensitive = dataset_train_sensitive.batch(BATCH_SIZE)
dataset_train_nonsensitive = (
dataset_train_nonsensitive.batch(BATCH_SIZE))

Once we have prepared our three datasets, we merge them into one MinDiff dataset using a util function provided in the library.

min_diff_dataset = md.keras.utils.pack_min_diff_data(
dataset_train_main,
dataset_train_sensitive,
dataset_train_nonsensitive)

To train with MinDiff, simply take the original model and wrap it in a MinDiffModel with a corresponding `loss` and `loss_weight`. We are using 1.5 as the default `loss_weight`, but this is a parameter that needs to be tuned for your use case, since it depends on your model and product requirements. You should experiment with changing the value to see how it impacts the model, noting that increasing it pushes the performance of the minority and majority groups closer together but may come with more pronounced tradeoffs.

As specified above, we create the original model, and wrap it in a MinDiffModel. We pass in one of the MinDiff losses and use a moderately high weight of 1.5.

original_model = ...  # Same structure as used for baseline model. 

min_diff_loss = md.losses.MMDLoss()
min_diff_weight = 1.5
min_diff_model = md.keras.MinDiffModel(
original_model, min_diff_loss, min_diff_weight)

After wrapping the original model, we compile the model as usual. This means using the same loss as for the baseline model:

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
loss = tf.keras.losses.BinaryCrossentropy()
min_diff_model.compile(
optimizer=optimizer, loss=loss, metrics=['accuracy'])

We fit the model to train on the MinDiff dataset, and save the original model to evaluate (see API documentation for details on why we don’t save the MinDiff model).

min_diff_model.fit(min_diff_dataset, epochs=20)

min_diff_model.save_original_model(
min_diff_model_location, save_format='tf')

Finally, we evaluate the new results.

min_diff_eval_subdir = 'eval_results_min_diff'
min_diff_eval_result = util.get_eval_results(
min_diff_model_location, base_dir, min_diff_eval_subdir,
validate_tfrecord_file, slice_selection='religion')

To ensure we evaluate a new model correctly, we need to select a threshold the same way that we would the baseline model. In a production setting, this would mean ensuring that evaluation metrics meet launch standards. In our case, we will pick the threshold that results in a similar overall FPR to the baseline model. This threshold may be different from the one you selected for the baseline model. Try selecting false positive rate with threshold 0.400. (Note that the subgroups with very low quantity examples have very wide confidence range intervals and don’t have predictable results.)

 widget_view.render_fairness_indicator(min_diff_eval_result) 
TensorFlow Image

Note: The scale of the y-axis has changed from .04 in the graph for the baseline model to .02 for our MinDiff model

Reviewing these results, you may notice that the FPRs for our target groups have improved. The gap between our lowest performing group and the majority group has improved from .024 to .006. Given the improvements we’ve observed and the continued strong performance for the majority group, we’ve satisfied both of our goals. Depending on the product, further improvements may be necessary, but this approach has gotten our model one step closer to performing equitably for all users.

MinDiff Chart

To get a better sense of scale, we superimposed the MinDiff model on top of the base model.

You can get started with MinDiff by visiting the MinDiff page on tensorflow.org. More information about the research behind MinDiff is available in our post on the Google AI Blog. You can also learn more about evaluating for fairness in this guide.

Acknowledgements

The MinDiff framework was developed in collaboration with Thomas Greenspan, Summer Misherghi, Sean O’Keefe‎, Christina Greer, Catherina Xu‎, Manasi Joshi, Dan Nanas, Nick Blumm, Jilin Chen, Zhe Zhao, James Chen, Maciej Kula, Lichan Hong, Mahesh Sathiamoorthy. This research effort on ML Fairness in classification was jointly led by (in alphabetical order) Alex Beutel, Ed H. Chi, Flavien Prost, Hai Qian, Jilin Chen, Shuo Chen, and Tulsee Doshi. Further, this work was pursued in collaboration with Christine Luu, Jonathan Bischof, Pierre Kreitmann, and Qiuwen Chen.

Read More

Characterizing quantum advantage in machine learning by understanding the power of data

Characterizing quantum advantage in machine learning by understanding the power of data

Posted by Hsin-Yuan Huang, Google/Caltech, Michael Broughton, Google, Jarrod R. McClean, Google, Masoud Mohseni, Google.

Data drives machine learning. Large scale research and production ML both depend on high volume and high quality sources of data where it is often the case that more is better. The use of training data has enabled machine learning algorithms to achieve better performance than traditional algorithms at recognizing photos, understanding human languages, or even tasks that have been championed as premier applications for quantum computers such as predicting properties of novel molecules for drug discovery and the design of better catalysts, batteries, or OLEDs. Without data, these machine learning algorithms would not be any more useful than traditional algorithms.

While existing machine learning models run on classical computers, quantum computers provide the potential to design quantum machine learning algorithms that achieve even better performance, especially for modelling quantum-mechanical systems like molecules, catalysts, or high-temperature superconductors. Because the quantum world allows the superposition of exponentially many states to evolve and interfere at the same time while classical computers struggle at such tasks, one would expect quantum computers to have an advantage in machine learning problems that have a quantum origin. The hope is that such quantum advantage (the advantage of using quantum computers instead of classical computers) extends to machine learning problems in the classical domain, like computer vision or natural language processing.

TensorFlow image

Learning from data obtained in nature (e.g., physical experiments) can enable classical computers to solve some problems that are hard for classical computers without the data. However, an even larger class of problems can be solved using quantum computers. If we assume nature can be simulated using quantum computers, then quantum computers with data obtained in nature will not have further computational power.

To understand the advantage of quantum computers in machine learning problems, the following big questions come to mind:

  1. If data comes from a quantum origin, such as from experiments developing new materials, are quantum models always better at predicting than classical models?
  2. Can we evaluate when there is a potential for a quantum computer to be helpful for modelling a given data set, from either a classical or quantum origin?
  3. Is it possible to find or construct a data set where quantum models have an advantage over their classical counterparts?

We addressed these questions and more in a recent paper by developing a mathematical framework to compare classical modelling approaches (neural networks, tree-based models, etc) against quantum modelling approaches in order to understand potential advantages in making more accurate predictions. This framework is applicable both when the data comes from the classical world (MNIST, product reviews, etc) and when the data comes from a quantum experiment (chemical reaction, quantum sensors, etc).

Conventional wisdom might suggest that the use of data coming from quantum experiments that are hard to reproduce classically would imply the potential for a quantum advantage. However, we show that this is not always the case. It is perhaps no surprise to machine learning experts that with enough data, an arbitrary function can be learned. It turns out that this extends to learning functions that have a quantum origin, too. By taking data from physical experiments obtained in nature, such as experiments for exploring new catalysts, superconductors, or pharmaceuticals, classical ML models can achieve some degree of generalization beyond the training data. This allows classical algorithms with data to solve problems that would be hard to solve using classical algorithms without access to the data (rigorous justification of this claim is given in our paper).

We provide a method to quantitatively determine the amount of samples required to make accurate predictions in datasets coming from a quantum origin. Perhaps surprisingly, sometimes there is no great difference between the number of samples needed by classical and quantum models. This method also provides a constructive approach to generate datasets that are hard to learn with some classical models.

TensorFlow study graphic

An empirical demonstration of prediction advantage using quantum models compared to the best among a list of common classical models under different amounts of training data N. The projected quantum kernel we introduce has a significant advantage over the best tested Classical ML.

We go on to use this framework to engineer datasets for which all attempted traditional machine learning methods fail, but quantum methods do not. An example of one such data set is presented in the above figure. These trends are examined empirically in the largest gate-based quantum machine learning simulations to date, made possible with TensorFlow Quantum, which is an open source library for quantum machine learning. In order to carry out this work at large quantum system sizes a considerable amount of computing power was needed. The combination of quantum circuit simulation (~300 TeraFLOP/s) and analysis code (~800 TeraFLOP/s) written using TensorFlow and TensorFlow Quantum was able to easily reach throughputs as high as 1.1 PetaFLOP/s, a scale that is rarely seen in the crossover field of quantum computing and machine learning (Though it is nothing new for classical ML which has already hit the exaflop scale).

When preparing distributed workloads for quantum machine learning research in TensorFlow Quantum, the setup and infrastructure should feel very familiar to regular TensorFlow. One way to handle distribution in TensorFlow is with the tf.distribute module, which allows users to configure device placement as well as machine configuration in multi-machine workloads. In this work tf.distribute was used to distribute workloads across ~30 Google cloud machines (some containing GPUs) managed by Kubernetes. The major stages of development were:

  1. Develop a functioning single-node prototype using TensorFlow and TensorFlow Quantum.
  2. Incorporate minimal code changes to use MultiWorkerMirroredStrategy in a manually configured two-node environment.
  3. Create a docker image with the functioning code from step 2. and upload it to the Google container registry .
  4. Using the Google Kubernetes engine, launch a job following along with the ecosystem templates provided here .

TensorFlow Quantum has a tutorial showcasing a variation of step 1. where you can create a dataset (in simulation) that is out of reach for classical neural networks.

It is our hope that the framework and tools outlined here help to enable the TensorFlow community to explore precursors of datasets that require quantum computers to make accurate predictions. If you want to get started with quantum machine learning, check out the beginner tutorials on the TensorFlow Quantum website. If you are a quantum machine learning researcher, you can read our paper addressing the power of data in quantum machine learning for more detailed information. We are excited to see what other kinds of large scale QML experiments can be carried out by the community using TensorFlow Quantum. Only through expanding the community of researchers will machine learning with quantum computers reach its full potential.

Read More

TensorFlow Community Spotlight program update

TensorFlow Community Spotlight program update

Posted by Marcus Chang, TensorFlow Program Manager

In June we started the TensorFlow Community Spotlight Program to offer the developer community an opportunity to showcase their hard work and passion for ML and AI by submitting their TensorFlow projects for the chance to be featured and recognized on Twitter with the hashtag #TFCommunitySpotlight.

GIF of posture tracking tool in use
Olesya Chernyavskaya, a Community Spotlight winner, created a tool in TensorFlow to track posture and blur the screen if a person is sitting poorly.

Now a little over four months in, we’ve received many great submissions and it’s been amazing to see all of the creative uses of TensorFlow across Python, JavaScript, Android, iOS, and many other areas of TensorFlow.

We’d like to learn about your projects, too. You can share them with us using this form. Here are our previous Spotlight winners:

Pranav Natekar

Pranav used TensorFlow to create a tool that identifies patterns in Indian Classical music to help students learn the Tabla. Pranav’s GitHub → http://goo.gle/2Z5f7Op

Chart of Indian Classical music patterns


Olesya Chernyavskaya

Working from home and trying to improve your posture? Olesya created a tool in TensorFlow to track posture and blur the screen if a person is sitting poorly. Olesya’s GitHub → https://goo.gle/2CAHvz9

GIF of posture tracking tool in use

Javier Gamazo Tejero

Javier used TensorFlow to capture movement with a webcam and transfer it to Google Street View to give a virtual experience of walking through different cities. Javier’s GitHub → https://goo.gle/3hgkmBc

GIF of virtual walking through cities

Hugo Zanini

Hugo used TensorFlow.js to create real-time semantic segmentation in a browser. Hugo’s GitHub → https://goo.gle/310RDKc

GIF of real-time semantic segmentation

Yana Vasileva

Ambianic.ai created a fall detection surveillance system in which user data is never sent to any 3rd party cloud servers. Yana’s GitHub → goo.gle/2XvYY3q

GIF of fall detection surveillance system

Samarth Gulati and Praveen Sinha

These developers had artists upload an image as a texture for TensorFlow facemesh 3D model and used CSS blend modes to give an illusion of face paint on the user’s face. Samarth’s GitHub → https://goo.gle/2Qe3Gyx

GIF of facemesh 3D model

Laetitia Hebert

Laetitia created a model system for understanding genes, neurons and behavior of the Roundworm, as it naturally moves through a variety of complex postures. Laetitia’s GitHub → https://goo.gle/2ZgLZ6L

GIF of Worm Pose

Henry Ruiz
Rigging.js is a react.js application that utilizes the facemesh Tensorflow.js model. Using a camera, it maps the movements of a person into a 3D model. Henry’s GitHub → https://goo.gle/3iCXBZj

GIF of Rigging JS

DeepPavlov.ai

The DeepPavlov AI library solves numerous NLP and NLU problems in a short amount of time using pre-trained models or training your own variations of the models. DeepPavlov’s GitHub → https://goo.gle/3jl967S

DeepPavlov AI library

Mayank Thakur

Mayank created a special hand gesture feature to go with the traditional face recognition lock systems on mobile phones that will help increase security. Mayank’s GitHub → https://goo.gle/3j7evyN

GIF of hand gesture software

Firiuza Shigapova

Using TensorFlow 2.x, Firiuza built a library for Graph Neural Networks containing GraphSage and GAT models for node and graph classification problems. Firiuza’s GitHub → https://goo.gle/3kFcmvz

GIF of Graph Neural Networks

Thank you for all the submissions thus far. Congrats to the winners, and we look forward to growing this community of Community Spotlight recipients so be sure to submit your projects here.

More information

Read More

New Coral APIs and tools for AI at the edge

New Coral APIs and tools for AI at the edge

Posted by Carlos Mendonça, Coral

Coral Fall 2020 image

Fall has finally arrived and with it a new release of Coral’s C++ and Python APIs and tools, along with new models optimized for the Edge TPU and further support for TensorFlow 2.0-based workflows.

Coral is a complete toolkit to build products with local AI. Our on-device inferencing capabilities allow you to build products that are efficient, private, fast and offline with the help of TensorFlow Lite and the Edge TPU.

From the beginning, we’ve provided APIs in Python and C++ that enable developers to take advantage of the Edge TPU’s local inference speed. Offline processing for machine learning models allows for considerable savings on bandwidth and cloud compute costs, it keeps data local, and it preserves user privacy. More recently, we’ve been hard at work to refactor our APIs and make them more modular, reusable and performant, while at the same time eliminating unnecessary API abstractions and surfacing more of the native TensorFlow Lite APIs that developers are familiar with.

So in our latest release, we’re now offering two separate reusable libraries, each built upon the powerful TensorFlow Lite APIs and each isolated in their own repositories: libcoral for C++ and PyCoral for Python.

libcoral (C++)

Unlike some of our previous APIs, libcoral doesn’t hide tflite::Interpreter. Instead, we’re making this native TensorFlow Lite class a first-class component and offering some additional helper APIs that simplify some of your code when working with common models such as classification and detection.

With our new libcoral library, developers should typically follow the pattern below to perform an inference in C++:

  1. Create tflite::Interpreter instance with the Edge TPU context and allocate memory.

    To simplify this step, libcoral provides the MakeEdgeTpuInterpreter() function:

     
    // Load the model
    auto model = coral::LoadModelOrDie(absl::GetFlag(FLAGS_model_path));

    // Get the Edge TPU context
    auto tpu_context = coral::ContainsEdgeTpuCustomOp(*model) ?
    coral::GetEdgeTpuContextOrDie() :
    nullptr;

    // Get the interpreter
    auto interpreter = coral::MakeEdgeTpuInterpreterOrDie(
    *model,
    tpu_context.get());
  2. Configure the interpreter’s input.
  3. Invoke the interpreter:

  4. interpreter->Invoke();

    As an alternative to Invoke(), you can achieve higher performance with the InvokeWithMemBuffer() and InvokeWithDmaBuffer() functions, which enable processing the input data without copying from another region of memory or from a DMA file descriptor, respectively.

  5. Process the interpreter’s output.

To simplify this step, libcoral provides some adapters, requiring less code from you:


auto result = coral::GetClassificationResults(
*interpreter,
/* threshold= */0.0f,
/*top_k=*/3);

The above is an example of the classification adapter, where developers can specify the minimum confidence threshold, as well as the maximum number of results to return. The API also features a detection adapter with its own result filtering parameters.

For a full view of the example application source code, see classify_image.cc on GitHub and for instructions on how to integrate libcoral into your application, refer to README.md on GitHub.

This new release also brings updates to on-device retraining with the decoupling of imprinting functions from inference on the updated ImprintingEngine. The new design makes the imprinting engine work with the tflite::Interpreter directly.

To easily address the Edge TPUs available on the host, libcoral supports labels such as "usb:0" or "pci:1“. This should make it easier to manage resources on multi-Edge TPU systems.

Finally, we’ve made a number of performance improvements such as more efficient memory usage and memory-based instead of file-based abstractions. Also, the design of the API is more consistent by leveraging the Abseil library for error propagation, generic interfaces and other common patterns, which should provide a more consistent and stable developer experience.

PyCoral (Python)

The new PyCoral library (provided in a new pycoral Python module) follows some of the design patterns introduced with libcoral, and brings parity across our C++ and Python APIs. PyCoral implements the same imprinting decoupling design, model adapters for classification and detection, and the same label-based TPU addressing semantics.

On PyCoral, the “run inference” functionality is now entirely delegated to the native TensorFlow Lite library, as we’ve done-away with the model “engines” that abstracted the TensorFlow interpreter. This change allowed us to eliminate the code duplication introduced by the Coral-specific BasicEngine, ClassificationEngine and DetectionEngine classes (those APIs—from the “Edge TPU Python library”—are now deprecated).

To perform an inference with PyCoral, we follow a similar pattern to that of libcoral:

  1. Create an interpreter:

  2. interpreter = edgetpu.make_interpreter(model_file)
    interpreter.allocate_tensors()
  3. Configure the interpreter’s input:

  4. common.set_input(interpreter, image)
  5. Invoke the interpreter:

  6. interpreter.invoke()
  7. Process the interpreter’s output:

  8. classes = classify.get_classes(interpreter, top_k=3)

    For fully detailed example code, check out our documentation for Python.

    Updates to the Coral model garden

    With this release, we’re further expanding the Coral model garden with MobileDet. MobileDets refer to a family of lightweight, single-shot detectors using the TensorFlow Object Detection API that achieve state-of-the-art accuracy-latency tradeoff on Edge TPUs. It is a lower-latency detection model that offers better accuracy, compared to the MobileNet family of models.

    Check out the full collection of models available from Coral for the Edge TPU, including Classification, Detection, Segmentation and models specially prepared for on-device training.

    Migrating our entire workflow and model collection to TensorFlow 2 is an ongoing effort. This release of the Coral machine learning API starts introducing support for TensorFlow 2-based workflows. For now, MobileNet v1 (ImageNet), MobileNet v2 (ImageNet), MobileNet v3 (ImageNet), ResNet50 v1 (ImageNet), and UNet MobileNet v2 (Oxford pets) all support training and conversion with TensorFlow 2.

    Model Pipelining

    Both libcoral and PyCoral have graduated the model pipelining functionality from Beta to General Availability. Model pipelining makes it possible for large models to be partitioned and distributed across multiple Edge TPUs to run them considerably faster.

    Refer to the documentation for examples of the API in C++ and Python.

    The partitioning of models is done with the Edge TPU Compiler, which employs a parameter count algorithm, partitioning the model into segments with similar parameter sizes. For cases where this algorithm doesn’t provide the throughput you need, this release is introducing a new tool that supports a profiling-based algorithm, which divides the segments based on latency observed by actually running the model multiple times, possibly resulting in a more balanced output.

    The new profiling_partition tool can be used as such:


    ./profiling_partition
    --edgetpu_compiler_binary $PATH_TO_COMPILER
    --model_path $PATH_TO_MODEL
    --output_dir $OUT_DIR
    --num_segments $NUM_SEGMENTS

    Learn more

    For more information about the Coral APIs mentioned above, see the following documentation:

Read More

Iris landmark tracking in the browser with MediaPipe and TensorFlow.js

Iris landmark tracking in the browser with MediaPipe and TensorFlow.js

Posted by Ann Yuan and Andrey Vakunov, Software Engineers at Google

Iris tracking enables a wide range of applications, such as hands-free interfaces for assistive technologies and understanding user behavior beyond clicks and gestures. Iris tracking is also a challenging computer vision problem. Eyes appear under variable light conditions, are often occluded by hair, and can be perceived as differently shaped depending on the head’s angle of rotation and the person’s expression. Existing solutions rely heavily on specialized hardware, often requiring a costly headset or a remote eye tracker system. These approaches are ill-suited for mobile devices with limited computing resources.

GIF of eye re-coloring tool in use
An example of eye re-coloring enabled.

In March we announced the release of a new package detecting facial landmarks in the browser. Today, we’re excited to add iris tracking to this package through the TensorFlow.js face landmarks detection model. This work is made possible by the MediaPipe Iris model. We have deprecated the original facemesh model, and future updates will be made to the face landmarks detection model.

Note that iris tracking does not infer the location at which people are looking, nor does it provide any form of identity recognition. In our model’s documentation and the accompanying Model Card, we detail the model’s intended uses, limitations and fairness attributes (aligned with Google’s AI Principles).

The MediaPipe iris model is able to track landmarks for the iris and pupil using a single RGB camera, in real-time, without the need for specialized hardware. The model also returns landmarks for the eyelids and eyebrow regions, enabling detection of slight eye movements such as blinking. Try the model out yourself right now in your browser.

Introducing @tensorflow/face-landmarks-detection

GIF of Facemesh predictions
Above left are predictions from @tensorflow-models/facemesh@0.0.4, above right are predictions from @tensorflow-models/face-landmarks-detection@0.0.1. Iris landmarks are in red.

Users familiar with our existing facemesh model will be able to upgrade to the new faceLandmarksDetection model with only a few code changes, detailed below. faceLandmarksDetection offers three major improvements over facemesh:

  1. Iris keypoints detection
  2. Improved eyelid contour detection
  3. Improved detection for rotated faces

These improvements are highlighted in the GIF above, which demonstrates how the landmarks returned by faceLandmarksDetection and facemesh differ for the same image sequence.

Installation

There are two ways to install the faceLandmarksDetection package:

  1. Through script tags:
  2. <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.6.0/dist/tf.js"></script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/face-landmarks-detection"></script>
  3. Through NPM (via the yarn package manager):
  4. $ yarn add @tensorflow-models/face-landmarks-detection@0.0.1
    $ yarn add @tensorflow/tfjs@2.6.0

Usage

Once the package is installed, you only need to load the model weights and then pass in an image to start detecting facial landmarks:

// If you are using NPM, first require the model. If you are using script tags, you can skip this step because `faceLandmarksDetection` will already be available in the global scope.
const faceLandmarksDetection = require('@tensorflow-models/face-landmarks-detection');

// Load the faceLandmarksDetection model assets.
const model = await faceLandmarksDetection.load(
faceLandmarksDetection.SupportedPackages.mediapipeFacemesh);

// Pass in a video stream to the model to obtain an array of detected faces from the MediaPipe graph.
// For Node users, the `estimateFaces` API also accepts a `tf.Tensor3D`, or an ImageData object.
const video = document.querySelector("video");
const faces = await model.estimateFaces({ input: video });

The input to estimateFaces can be a video, a static image, a `tf.Tensor3D` or even an ImageData object for use in node.js pipelines. FaceLandmarksDetection then returns an array of prediction objects for the faces in the input, which include information about each face (e.g. a confidence score, and the locations of 478 landmarks within the face).

Here is a sample prediction object:

{
faceInViewConfidence: 1,
boundingBox: {
topLeft: [232.28, 145.26], // [x, y]
bottomRight: [449.75, 308.36],
},
mesh: [
[92.07, 119.49, -17.54], // [x, y, z]
[91.97, 102.52, -30.54],
...
],
// x,y,z positions of each facial landmark within the input space.
scaledMesh: [
[322.32, 297.58, -17.54],
[322.18, 263.95, -30.54]
],
// Semantic groupings of x,y,z positions.
annotations: {
silhouette: [
[326.19, 124.72, -3.82],
[351.06, 126.30, -3.00],
...
],
...
}
}

Refer to our README for more details about the API.

Performance

FaceLandmarksDetection is a lightweight package containing only ~3MB of weights, making it ideally suited for real-time inference on a variety of mobile devices. When testing, note that TensorFlow.js also provides several different backends to choose from, including WebGL and WebAssembly (WASM) with XNNPACK for devices with lower-end GPU’s. The table below shows how the package performs across a few different devices and TensorFlow.js backends.:

Desktop:

Chart of desktop performance

Mobile:

All benchmarks were collected in the Chrome browser. See our earlier blogpost for details on how to activate SIMD for the TF.js WebAssembly backend.

Looking ahead

Both the TensorFlow.js and MediaPipe teams plan to add depth estimation capabilities to our face landmark detection solutions using the improved iris coordinates. We strongly believe in sharing code that enables reproducible research and rapid experimentation, and are looking forward to seeing how the wider community makes use of the MediaPipe iris model.

Try the demo!

Use this link to try our new package in your web browser. We look forward to seeing how you use it in your apps.

More information

Read More

How CEVA uses TensorFlow Lite for Always-On Speech Recognition on the Edge

How CEVA uses TensorFlow Lite for Always-On Speech Recognition on the Edge

A guest article by Ido Gus of CEVA

CEVA is a leading licensor of wireless connectivity and smart sensing technologies. Our products help OEMs design power-efficient, intelligent and connected devices for a range of end markets, including mobile, consumer, automotive, robotics, industrial and IoT.

In this article, we’ll describe how we used TensorFlow Lite for Microcontrollers (TFLM) to deploy a speech recognition engine and frontend, called WhisPro, on a bare-metal development board based on our CEVA-BX DSP core. WhisPro detects always-on wake words and speech commands efficiently, on-device.

Figure 1 CEVA Multi-microphone DSP Development Board

About WhisPro

WhisPro is a speech recognition engine and frontend targeted to run on low power, resource constrained edge devices. It is designed to handle the entire data flow from processing audio samples to detection.

WhisPro supports two use cases for edge devices:

  • Always-on wake word detection engine. In this use case, WhisPro’s role is to wake a device in sleep mode when a predefined phrase is detected.
  • Speech commands. In this use case, WhisPro’s role is to enable a voice-based interface. Users can control the device using their voice. Typical commands can be: volume up, volume down, play, stop, etc.

WhisPro enables voice interface on any SoC that has a CEVA BX DSP core integrated into it, lowering entry barriers to OEMs and ODM interested in joining the voice interface revolution.

Our Motivation

Originally, WhisPro was implemented using an in-house neural network library called CEVA NN Lib. Although that implementation achieved excellent performance, the development process was quite involved. We realized that, if we ported the TFLM runtime library and optimized it for our target hardware, the entire model porting process would become transparent and more reliable (far fewer lines of code would need to be written, modified, and maintained).

Building TFLM for CEVA-BX DSP Family

The first thing we had to do is to figure out how to port TFLM to our own platform. We found that following this porting to a new platform guide to be quite useful.
Following the guide, we:

  • Verified DebugLog() implementation is supported by our platform.
  • Created a TFLM runtime library project in CEVA’s Eclipse-based IDE:
    • Created a new CEVA-BX project in CEVA’s IDE
    • Added all the required source files to the project
  • Built the TFLM runtime library for the CEVA-BX core.
    This required the usual fiddling with compiler flags, including paths (not all required files are under the “micro” directory), linker script, and so on.

Model Porting Process

Our starting point is a Keras implementation of our model. Let’s look at the steps we took to deploy our model on our bare-metal target hardware:

Converted theTensorFlow model to TensorFlow Lite using the TF built-in converter:

$ python3 -m tensorflow_docs.tools.nbfmt [options] notebook.ipynb


```
converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)
converter.experimental_new_converter = True
tflite_model = converter.convert()
open("converted_to_tflite_model.tflite", "wb").write(tflite_model)
```

Used quantization:

$ python3 -m tensorflow_docs.tools.nbfmt [options] notebook.ipynb



```
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.representative_dataset = representative_data_gen
```

Converted the TensorFlow Lite model to TFLM using xxd:

$ python3 -m tensorflow_docs.tools.nbfmt [options] notebook.ipynb


```
$> xxd –I model.tflite > model.cc
```

Here we found that some of the model layers (for example, GRU) were not properly supported (at the time) by TFLM. It is very reasonable to assume that, as TFLM continues to mature and Google and the TFLM community invest more in it, issues like this will become rarer.
In our case, though, we opted to re-implement the GRU layers in terms of Fully Connected layers, which was surprisingly easy.

Integration

The next step was to integrate the TFLM runtime library and the converted model into our existing embedded C frontend, which handles audio preprocessing and feature extraction.

Even though our frontend was not written with TFLM in mind, it was modular enough to allow easy integration by implementation of a single simple wrapper function, as follows:

  1. Linked the TFLM runtime library into our embedded C application (WhisPro frontend)
  2. Implemented a wrapper-over-setup function for mapping the model into a usable data structure, allocating the interpreter and tensors
  3. Implemented a wrapper-over-execute function for mapping data passed from the WhisPro frontend into tflite tensors used by the actual execute function
  4. Replaced the call to the original model execute function with a call to the TFLM implementation

Process Visualization

The process we described is performed by two components:

  • The microcontroller supplier, in this case, CEVA – is responsible for optimizing TFLM for its hardware architecture.
  • The microcontroller user, in this case, CEVA WhisPro developer – is responsible for deploying a neural network based model, using an optimized TFLM runtime library, on the target microcontroller.

What’s Next

This work has proven the importance of the TFLM platform to us, and the significant value supporting TFLM can add to our customers and partners by enabling easy neural network model deployment on edge devices. We are committed to further support TFLM on the CEVA-BX DSP family by:

  • Active contribution to the TFLM project, with the goal of improving layer coverage and overall platform maturity.
  • Investing in TFLM operator optimization for execution on CEVA-BX cores, aiming for full coverage.

Final Thoughts

While the porting process had some bumps along the way, at the end it was a great success, and took about 4-5 days’ worth of work. Implementing a model in C from scratch, and handcrafting model conversion scripts from Python to C, could take 2-3 weeks (and lots of debugging).

CEVA Technology Virtual Seminar

To learn more, you are welcome to watch CEVA’s virtual seminar – Wireless Audio session, covering TFLM, amongst other topics.

Read More

How NetEase Yanxuan uses TensorFlow for customer service chat bots

How NetEase Yanxuan uses TensorFlow for customer service chat bots

Posted by Liu Huiyun, a senior algorithm engineer at NetEase

With the development of natural language processing (NLP) technology, Intelligent customer service has become an important use case in the e-commerce field. In recent years, this use case has received more and more attention. This is because, in the purchasing process, users need to be transferred to a customer services system for consultation and support if they encounter any problems or have questions. If the customer service system is able to provide accurate and effective responses, this will directly improve the user experience and have a positive impact on purchase conversion. For example:

  • In pre-sales scenarios, users may ask for more detailed information about the products or promotional activities that they are interested in.
  • In post-sales scenarios, users often have questions about returning and exchanging products, shipping fees, and logistics issues.

During actual business operations, NetEase Yanxuan, a large eCommerce platform in China, produces and accumulates large volumes of information, such as product attributes, activity operations, aftersales policies. In the meantime, the corresponding business logic is complicated. Intelligent customer service is an intelligent dialog system that leverages this information to automatically answer user questions or help human customer service representatives do so.

However, the e-commerce field involves many detailed and complicated business aspects, and users may ask their questions in many different ways and in a colloquial manner. These features require Intelligent customer service systems to possess strong semantic understanding. To this end, we have combined general customer scenarios with Yanxuan’s businesses and designed a deep learning based system. Check Yanxuan Intelligent customer service Framework full picture

in Figure 1 and Figure 2.

  • As a user inputs a question, the input text and its contextual information are first sent to the intent recognition (IR) module.
  • The intent recognition module analyzes the user’s multi-layered intents and then distributes them to different sub-modules.
  • The sub-modules are responsible for more targeted business Q&A, and different sub-modules apply different technical solutions.

As you can see, deep learning algorithms are applied to different modules in the framework. Because of the advanced NLP algorithms, we can extract more general and multi-granular semantic information from the user’s utterance.

Figure 3 shows the Xiaoxuan bot answering questions in a real dialog scenario. Next, I will introduce the different sub-modules that apply deep learning technology.

Xiaoxuan bot answering questions
Figure 3. Online Conversation Example

Intent Recognition Module — Multilayer Classification Model

As the user inputs text, we use a multilayer classification intent recognition model built with TensorFlow to analyze the input text, its context, and the historical behavior of the user. We divide first-level intents into four main categories: pre-sales product questions, aftersales questions, casual chatting, and the rest. When users ask common policy-related a ftersales questions, the input is summarized into more detailed sub-level intents. Click here (Figure 4) to check the structure of the intent recognition process.

In essence, intent recognition can be viewed as a classification problem. When building a classification system, we use the Attention+BiLSTM (ABL) model structure as a preliminary baseline. Except for the raw input text, we further design more features fed to the deep model, such as n-grams and positional encoding in the Transformer model. Ultimately, more manually crafted features improves the model accuracy by three percentage points. In addition, we also use a fine-tuned BERT model to train a classification model with less labeled data, and it performs as good as an ABL model. Pretrained models have better generalization, and can learn more semantic information based on fewer labeled data. However, this approach requires more computing resources.

FAQ Module — Text Matching Model

Answering FAQs is a key function of Intelligent customer service systems. This module is composed of two components, recall and re-rank.

  • The recall stage adopts discrete searches at the word granularity as well as semantic searches based on dense sentence vectors.
  • The re-rank stage uses a text matching model built with TensorFlow to re-rank the recalled candidature Q&A pairs.
  • Then, filter by the final mixed strategies, the module returns the final answer.

In the automatic Q&A field, text matching algorithms are commonly applied to sentence similarity task and natural language inference task. From the most basic Siamese-LSTM networks, the structure of matching modules has evolved through InferNet, Decomposable Attention, ESIM, and finally to BERT models. Generally speaking, matching algorithms can be categorized into two kinds, one is representation-based and the other is interaction-based. Representation methods are focused on the encoding of single sentences, regardless of the interactive semantics between sentences which is used in interaction methods.

At the service layer, we adopt a variety of question matching solutions:

  1. Perform association matching between input question Q and answer A.
  2. Perform similarity matching between input question Q₁ and standard question Q₂.
  3. Perform similar question matching between input question Q and standard question Qs.

These three methods perform question relevance recall and Q&A association matching in different ways. In the match and rank stages, we can use flexible weighted discrimination.

We built a Siamese-LSTM model to use as our baseline model and then implemented the following model iteration solutions:

  • We converted the LSTM units into the encoders of the Transformer model and replaced the cosine distance characterization module with the sentence-pair vector feature: to connect to the MLP layer.
  • We integrated an ESIM model with ELMo features.
  • We fine tuned the BERT model.

Tests showed that these optimizations improved these models. For example, the encoders of the Transformer model showed better accuracy in tasks (1) and (3), increasing performance by nearly 5 percentage points.

In addition, we found that, without any additional feature construction or techniques, BERT could provide stable and outstanding matching performance. This is because, in the pretraining stage, BERT aims to predict whether a contextual relationship exists between two sentences, so it can learn the relationships between sentences. In addition, the self-attention mechanism is adept at capturing deep semantics and can obtain fine-grained matching results for a word in sentence A and any word in sentence B. This is crucial for text matching tasks.

KBQA Module — NER Module

In the product knowledge-base Q&A (KBQA) and shopping guide modules, we built a named-entity recognition (NER) model for the e-commerce field based on TensorFlow. The model can recognize product names, product attribute names, product attribute values, and other key product information in the questions asked by users, as shown in Figure 5. Then, entity names are sent to downstream modules, where Q&A knowledge graph techniques are used to generate a final answer.

Figure 5. E-commerce NER Example

Generally, NER algorithm models use a bidirectional LSTM with a Conditional Random Field (CRF) layer. The former captures the before and after features, understands the context, and fully extracts contextual information. The latter focuses on the probabilistic transfer constructed from the local and global features of the current dialogue text, effectively mining the semantic information of the text. Yanxuan uses a BiLSTM-CRF model as a word-granularity baseline model, which serves the Intelligent customer service system. In later experiments, we tested feature extraction and fine-tuned BERT models.

In bert-based model optimization, we tried to use bert to extract sentence vector features and incorporate them into bilstm and crf, as well as two methods of bert-based fine-tuning: the last layer of embedding prediction, and the embedding method of weighted hidden layers. On the test set, the feature fusion performed best, with F1 as high as 0.92, followed by the multi-hidden layer fusion method (0.90), and finally the single high-layer method (0.88). In terms of the time efficiency of online inference, feature fusion takes about 100ms, and fine-tuning the model takes about 10ms.

The performance results using Yanxuan’s dataset are shown in Table 1. These results tell us the following:

  • Feature extraction provides better performance than fine tuning. In addition to using BiLSTM for semantic and structure information extraction, by introducing BERT features into a feature extraction model, we obtain a wider range of semantic and structural representations. The performance boost obtained by adding additional parameters, as in feature extraction, is significantly higher than that of normal fine tuning.
  • Multilayer feature fusion provides better performance than high-level features. This is because, for sequence tagging tasks, we need to consider both the semantic representation and the fusion of other granular representations of the sentence, such as syntactic structure information.
  • In terms of response time, feature extraction, which adds additional parameters, is well-suited to offline systems, but cannot meet the needs of online systems. Fine-tuned models, however, can meet the timeliness requirements of online systems.

Casual Chat Module — Generative Model

A standalone customer service bot must be able to answer difficult questions from users. At the same time, it must also have the ability to chat casually so as to demonstrate both its humanity and intelligence.

To give our bot this capability, we built a casual chat module capable of handling routine chatting. This module includes two key models: retrieval-based QA and generative QA.

  • The retrieval-based QA model first recalls answers from a prepared corpus and then uses a text matching model to re-rank the answers.
  • The generative QA model uses the Transformer generative model trained using TensorFlow’s tensor2tensor to generate responses in an end-to-end (E2E) manner.

However, a purely E2E approach to response generation is difficult to control. Therefore, we decided to fuse the two models in our online system to ensure more reliable responses.

Model Deployment

Figure 6 shows an online service flow based on the BERT model. Thanks to the open-source TensorFlow versions of language models such as BERT, only a small number of labeled samples need to be used to build various text models that feature high accuracy. Then, we can use GPUs to accelerate computation in order to meet the QPS requirements of online services. Finally, we can quickly deploy and launch the model based on TensorFlow Serving (TFS). Therefore, it is the support provided by TensorFlow that allows us to deploy and iterate online services in a stable and efficient manner.

Figure 6. BERT-based Online Service Flow

Conclusion

As deep learning technology continues to develop, new models will make new breakthroughs in the NLP field. By continuing to apply academic advances in the industry, we can achieve outstanding business results. However, this would not be possible without the work of TensorFlow. In Yanxuan’s business scenarios, TensorFlow provides flexible and refined APIs that enables engineers to deal with agile development and test new models, greatly facilitating algorithm model iteration.

Read More

Neural Structured Learning in TFX

Neural Structured Learning in TFX

Posted by Arjun Gopalan, Software Engineer, Google Research

Edited by Robert Crowe, TensorFlow Developer Advocate, Google Research

Introduction

Neural Structured Learning (NSL) is a framework in TensorFlow that can be used to train neural networks with structured signals. It handles structured input in two ways: (i) as an explicit graph, or (ii) as an implicit graph where neighbors are dynamically generated during model training. NSL with an explicit graph is typically used for Neural Graph Learning while NSL with an implicit graph is typically used for Adversarial Learning. Both of these techniques are implemented as a form of regularization in the NSL framework. As a result, they only affect the training workflow and so, the model serving workflow remains unchanged. In the rest of this post, we will mostly focus on how graph regularization can be implemented using the NSL framework in TFX.

The high-level workflow for building a graph-regularized model using NSL entails the following steps:

  1. Build a graph, if one is not available.
  2. Use the graph and the input example features to augment the training data.
  3. Use the augmented training data to apply graph regularization to a given model.

These steps don’t immediately map onto existing TFX pipeline components. However, TFX supports custom components which allow users to implement custom processing within their TFX pipelines. See this blog post for an introduction to custom components in TFX. So, to create a graph-regularized model in TFX incorporating the above steps, we will make use of additional custom TFX components.

To illustrate an example TFX pipeline with NSL, let’s consider the task of sentiment classification on the IMDB dataset. A colab-based tutorial demonstrating the use of NSL for this task with native TensorFlow is available here, which we will use as the basis for our TFX pipeline example.

Graph Regularization With Custom TFX Components

To build a graph-regularized NSL model in TFX for this task, we will define three custom components using the custom Python functions approach. Here is a TFX pipeline schematic for our example using these custom components. For brevity, we have skipped components that typically come after the Trainer component like the Evaluator, Pusher, etc.

example chart

Figure 1: Example TFX pipeline for text classification using graph regularization

In this figure, only the custom components (in pink) and the Graph-regularized Trainer component have NSL-related logic. It’s worth noting that the custom components shown here are only illustrative and it may be possible to build a functionally equivalent pipeline in other ways. We now describe each of the custom components in further detail and show code snippets for them.

IdentifyExamples

This custom component assigns a unique ID to each training example that is used to associate each training example with its corresponding neighbors from the graph.

 
@component
def IdentifyExamples(
orig_examples: InputArtifact[Examples],
identified_examples: OutputArtifact[Examples],
id_feature_name: Parameter[str],
component_name: Parameter[str]
) -> None:

# Compute the input and output URIs.
...

# For each input split, update the TF.Examples to include a unique ID.
with beam.Pipeline() as pipeline:
(pipeline
| 'ReadExamples' >> beam.io.ReadFromTFRecord(
os.path.join(input_dir, '*'),
coder=beam.coders.coders.ProtoCoder(tf.train.Example))
| 'AddUniqueId' >> beam.Map(make_example_with_unique_id, id_feature_name)
| 'WriteIdentifiedExamples' >> beam.io.WriteToTFRecord(
file_path_prefix=os.path.join(output_dir, 'data_tfrecord'),
coder=beam.coders.coders.ProtoCoder(tf.train.Example),
file_name_suffix='.gz'))

identified_examples.split_names = orig_examples.split_names
return

The make_example_with_unique_id() function updates a given example to include an additional feature containing a unique ID.

SynthesizeGraph

As mentioned above, in the IMDB dataset, no explicit graph is given as an input. So, we will build one before we can demonstrate graph regularization. For this example, we will use a pre-trained text embedding model to convert raw text in the movie reviews to embeddings, and then use the resulting embeddings to build a graph.

The SynthesizeGraph custom component handles graph building for our example and notice that it defines a new Artifact called SynthesizedGraph, which will be the output of this custom component.

 
"""Custom Artifact type"""
class SynthesizedGraph(tfx.types.artifact.Artifact):
"""Output artifact of the SynthesizeGraph component"""
TYPE_NAME = 'SynthesizedGraphPath'
PROPERTIES = {
'span': standard_artifacts.SPAN_PROPERTY,
'split_names': standard_artifacts.SPLIT_NAMES_PROPERTY,
}

@component
def SynthesizeGraph(
identified_examples: InputArtifact[Examples],
synthesized_graph: OutputArtifact[SynthesizedGraph],
similarity_threshold: Parameter[float],
component_name: Parameter[str]
) -> None:

# Compute the input and output URIs
...

# We build a graph only based on the 'train' split which includes both
# labeled and unlabeled examples.
create_embeddings(train_input_examples_uri, output_graph_uri)
build_graph(output_graph_uri, similarity_threshold)
synthesized_graph.split_names = artifact_utils.encode_split_names(
splits=['train'])
return

The create_embeddings() function involves converting the text in movie reviews to corresponding embeddings using some pre-trained model on TensorFlow Hub. The build_graph() function involves invoking the build_graph() API in NSL.

GraphAugmentation

The purpose of this custom component is to combine the example features (text in the movie reviews) with the graph built from embeddings to produce an augmented training dataset. The resulting training examples will include features from their corresponding neighbors as well.

@component
def GraphAugmentation(
identified_examples: InputArtifact[Examples],
synthesized_graph: InputArtifact[SynthesizedGraph],
augmented_examples: OutputArtifact[Examples],
num_neighbors: Parameter[int],
component_name: Parameter[str]
) -> None:

# Compute the input and output URIs
...

# Separate out the labeled and unlabeled examples from the 'train' split.
train_path, unsup_path = split_train_and_unsup(train_input_uri)

# Augment training data with neighbor features.
nsl.tools.pack_nbrs(
train_path, unsup_path, graph_path, output_path, add_undirected_edges=True,
max_nbrs=num_neighbors
)

# Copy the 'test' examples from input to output without modification.
...

augmented_examples.split_names = identified_examples.split_names
return

The split_train_and_unsup() function involves splitting the input Examples into labeled and unlabeled examples and the pack_nbrs() NSL API creates the augmented training dataset.

Graph-regularized Trainer

Now that all of our custom components are implemented, the remaining NSL-specific addition to the TFX pipeline is in the Trainer component. Below is a simplified view of the graph-regularized Trainer component.

 
 ...

estimator = tf.estimator.Estimator(
model_fn=feed_forward_model_fn, config=run_config, params=HPARAMS)

# Create a graph regularization config.
graph_reg_config = nsl.configs.make_graph_reg_config(
max_neighbors=HPARAMS.num_neighbors,
multiplier=HPARAMS.graph_regularization_multiplier,
distance_type=HPARAMS.distance_type,
sum_over_axis=-1)

# Invoke the Graph Regularization Estimator wrapper to incorporate
# graph-based regularization for training.
graph_nsl_estimator = nsl.estimator.add_graph_regularization(
estimator,
embedding_fn,
optimizer_fn=optimizer_fn,
graph_reg_config=graph_reg_config)

...

As you can see, once a base model has been created (in this case a feed-forward neural network), it’s straightforward to convert it to a graph-regularized model by invoking the NSL wrapper API.

And that’s it! We now have all of the missing pieces that are required to build a graph-regularized NSL model in TFX. A colab-based tutorial that demonstrates this example end-to-end in TFX is available here. Feel free to try it and customize it as you want!

Adversarial Learning

As mentioned in the introduction above, another aspect of Neural Structured Learning is adversarial learning where instead of using explicit neighbors from a graph for regularization, implicit neighbors are created dynamically and adversarially to confuse the model. So, regularizing using adversarial examples is an effective way to improve a model’s robustness. Adversarial learning using NSL can be easily integrated into a TFX pipeline. It does not require any custom components and only the trainer component needs to be updated to invoke the adversarial regularization wrapper API in NSL.

Summary

We have demonstrated how to build a graph-regularized model with NSL in TFX using custom components. It’s certainly possible to build graphs in other ways as well as structure the overall pipeline differently. We hope that this example provides a basis for your own NSL workflows.

Additional Links

For more information on NSL, check out the following resources:

Acknowledgements:

We’d like to thank the Neural Structured Learning and TFX teams at Google as well as Aurélien Geron for their support and contributions.

Read More

How TensorFlow docs uses Jupyter notebooks

How TensorFlow docs uses Jupyter notebooks

Posted by Billy Lamberta, TensorFlow Team

Jupyter notebooks are an important part of our TensorFlow documentation infrastructure. With the JupyterCon 2020 conference underway, the TensorFlow docs team would like to share some tools we use to manage a large collection of Jupyter notebooks as a first-class documentation format published on tensorflow.org.

As the TensorFlow ecosystem has grown, the TensorFlow documentation has grown into a substantial software project in its own right. We publish ~270 notebook guides and tutorials on tensorflow.org—all tested and available in GitHub. We also publish an additional ~400 translated notebooks for many languages—all tested like their English counterpart. The tooling we’ve developed to work with Jupyter notebooks helps us manage all this content.

Graph showing Notebooks published

When we published our first notebook on tensorflow.org over two years ago for the 2018 TensorFlow Developer Summit, the community response was fantastic. Users love that they can immediately jump from webpage documentation to an interactive computing experience in Google Colab. This setup allows you to run—and experiment with—our guides and tutorials right in the browser, without installing any software on your machine. This tensorflow.org integration with Colab made it much easier to get started and changed how we could teach TensorFlow using Jupyter notebooks. Other machine learning projects soon followed. Notebooks can be loaded directly from GitHub into Google Colab with just the URL:

https://colab.research.google.com/github/<repo>/blob/<branch>/<path>/notebook.ipynb

For compute-intensive tasks, Colab provides TPUs and GPUs at no cost. The TensorFlow documentation, such as this quickstart tutorial, has buttons that link to both its notebook source in GitHub and to load in Colab.

Better collaboration

Software documentation is a team effort, and notebooks are an expressive, education-focused format that allows engineers and writers to build up an interactive demonstration. Jupyter notebooks are JSON-formatted files that contain text cells and code cells, typically executed in sequential order from top-to-bottom. They are an excellent way to communicate programming ideas, and, with some discipline, a way to share reproducible results.

On the TensorFlow team, notebooks allow engineers, technical writers, and open source contributors to collaborate on the same document without the tension that exists between a separate code example and its published explanation. We write TensorFlow notebooks so that the documentation is the code—self-contained, easily shared, and tested.

Notebook translations with GitLocalize

Documentation needs to reach everyone around the world—something the TensorFlow team values. The TensorFlow community translation project has grown to 10 languages over the past two years. Translation sprints are a great way to engage with the community on open source documentation projects.

To make TensorFlow documentation accessible to even more developers, we worked with Alconost to add Jupyter notebook support to their GitLocalize translation tool. GitLocalize makes it easy to create translated notebooks and sync documentation updates from the source files. Open source contributors can submit pull requests and provide reviews using the TensorFlow GitLocalize project: gitlocalize.com/tensorflow/docs-l10n.

Jupyter notebook support in GitLocalize not only benefits TensorFlow, but is now available for all open source translation projects that use notebooks with GitHub.

TensorFlow docs notebook tools

Incorporating Jupyter notebooks into our docs infrastructure allows us to run and test all the published guides and tutorials to ensure everything on the site works for a new TensorFlow release—using stable or nightly packages.

Benefits aside, there are challenges with managing Jupyter notebooks as source code. To make pull requests and reviews easier for contributors and project maintainers, we created the TensorFlow docs notebook tools to automate common fixes and communicate issues to contributors with continuous integration (CI) tests. You can install the tensorflow-docs pip package directly from the tensorflow/docs GitHub repository:

$ python3 -m pip install -U git+https://github.com/tensorflow/docs

nbfmt

While the Jupyter notebook format is straightforward, notebook authoring environments are often inconsistent with JSON formatting or embed their own metadata in the file. These unnecessary changes can cause diff churn in pull requests that make content reviews difficult. The solution is to use an auto-formatter that outputs consistent notebook JSON.

nbfmt is a notebook formatter with a preference for the TensorFlow docs notebook style. It formats the JSON and strips unneeded metadata except for some Colab-specific fields used for our integration. To run:

$ python3 -m tensorflow_docs.tools.nbfmt [options] notebook.ipynb

For TensorFlow docs projects, notebooks saved without output cells are executed and tested; notebooks saved with output cells are published as-is. We prefer to remove outputs to test our notebooks, but nbfmt can be used with either format.

The --test flag is available for continuous integration tests. Instead of updating the notebook, it returns an error if the notebook is not formatted. We use this in a CI test for one of our GitHub Actions workflows. And with some further bot integration, formatting patches can be automatically applied to the contributor’s pull request.

nblint

The easiest way to scale reviews is to let the machine do it. Every project has recurring issues that pop up in reviews, and style questions are often best settled with a style guide (TensorFlow likes the Google developer docs style guide). For a large project, the more patterns you can catch and fix automatically, the more time you’ll have available for other goals.

nblint is a notebook linting tool that checks documentation style rules. We use it to catch common style and structural issues in TensorFlow notebooks:

>$ python3 -m tensorflow_docs.tools.nblint [options] notebook.ipynb

Lints are assertions that test specific sections of the notebook. These lints are collected into style modules. nblint tests the google and tensorflow styles by default, and other style modules can be loaded at the command-line. Some styles require arguments that are also passed at the command-line, for example, setting a different repo when linting the TensorFlow translation notebooks:

$ python3 -m tensorflow_docs.tools.nblint 
--styles=tensorflow,tensorflow_docs_l10n
--arg=repo:tensorflow/docs-1l0n
notebook.ipynb

Lint tests can have an associated fix that makes it easy to update notebooks to pass style checks automatically. Use the --fix argument to apply lint fixes that overwrite the notebook, for example:

$ python3 -m tensorflow_docs.tools.nblint --fix 
--arg=repo:tensorflow/docs notebook.ipynb

Learn more

TensorFlow is a big fan of Project Jupyter and Jupyter notebooks. Along with Google Colab, notebooks changed how we teach TensorFlow and scale a large open source documentation project with tested guides, tutorials, and translations. We hope that sharing some of the tools will help other open source projects that want to use notebooks as documentation.

Read a TensorFlow tutorial and then run the notebook in Google Colab. To contribute to the TensorFlow documentation project, submit a pull request or a translation review to our GitLocalize project.

Special thanks to Mark Daoust, Wolff Dobson, Yash Katariya, the TensorFlow docs team, and all TensorFlow docs authors, reviewers, contributors, and supporters.

Read More