Tensorflow – Page 5

A new Kaggle competition to help parents of deaf children learn sign language

February 23, 2023

by TensorFlow Blog Tensorflow

By Sam Sepah, ML Research Program Manager

As a Deaf person who learned sign language from my family at an early age, I consider myself lucky. Every day, 33 babies are born with permanent hearing loss in the U.S. (kdhe.ks.gov, deafchildren.org) The majority of deaf children are born to hearing parents who do not know how to sign, like my own. My parents were determined to provide me with the ability to communicate effectively anywhere, anytime, and anyplace. Because of this rich language environment, today I can achieve my dreams and live the life I want to live.

But most hearing parents do not know how to sign and might not have the resources to learn. Because of this, they may not be able to have a conversation with their deaf child, even at the family dinner table.

Deaf children who grow up in homes where sign language is not used are at risk for language deprivation. Language deprivation is a delay in language development that occurs when sufficient exposure to language, spoken or signed, is not provided in the first few years of a child’s life. Language deprivation is very common in deaf children, but it can happen to hearing children as well. It often leads to a life of challenges with employment, relationships, and the ability to be successful in one’s life goals.

So, what can be done?

You can help reduce the risk of language deprivation for deaf children by joining our new Isolated Sign Language Recognition competition on Kaggle and training an accurate, real-time sign language recognition (TensorFlow Lite) model!

We plan to open source the winning model and add it to the PopSign smartphone game app. PopSign* is a smartphone app that makes learning American Sign Language (ASL) fun and interactive. Players match videos of ASL signs with bubbles containing written English words to pop them.

PopSign is designed to help parents with deaf children learn ASL, but it’s open to anyone who wants to learn sign language vocabulary. The app is a great resource for parents who want to learn sign language and help their children develop language and social skills. By adding a sign language recognizer from this competition, PopSign players will be able to sign the type of bubble they want to shoot, providing the player with the opportunity to form the sign instead of just watching videos of other people signing.

We are grateful to our partners, the National Technical Institute for the Deaf at Rochester Institute of Technology, the Georgia Institute of Technology, and Deaf Professional Arts Network, for developing the PopSign game app, creating the dataset and helping us prepare for this competition. The game, dataset, and model you train will help us improve access to communication for so many families!

Join the competition today and together, we can make a difference for deaf children worldwide.

*PopSign is an app developed by the Georgia Institute of Technology and the National Technical Institute for the Deaf at Rochester Institute of Technology. The app is available in beta on Android and iOS.

TensorFlow Datasets is turning 4!

February 16, 2023

by TensorFlow Blog Tensorflow

Posted by the TensorFlow Datasets team

Datasets landscape has changed a lot since TensorFlow Datasets (TFDS) was introduced about 4 years ago: TFDS made sharing or re-using a dataset significantly easier, and transformed the datasets landscape by inspiring other ML tools, libraries and services.

Loading a dataset went from complicated scripts to:

import tensorflow_datasets as tfds ds = tfds.load('mnist', split='train')  for example in ds:  # example is `{'image': tf.Tensor, 'label': tf.Tensor}`   print(list(example.keys()))   image = example["image"]   label = example["label"]   print(image.shape, label)

Read the documentation for a more extensive introduction.

Over the years, TFDS has grown to become a recognized way to load datasets. To celebrate our last 4.8.2 release, we would like to take some time to reflect on the progress and improvements made over those past years and thank the community for their support.

TFDS is still a library to facilitate download, preparation and loading of datasets for ML pipelines, but it now supports hundreds of datasets and offers the following main features:

A large variety of features with encoding and decoding, ranging from text to images, videos, audio and even RL-specific types (e.g. dataset of datasets).
Large datasets support: TFDS is successfully used within Google to prepare and load large datasets (PBs) using high performance input pipelines.
Dataset collections, to arbitrarily group together a number of existing TFDS datasets, for example used in a benchmark.
Support for all main ML Python frameworks: yes there is “TF” in “TFDS”, but besides TensorFlow, one can use TFDS with Torch, Jax, NumPy, Keras and any other Python ML framework that can consume a tf.data.Dataset or a NumPy Iterator.
Global shuffling at preparation time: It is good practice to shuffle training data, TFDS optionally does a global shuffling at preparation time in case the source of the data wasn’t already shuffled.
Splits and slicing: datasets can specify their splits, and readers can specify which split(s) they want to read, or slices of splits they want to read, eg: test[:10%] to “load the 10 first percent of the test split”.
Versioning and determinism: TFDS datasets and collection are versioned, so it is possible to reproduce experiments reliably. Loading a dataset pinned at a particular version will always return the same set of examples. This works with slicing and global shuffling too, as those are deterministic.
Code-less sharing: TFDS can read TFDS prepared datasets even if the code used to prepare the dataset is not available. This facilitates sharing and versioning datasets.
Community datasets and support for internal datasets within organizations: TFDS allows organizations to manage different corpuses of datasets and make them available to their internal users.
Formats-specific builders: to easily define datasets based on well known formats such as CoNLL.
GCS integration: TFDS works well with GCS.

Thank you to all of our contributors and users!

What’s next?

TFDS is under active development to bring you the best datasets to use as input in your ML pipelines.

Notably, we work on making transformations seamless. Sometimes, a dataset is derived from another dataset by a few transformations (e.g., data augmentation or column renaming). We want those transformations to be as easy to implement as possible. This feature is already available experimentally, don’t hesitate to give feedback on GitHub!

We are also working on making the TensorFlow dependency optional. TFDS is a framework agnostic library that provides datasets and tools to support machine learning research. TFDS does not rely on any specific machine learning framework, and we are working to make the TensorFlow dependency optional.

We have other plans too, smaller ones such as the support of partitioned datasets, and longer-term ones that could durably influence the field. Follow us on GitHub to receive future updates about those upcoming developments!

Updates: TensorFlow Decision Forests is production ready

February 14, 2023

by TensorFlow Blog Tensorflow

Posted by Mathieu Guillame-Bert, Richard Stotz, Luiz GUStavo Martins

Two years ago, we open sourced the experimental version of TensorFlow Decision Forests and Yggdrasil Decision Forests, a pair of libraries to train and use decision forest models such as Random Forests and Gradient Boosted Trees in TensorFlow. Since then, we’ve added a lot of new features and improvements.

Today, we are happy to announce that TensorFlow Decision Forests is production ready. In this post, we are going to show you all the new features that come with it 🙂. Buckle up!

First, what are decision forests?

Decision forests are a type of machine learning model that train fast and work extremely well on tabular datasets. Informally, a decision forest is composed of many small decision trees. Together, they make better predictions thanks to the wisdom of the crowd principle. If you want to learn more, check out our class.

Illustration of a simple decision tree to select an animal based on number of legs (more than or equal to 4; if no = penguin, and/or number of eyes (more than or equal to three; if yes = spider, if no = dog)

If you’re new to TensorFlow Decision Forests, we recommend that you try the beginner tutorial. Here is how easy it is to use TF-DF:

train_df = pd.read_csv("train.csv") train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="species") model = tfdf.keras.GradientBoostedTreesModel() model.fit(train_ds) model.save("my_model")

Following are the main new features introduced to TensorFlow Decision Forests (TF-DF) in the 1.x release.

Easier hyper-parameter tuning

Like all machine learning algorithms, Decision Forests have hyper-parameters. The default values of those parameters give good results, but, if you really want the best possible results for your model, you need to “tune” those parameters.

TF-DF makes it easy to tune parameters. For example, the objective function and the configuration for distribution are selected automatically, and you specify the hyper-parameters you wish to tune as follows:

tuner = tfdf.tuner.RandomSearch(num_trials=50) tuner.choice("min_examples", [2, 5, 7, 10]) tuner.choice("categorical_algorithm", ["CART", "RANDOM"]) tuner.choice("max_depth", [3, 4, 5, 6, 8]) tuner.choice("use_hessian_gain", [True, False]) tuner.choice("shrinkage", [0.02, 0.05, 0.10, 0.15]) tuner.choice("growing_strategy", ["LOCAL"]).choice("max_depth", [3, 4, 5, 6, 8]) tuner.choice("growing_strategy", ["BEST_FIRST_GLOBAL"], merge=True).choice("max_num_nodes", [16, 32, 64, 128, 256]) # ... Add all the parameters to tune  model = tfdf.keras.GradientBoostedTreesModel(verbose=2, tuner=tuner) model.fit(training_dataset

Starting with TF-DF 1.0, you can use the pre-configured hyper-parameter tuning search space. Simply add use_predefined_hps=True to your model constructor and the tuning will be done automatically:

tuner = tfdf.tuner.RandomSearch(num_trials=50, use_predefined_hps=True) # No need to configure each hyper-parameters tuned_model = tfdf.keras.GradientBoostedTreesModel(verbose=2, tuner=tuner) tuned_model.fit(train_ds, verbose=2)

Check the hyper-parameter tuning tutorial for more details. And, if your dataset is large, or if you have a lot of parameters to optimize, you can even use distributed training to tune your hyper-parameters.

Hyper-parameters templates

As mentioned above, to maximize the quality of your model you need to tune the hyper-parameters. However, this operation takes time. If you don’t have the time to tune your hyper-parameters, we have a new solution for you: Hyper-parameter templates.

Hyper-parameter templates are a set of hyper-parameters that have been discovered by testing hundreds of datasets. To use them, you simply need to set the hyperparameter_template argument.

model = tfdf.keras.GradientBoostedTreesModel(hyperparameter_template="benchmark_rank1") model.fit(training_dataset)

In our paper called “Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library”, we show experimentally that the results are almost as good as with manual hyper-parameter tuning.

See the “hyper-parameter templates” sections in the hyper-parameter index for more details.

Serving models on Google Cloud

TensorFlow Decision Forests is now included in the official release of TensorFlow Serving and in Google Cloud’s Vertex AI. Without any special configuration or custom images, you can now run TensorFlow Decision Forests in Google Cloud.

See our examples for TensorFlow Serving.

Distributed training on billions of examples

illustration of ten desktop PCs in two rows of five

Training TF-DF on datasets with less than a million examples is almost instantaneous. On larger datasets however, training takes longer. TF-DF now supports distributed training. If your dataset contains multiple millions or even billions of examples, you can use distributed training on tens or even hundreds of machines.

Here is an example:

cluster_resolver = tf.distribute.cluster_resolver.TFConfigClusterResolver() strategy = tf.distribute.experimental.ParameterServerStrategy(cluster_resolver)  with strategy.scope():     model = tfdf.keras.DistributedGradientBoostedTreesModel(         temp_directory=...,         num_threads=30,     )  model.fit_on_dataset_path(       train_path=os.path.join(dataset_path, "train@60"),       valid_path=os.path.join(dataset_path, "valid@20"),       label_key="my_label",      dataset_format="csv")

See our end-to-end example and documentation for more details and examples.

Training models in Google Sheets

To make it even easier to train decision forests, we created Simple ML for Sheets. Simple ML for Sheets makes it possible to train, evaluate, and interpret TensorFlow Decision Forests models in Google Sheets without any coding!

And once you have trained your model in Google Sheets, you can export it back to TensorFlow Decision Forests and use it like any other models.

Check the Simple ML for Sheets tutorial for more details.

Next steps

We hope you enjoyed reading this news, and that the new version of TensorFlow Decision Forests will be useful for your work.

To learn more about the TensorFlow Decision Forests library, see the following resources:

See tutorials on this page.
Learn more about advanced usages of TensorFlow Decision Forests and Yggdrasil Decision Forests on this page.

And if you have questions, please ask them on the discuss.tensorflow.org using the tag “TFDF” and we’ll do our best to help. Thanks again.

— The TensorFlow Decision Forests team

Extend your TFX pipeline with TFX-Addons

February 7, 2023

by TensorFlow Blog Tensorflow

Posted by Hannes Hapke and Robert Crowe

figuration framework and shared libraries to integrate common components needed to define, launch, and monitor your machine learning system.

What is TFX-Addons?

TFX-Addons is a special interest group (SIG) for TFX users who are extending the standard set of components provided by Google’s TensorFlow team. The addons are implementations by other machine learning companies and developers which rely heavily on TFX for their production machine learning operations.

Common MLOps patterns, for example ingesting data into machine learning pipelines, are solved through TFX components. As an example, members of TFX-Addons developed and open-sourced a TFX component to ingest data from a Feast feature store, a component maintained by machine learning engineers at Twitter and Apple.

How can you use the TFX-Addons components or examples?

The TFX-Addons components and examples are accessible via a simple pip installation. To install the latest version, run the following:

pip install tfx-addons

To ensure you have a compatible version of dependencies for any given project, you can specify the project name as an extra requirement during install:

pip install tfx-addons[feast_examplegen]

To use TFX-Addons:

from tfx import v1 as tfx
import tfx_addons as tfxa

# Then you can easily load projects tfxa.{project_name}. Ex:

tfxa.feast_examplegen.FeastExampleGen(...)

The TFX-Addons components can be used in any TFX pipeline. Most components support all TFX orchestrators including Google Cloud’s Vertex Pipelines, Apache Beam, Apache Airflow, or Kubeflow Pipelines.

Which additional components are currently available?

The list of components, libraries, and examples is constantly growing, with several new projects currently in development. As of this writing, these are the currently available components.

Feast Component

The Example Generator allows you to ingest data samples from a Feast Feature Store.

More information: tfxa.feast_examplegen

Message Exit Handler

This component provides an exit handler for TFX pipelines which notifies the user about the final state of the pipeline (failed or succeeded) via a Slack message. If the pipeline fails, the component will provide the error message. The message component supports a number of message providers (e.g. Slack, stdout, logging providers) and can easily be extended to support Twilio. It also serves as an example of how to write exit handlers for TFX pipelines.

More information: tfxa.message_exit_handler

Schema Curation Component

This component allows its users to update/change the schema produced by the SchemaGen component, and curate it based on domain knowledge. The curated schema can be used to stop pipelines if a feature drift is detected.

More information: tfxa.schema_curation

Feature Selection Component

This component allows users to select features from datasets. This component is useful if you want to select features based on statistical feature selection metrics.

More information: tfxa.feature_selection

XGBoost Evaluator Component

This component extends the standard TFX Evaluator component to support trained XGBoost models, in order to do deep analysis of model performance.

More information: tfxa.xgboost_evaluator

Sampling Component

This component allows users to balance their training datasets by randomly undersampling or oversampling, reducing the data to the lowest- or highest-frequency class.

More information: tfxa.sampling

Pandas Transform Component

This component can be used instead of the standard TFX Transform component, and allows you to work with Pandas dataframes for your feature engineering. Processing is distributed using Beam for scalability.

More information: tfxa.pandas_transform

Firebase Publisher

This project helps users to publish trained models directly from a TFX pipeline to Firebase ML.

More information: tfxa.firebase_publisher

HuggingFace Model Pusher

The HuggingFace Model Pusher (HFModelPusher) pushes a blessed model to the HuggingFace Model Hub. Also, it optionally pushes an application to HuggingFace Space Hub.

More information: tfxa.huggingface_pusher

How can you participate?

The TFX-Addons SIG is all about sharing reusable components and best practices. If you are interested in MLOps, join our bi-weekly conference calls. It doesn’t matter if you are new to TFX or an experienced ML engineer, everyone is welcome and the SIG accepts open source contributions from all participants.

If you want to join our next meeting, sign up to our list group sig-tfx-addons@tensorflow.org.

Other resources:

TFX-Addons Slack – join here
TFX-Addons Repository

Already using TFX-Addons?

If you’re already using TFX-Addons we’d love to hear from you! Use this form to send us your story!

Thanks to all Contributors

Big thanks to all the open-source component contributions from following members:
Badrul Chowdhury, Daniel Kim, Fatimah Adwan, Gerard Casas Saez, Hannes Hapke, Marcus Chang, Kshitijaa Jaglan, Pratishtha Abrol, Robert Crowe, Nirzari Gupta, Thea Lamkin, Wihan Booyse, Michael Hu, Vulko Milev, and all the other contributors! Open-source only happens when people like you contribute!

Get inspired in 2023 with new machine learning solutions for web developers with MediaPipe

February 3, 2023

by TensorFlow Blog Tensorflow

Posted by Jen Person, Senior Developer Relations Engineer

I’m the type of person to say I don’t like to make New Year’s resolutions, but then I still quietly resolve to make some changes anyway. After overindulging over the holidays, I resolve to eat healthier, exercise more, spend more time with friends and family, and prioritize my mental health…but they’re not *New Year’s* resolutions I swear! Because whether you like to make New Year’s resolutions or not, the start of a new year can give you a feeling of inspiration. It’s like a blank slate full of possibilities!

What kind of changes are you resolving to make this year? If you’re looking to create an exciting new web project or take your work to the next level, then I recommend adding machine learning (ML)!

Near year, new solutions

MediaPipe has been a great go-to solution for web developers interested in adding ML to their web applications. In 2022, the MediaPipe hands NPM package had around 70K downloads, the pose package had about 90K downloads, and the selfie segmentation package had over 130K downloads!

This year, MediaPipe has expanded to include MediaPipe Tasks, Model Maker, and Studio! Tasks are aptly named because they can be used to perform common ML tasks like image classification and object detection. Model Maker is a low-code solution for customizing your MediaPipe Tasks to fit your app’s needs. With MediaPipe Studio, you can view interactive demos of MediaPipe Tasks. In the future, you will be able to customize your tasks in MediaPipe Studio without writing any code.

MediaPipe’s solutions are special because they are available across multiple platforms, including Android, web, and Python, but given my background in JavaScript, I want to take this opportunity to shine the spotlight on web.

When compared to server-side ML, web ML has some unique benefits:

Lower latency – Predictions are done right on your users’ devices, so there is no waiting for server calls to complete. This is essential for applications that use a streaming component like the webcam.

User privacy – With predictions taking place on-device, your users’ data never leaves their device.

Click and go – Your users don’t have to download any additional applications or plugins. Just navigate to the desired URL and your ML experience is good to go!

MediaPipe is updating its offerings, including more solutions and opportunities for customization. Check out these new MediaPipe Tasks:

Image Classification – identify what an image represents among a set of categories defined at training time.

Photo of an American Flamingo facing left with ttext 'Flamingo 95%'

Object Detection – detect the presence and location of multiple classes of object.

Image of a dog on the left and a cat on the right with respective Object detection labels 'dog' and 'cat'

Text Classification – classify text into a set of defined categories, such as positive or negative sentiment.

Image showing input text in a white bubble reads 'Great movie with a classic plot. I will recommend this to everyone.'and Output showing five turquoise stars and a white thumbs up against a green background

Gesture Recognition – recognize specific hand gestures from a user, and invoke application features that correspond to those gestures.

Image showing a hand giving thumbs up gesture. Text in a white bubble reads 'Thumbs up 63%'

Hand Landmark Detection – localize key points of the hands and render visual effects over the hands.

Image showing a hand holding an egg. White lines with blue nodes indicate the detection of landmarks in the hand in the image

MediaPipe is adding more exciting solutions in 2023, so keep an eye out for what’s next!

Customize for your needs

Many of these solutions offer customization using MediaPipe Model Maker. The MediaPipe Model Maker package is a simple, low-code solution for customizing on-device ML models, including models for the web. And with MediaPipe Studio, you can prototype and benchmark solutions in-browser!

Resolve to make something great!

By now, a lot of our New Year’s resolutions have already been abandoned. But it’s definitely not too late to make a new one! Why not resolve to build something amazing with MediaPipe solutions for the web?

Create a rock paper scissors game

At the Women in ML Symposium, the MediaPipe team hosted a workshop walking through creating a rock paper scissors game using the MediaPipe solutions Gesture Recognizer task. Learn how to train a custom gesture recognizer by following along with the workshop on YouTube using the corresponding Colab notebook. You can also view a complete version of the game on Codepen.

Categorize your images

When uploading images, run image classification to automatically add relevant tags. Check out the image classification task documentation and the Codepen demo to see how to get started. You can even customize your model to add your own tags to suit your needs.

Cropped Screen grab of Mediapipe Image Classifier for web Codepen demo showing the image of a dog under text which reads Demo: Classify images Click in an image below to see its classification

Run sentiment analysis

Want to get an idea how your users are feeling? Run sentiment analysis on text to classify it as positive or negative. See the documentation and the Codepen demo to find out how it’s done. The best part is that you can also customize your model to classify text in whatever category you need!

Cropped Screen grab of Mediapipe Text Classifier for web Codepen demo showing the image of a dog under text which reads Demo: Classify images Click in an image below to see its classification

[Your idea here]

Let’s face it: you’re much more creative than I am! So when you build something amazing with MediaPipe Solutions, share it with us on the TensorFlow forum, LinkedIn, or Twitter!

TensorFlow Lite Micro with ML acceleration

February 2, 2023

by TensorFlow Blog Tensorflow

Posted by Scott Main, Technical Writer, and the Coral team

In just a few years, ML models for mobile and embedded systems have come a very long way. With TensorFlow Lite (TFLite), you can now run sophisticated models that perform pose estimation and object segmentation, but these models still require a relatively powerful processor and a high-level OS in a mobile device or small computer like a Raspberry Pi. Alternatively, you can use TensorFlow Lite Micro (TFLM) on low-power microcontrollers (MCUs) to run simple models such as image and audio classification. However, the models for MCUs are much smaller, so they have limited capabilities and accuracy.

So there’s an opportunity cost when you must select between TFLM (low power but limited model performance) and regular TFLite (great model performance but higher power cost). Wouldn’t it be nice if you could get both on one board? Well, we’re happy to announce that the Coral Dev Board Micro is now available to provide exactly that.

A tiny board with big muscle

The Dev Board Micro is a microcontroller board (with a dual-core Cortex-M7 and Cortex-M4), so it’s small and power efficient, but it also includes the Coral Edge TPU™ on board, so it offers outstanding inferencing speeds for larger TFLite models. Plus, it has an on-board camera (324×324) and microphone. Naturally, there are plenty of GPIO pins and high-density connectors for add-on boards (such as our own Wireless Add-on and PoE Add-on).

The Dev Board Micro executes your models using TFLM, which supports only a subset of operations in TFLite. Even if TFLM did support all the same ops, the MCU would still be much too slow for practical applications that use complex models such as for object detection and pose estimation. However, when you compile a TFLite model for the Edge TPU, all the MCU needs to do is set the model’s input, delegate the model ops to the Edge TPU, and then read the output.

As such, even though you’re still using the smaller TFLM interpreter, you can run sophisticated TFLite models that otherwise are not compatible with the TFLM interpreter, because they actually execute on the Edge TPU. For example, with the Dev Board Micro, you can run PoseNet for pose estimation, BodyPix for body segmentation, SSD MobileNet for object detection, and much more, at realtime speeds. For example:

Table showing the different models with corresponding inference time on Dev Board Micro with Edge TPU

Of course, running the Edge TPU demands more power, but the beauty of this board’s dual-core MCU is that you can run low-power apps on the M4 (which supports tiny TFLM models) and then activate the M7 and Edge TPU only as needed to run more sophisticated TFLite models.

To better understand how this board compares to our other Coral board, here’s a brief comparison of our different developer boards:

Table comparing the price (USD), size, processor,RAM, camera, microphone, wi-fi/bluetooth, ethernet, and operating system capabilities across Dev Board Micro, Dev Board Mini and Dev Board

Get started

We built a new platform for the Dev Board Micro based on FreeRTOS and included compatibility with the Arduino programming language. So you can build a C++ app with CMake and flash it to the board with our command line tools, or you can write and upload an Arduino sketch with the Arduino IDE. We call this new platform coralmicro and it’s fully open sourced on GitHub.

If you choose to code with FreeRTOS, coralmicro includes all the core FreeRTOS APIs you need to build multi-tasking apps on the MCU, plus custom coralmicro APIs for interacting with GPIOs, capturing photos, listening to audio, performing multi-core processing, and much more.

Because coralmicro uses TensorFlow Lite for Microcontrollers for inferencing, running a TensorFlow Lite model on the Dev Board Micro works almost exactly the way you expect, if you’ve used TensorFlow Lite on other platforms. One difference with TFLM, compared to TFLite, is that you need to specify the ops used by your model by adding them to the MicroMutableOpResolver. For example, if your model uses 2D convolution, then you need to call AddConv2D(). This way, you conserve memory by compiling only the op kernels you actually need to run your model on the MCU. However, if your model is compiled to run on the Edge TPU, then you also need to add the Edge TPU custom op, which accounts for all the ops that run on the Edge TPU. For example, when using SSD MobileNet for object detection on the Edge TPU, only the dequantize and post-processing ops run on the MCU, and the rest are delegated to the Edge TPU custom op, so the code to set up the MicroInterpreter looks like this:

auto tpu_context = coralmicro::EdgeTpuManager::GetSingleton()->OpenDevice();
if (!tpu_context) {
  printf("ERROR: Failed to get EdgeTpu contextrn");
  vTaskSuspend(nullptr);
}

tflite::MicroErrorReporter error_reporter;
tflite::MicroMutableOpResolver<3> resolver;
resolver.AddDequantize();
resolver.AddDetectionPostprocess();
resolver.AddCustom(coralmicro::kCustomOp, coralmicro::RegisterCustomOp());

tflite::MicroInterpreter interpreter(tflite::GetModel(model.data()), resolver,
                                     tensor_arena, kTensorArenaSize,
                                     &error_reporter);

Notice that you also need to turn on the Edge TPU with OpenDevice(). Other than that and AddCustom(), the code to run an inference on the Dev Board Micro is pretty standard TensorFlow code. For more details, see our API reference for TFLM, and check out our code examples for FreeRTOS.

If you prefer to code with the Arduino IDE, we offer Arduino-style APIs for most of the same features available in FreeRTOS (multi-core processing is not available in Arduino). All you need to do is install the “Coral” boards package in the Arduino IDE’s Board Manager, select the Dev Board Micro board, and then you can browse all our examples for the Dev Board Micro in File > Examples.

You can learn more about the board and find a seller here, and start running the code examples by following our get started guide.

Using TensorFlow for Deep Learning on Video Data

January 19, 2023

by TensorFlow Blog Tensorflow

Posted by Shilpa Kancharla

Video data contains a rich amount of information, and has a more complex and large structure than image data. Being able to classify videos in a memory-efficient way using deep learning can help us better understand the contents within the data. On tensorflow.org, we have published a series of tutorials on how to load, preprocess, and classify video data. Here are quick links to each of these tutorials:

In this blog post, we thought it would be interesting to go more in depth about certain parts of some tutorials, and talk about how you can incorporate these parts to build your own models that can process video or three-dimensional data (such as MRI scans) in a memory-efficient manner using TensorFlow, such as leveraging Python generators and resizing, or downsampling, the data.

Diagram showing three dmensional representation of video data showing height, width and number of frames (time)

Example of shape of video data, with the following dimensions:
number of frames (time) x height x width x channels.

FrameGenerator to load video data

From the Load video data tutorial, let’s take the opportunity to talk about the main workhorse of the majority of these tutorials: the FrameGenerator class. Through this class, we are able to yield the tensor representation of the video and the label, or class, of the video.

class FrameGenerator:
def __init__(self, path, n_frames, training = False):
“”” Returns a set of frames with their associated label.

Args:
path: Video file paths.
n_frames: Number of frames.
training: Boolean to determine if training dataset is being created.
“””
self.path = path
self.n_frames = n_frames
self.training = training
self.class_names = sorted(set(p.name for p in self.path.iterdir() if p.is_dir()))
self.class_ids_for_name = dict((name, idx) for idx, name in enumerate(self.class_names))

def get_files_and_class_names(self):
video_paths = list(self.path.glob(‘*/*.avi’))
classes = [p.parent.name for p in video_paths]
return video_paths, classes

def __call__(self):
video_paths, classes = self.get_files_and_class_names()

pairs = list(zip(video_paths, classes))

if self.training:
random.shuffle(pairs)

for path, name in pairs:
video_frames = frames_from_video_file(path, self.n_frames)
label = self.class_ids_for_name[name] # Encode labels
yield video_frames, label

Upon creating the generator class, we use the function from_generator() to feed in the data to our deep learning models. Specifically, the from_generator() API will create a dataset whose contents are generated by a generator. Using Python generators can be more memory-efficient than storing an entire sequence of data in memory. Consider creating a generator class similar to FrameGenerator and using the from_generator() API to load data into your TensorFlow and Keras models.

output_signature = (tf.TensorSpec(shape = (None, None, None, 3),

dtype = tf.float32),
tf.TensorSpec(shape = (),

dtype = tf.int16))

train_ds = tf.data.Dataset.from_generator(FrameGenerator(subset_paths[‘train’],

10,

training=True),

output_signature = output_signature)

einops library for resizing video data

For the second tutorial on Video classification with a 3D convolutional neural network, let’s discuss the use of the einops library and how it can be incorporated into a Keras model backed by TensorFlow. This library is useful to perform flexible tensor operations and can be used with not only TensorFlow, but also JAX. Specifically in this tutorial, we use it to help with resizing the size of the data as it goes through the (2+1)D convolutional neural network we create. In the context of this second tutorial, we wanted to downsample the video data. Downsampling is particularly useful because it allows our model to examine specific parts of frames to detect patterns that may be specific to a certain feature in that video. Through downsampling, non-essential information can be discarded. It will allow for dimensionality reduction and therefore faster processing.

We use the functions parse_shape() and rearrange() from the einops library. The parse_shape() function used here maps the names of the axes to their corresponding lengths. It will return a dictionary containing this information, called old_shape. Next, we use the rearrange() function that allows you to reorder the axes for multidimensional tensors. Pass in the tensor, alongside the names of the axes you are trying to rearrange.

The notation b t h w c -> (b t) h w c here means we want to squeeze together the batch size (denoted by b) and time (denoted by t) dimensions to pass this data into the Keras Resizing layer object. When we instantiate the ResizeVideo class, we pass in the height and width values that we want to resize the frame to. Once this resizing is complete, we use the rearrange() function again to unsqueeze (using the notation (b t) h w c -> b t h w c) the batch size and time dimensions.

class ResizeVideo(keras.layers.Layer):
def __init__(self, height, width):
super().__init__()
self.height = height
self.width = width
self.resizing_layer = layers.Resizing(self.height, self.width)

def call(self, video):
“””
Use the einops library to resize the tensor.

Args:
video: Tensor representation of the video, in the form of a set of frames.

Return:
A downsampled size of the video according to the new height and width it should be resized to.
“””
# b stands for batch size, t stands for time, h stands for height,
# w stands for width, and c stands for the number of channels.
old_shape = einops.parse_shape(video, ‘b t h w c’)
images = einops.rearrange(video, ‘b t h w c -> (b t) h w c’)
images = self.resizing_layer(images)
videos = einops.rearrange(
images, ‘(b t) h w c -> b t h w c’,
t = old_shape[‘t’])
return videos

What’s next?

These are just a few ways you can leverage TensorFlow to work with video data in a memory-efficient manner, but such techniques aren’t just limited to video data. Medical data such as MRI scans or 3D image data also require efficient data loading and potential resizing of the shape of data. These techniques could prove useful when you are working with limited computational resources. We hope you find these tutorials helpful, and thank you for reading!

End-to-End Pipeline for Segmentation with TFX, Google Cloud, and Hugging Face

January 18, 2023

by TensorFlow Blog Tensorflow

Posted by Chansung Park, Sayak Paul (ML and Cloud GDEs)

TensorFlow Extended (TFX) is a flexible framework allowing Machine Learning (ML) practitioners to iterate on production-grade ML workflows faster with reliability and resiliency. TFX’s power lies in its flexibility to run ML pipelines across different compatible orchestrators such as Kubeflow, Apache Airflow, Vertex AI Pipelines, etc., both locally and on the cloud.

In this blog post, we discuss the crucial details of building an end-to-end ML pipeline for Semantic Segmentation tasks with TFX and various Google Cloud services such as Dataflow, Vertex Pipelines, Vertex Training, and Vertex Endpoint. The pipeline also uses a custom TFX component that is integrated with Hugging Face 🤗 Hub – HFPusher. Finally, you will see how we implemented CI/CD into the mix by leveraging GitHub Actions.

Although we won’t go over all the bits of the pipeline, you can still find the code of the underlying project in this GitHub repository.

Architectural Overview

The system architecture of the project is divided into three main parts. The first part is all about the core TFX pipeline handling all the steps from data ingestion to model deployment. The second part concerns the integration between the pipeline and the external Hugging Face 🤗 Hub service. The last one is about automation and implementing CI/CD using GitHub Actions.

Flowchart showing overall system architecture from parametrized GitHub action to continuous deployment to within GCP Environment to external

Figure 1. Overall system architecture (original)

It is common to open Pull Requests when proposing new features or code refactorings in separate branches. When it comes to ML projects, these changes usually affect the model and/or data. Besides running basic validation on the proposed changes (code quality, tests, etc.), we should also ensure that the changes produce a model that is better enough to replace the currently deployed model before merging (if the changes pertain to modeling). In this project, we developed a GitHub Action that is manually triggered on the merging branch with configurable parameters. This way, project stakeholders can validate performance-related changes and reliably ship the changes to production. In reality, there might be more critical measurements here, but we hope this GitHub Action proves to be a good starting point.

At the heart of any MLOps project, there is an ML pipeline. We built a simple yet complete ML pipeline with support for automatic data ingestion, data preprocessing, model training, model evaluation, and model deployment in TFX. The TFX pipeline could be run on a local environment, but we also ran it on the Vertex AI platform to replicate real-world production-grade environments.

Finally, the trained and qualified model from the ML pipeline is deployed to the Vertex AI Endpoint. The “blessed” model is also pushed to the Hugging Face Hub alongside an interactive demo via a custom HFPusher TFX component. Hugging Face Hub is a very popular place to store models and publish a fully working ML-powered interactive application for free. It is useful to showcase an application with the latest model to audit if it works as expected before going on a full production deployment.

Below, we discuss each of these components in a little more detail, discussing our design considerations and non-trivial technical aspects.

TFX Pipeline

The ML pipeline is written entirely in TFX, from data ingestion to model deployment. Specifically, we used standard TFX components such as ExampleGen, ImportSchemaGen, Transform, Trainer, Evaluator, and Pusher, along with the custom HFPusher component. Let’s briefly look at the roles of each component in the context of our project.

Flowchart showing overview of the TFX ML pipeline. Pipeline could be run on Local and Cloud(Vertex Pipeline) environment

Figure 2. Overview of the ML pipeline (original)

ExampleGen

In this project, we have prepared Pets dataset in TFRecord format with these scripts and stored them in Google Cloud Storage(GCS). ExampleGen brings the data files from GCS, splits them into training and evaluation datasets according to glob patterns, and stores them as TFRecords in GCS. Note that ExampleGen could take different data types such as CSV, TFRecord, or Parquet, then it generates datasets in a uniform format in TFRecord. It lets us handle the data uniformly inside the entire TFX pipeline. Note that since the Pets dataset is available from TF Datasets, you could also use a custom TFDS ExampleGen for this task.

ExampleGen can be integrated with Dataflow out of the box. All you need to do to benefit from Dataflow is to call with_beam_pipeline_args method with appropriate parameters such as machine type, disk size, the number of workers, and so on. For context, Dataflow is a managed service provided by Google Cloud that allows us to run Apache Beam pipelines efficiently in a fully distributed manner.

ImportSchemaGen

ImportSchemaGen imports a Protocol Buffer Text Format file that was previously automatically inferred by SchemaGen. It can also be hand-tuned to define the structure of the output data from ExampleGen.

In our case, the prepared Pets dataset has two features – image and segmentation map (label), and the size of each feature is 128×128. Therefore, we could define a schema like the one below.

feature {
name: “image”
type: FLOAT

float_domain {
min: 0
max: 255
}

shape {
dim { size: 128 }
dim { size: 12 }
dim { size: 3 }
}
}

feature {
name: “label”
type: FLOAT

float_domain {
min: 0
max: 2
}

shape {
dim { size: 128 }
dim { size: 128 }
}
}

Also note that in the float_domain section, we can set the value restrictions. In this project, the input data is standard RGB images, so each pixel value should be between 0 and 255. On the other hand, the pixel value of the label should be 0, 1, or 2, meaning outer, inner, and border of an object in an image, respectively.

Transform

With the help of ImportSchemaGen, the data is already shaped correctly in Transform and validated. Without ImportSchemaGen, we would have to write code to parse TFRecords and shape each feature manually inside Transform. Therefore, one line of code below is sufficient for the data preprocessing since the model in this project is built on top of MobileNetV2.

# IMAGE_KEY is “image” which matches the name of feature in the ImportSchemaGen

image_features = mobilenet_v2.preprocess_input(inputs[IMAGE_KEY])

Since data preprocessing is a CPU and memory-intensive job, Transform also can be integrated with Dataflow. Just like in ExampleGen, the job could be seamlessly delegated to Dataflow by calling the with_beam_pipeline_args method.

Trainer

(Vertex) Trainer simply trains a model. We used a UNet architecture built on top of MobileNetV2 from the TensorFlow official tutorial. Since the model architecture is nothing new, let’s take a look at how it is modularized and some of the key pieces of code.

pipeline/

├─ …
├─ models/
├─ common.py
├─ hyperparams.py
├─ signatures.py
├─ train.py
├─ unet.py

You place your modeling code in a separate file, which is supplied as a parameter to the Trainer. In this case, that file is named train.py. When the Trainer component is run, it looks for a starting point function with the name run_fn which is defined in train.py. The run_fn() function basically pulls in the training and evaluation datasets from the output of Transform, trains the UNet model ( defined in unet.py), then saves the trained model with appropriate signatures. The training process simply follows the standard Keras way – model.compile(), model.fit().

The Trainer component can be integrated with Vertex AI Training out of the box, which is a managed service to train models in a distributed system. By specifying how you would want to configure the training server clusters in the custom_config parameter of the Trainer, the training job is handled by Vertex AI Training automatically.

It is also important to notice which signatures the model exports in TensorFlow. Consider the following code snippet that saves a trained model (of the tf.keras.Model instance) into a SavedModel resource.

model.save(
fn_args.serving_model_dir,
save_format=“tf”,
signatures={
“serving_default”: model_exporter(model),
“transform_features”: transform_features_signature(
model, tf_transform_output
),
“from_examples”: tf_examples_serving_signature(

model, tf_transform_output

),
},
)

The signatures are functions that define how to handle given input data. For example, we have defined three different signatures. While serving_default is used during serving time, the other two are used during the model evaluation time.

serving_default transforms a single or a batch of data points from user requests which is usually marshaled in JSON (base64 encoded) for HTTP or serialized Protocol Buffer messages for gRPC, then runs the model prediction on the data.
transform_features applies a transformation graph obtained from the Transform component to the data produced by ExampleGen. This function will be used in the Evaluator component, so the raw evaluation inputs from ExampleGen can be appropriately transformed that the model could understand.
from_examples performs data transformation and model prediction in a sequential manner. How data transformation is done is identical to the process of the transform_features function.

Note that the transform_features and from_examples signatures are used internally in the Evaluator component. In the next section, we explain their connections.

Evaluator

The performance of the trained model should be evaluated by certain criteria or metrics. Evaluator lets us define such metrics that not only evaluates the trained model itself but also compares the trained model to the last best model retrieved by Resolver. In other words, the trained model will be deployed only if it achieves performance above the baseline threshold and it is better than the previously deployed model. The full configurations for this project can be found here.

EVAL_CONFIGS = tfma.EvalConfig(
model_specs=[
tfma.ModelSpec(
signature_name=“from_examples”,
preprocessing_function_names=[“transform_features”],
)
],
…
)

The reason that we had transform_features and from_examples signatures that are doing the same data preprocessing is that they are used in different situations. Evaluator runs the evaluate() method on an existing model while it runs a function (signature) specified in the signature_name on the currently trained model. Therefore, we not only need a function that transforms a given sample but also runs the evaluate() method at the same time.

Pusher

When the trained model is evaluated to be deployed, (Vertex) Pusher pushes the model to the Model Registry in Vertex AI. It also optionally creates an Endpoint and deploys the model to the endpoint out of the box. You can specify a number of different deployment-specific configurations to Pusher: machine type, GPU type, the number of GPUs, traffic splits etc.

Integration with Hugging Face 🤗 Hub

Hugging Face Hub offers ML practitioners a powerful way to store and share models, datasets, and ML applications. Since it supports seamless support for storing model artifacts with automatic version control, we developed a custom TFX component named HFPusher that:

takes a model artifact (in the SavedModel format) and pushes that to the Hub in a separate branch for better segregation. The branch name is determined by time.time().
creates and pushes a model card that includes attributes of the model enabling dıscovery of the models on the Hugging Face Hub platform.
hosts an application with the model using Hugging Face Spaces given an application template referencing the branch where the model artifact was pushed to.

You can use this component anywhere after the Trainer component, but it’s recommended to use it at the end of a TFX pipeline. The HFPusher component only requires a handful of arguments consisting of two TFX artifacts and four Hugging Face specific configurations:

Hugging Face user name
Hugging Face access token for creating and modifying repositories on the Hugging Face Hub, which is automatically injected with GitHub Action (see the next section)
Name of the repository to which the model artifacts will be pushed
Model artifact as an output of a previous component such as Trainer
Hugging Face Space specific configurations (optional)
- Application template to host a Space application
- Name of the repository to which the Space application will be pushed. It has the same name as the name of the model repository by default.
- Space SDK. The default value is gradio, but it could be set to streamlit
Model blessing artifact as an output of a previous component such as Evaluator (optional)

The Hugging Face Hub is primarily based on Git and Git-LFS. The Hugging Face team provides an easy-to-use huggingface_hub API toolkit to interact with it. That is how it provides seamless support for version control, large file storage, and interaction.

In Figures 3 and 4, we show how the model repository and the application repository (which were automatically created from a TFX pipeline) look like on the Hugging Face Hub.

Screenshot showing model versioning in Hugging Face Model Hub

Figure 3. Model versioning in Hugging Face Model Hub (original)

Screenshot of a simple demo for semantic segmentation model trained on the PETS dataset

Figure 4. Automatically published application in Hugging Face Space Hub (original)

HFPusher has been contributed to the official TFX-Addons tfx-addons package. HFPusher will be available in version 0.4.0 and later in the tfx-addons package.

Automation with GitHub Actions

In the DevOps world, we usually run a number of tests on the changes introduced to ensure they’re valid enough to hit production. If the tests pass, the changes are merged and a new deployment is shipped automatically.

For an ML codebase, the changes are usually either related to data or model on a broad level. Validating these changes is quite application dependent but there could still be common grounds:

Do the changes introduced on the modeling side lead to better performance metrics?
Do the changes lead to faster training throughput?
Do the data-related changes reflect some distribution better?

We focused on the first point in this project. We designed a GitHub Action workflow that can:

1. Google Cloud authentication and setup is done with google-github-actions/auth and google-github-actions/setup-gcloud GitHub Actions when a credential (JSON) is provided. In order to use appropriate credentials to the specified Google Cloud project ID, the workflow seeks for the credentials from GitHub Action Secret. Each credential is mapped to the name which is identical to the Google Cloud project ID.

2. Some of the sensitive information is replaced with envsubst command. In this project, it is required to provide a Hugging Face 🤗access token to the HFPusher component to create and update any repositories in Hugging Face 🤗 Hub. The access token is stored in GitHub Action Secret.

3. An environment variable enable_dataflow is set to “true” or “false” based on the specified parameter. By looking up the environment variable, the TFX pipeline conditionally defines dedicated parameters for Dataflow and passes them to ExampleGen and Transform components via with_beam_pipeline_args method.

4. The last part of the workflow compiles and runs the TFX pipeline on Vertex AI with the TFX CLIs as below. The tfx pipeline create CLI creates the pipeline and registers it to the local system. Furthermore, it is capable of building and pushing a Docker Image to Google Container Registry(GCR) based on a custom Dockerfile in the pipeline. Then tfx run create CLI runs the pipeline on Vertex AI with the specified Google Cloud Project ID and region.

tfx pipeline create
–pipeline-path kubeflow_runner.py
–engine vertex –build-image

tfx run create
–engine vertex
–pipeline-name PIPELINE_NAME
–project GCP_PROJECT_ID –region GCP_REGION

In this case, we need to verify each PR if the suggested modification works well at the build and run times. Also, sometimes each collaborator wants to run the ML pipeline with their own Google Cloud account. Furthermore, it is better if we could conditionally delegate some heavy jobs in the ML pipeline to more dedicated Google Cloud services.

Figure 5. GitHub Action for CI/CD of ML pipeline (original)

As you may notice from Figure 5, the GitHub Action runs a workflow based on five different parameters – branch, Google Cloud project ID, cloud region, the name of TFX pipeline, and enabling the Dataflow integration.

Conclusion

In this post, we discussed how to build an end-to-end ML pipeline for semantic segmentation tasks. We leveraged TensorFlow, TFX, and Google Cloud services such as Dataflow and Vertex AI, GitHub Actions, and Hugging Face 🤗 Hub to develop a production-grade ML pipeline with external services along with semi-automatic CI/CD pipelines. We hope that you found this setup useful and reliable and that you will use this in your own ML pipeline projects.

As a future work, we will demonstrate a common MLOps scenario by extending this project. First, we’ll add more complexities to the data to simulate model performance degradation. Second, we’ll evaluate the currently deployed model to see if the model performance degradation actually happened. Last, we’ll verify the model performance is recovered after replacing the current model architecture with better ones such as DeepLabV3+ or SegFormer.

Acknowledgements

We are grateful to the ML Developer Programs team that provided Google Cloud credits to support our experiments. We thank Robert Crowe for providing us with helpful feedback and guidance. We also thank Merve Noyan who worked on integrating the model card utilities into the HFPusher component.

Optimizing TensorFlow for 4th Gen Intel Xeon Processors

January 10, 2023

by TensorFlow Blog Tensorflow

Posted by Ashraf Bhuiyan, AG Ramesh from Intel, Penporn Koanantakool from Google

TensorFlow 2.9.1 was the first release to include, by default, optimizations driven by the Intel® oneAPI Deep Neural Network (oneDNN) library, for 3rd Gen Intel ® 3^rd Xeon® processors (Cascade Lake). Since then, Intel and Google have continued our collaboration to introduce new TensorFlow optimizations for the next generation of Intel Xeon processors.

These optimizations accelerate TensorFlow models using the new matrix-based instructions set, Intel® Advanced Matrix Extension (AMX). The Intel AMX instructions are designed to accelerate deep learning operations such as matrix multiplication and convolutions that use Google’s bfloat16 and 8-bit low precision data types. Low precision data types are widely used and provide significant improvement over the default 32-bit floating format without significant loss in accuracy.

We are happy to announce that these features are now available as a preview in the nightly build of TensorFlow on Github, and also in the Intel optimized build. TensorFlow developers can now use Intel AMX on the 4^th Gen Intel® Xeon® Scalable processor (formerly known as Sapphire Rapids) using the existing mixed precision support available in TensorFlow. We are excited by the results – several popular AI models run up to 19x faster by moving from 3^rd Gen to 4^th Gen Intel Xeon processors using Intel AMX.

Intel’s Advanced Matrix Extension (AMX) Accelerations in 4^th Gen Intel Xeon Processor

The Intel® Advanced Matrix Extension (AMX) is an X86-based extension which introduces a new programming framework for dot products of two matrices. Intel AMX serves as an AI acceleration engine and builds on capabilities such as AVX-512 (for optimized vector operations) and Deep Learning Boost (through Vector Neural network Instructions for optimized resource utilization/caching and for lower precision AI optimizations) in previous generations of Intel Xeon processors.

In Intel AMX, a new type of 2-dimensional register file, called “tiles”, and a set of 12 new X86 instructions to operate on the tiles, are introduced. New instruction TDPBF16PS performs a dot product of bfloat16 tiles, and TDPBSSD performs dot product of signed 8-bit integer tiles. Other instructions include tile configuration and data movement to the Intel AMX unit. Further details can be found in the document published by Intel.

How to take advantage of AMX optimizations on 4^th Gen Intel Xeon.

Intel AMX optimizations are included in the official TensorFlow nightly releases. The latest stable release 2.11 includes preliminary support, however full support will be available in a subsequent stable release.

Users running TensorFlow on Intel 4^th gen Intel Xeon can take advantage of the optimizations with minimal changes:

a) For bfloat16 mixed precision, developers can accelerate their models using Keras mixed precision API, as explained here. You can easily invoke auto mixed precision by including these lines in your code, that’s it!

from tensorflow.keras import mixed_precisionpolicy = mixed_precision.Policy('mixed_bfloat16') mixed_precision.set_global_policy(policy)

b) Using Intel AMX with 8-bit quantized models requires the models to be quantized to use int8. Any existing standard models, for example RN50, BERT, SSD-RN34 that have been previously quantized with Intel Neural Compressor will run with no changes needed.

Performance improvements

The following charts show performance improvement on a 2-socket, 56-core 4^th Gen Intel Xeon using Intel AMX low precision on various popular vision and language models, where the baseline is a 2-socket, 40-core 3^rd Gen Intel Xeon with FP32 precision. We use Intel Optimization for TensorFlow* preview and the launch_benchmark script from Model Zoo for Intel® Architecture .

Bar chart showing comparison of Speeddup between 4th Gen Intel Xeon with AMX BF16 vs. 3rd Gen Intel Xeon with FP32 across mixed precision models

Here in the chart, inference with mixed precision models on a 4^th Gen Intel Xeon was 1.9x to 9.6x faster than FP32 models on a 3^rd Gen Intel Xeon. (BS=x indicates a large batch size, depending on the model)

Bar chart showing comparison of Speeddup between 4th Gen Intel Xeon with AMX BF16 vs. 3rd Gen Intel Xeon with FP32 for training across mixed precision models

Training models with auto-mixed-precision on a 4th Gen Intel Xeon was 2.3x to 5.5x faster than FP32 models on a 3rd Gen Intel Xeon.

Bar chart showing comparison of Speeddup between 4th Gen Intel Xeon with AMX Int8 vs. 3rd Gen Intel Xeon with FP32 across mixed precision models

Similarly, quantized model inference on a 4^th Gen Intel Xeon was 3.3x to 19x faster than FP32 precision on a 3^rd Gen Intel Xeon.

In addition to the above popular models, we have tested 100s of other models to ensure that the performance gain is observed across the board.

Next Steps

We are working to continuously tune and improve the Intel AMX optimizations in future releases of TensorFlow. We encourage users to optimize their AI models with Intel AMX on Intel 4^th Gen processors to get a significant performance boost; not just for inference, but also for pre-training, fine tuning and transfer learning. We would like to hear from you, please provide feedback through the TensorFlow Github page or the oneAPI Deep Neural Network library GitHub page.

Acknowledgements

The results presented in this blog is the work of many people including the TensorFlow and oneDNN teams at Intel and our collaborators in Google’s TensorFlow team.

From Intel: Md Faijul Amin, Mahmoud Abuzaina, Gauri Deshpande, Ashiq Imran, Kanvi Khanna, Geetanjali Krishna, Sachin Muradi, Srinivasan Narayanamoorthy, Bhavani Subramanian, Yimei Sun, Om Thakkar, Jojimon Varghese, Tatyana Primak, Shamima Najnin, Mona Minakshi, Haihao Shen, Shufan Wu, Feng Tian, Chandan Damannagari.

From Google: Eugene Zhulenev, Antonio Sanchez, Emilio Cota.

*For configuration details see www.intel.com/performanceindex

Notices and Disclaimers:

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured list by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

New State-of-the-Art Quantized Models Added in TF Model Garden

December 22, 2022

by TensorFlow Blog Tensorflow

Posted by Jaehong Kim, Fan Yang, Shixin Luo, and Jiyang Kang

The TensorFlow Model Garden provides implementations of many state-of-the-art machine learning models for vision and natural language processing, and workflow tools to let you quickly configure and run those models on standard datasets. These models are implemented using modern best practices.

Previously, we have announced the quantization aware training (QAT) support for various on-device vision models using TensorFlow Model Optimization Toolkit (TFMOT). In this post, we introduce new SOTA models optimized using QAT in object detection, semantic segmentation, and natural language processing.

RetinaNet+MobileNetV2

A new QAT supported object detection model has been added to the Model Garden. Specifically, we use a MobileNetV2 with 1x depth multiplier as the backbone and a lightweight RetinaNet as the decoder. MobileNetV2 is a widely used mobile model backbone and we have provided QAT support since our last release. RetinaNet is the SOTA one-stage detection framework used for detection tasks and we make it more efficient on mobile devices by using separable convolution and reducing the number of filters. We train the model from scratch without any pre-trained checkpoints.

Results show that with QAT, we can successfully preserve the model quality while reducing the latency significantly. In comparison, post-training quantization (PTQ) does not work out-of-the-box smoothly due to the complexity of the RetinaNet decoder, thus leading to low box average precision (AP).

Table 1. Box AP and latency comparison of the RetinaNet models. Latency is measured on a Samsung Galaxy S21 using 1-thread CPU. FP32 refers to the unquantized floating point TFLite model. PTQ INT8 refers to full integer post-training quantization. QAT INT8 refers to the quantized QAT model.

The QAT support for object detection model is critical to many on-device use cases, such as product recognition using hand-held devices, enabling a more pleasant user journey.

MOSAIC

MOSAIC is a neural network architecture for efficient and accurate semantic image segmentation on mobile devices. With a simple asymmetric encoder-decoder structure which consists of an efficient multi-scale context encoder and a light-weight hybrid decoder to recover spatial details from aggregated information, MOSAIC achieves better balanced performance while considering accuracy and computational cost. MLCommons MLPerf adopted MOSAIC as the new industry standard model for mobile image segmentation benchmark.

We have added QAT support for MOSAIC as part of the open source release. In Table 2, we provide the benchmark comparison between DeepLabv3+ and MOSAIC. We can clearly observe that MOSAIC achieves better performance (mIoU: mean intersection-over-union) with significantly lower latency. The negligible gap between QAT INT8 and FP32 also demonstrates the effectiveness of QAT. Please refer to the paper for more benchmark results.

Table 2. mIoU and latency comparison of a MobileNet Multi-HW AVG + MOSAIC. Latency is measured on a Samsung Galaxy S21 using 1-thread CPU. FP32 refers to the unquantized floating point TFLite model. PTQ INT8 refers to full integer post-training quantization. QAT INT8 refers to the quantized QAT model.

MOSAIC is designed using commonly supported neural operations, and can be easily deployed to diverse mobile hardware platforms for efficient and accurate semantic image segmentation.

MobileBERT

MobileBERT is a thin version of BERT_LARGE, while equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward networks. (code)

We applied QAT to the MobileBERT model to show our QAT toolkit can apply to the Transformer based mode, which has become very popular these days.

Table 3. F1 score and latency comparison of a MobileBERT. Latency is measured on a Samsung Galaxy S21 using 1-thread CPU. FP32 refers to the unquantized floating point TFLite model. PTQ INT8 refers to full integer post-training quantization. QAT INT8 refers to the quantized QAT model.

MobileBERT-EdgeTPU-XS model

Apply QAT on MobileBERT to enable mobile use-case for the NLP model, such as next word prediction or answer generation. This model only trained on Q&A tasks but it can leverage other on-device NLP tasks.

Next steps

In this post, we expanded the coverage of QAT support and introduced new state-of-the-art quantized models in Model Garden for object detection, semantic segmentation, and natural language processing. TensorFlow practitioners can easily utilize these SOTA quantized models for their problems achieving lower latency or smaller model size with minimal accuracy loss.

To learn more about the Model Garden and its Model Optimization Toolkit support, check out the following blog posts:

Model Garden provides implementation of various vision and language models, and the pipeline to train models from scratch or from checkpoints. To get started with Model Garden, you can check out the examples in the Model Garden Official repository. Model libraries in this repository are optimized for fast performance and actively maintained by Google engineers. Simple colab examples for training and inference using these models are also provided.

Acknowledgements

We would like to thank everyone who contributed to this work including the Model Garden team, Model Optimization team and Google Research team. Special thanks to Abdullah Rashwan, Yeqing Li, Hongkun Yu from the Model Garden team; Jaesung Chung from the Model Optimization team, Weijun Wang from the Google Research team.

What’s next?

First, what are decision forests?

Easier hyper-parameter tuning

Hyper-parameters templates

Serving models on Google Cloud

Distributed training on billions of examples

Training models in Google Sheets

Next steps

What is TFX-Addons?

How can you use the TFX-Addons components or examples?

Which additional components are currently available?

Feast Component

Message Exit Handler

Schema Curation Component

Feature Selection Component

XGBoost Evaluator Component

Sampling Component

Pandas Transform Component

Firebase Publisher

HuggingFace Model Pusher

How can you participate?

Already using TFX-Addons?

Thanks to all Contributors

Near year, new solutions

Customize for your needs

Resolve to make something great!

Create a rock paper scissors game

Categorize your images

Run sentiment analysis

[Your idea here]

A tiny board with big muscle

Get started

FrameGenerator to load video data

einops library for resizing video data

What’s next?

Architectural Overview

TFX Pipeline

ExampleGen

ImportSchemaGen

Transform

Trainer

Evaluator

Pusher

Integration with Hugging Face 🤗 Hub

Automation with GitHub Actions

Conclusion

Acknowledgements

Intel’s Advanced Matrix Extension (AMX) Accelerations in 4th Gen Intel Xeon Processor

How to take advantage of AMX optimizations on 4th Gen Intel Xeon.

Performance improvements

Next Steps

Acknowledgements

RetinaNet+MobileNetV2

MOSAIC

MobileBERT

Next steps

Acknowledgements

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.

Intel’s Advanced Matrix Extension (AMX) Accelerations in 4^th Gen Intel Xeon Processor

How to take advantage of AMX optimizations on 4^th Gen Intel Xeon.