Fast Reduce and Mean in TensorFlow Lite

Fast Reduce and Mean in TensorFlow Lite

Posted by Alan Kelly, Software Engineer

We are happy to share that TensorFlow Lite version 2.10 has optimized Reduce (All, Any, Max, Min, Prod, Sum) and Mean operators. These common operators replace one or more dimensions of a multi-dimensional tensor with a scalar. Sum, Product, Min, Max, Bitwise And, Bitwise Or and Mean variants of reduce are available. Reduce is now fast for all possible inputs.

Benchmark for Reduce Mean on Google Pixel 6 Pro Cortex A55 (small core). Input tensor is 4D of shape [32, 256, 5, 128] reduced over axis [1, 3], Output is a 2D tensor of shape [32, 5].

Benchmark for Reduce Prod on Google Pixel 6 Pro Cortex A55 (small core). Input tensor is 4D of shape [32, 256, 5, 128] reduced over axis [1, 3], Output is a 2D tensor of shape [32, 5].

Benchmark for Reduce Sum on Google Pixel 6 Pro Cortex A55 (small core). Input tensor is 4D of shape [32, 256, 5, 128] reduced over axis [0, 2], Output is a 2D tensor of shape [256, 128].


These speed-ups are available by default using the latest version of TFLite on all architectures.

How does this work?

To understand how these improvements were made, we need to look at the problem from a different perspective. Let’s take a 3D tensor of shape [3, 2, 5].

Let’s reduce this tensor over axes [0] using Reduce Max. This will give us an output tensor of shape [2, 5] as dimension 0 will be removed. Each element in the output tensor will contain the max of the three elements in the same position along dimension 0. So the first element will be max{0, 10, 20} = 20. This gives us the following output:

To simplify things, let’s reshape the original 3D tensor as a 2D tensor of shape [3, 10]. This is the exact same tensor, just visualized differently.

Reducing this over dimension 0 by taking the max of each column gives us:

Which we then reshape back to its original shape of [2, 5]

This demonstrates how simply changing how we visualize the tensor dramatically simplifies the implementation. In this case, dimensions 1 and 2 are adjacent and not being reduced over. This means that we can fold them into one larger dimension of size 2 x 5 = 10, transforming the 3D tensor into a 2D one. We can do the same to adjacent dimensions which are being reduced over.

Let’s take a look at all possible Reduce permutations for the same 3D tensor of shape [3, 2, 5].

Of all 8 permutations, only two 3D permutations remain after we re-visualize the input tensor. For any number of dimensions, there are only two possible reduction permutations: the rows or the columns. All other ones simplify to a lower dimension.

This is the trick to an efficient and simple reduction operator as we no longer need to calculate input and output tensor indices and our memory access patterns are much more cache friendly.

This also allows the compiler to auto-vectorize the integer reductions. The compiler won’t auto-vectorize floats as float addition is not commutative. You can see the code which removes redundant axes here and the reduction code here.

Changing how we visualize tensors is a powerful code simplification and optimization technique which is used by many TensorFlow Lite operators.

Next steps

We are always working on adding new operators and speeding up existing ones. We’d love to hear about models of yours which have benefited from this work. Get in touch via the TensorFlow Forum. Thanks for reading!Read More

Colab’s ‘Pay As You Go’ Offers More Access to Powerful NVIDIA Compute for Machine Learning

Posted by Chris Perry, Google Colab Product Lead

Google Colab is launching a new paid tier, Pay As You Go, giving anyone the option to purchase additional compute time in Colab with or without a paid subscription. This grants access to Colab’s  powerful NVIDIA GPUs and gives you more control over your machine learning environment.

Colab is fully committed to supporting all of our users whether or not they pay for additional compute, and our free-of-charge tier stays in its current form. Today’s announcement reflects additions to paid options only.

Colab helps you accomplish more with machine learning

Google Colab is the easiest way to start machine learning. From the Colab notebooks powering TensorFlow’s tutorials and guides to Deepmind’s AlphaFold example, Colab is helping the world learn ML and share the results broadly, democratizing machine learning.

Colab Pay As You Go further expands the potantial for using Colab. Pay As You Go allows anyone to purchase more compute time with Colab, regardless of whether or not they have a monthly subscription. Customers can use this feature to dramatically increase their usage allotments of Colab over what was possible before. Try it out at colab.research.google.com/signup

Previously, Colab’s paid quota service throttled compute usage to smooth out quota exhaustion over the entire month of a subscription to ensure a paid user would be able to access Colab compute as much as possible over their month’s subscription: we didn’t want users to fully exhaust their quota on day one and spend the rest of the month frustrated by lack of access to runtimes. Now with Pay As You Go, we are relaxing usage throttling for all paid users (though this will remain the case for users in our free of charge tier).

Paid users now have the flexibility to exhaust compute quota, measured in compute units, at whatever rate they choose. As compute units are exhausted, a user can choose to purchase more with Pay As You Go at their discretion. Once a user has exhausted their compute units their Colab usage quota will revert to our free of charge tier limits.

Increasing your power with NVIDIA GPUs

Paid Colab users can now choose between a standard or premium GPU in Colab, giving you the ability to upgrade your GPU when you need more power. Standard GPUs are typically NVIDIA T4 Tensor Core GPUs, while premium GPUs are typically NVIDIA V100 or A100 Tensor Core GPUs. Getting a specific GPU chip type assignment is not guaranteed and depends on a number of factors, including availability and your paid balance with Colab. If you want guaranteed access to a specific machine configuration, we recommend purchasing a VM on GCP Marketplace.

When you need more power, select premium GPU in your runtime settings: Runtime > Change runtime type > GPU class > Premium. Premium GPUs will deplete your paid balance in Colab faster than standard GPUs.

Colab is the right choice for ML projects

Colab is the right choice for your machine learning project: TensorFlow and many excellent ML libraries come pre-installed, pre-warmed GPUs are a click away, and sharing your notebook with a collaborator is as easy as sharing a Google doc. Collaborators can access runtimes with GPU accelerators without need for payment. Pay As You Go makes Colab an even more useful product for any ML project you’re looking into.

Read More

Automated Deployment of TensorFlow Models with TensorFlow Serving and GitHub Actions

Automated Deployment of TensorFlow Models with TensorFlow Serving and GitHub Actions

Posted by Chansung Park and Sayak Paul (ML-GDEs)


If you are an applications developer, or if your organization doesn’t have a dedicated ML Engineering team, it is common to deploy a machine learning model without worrying about the end to end machine learning pipeline or MLOps. TFX and TensorFlow Serving can help you create the heart of an MLOps infrastructure. 

In this post, we will share how we serve a TensorFlow image classification model as RESTful and gRPC based services with TensorFlow Serving on a Kubernetes (k8s) cluster running on Google Kubernetes Engine (GKE) through a set of GitHub Actions workflows. 

Overview

In any GitHub project, you can make releases, with up to 2 GB of assets included in each release when using a free account. This is a good place to manage different versions of machine learning models for various reasons. One can also replace this with a more private component for managing model versions such as Google Cloud Storage buckets. For our purposes, the 2 GB space provided by GitHub Releases will be enough.

Figure 1. Three steps to deploy TF Serving on GKE (original).

The basic idea is to:

  1. Automatically detect a newly released version of a TensorFlow-based ML model in GitHub Releases
  2. Build a custom TensorFlow Serving Docker image containing the released ML model
  3. Deploy it on a k8s cluster running on GKE through a set of GitHub Actions.

The entire workflow can be logically divided into three subtasks, so it’s a good idea to write three separate composite GitHub Actions:

  • First subtask handles the environmental setup
    • GCP Authentication (GCP credentials are injected from the GitHub Action Secret)
    • Install gcloud CLI toolkit to access the GKE cluster for the third subtask
    • Authenticate Docker to push images to the Google Cloud Registry (GCR)
    • Connect to a designated GKE cluster for further accesses
  • Second subtask builds a custom TensorFlow Serving image
    • Download and extract your latest released SavedModel from your GitHub repository
    • Run the official or a custom built TensorFlow Serving docker image
    • Copy the extracted SavedModel into the running TensorFlow Serving docker container
    • Commit the changes of the running container and give it a new name with the tags of special token to denote GCR, GCP project ID, and latest
    • Push the committed image to the GCR
  • Third subtask deploys the custom built TensorFlow Serving image to the GKE cluster
    • Download the Kustomize toolkit to handle overlay configurations
    • Pick one of the scenarios from the various experiments
    • Apply Deployment, Service, and ConfigMap according to the selected experiment to the currently connected GKE cluster
      • ConfigMap is used for batching-enabled scenarios to inject batching configurations dynamically into the Deployment.

There are a number of parameters that you can customize such as the GCP project ID, GKE cluster name, the repository where the ML model will be released, and so on. The full list of parameters can be found here. As noted above, the GCP credentials should be set as a GitHub Action Secret beforehand. If the entire workflow goes without any errors, you will see something similar to the output below.

NAME         TYPE            CLUSTER-IP      EXTERNAL-IP     PORT(S)                            AGE
tfs-server   LoadBalancer    xxxxxxxxxx      xxxxxxxxxx       8500:30869/TCP,8501:31469/TCP      23m

The combinations of the EXTERNAL-IP and the PORT(S) represent endpoints where external users can connect to the TensorFlow Serving pods in the k8s cluster. As you see, two ports are exposed, and 8500 and 8501 are for RESTful and gRPC services respectively. One thing to note is that we used LoadBalancer as the service type, but you may want to consider including Ingress controllers such as GKE Ingress for securing the k8s clusters with SSL/TLS and defining more flexible routing rules in production. You can check out the complete logs from the past runs.

Build a Custom TensorFlow Serving Image within a GitHub Action

As described in the overview and the official document, a custom TensorFlow Serving Docker image can be built in five steps. We also provide a notebook for local testing of these steps. In this section, we show how to write a composite GitHub Action for this partial subtask of the whole workflow (note that .inputs, .env, and ${{ }} for the environment variables are omitted for brevity).

First, a model can be downloaded by an external robinraju/release-downloader GitHub Action with custom information about the URL of the GitHub repository and the filename in the list of assets from the latest release. The default filename is saved_model.tar.gz.

Second, the downloaded file should be decompressed to fetch the actual SavedModel that TensorFlow Serving can understand.

runs:
  using: “composite”
  steps:
      – name: Download the latest SavedModel release
        uses: robinraju/release-downloader@v1.3
        with:
          repository: $MODEL_RELEASE_REPO
          fileName: $MODEL_RELEASE_FILE

          latest: true
         
      – name: Extract the SavedModel
        run: |
          mkdir MODEL_NAME
          tar -xvf $MODEL_RELEASE_FILE –strip-components=1 –directory $MODEL_NAME
   
      – name: Run the CPU Optimized TensorFlow Serving container
        run: |
          docker run -d –name serving_base $BASE_IMAGE_TAG
         
      – name: Copy the SavedModel to the running TensorFlow Serving container
        run: |
          docker cp $MODEL_NAME serving_base:/models/$MODEL_NAME
         
      – id: push-to-registry
        name: Commit and push the changed running TensorFlow Serving image
        run: |
          export NEW_IMAGE_NAME=tfserving-$MODEL_NAME:latest
          export NEW_IMAGE_TAG=gcr.io/$GCP_PROJECT_ID/$NEW_IMAGE_NAME
          echo “::set-output name=NEW_IMAGE_TAG::$(echo $NEW_IMAGE_TAG)”
          docker commit –change “ENV MODEL_NAME $MODEL_NAME” serving_base $NEW_IMAGE_TAG
          docker push $NEW_IMAGE_TAG

Third, we can modify a running TensorFlow Serving Docker container by placing a custom SavedModel inside. In order to do this, we need to run the base TensorFlow Serving container instantiated either from the official image or a custom-built image. We have used the CPU-optimized version as the base image by compiling from source, and it is publicly available here.

Fourth, the SavedModel should be copied to the /models directory inside the running TensorFlow Serving container. In the last step, we set the MODEL_NAME environment variable to let TensorFlow Serving know which model to expose as services, and commit the two changes that we made to the base image. Finally, the updated TensorFlow Serving Docker image can be pushed into the designated GCR.

Notes on the TensorFlow Serving Parameters

We consider three TensorFlow Serving specific parameters in this post: tensorflow_inter_op_parallelism, tensorlfow_inter_op_parallelism, and the batching option. Here, we provide brief overviews of each of them.

Parallelism threads: tesorflow_intra_op_parallelism controls the number of threads to parallelize the execution of an individual operation. tensorflow_inter_op_parallelism controls the number of threads to parallelize the execution of multiple independent operations. To know more, refer to this resource.

Batching: As mentioned above, we can allow TensorFlow Serving to batch requests by setting the enable_batching parameter to True. If we do so, we also need to define the batching configurations for TensorFlow in a separate file (passed via the batching_parameters_file argument). Please refer to this resource for more information about the options we can specify in that file.

Configuring TensorFlow Serving

Once you have a custom TensorFlow Serving Docker image, you can deploy it with the k8s resource objects: Deployment and ConfigMap as shown below. This section shows how to write ConfigMap to write batching configurations and Deployment to add TensorFlow Serving specific runtime options. We also show you how to mount the ConfigMap to inject batching configurations into TensorFlow Serving’s batching_parameters_file option.

apiVersion: apps/v1

kind: Deployment


    spec:
      containers:
      – image: gcr.io/gcp-ml-172005/tfs-resnet-cpu-opt:latest
        name: tfs-k8s
        imagePullPolicy: Always
        args: [“–tensorflow_inter_op_parallelism=2”,
              “–tensorflow_intra_op_parallelism=8”,
              “–enable_batching=true”,
              “–batching_parameters_file=/etc/tfs-config/batching_config.txt”]
        …
        volumeMounts:
          – mountPath: /etc/tfs-config/batching_config.txt
            subPath: batching_config.txt
            name: tfs-config

The URI of the custom built TensorFlow Serving Docker image can be specified in spec.containers.image, and the behavior of TensorFlow Serving can be customized by providing arguments in the spec.containers.args in the Deployment. This post shows how to configure three kinds of custom behavior: tensorflow_inter_op_parallelism, tensorflow_intra_op_parallelism, and enable_batching.

apiVersion: v1

kind: ConfigMap
metadata:
  name: tfs-config
data:
  batching_config.txt: |
    max_batch_size { value: 128 }
    batch_timeout_micros { value: 0 }
    max_enqueued_batches { value: 2 }
    num_batch_threads { value: 2 }

When enable_batching is set to true, we can further customize the batch inference by defining its specific batching-related configurations in a ConfigMap. Then, the ConfigMap can be mounted as a file with spec.containers.volumeMounts, and we can specify which file to look up for the batching_parameters_file argument in Deployment.

Kustomize to Manage Various Experiments

As you see, there are lots of parameters to determine the behavior of TensorFlow Serving, and the optimal values for them are usually found by running experiments. Indeed, we have experimented with various parameters within a number of different environmental setups: different numbers of nodes, different numbers of vCPU cores, and different RAM capacity.

├── base

|   ├──kustomization.yaml

|   ├──deployment.yaml

|   └──service.yaml
└── experiments
    ├── 2vCPU+4GB+inter_op2

    …

    ├── 4vCPU+8GB+inter_op2
    …

    ├── 8vCPU+64GB+inter_op2_w_batch

    |   ├──kustomization.yaml

    |   ├──deployment.yaml

    |   └──tfs-config.yaml
    …

We used kustomize to manage the YAML files of various experiments. We keep common YAML files of Deployment and Service in the base directory while having specific YAML files for certain experimental environments and configurations under the experiments directory. With this and kustomize, the contents of the base YAML files could be easily overlaid with different numbers of replicas, different values of tensorflow_inter_op_parallelism, tensorflow_intra_op_parallelism, enable_batching, and batch configurations.

runs:
  using: “composite”
  steps:
    – name: Setup Kustomize
      …

    – name: Deploy to GKE
      working-directory: .kube/
      run: |-
        ./kustomize build experiments/$TARGET_EXPERIMENT | kubectl apply -f –

You can simply select the experiment that you want to test or that you think is optimal by setting $TARGET_EXPERIMENT. For example, the best experiment that we found was “8vCPU+16GB+inter_op4” which means each VM is configured with an 8vCPU and 16GB RAM while tensorflow_inter_op_parallelism is set to 4. Then the kustomize build command will provision the YAML files for the selected experiment for the k8s clusters.

Costs

We used the GCP cost estimator for this purpose. Pricing for each experiment configuration was assumed to be live for 24 hours per month (which was sufficient for our experiments).


Machine Configuration (E2 series) Pricing (USD)

2vCPUs, 4GB RAM, 8 Nodes

11.15
4vCPUs, 8GB RAM, 4 Nodes

11.15
8vCPUs, 16GB RAM, 2 Nodes

11.15

8vCPUs, 64GB RAM, 2 Nodes

18.21

Conclusion

In this post, we discussed how to automatically deploy and experiment with an already trained model with various configurations. We leveraged TensorFlow Serving, Kubernetes, and GitHub Actions to streamline the deployment and experiments. We hope that you found this setup useful and reliable and that you will use this in your own model deployment projects.


Acknowledgements

We are grateful to the ML Developer Programs team that provided GCP credits for supporting our experiments. We also thank Hannes Hapke and Robert Crowe for providing us with helpful feedback and guidance.

Read More

Bridging communities: TensorFlow Federated (TFF) and OpenMined

Bridging communities: TensorFlow Federated (TFF) and OpenMined

Posted by Krzys Ostrowski (Research Scientist), Alex Ingerman (Product Manager), and Hardik Vala (Software Engineer)

Since the announcement of TensorFlow Federated (TFF) on this blog 3.5 years ago, a number of organizations have developed frameworks for Federated Learning (FL). While growing attention to privacy and investments in FL are a welcome trend, one challenge that arises is fragmentation of community and industry efforts, which leads to code duplication and reinvention. One way we can address this as a community is by investing in interoperability mechanisms that could enable our platforms and developers to work together and leverage each other’s strengths.

In this context, we’re excited to announce the collaboration between TFF and OpenMined – an OSS community dedicated to development of privacy-preserving technologies. OpenMined’s PySyft framework has attracted a vibrant community of hundreds of OSS contributors, and includes tools and APIs to facilitate containerized deployment and integrations with diverse data sources that complement the capabilities we offer in TFF.

OpenMined is joining Special Interest Group (SIG) Federated (see the charter, forum, meeting notes, and the Discord server) we’ve recently established to enable developers of TFF, together with a growing set of OSS and industry partners, to openly engage in conversations about how to jointly evolve the TFF ecosystem and grow the adoption of FL.

Introducing PySyTFF

To kick off the collaboration, we – the developers of TFF and OpenMined’s PySyft – decided to focus our initial efforts on building together a new platform, with an endearing name PySyTFF, that combines elements of TFF and PySyft to support what we believe will be an increasingly common scenario, illustrated below.

In this scenario, an owner of a sensitive dataset would like to invite researchers to experiment with training and evaluating ML models on their dataset to advance the current understanding of what model architectures, parameters, etc., work best, while protecting the data and adhering to policies that may govern its use. In practice, such scenarios often end up involving negotiating data usage contracts. On the one hand, these can be tedious to set up, and on the other hand, they largely rely on goodwill.

What we’d like instead is, to have a platform that can offer structural safeguards in place that limit the disclosure of sensitive information and ensure policy compliance by construction – this is our goal for PySyTFF.

As an aside, note that even though this blog post is about FL, we aren’t necessarily talking here about scenarios where data is physically siloed across physical locations – the data can also be hosted in a datacenter and logically siloed. More on this below.

Developer experience

The initial proof-of-concept implementation of PySyTFF offers an early glimpse of what the developer experience for the data scientist will look like. Note how we combine the advantages of both frameworks – e.g., TFF’s ability to define models in Keras, and PySyft’s access control mechanism and APIs for data access:


domain = sy.login(email=“sam@stargate.net”, password=“changethis”, port=8081)


model_fn = lambda: tf.keras.models.Sequential(…)


params = {

    ’rounds’: 10,

    ‘no_clients’: 3,

    ‘noise_multiplier’: 0.05,

    ‘clients_per_round’: 2,

    ‘train_data_id’: domain.dataset[0][‘images’].id_at_location.to_string(),

    ‘label_data_id’: domain.datasets[0][‘labels’].id_at_location.to_string()

}


model, metrics = sy.tff.train_model(model_fn, params, domain, timeout=5000)

Here, the data scientist is logging into a PySyft’s domain node – an infrastructure component provisioned by or on behalf of the data provider – and gains a limited, access control-guarded ability to enumerate the available resources and perform actions on them. This includes obtaining references to datasets managed by the node and their metadata (but not content) and issuing the train_model calls, wherein the data scientist can supply a Keras model they wish to train, and the various parameters that control the training process and affect the privacy guarantees of the computed result, such as the number of rounds or the amount of noise added in order to make the results of the model training more private. In return, the researcher may get computed outputs such as a set of evaluation metrics, or the trained model parameters.

Exactly what ranges of parameters supplied by the researcher are accepted by the platform, and what results the researcher can get will, in general, depend on the policies defined by the data owner that might, e.g., mandate the use of privacy-preserving algorithms and constrain the allowed privacy budget – and these may constrain parameters such as the number of training rounds, clients per round, or the noise multiplier. Whereas at the current stage of development, PySyTFF does not yet offer policy engine integration, this is an important part of the future development plans.

Under the hood

The domain node is a docker-based environment that bundles together a web-based frontend that you can securely log into, with a mechanism for authenticating and authorizing users, and a set of internal services that includes database connectivity, as illustrated below.

The train_model call in the code snippet above, perhaps embedded in the data scientist’s Python colab notebook, is implemented as a network request, carrying a serialized representation of the TensorFlow code of the model to train, along with the training parameters, and the references to the PySyft datasets to use for training and evaluation.

Inside the domain node, the call is relayed to a PySyTFF service, a new component introduced to the PySyft ecosystem to orchestrate the training process. This involves interacting with PySyft’s data backend to obtain handles to shards of user data, calling TFF APIs to construct TFF computations to run, and passing the constructed TFF computations and data handles to an embedded instance of TFF runtime that loads the data using the supplied handles and runs the FL algorithms.

FL on logically-siloed data

At this point, some of you may be wondering how exactly FL fits into the picture. After all, FL is mostly known as a technology that supports computations on data that’s distributed across a set of devices, or (in what’s called a cross-silo flavor of FL) a set of data centers owned by a group of institutions, yet here, we’re talking about a scenario where the data is already in the customer’s PySyft database.

To explain this, let’s pop up a level and consider the high level objective – to enable researchers to perform ML computations on sensitive data with platform-level, structural and formal privacy guarantees. In order to do so, the platform should ideally uphold formal privacy principles, such as data minimization (a guarantee on how the computation is executed and how sensitive data is handled), and anonymous aggregation (a guarantee on what is being computed and released).

Federated Learning is a great fit in this context because it structurally embodies these principles, and provides a framework for implementing algorithms that provably achieve user-level Differential Privacy (DP) – the current gold standard. The FL algorithms that enable us to achieve these guarantees can be used to process data in datacenter deployments, even in scenarios where – as is the case here with the PySyft database – all of that data resides in a single administrative domain.

To see this, just imagine that for each user in the database, we draw a virtual boundary around all their data, and think of it as a kind of virtual silo. We can treat such virtual silos of user data in the same way as how we treat “client” devices in a more traditional FL setting, and orchestrate FL algorithms to run across virtual silos as clients.

Thus, for example, when training an ML model, we’d repeatedly pick sets of users from the database, locally and independently train local model updates on their data – separately for each user, add clipping to each local update and noise for privacy, aggregate these local updates across users to produce an updated global model, and repeat this process for thousands of rounds until the ML model converges, as shown below.

Whereas the data may be only logically partitioned, following this approach enables us to achieve the very same types of formal guarantees, including provable user-level differential privacy, as those cited above – and indeed, TFF enables us to leverage the same FL algorithm implementation – literally the same TFF code – as that which powers Google’s mobile/IoT production deployments.

Collaborate with us!

As noted earlier, the initial version of PySyTFF is still missing a number of components – and this, dear reader, is where you come in. If the vision laid out above excites you, we – the TFF and PySyft teams – would love to work with you to evolve this platform together. In addition to policy engine integration, we plan to augment PySyTFF with the ability to spawn distributed instances of the TFF runtime on cloud or compute clusters to power very compute-intensive workloads, a system of charging for the use of resources, and to extend the scope of PySyTFF to include classical types of cross-silo FL deployments, to name just a few.

There are a great many ways to go about this – from joining the TFF and PySyft’s collaborative efforts and directly helping us build and deploy this platform, to helping design and build generic components and APIs that can enable TFF and PySyft/PyGrid to interoperate.

Ready to get started? You can visit the SIG Federated forum and join the Discord server, or you can reach out directly – see the contact info in the SIG charter, and the engagement channels created by the OpenMined’s PySyft team. We’re looking forward to hearing from you!

Acknowledgments

On behalf of the TFF team at Google, we’d like to thank our OpenMined partners Andrew Trask, Tudor Cebere, and Teo Milea for the productive collaboration leading up to this announcement.

Read More

Optimizing TF, XLA and JAX for LLM Training on NVIDIA GPUs

Optimizing TF, XLA and JAX for LLM Training on NVIDIA GPUs

Posted by Douglas Yarrington (Google TPgM), James Rubin (Google PM), Neal Vaidya (NVIDIA TME), Jay Rodge (NVIDIA PMM)

Together, NVIDIA and Google are delighted to announce new milestones and plans to optimize TensorFlow and JAX for the Ampere and recently announced Hopper GPU architectures by leveraging the power of XLA: a performant, flexible and extensible ML compiler built by Google. We will deepen our ongoing collaboration with dedicated engineering teams focused on delivering improved performance in currently available A100 GPUs. NVIDIA and Google will also jointly support unique features in the recently announced H100 GPU, including the Transformer Engine with support for hardware-accelerated 8-bit floating-point (FP8) data types and the transformer library.

We are announcing improved performance in TensorFlow, new NVIDIA GPU-specific features in XLA and the first release of JAX for multi-node, multi-GPU training, which will significantly improve large language model (LLM) training. We expect the Hopper architecture to be especially popular for LLMs.

NVIDIA H100 Tensor Core GPU

XLA for GPU

Google delivers high performance with LLMs on NVIDIA GPUs because of a notable technology, XLA, which supports all leading ML frameworks, such as TensorFlow, JAX, and PyTorch. Over 90% of Google’s ML compilations – across research and production, happen on XLA. These span the gamut of ML use cases, from ultra-large scale model training at DeepMind and Google Research, to optimized deployments across our products, to edge inferencing at Waymo.

XLA’s deep feature set accelerates large language model performance and is solving most large model challenges seen in the industry today. For example, a feature unique to XLA, SPMD, automates most of the work needed to partition models across multiple cores and devices, making large model training significantly more scalable and performant. XLA can also automatically recognize and select the most optimal hand-written library implementation for your target backend, like cuDNN for CUDA chipsets. Otherwise, XLA can natively generate optimized code for performant execution.

We’ve been collaborating with NVIDIA on several exciting features and integrations that will further optimize LLMs for GPUs. We recently enabled collectives such as all-reduce to run in parallel to compute. This has resulted in a significant reduction in end to end latency for customers. Furthermore, we enabled support for bfloat16, which has resulted in compute gains of 4.5x over 32 bit floating point while retaining the same dynamic range of values.

Our joint efforts mean that XLA integrates even more deeply with NVIDIA’s AI tools and can better leverage NVIDIA’s suite of AI hardware optimized libraries. In Q1 2023, we will release a XLA-cuDNN Graph API integration, which provides customers with optimized fusion of convolution/matmul operations and multi-headed attention in transformers for improved use of memory and faster GPU kernel execution. As a result, overheads drop significantly and performance improves notably.

TensorFlow for GPU

TensorFlow recently released distributed tensors (or DTensors) to enable Tensor storage across devices like NVIDIA GPUs while allowing programs to manipulate them seamlessly. The goal of DTensor is to make parallelizing large-scale TensorFlow models across multiple devices easy, understandable, and fast. DTensors are a drop-in replacement for local TensorFlow tensors and scale well to large clusters. In addition, the DTensor project improves the underlying TensorFlow execution and communication primitives, and they are available for use today!

We are also collaborating with NVIDIA on several exciting new features in TensorFlow that leverage GPUs, including supporting the new FP8 datatype which should yield a significant improvement in training times for transformer models, when using the Hopper H100 GPU.

JAX for GPU

Google seeks to empower every developer with purpose-built tools for every step of the ML workflow. That includes TensorFlow for robust, production-ready models and JAX with highly optimized capabilities for cutting-edge research. We are pleased to announce the unique collaboration between NVIDIA and Google engineering teams to enhance TensorFlow and JAX for large deep-learning models, like LLMs. Both frameworks fully embrace NVIDIA A100 GPUs, and will support the recently-announced H100 GPUs in the future.

One of the key advantages of JAX is the ease of achieving superior hardware utilization with industry-leading FLOPs across the accelerators. Through our collaboration with NVIDIA, we are translating these advantages to GPU using some XLA compiler magic. Specifically, we are leveraging XLA for operator fusion, improving GSPMD for GPU to support generalized data and model parallelism and optimizing for cross-host NVLink.

Future Plans

NVIDIA and Google are pleased with all the progress shared in this post, and are excited to hear from community members about their experience using TensorFlow and JAX, by leveraging the power of XLA for Ampere (A100) and Hopper (H100) GPUs.

Check out the release notes for more information. To stay up to date, you can read the TensorFlow blog, follow twitter.com/tensorflow, or subscribe to youtube.com/tensorflow. If you’ve built something you’d like to share, please submit it for our Community Spotlight at goo.gle/TFCS. For feedback, please file an issue on GitHub or post to the TensorFlow Forum.

TensorFlow is also available in the NVIDIA GPU Cloud (NGC) as a docker container that contains a validated set of libraries that enable and optimize GPU performance, with JAX NGC container coming soon later this year.

Thank you!

Contributors: Frederic Bastien (NVIDIA), Abhishek Ratna (Google), Sean Lee (NVIDIA), Nathan Luehr (NVIDIA), Ayan Moitra (NVIDIA), Yash Katariya (Google), Peter Hawkins (Google), Skye Wanderman-Milne (Google), David Majnemer (Google), Stephan Herhut (Google), George Karpanov (Google), Mahmoud Soliman (NVIDIA), Yuan Lin (NVIDIA), Vartika Singh (NVIDIA), Vinod Grover (NVIDIA), Pooya Jannaty (NVIDIA), Paresh Kharya (NVIDIA), Santosh Bhavani (NVIDIA)

Read More

September Machine Learning Updates

September Machine Learning Updates

Posted by the TensorFlow team

On September 14, at the Google Developers Summit in Shanghai, China, members of Google’s open-source ML teams will be on stage to talk about updates to our growing ecosystem, and we’d love to share them here with you.

MediaPipe Studio

We recognize that creating and productionizing custom on-device ML solutions can be challenging, so we’re reinventing how you develop them by leveraging simple-to-use abstraction APIs and no-code GUIs. We’re excited to give you a sneak peek at MediaPipe Studio, our low-code and no-code solution that gets you from data to modeling to deployment on Android or iOS with native code integration libraries that make it easy to build ML-powered apps.

General Availability of TensorFlow Lite in Google Play Services

We recently launched the general availability of TensorFlow Lite in Google Play services. With this, the TensorFlow Lite runtime is automatically managed and updated by Google Play services, meaning you no longer need to ship it as part of your application. Your apps get smaller, and you get regular updates in the background, so your users will always have the latest version. This is nice for you as an app developer, because your user will get updates and bug fixes to the framework automatically, reducing the burden on you to provide them. And TensorFlow Lite in Google Play Services is production ready, already running over 100 billion daily inferences.

Tensor Projects

At Google, we are creating a world-class family of ML tools across all hardware and device types. Because we are committed to building tools that are fit for purpose, from cutting-edge research to tried-and-true planet-scale deployments, we are sharing our vision of an open ML ecosystem of the future: Tensor Projects.

Tensor Projects is an ecosystem of ML technologies and platforms that bring together Google’s ML tools, and organize efforts across our world-class engineering and research teams. It creates a space and a promise of continued innovation and support to enable researchers, developers, MLOps, and business teams to build responsible and cutting edge ML, from novel model development to scaled production ML in any data center or on any device.

These tools, like TensorFlow, Keras, JAX, and MediaPipe Studio, will work well independently, with each other, and/or with other industry-leading tools and standards. We want to give you full flexibility and choice to build powerful, performant infrastructure for all of your ML use cases. And it’s just the beginning. Tensor Projects will evolve and grow as ML continues to advance. Watch the summary video here:

   

Updates to Tensorflow.org

We have an updated experience on tensorflow.org for new or advanced users to easily find resources. You can quickly identify the right TensorFlow tool for your task, explore pre-built artifacts for faster model creation, find ideas and inspiration, get involved in the community, discover quick start guides for common scenarios and much more.

PyTorch Foundation

We believe in the power of choice for ML developers and continue to invest resources to make it easy to train, deploy and manage models. Our investment intends to bring machine learning to every developer’s toolbox and covers a broad spectrum of offerings: from TensorFlow and Keras, which provide free and open source offerings to millions of developers, allowing them to succeed with ML, and to JAX, which empowers researchers across Alphabet.

Additionally, in the spirit of openness, we support PyTorch developers with Cloud TPU using XLA. To continue to help all developers succeed with Google Cloud, and to better position Google to make meaningful contributions to the community, we’re delighted to announce our role as a founding member of the newly formed PyTorch Foundation. As a member of the board, we will deepen our open source investment to deliver on the Foundation’s mission to drive the adoption of AI and ML through open source platforms.

Thank you for reading! To stay up to date, you can read the TensorFlow blog, follow twitter.com/tensorflow, or subscribe to youtube.com/tensorflow.

Read More

Content moderation using machine learning: the server-side part

Content moderation using machine learning: the server-side part

Posted by Jen Person, Senior Developer Relations Engineer, TensorFlow

Welcome to part 2 of my dual approach to content moderation! In this post, I show you how to implement content moderation using machine learning in a server-side environment. If you’d like to see how to implement this moderation client-side, check out part 1.

Remind me: what are we doing here again?

In short, anonymity can create some distance between people in a way that allows them to say things they wouldn’t say in person. That is to say, there are tons of trolls out there. And let’s be honest: we’ve all typed something online we wouldn’t actually say IRL at least once! Any website that takes public text input can benefit from some form of moderation. Client-side moderation has the benefit of instant feedback, but server-side moderation cannot be bypassed like client-side might, so I like to have both.

This project picks up where part 1 left off, but you can also start here with a fresh copy of the Firebase Text Moderation demo code. The website in the Firebase demo showcases content moderation through a basic guestbook using a server-side content moderation system implemented through a Realtime Database-triggered Cloud Function. This means that the guestbook data is stored in the Firebase Realtime Database, a NoSQL database. The Cloud Function is triggered whenever data is written to a certain area of the database. We can choose what code runs when that event is triggered. In our case, we will use the Text Toxicity Classifier model to determine if the text written to the database is inappropriate, and then remove it from the database if needed. With this model, you can evaluate text on different labels of unwanted content, including identity attacks, insults, and obscenity. You can try out the demo to see the classifier in action.

If you prefer to start at the end, you can follow along in a completed version of the project on GitHub.

Server-side moderation

The Firebase text moderation example I used as my starting point doesn’t include any machine learning. Instead, it checks for the presence of profanity from a list of words and then replaces them with asterisks using the bad-words npm package. I thought about blending this approach with machine learning (more on that later), but I decided to just wipe the slate clean and replace the code of the Cloud Function altogether. Start by navigating to the Cloud Functions folder of the Text Moderation example:

cd textmoderation/functions

Open index.js and delete its contents. In index.js, add the following code:

const functions = require(‘firebase-functions’);

const toxicity = require(‘@tensorflow-models/toxicity’);


exports.moderator = functions.database.ref(‘/messages/{messageId}’).onCreate(async (snapshot, context) => {

  const message = snapshot.val();


  // Verify that the snapshot has a value

  if (!message) { 

    return;

  }

  functions.logger.log(‘Retrieved message content: ‘, message);


  // Run moderation checks on the message and delete if needed.

  const moderateResult = await moderateMessage(message.text);

  functions.logger.log(

    ‘Message has been moderated. Does message violate rules? ‘,

    moderateResult

  );

});

This code runs any time a message is added to the database. It gets the text of the message, and then passes it to a function called `moderateResult`. If you’re interested in learning more about Cloud Functions and the Realtime Database, then check out the Firebase documentation.

Add the Text Toxicity Classifier model

Depending on your development environment, you probably have some sort of error now since we haven’t actually written a function called moderateMessage yet. Let’s fix that. Below your Cloud Function trigger function, add the following code:

exports.moderator = functions.database.ref(‘/messages/{messageId}’).onCreate(async (snapshot, context) => {

        //…

        // Your other function code is here.

});


async function moderateMessage(message) {

  const threshold = 0.9;


  let model = await toxicity.load(threshold);


  const messages = [message];


  let predictions = await model.classify(messages);


  for (let item of predictions) {

    for (let i in item.results) {

      if (item.results[i].match === true) {

        return true;

      }

    }

  }

  return false;

}

This function does the following:

  1. Sets the threshold for the model to 0.9. The threshold of the model is the minimum prediction confidence you want to use to set the model’s predictions to true or false–that is, how confident the model is that the text does or does not contain the given type of toxic content. The scale for the threshold is 0-1.0. In this case, I set the threshold to .9, which means the model will predict true or false if it is 90% confident in its findings.
  2. Loads the model, passing the threshold. Once loaded, it sets toxicity_model to the model` value.
  3. Puts the message into an array called messages, as an array is the object type that the classify function accepts.
  4. Calls classify on the messages array.
  5. Iterates through the prediction results. predictions is an array of objects each representing a different language label. You may want to know about only specific labels rather than iterating through them all. For example, if your use case is a website for hosting the transcripts of rap battles, you probably don’t want to detect and remove insults.
  6. Checks if the content is a match for that label. if the match value is true, then the model has detected the given type of unwanted language. If the unwanted language is detected, the function returns true. There’s no need to keep checking the rest of the results, since the content has already been deemed inappropriate.
  7. If the function iterates through all the results and no label match is set to true, then the function returns false – meaning no undesirable language was found. The match label can also be null. In that case, its value isn’t true, so it’s considered acceptable language. I will talk more about the null option in a future post.
If you completed part 1 of this tutorial, then these steps probably sound familiar. The server-side code is very similar to the client-side code. This is one of the things that I like about TensorFlow.js: it’s often straightforward to transition code from the client to server and vice versa.

Complete the Cloud Functions code

Back in your Cloud Function, you now know that based on the code we wrote for moderateMessage, the value of moderateResult will be true or false: true if the message is considered toxic by the model, and false if it does not detect toxicity with certainty greater than 90%. Now add code to delete the message from the database if it is deemed toxic:

  // Run moderation checks on the message and delete if needed.

  const moderateResult = await moderateMessage(message.text);

  functions.logger.log(

    ‘Message has been moderated. Does message violate rules? ‘,

    moderateResult

  );


  if (moderateResult === true) {

    var modRef = snapshot.ref;

    try {

      await modRef.remove();

    } catch (error) {

      functions.logger.error(‘Remove failed: ‘ + error.message);

    }

  }

This code does the following:

  1. Checks if moderateResult is true, meaning that the message written to the guestbook is inappropriate.
  2. If the value is true, it removes the data from the database using the remove function from the Realtime Database SDK.
  3. Logs an error if one occurs.

Deploy the code

To deploy the Cloud Function, you can use the Firebase CLI. If you don’t have it, you can install it using the following npm command:

npm install g firebasetools

Once installed, use the following command to log in:

firebase login

Run this command to connect the app to your Firebase project:

firebase use add

From here, you can select your project in the list, connect Firebase to an existing Google Cloud project, or create a new Firebase project.
Once the project is configured, use the following command to deploy your Cloud Function:

firebase deploy

Once deployment is complete, the logs include the link to your hosted guestbook. Write some guestbook entries. If you followed part 1 of the blog, you will need to either delete the moderation code from the website and deploy again, or manually add guestbook entries to the Realtime Database in the Firebase console.

You can view your Cloud Functions logs in the Firebase console.

Building on the example

I have a bunch of ideas for ways to build on this example. Here are just a few. Let me know which ideas you would like to see me build, and share your suggestions as well! The best ideas come from collaboration.

Get a queue

I mentioned that the “match” value of a language label can be true, false, or null without going into detail on the significance of the null value. If the label is null, then the model cannot determine if the language is toxic within the given threshold. One way to limit the number of null values is to lower this threshold. For example, if you change the threshold value to 0.8, then the model will label the match value as true if it is at least 80% certain that the text contains language that fits the label. My website example assigns labels of value null the same as those labeled false, allowing that text through the filter. But since the model isn’t sure if that text is appropriate, it’s probably a good idea to get some eyes on it. You could add these posts to a queue for review, and then approve or deny them as needed. I said “you” here, but I guess I mean “me”. If you think this would be an interesting use case to explore, let me know! I’m happy to write about it if it would be useful.

What’s in ‘store

The Firebase moderation sample that I used as the foundation of my project uses Realtime Database. I prefer to use Firestore because of its structure, scalability, and security. Firestore’s structure is well suited for implementing a queue because I could have a collection of posts to review within the collection of posts. If you’d like to see the website using Firestore, let me know.

Don’t just eliminate – moderate!

One of the things I like about the original Firebase moderation sample is that it sanitizes the text rather than just deleting the post. You could run text through the sanitizer before checking for toxic language through the text toxicity model. If the sanitized text is deemed appropriate, then it could overwrite the original text. If it still doesn’t meet the standards of decent discourse, then you could still delete it. This might save some posts from otherwise being deleted.

What’s in a name?

You’ve probably noticed that my moderation functionality doesn’t extend to the name field. This means that even a halfway-clever troll could easily get around the filter by cramming all of their expletives into that name field. That’s a good point and I trust that you will use some type of moderation on all fields that users interact with. Perhaps you use an authentication method to identify users so they aren’t provided a field for their name. Anyway, you get it: I didn’t add moderation to the name field, but in a production environment, you definitely want moderation on all fields.

Build a better fit

When you test out real-world text samples on your website, you might find that the text toxicity classifier model doesn’t quite fit your needs. Since each social space is unique, there will be specific language that you are looking to include and exclude. You can address these needs by training the model on new data that you provide.

If you enjoyed this article and would like to learn more about TensorFlow.js, then there are a ton of things you can do:

Read More

Announcing TensorFlow Official Build Collaborators

Announcing TensorFlow Official Build Collaborators

Posted by Rostam Dinyari, Nitin Srinivasan, Douglas Yarrington and Rishika Sinha of the TensorFlow team

Starting with TensorFlow 2.10, we are excited to announce our collaboration with Intel, AWS, ARM, and Linaro to develop official TensorFlow builds. This means that when you pip install TensorFlow on Windows Native and Linux Aarch64 hosts, you will receive a build of TensorFlow that has been reviewed and vetted by these platform experts. This happens transparently, and there are no changes to your workflow . We’ve updated the pip install scripts so it’s automatic for you.

Official builds are TensorFlow releases that follow rigorous functional and performance testing standards Google engineers and our collaborators publish with each release, which we align with our published support expectations under the SIG Build forum. Collaborators monitor the builds daily and publish artifacts to the community in coordination with the overall TensorFlow release schedule.

For the majority of use cases, there will be no changes to the behavior of pip install or pip uninstall TensorFlow. However, for Windows Native and Linux Aarch64 based systems an additional pip uninstall step may be needed. You can find details about install, uninstall and other best practices on tensorflow.org/install/pip.

Over time, we expect the number of collaborators to expand but for now we want to share with you the progress we have made together to release increasingly performant and robust builds for these important platforms. You can learn more about each of the collaborations below.

Intel Collaboration

We are pleased to share that Intel has joined the 3P Official Build program to take ownership over Windows Native CPU builds. This will include responsibility for managing both nightly and final production releases. We and Intel do not expect this to disrupt end user experiences; users simply install TensorFlow as usual and the Intel produced Python binary artifacts (wheel files) will be correctly installed.

AWS, ARM and Linaro Collaboration

We are especially pleased to announce the availability of official builds for ARM Aarch64, specifically tuned for AWS Graviton instances. Together, the experts at Linaro have supported Google, AWS and ARM to ensure a highly performant version of TensorFlow is available on the emerging class of Aarch64 devices.

Next steps

These changes should be transparent for most users. You can learn more at tensorflow.org/install.

Read More

Announcing TensorFlow Lite in Google Play Services General Availability

Announcing TensorFlow Lite in Google Play Services General Availability

Posted by Bernhard Bauer and Terry Heo, Software Engineers, Google

Today we’re excited to announce that the Google Play services API for TensorFlow Lite is generally available on Android devices. We recommend this distribution as the path to adding custom machine learning to your apps. Last year, we launched a public beta of TensorFlow Lite in Google Play services at Google I/O. Since then, we’ve received lots of feedback and made improvements to the API. Most recently, we added the GPU delegate and Task Library support. Today we’re moving from beta to general availability on billions of Android devices globally.

TensorFlow Lite in Google Play services is already used by Google teams, including ML Kit, serving over a billion monthly active users and running more than 100 billion daily inferences.

TensorFlow Lite is an inference runtime optimized for mobile devices, and now that it’s part of Google Play services, it helps you deliver better ML experiences because it:

  • Reduces your app size by up to 5 MB compared to statically bundling TensorFlow Lite with your app
  • Uses the same API as available when bundling TF Lite into your app
  • Receives regular performance updates in the background so it’s always getting better automatically

Get started by learning how to add TensorFlow Lite in Google Play Services to your Android app.Read More

What’s new in TensorFlow 2.10?

What’s new in TensorFlow 2.10?

Posted by the TensorFlow Team

TensorFlow 2.10 has been released! Highlights of this release include user-friendly features in Keras to help you develop transformers, deterministic and stateless initializers, updates to the optimizers API, and new tools to help you load audio data. We’ve also made performance enhancements with oneDNN, expanded GPU support on Windows, and more. This release also marks TensorFlow Decision Forests 1.0! Read on to learn more.

Keras

Expanded, unified mask support for Keras attention layers

Starting from TensorFlow 2.10, mask handling for Keras attention layers, such as tf.keras.layers.Attention, tf.keras.layers.AdditiveAttention, and tf.keras.layers.MultiHeadAttention have been expanded and unified. In particular, we’ve added two features:

Causal attention: All three layers now support a use_causal_mask argument to call (Attention and AdditiveAttention used to take a causal argument to __init__).

Implicit masking: Keras AttentionAdditiveAttention, and MultiHeadAttention layers now support implicit masking  (set mask_zero=True in tf.keras.layers.Embedding).

Combined, this simplifies the implementation of any Transformer-style model since getting the masking right is often a tricky part.

A basic Transformer self-attention block can now be written as:

import tensorflow as tf


embedding = tf.keras.layers.Embedding(

    input_dim=10,

    output_dim=3,

    mask_zero=True) # Infer a correct padding mask.


# Instantiate a Keras multi-head attention (MHA) layer,

# a layer normalization layer, and an `Add` layer object.

mha = tf.keras.layers.MultiHeadAttention(key_dim=4, num_heads=1)

layernorm = tf.keras.layers.LayerNormalization()

add = tf.keras.layers.Add()


# Test input.

x = tf.constant([[1, 2, 3, 4, 5, 0, 0, 0, 0],

                 [1, 2, 1, 0, 0, 0, 0, 0, 0]])

# The embedding layer sets the mask.

x = embedding(x)


# The MHA layer uses and propagates the mask.

a = mha(query=x, key=x, value=x, use_causal_mask=True)

x = add([x, a]) # The `Add` layer propagates the mask.

x = layernorm(x)


# The mask made it through all layers.

print(x._keras_mask)

And here’s the output: 

> tf.Tensor(

> [[ True  True  True  True  True False False False False]

>  [ True  True  True False False False False False False]], shape=(2, > 9), dtype=bool)

Try out the new Keras Optimizers API

In the previous release, Tensorflow 2.9, we published a new version of the Keras Optimizer API, in tf.keras.optimizers.experimental, which will replace the current tf.keras.optimizers namespace in TensorFlow 2.11. To prepare for the upcoming formal switch of the optimizer namespace to the new API, we’ve also exported all of the current Keras optimizers under tf.keras.optimizers.legacy in TensorFlow 2.10.

Most users won’t be affected by this change, but please check the API doc to see if any API used in your workflow has changed. If you decide to keep using the old optimizer, please explicitly change your optimizer to corresponding tf.keras.optimizers.legacy.Optimizer.

You can also find more details about new Keras Optimizers in this article.

Deterministic and Stateless Keras initializers

In TensorFlow 2.10, we’ve made Keras initializers (the tf.keras.initializers API) stateless and deterministic, built on top of stateless TF random ops. Starting in TensorFlow 2.10, both seeded and unseeded Keras initializers will always generate the same values every time they are called (for a given variable shape). The stateless initializer enables Keras to support new features such as multi-client model training with DTensor.

init = tf.keras.initializers.RandomNormal()

a = init((3, 2))

b = init((3, 2))

# a == b


init_2 = tf.keras.initializers.RandomNormal(seed=1)

c = init_2((3, 2))

d = init_2((3, 2))

# c == d

# a != c


init_3 = tf.keras.initializers.RandomNormal(seed=1)

e = init_3((3, 2))

# e == c


init_4 = tf.keras.initializers.RandomNormal()

f = init_4((3, 2))

# f != a

For unseeded initializers (seed=None), a random seed will be created and assigned at initializer creation (different initializer instances get different seeds). An unseeded initializer will raise a warning if it is reused (called) multiple times. This is because it would produce the same values each time, which may not be intended.

BackupAndRestore checkpoints with step level granularity

In the previous release, Tensorflow 2.9, the tf.keras.callbacks.BackupAndRestore Keras callback would backup the model and training state at epoch boundaries. In Tensorflow 2.10, the callback can also backup the model every N training steps. However, keep in mind that when BackupAndRestore is used with tf.distribute.MultiWorkerMirroredStrategy, the distributed dataset iterator state will be reinitialized and won’t be restored when restoring the model. More information and code examples can be found in the migrate the fault tolerance mechanism guide.

Easily generate an audio classification dataset from a directory of audio files

You can now use a new utility, tf.keras.utils.audio_dataset_from_directory, to easily generate audio classification datasets from directories of .wav files. Just sort your audio files into one different directory per file class, and a single line of code will get you a labeled tf.data.Dataset you can pass to a Keras model. You can find an example here.

The EinsumDense layer is no longer experimental

The einsum function is the swiss army knife of linear algebra. It can efficiently and explicitly describe a wide variety of operations. The tf.keras.layers.EinsumDense layer brings some of that power to Keras.

Operations like einsumeinops.rearrange, and the EinsumDense layer operate based on a string “equation” that describes the axes of the inputs and outputs. For EinsumDense the equation lists the axes of the input argument, the axes of the weights, and the axes of the output. A basic Dense layer can be written as: 

dense = keras.layers.Dense(units=10, activation=’relu’)

dense = keras.layers.EinsumDense(‘…i, ij -> …j’, output_shape=(10,), activation=’relu’)

Notes:

  • …i – This only works on the last axis of the input, that axis is called i.
  • ij – The weights are a matrix with shape (ij).
  • …j – The result sums out the i axis and leaves j.

For example, here is a stack of 5 Dense layers with 10 units each:

dense = keras.layers.EinsumDense(‘…i, nij -> …nj’, output_shape=(5,10))

Here is a stack of Dense layers, where each one operates on a different input vector:

dense = keras.layers.EinsumDense(‘…ni, nij -> …nj’, output_shape=(5,10))

Here is a stack of Dense layers where each one operates on each input vector independently:

dense = keras.layers.EinsumDense(‘…ni, mij -> …nmj’, output_shape=(None, 5,10))

Performance and collaborations

Improved aarch64 CPU performance: ACL/oneDNN integration

We have worked with Arm, AWS, and Linaro to integrate Compute Library for the Arm® Architecture (ACL) with TensorFlow through oneDNN to accelerate performance on aarch64 CPUs. Starting with TensorFlow 2.10, you can try these experimental optimizations by setting the environment variable TF_ENABLE_ONEDNN_OPTS=1 before running your TensorFlow program.

There may be slightly different numerical results due to different computation and floating-point round-off approaches. If this causes issues for you, turn the optimizations off by setting TF_ENABLE_ONEDNN_OPTS=0 before running your program.

To verify that the optimizations are on, look for a message beginning with “oneDNN custom operations are on” in your program log. We welcome feedback on GitHub and the TensorFlow Forum.

Expanded GPU support on Windows

TensorFlow can now leverage a wider range of GPUs on Windows through the TensorFlow-DirectML plug-in. To enable model training on DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm, install the plug-in alongside standard TensorFlow CPU packages on native Windows or WSL2. The preview package currently supports a limited number of basic machine learning models, with a goal to increase model coverage in the future. You can view the open-source code and leave feedback at the TensorFlow-DirectML GitHub repository.

New features in tf.data

Create tf.data Dataset from lists of elements

Tensorflow 2.10 introduces a convenient new experimental API tf.data.experimental.from_list which creates a tf.data.Dataset comprising the given list of elements. The returned dataset will produce the items in the list one by one. The functionality is identical to tf.data.Dataset.from_tensor_slices when elements are scalars, but different when elements have structure.

Consider the following example:

dataset = tf.data.experimental.from_list([(1, ‘a’), (2, ‘b’), (3, ‘c’)])

list(dataset.as_numpy_iterator())

[(1, ‘a’), (2, ‘b’), (3, ‘c’)]

In contrast, to get the same output with `from_tensor_slices`, the data needs to be reorganized:

dataset = tf.data.Dataset.from_tensor_slices(([1, 2, 3], [‘a’, ‘b’, ‘c’]))

list(dataset.as_numpy_iterator())

[(1, ‘a’), (2, ‘b’), (3, ‘c’)]

Unlike the from_tensor_slices method, from_list supports non-rectangular input (achieving the same with from_tensor_slices requires the use of ragged tensors).

Sharing tf.data service with concurrent trainers

If you run multiple trainers concurrently using the same training data, it could save resources to cache the data in one tf.data service cluster and share the cluster with the trainers. For example, if you use Vizier to tune hyperparameters, the Vizier jobs can run concurrently and share one tf.data service cluster.

To enable this feature, each trainer needs to generate a unique trainer ID, and you pass the trainer ID to tf.data.experimental.service.distribute. Once a job has consumed the data, the data remains in the cache and is re-used by jobs with different trainer_ids. Requests with the same trainer_id do not re-use data. For example:

dataset = expensive_computation()

dataset = dataset.apply(tf.data.experimental.service.distribute(

processing_mode=tf.data.experimental.service.ShardingPolicy.OFF,

service=FLAGS.tf_data_service_address,

job_name=”job”,

cross_trainer_cache=data_service_ops.CrossTrainerCache(

trainer_id=trainer_id())))

 tf.data service uses a sliding-window cache to store shared data. When one trainer consumes data, the data remains in the cache. When other trainers need data, they can get data from the cache instead of repeating the expensive computation. The cache has a bounded size, so some workers may not read the full dataset. To ensure all the trainers get sufficient training data, we require the input dataset to be infinite. This can be achieved, for example, by repeating the dataset and performing random augmentation on the training instances.

TensorFlow Decision Forests 1.0

In conjunction with the release of Tensorflow 2.10, Tensorflow Decision Forests (TF-DF) reaches version 1.0. With this milestone we want to communicate more broadly that Tensorflow Decision Forests has become a more stable and mature library. We’ve improved our documentation and established more comprehensive testing to make sure that TF-DF is ready for professional environments.

The new release of TF-DF also offers a first look at the Javascript and Go APIs for inference of TF-DF models. While these APIs are still in beta, we are actively looking for feedback for them. TF-DF 1.0 improves performance of oblique splits. Oblique splits allow decision trees to express more complex patterns by conditioning on multiple features at the same time – learn more in our Decision Forests class on developers.google.com. Benchmarks and real-world observations show that oblique splits outperform classical axis-aligned splits on the majority of datasets. Finally, the new release includes our latest bug fixes.

Next steps

Check out the release notes for more information. To stay up to date, you can read the TensorFlow blog, follow twitter.com/tensorflow, or subscribe to youtube.com/tensorflow. If you’ve built something you’d like to share, please submit it for our Community Spotlight at goo.gle/TFCS. For feedback, please file an issue on GitHub or post to the TensorFlow Forum. Thank you!

Read More