What’s new in TensorFlow 2.9?

Posted by Goldie Gadde and Douglas Yarrington for the TensorFlow team

TensorFlow 2.9 has been released! Highlights include performance improvements with oneDNN, and the release of DTensor, a new API for model distribution that can be used to seamlessly move from data parallelism to model parallelism

We’ve also made improvements to the core library, including Eigen and tf.function unification, deterministic behavior, and new support for Windows’ WSL2. Finally, we’re releasing new experimental APIs for tf.function retracing and Keras Optimizers. Let’s take a look at these new and improved features.

Improved CPU performance: oneDNN by default

We have worked with Intel to integrate the oneDNN performance library with TensorFlow to achieve top performance on Intel CPUs. Since TensorFlow 2.5, TensorFlow has had experimental support for oneDNN, which could provide up to a 4x performance improvement. In TensorFlow 2.9, we are turning on oneDNN optimizations by default on Linux x86 packages and for CPUs with neural-network-focused hardware features such as AVX512_VNNI, AVX512_BF16, AMX, and others, which are found on Intel Cascade Lake and newer CPUs.

Users running TensorFlow with oneDNN optimizations enabled might observe slightly different numerical results from when the optimizations are off. This is because floating-point round-off approaches and order differ, and can create slight errors. If this causes issues for you, turn the optimizations off by setting TF_ENABLE_ONEDNN_OPTS=0 before running your TensorFlow programs. To enable or re-enable them, set TF_ENABLE_ONEDNN_OPTS=1 before running your TensorFlow program. To verify that the optimizations are on, look for a message beginning with "oneDNN custom operations are on" in your program log. We welcome feedback on GitHub and the TensorFlow Forum.

Model parallelism with DTensor

DTensor is a new TensorFlow API for distributed model processing that allows models to seamlessly move from data parallelism to single program multiple data (SPMD) based model parallelism, including spatial partitioning. This means you have tools to easily train models where the model weights or inputs are so large they don’t fit on a single device. (If you are familiar with Mesh TensorFlow in TF1, DTensor serves a similar purpose.)

DTensor is designed with the following principles at its core:

  • A device-agnostic API: This allows the same model code to be used on CPU, GPU, or TPU, including models partitioned across device types.
  • Multi-client execution: Removes the coordinator and leaves each task to drive its locally attached devices, allowing scaling a model with no impact to startup time.
  • A global perspective vs. per-replica: Traditionally with TensorFlow, distributed model code is written around replicas, but with DTensor, model code is written from the global perspective and per replica code is generated and run by the DTensor runtime. Among other things, this means no uncertainty about whether batch normalization is happening at the global level or the per replica level.

We have developed several introductory tutorials on DTensor, from DTensor concepts to training DTensor ML models with Keras:

TraceType for tf.function

We have revamped the way tf.function retraces to make it simpler, predictable, and configurable.

All arguments of tf.function are assigned a tf.types.experimental.TraceType. Custom user classes can declare a TraceType using the Tracing Protocol (tf.types.experimental.SupportsTracingProtocol).

The TraceType system makes it easy to understand retracing rules. For example, subtyping rules indicate what type of arguments can be used with particular function traces. Subtyping also explains how different specific shapes are joined into a generic shape that is their supertype, to reduce the number of traces for a function.

To learn more, see the new APIs for tf.types.experimental.TraceType, tf.types.experimental.SupportsTracingProtocol, and the reduce_retracing parameter of tf.function.

Support for WSL2

The Windows Subsystem for Linux lets developers run a Linux environment directly on Windows, without the overhead of a traditional virtual machine or dual boot setup. TensorFlow now supports WSL2 out of the box, including GPU acceleration. Please see the documentation for more details about the requirements and how to install WSL2 on Windows.

Deterministic behavior

The API tf.config.experimental.enable_op_determinism makes TensorFlow ops deterministic.

Determinism means that if you run an op multiple times with the same inputs, the op returns the exact same outputs every time. This is useful for debugging models, and if you train your model from scratch several times with determinism, your model weights will be the same every time. Normally, many ops are non-deterministic due to the use of threads within ops which can add floating-point numbers in a nondeterministic order.

TensorFlow 2.8 introduced an API to make ops deterministic, and TensorFlow 2.9 improved determinism performance in tf.data in some cases. If you want your TensorFlow models to run deterministically, just add the following to the start of your program:

“`

tf.keras.utils.set_random_seed(1)

tf.config.experimental.enable_op_determinism()

“`

The first line sets the random seed for Python, NumPy, and TensorFlow, which is necessary for determinism. The second line makes each TensorFlow op deterministic. Note that determinism in general comes at the expense of lower performance and so your model may run slower when op determinism is enabled.

Optimized Training with Keras

In TensorFlow 2.9, we are releasing a new experimental version of the Keras Optimizer API, tf.keras.optimizers.experimental. The API provides a more unified and expanded catalog of built-in optimizers which can be more easily customized and extended.

In a future release, tf.keras.optimizers.experimental.Optimizer (and subclasses) will replace tf.keras.optimizers.Optimizer (and subclasses), which means that workflows using the legacy Keras optimizer will automatically switch to the new optimizer. The current (legacy) tf.keras.optimizers.* API will still be accessible via tf.keras.optimizers.legacy.*, such as tf.keras.optimizers.legacy.Adam.

Here are some highlights of the new optimizer class:

  • Incrementally faster training for some models.
  • Easier to write customized optimizers.
  • Built-in support for moving average of model weights (“Polyak averaging”).

For most users, you will need to take no action. But, if you have an advanced workflow falling into the following cases, please make corresponding changes:

Use Case 1: You implement a customized optimizer based on the Keras optimizer

For these works, please first check if it is possible to change your dependency to tf.keras.optimizers.experimental.Optimizer. If for any reason you decide to stay with the old optimizer (we discourage it), then you can change your optimizer to tf.keras.optimizers.legacy.Optimizer to avoid being automatically switched to the new optimizer in a later TensorFlow version.

Use Case 2: Your work depends on third-party Keras-based optimizers (such as tensorflow_addons)

Your work should run successfully as long as the library continues to support the specific optimizer. However, if the library maintainers fail to take actions to accommodate the Keras optimizer change, your work would error out. So please stay tuned with the third-party library’s announcement, and file a bug to Keras team if your work is broken due to optimizer malfunction.

Use Case 3: Your work is based on TF1

First of all, please try migrating to TF2. It is worth it, and may be easier than you think! If for any reason migration is not going to happen soon, then please replace your tf.keras.optimizers.XXX to tf.keras.optimizers.legacy.XXX to avoid being automatically switched to the new optimizer.

Use Case 4: Your work has customized gradient aggregation logic

Usually this means you are doing gradients aggregation outside the optimizer, and calling apply_gradients() with experimental_aggregate_gradients=False. We changed the argument name, so please change your optimizer to tf.keras.optimizers.experimental.Optimizer and set skip_gradients_aggregation=True. If it errors out after making this change, please file a bug to Keras team.

Use Case 5: Your work has direct calls to deprecated optimizer public APIs

Please check if your method call has a match here. change your optimizer to tf.keras.optimizers.experimental.Optimizer. If for any reason you want to keep using the old optimizer, change your optimizer to tf.keras.optimizers.legacy.Optimizer.

Next steps

Check out the release notes for more information. To stay up to date, you can read the TensorFlow blog, follow twitter.com/tensorflow, or subscribe to youtube.com/tensorflow. If you’ve built something you’d like to share, please submit it for our Community Spotlight at goo.gle/TFCS. For feedback, please file an issue on GitHub or post to the TensorFlow Forum. Thank you!

Read More