Identifying and eliminating bugs in learned predictive models

One in a series of posts explaining the theories underpinning our research. Bugs and software have gone hand in hand since the beginning of computer programming. Over time, software developers have established a set of best practices for testing and debugging before deployment, but these practices are not suited for modern deep learning systems. Today, the prevailing practice in machine learning is to train a system on a training data set, and then test it on another set. While this reveals the average-case performance of models, it is also crucial to ensure robustness, or acceptably high performance even in the worst case. In this article, we describe three approaches for rigorously identifying and eliminating bugs in learned predictive models: adversarial testing, robust learning, and formal verification.Machine learning systems are not robust by default. Even systems that outperform humans in a particular domain can fail at solving simple problems if subtle differences are introduced. For example, consider the problem of image perturbations: a neural network that can classify images better than a human can be easily fooled into believing that sloth is a race car if a small amount of carefully calculated noise is added to the input image.Read More

TF-Replicator: Distributed Machine Learning for Researchers

At DeepMind, the Research Platform Team builds infrastructure to empower and accelerate our AI research. Today, we are excited to share how we developed TF-Replicator, a software library that helps researchers deploy their TensorFlow models on GPUs and Cloud TPUs with minimal effort and no previous experience with distributed systems. TF-Replicators programming model has now been open sourced as part of TensorFlows tf.distribute.Strategy. This blog post gives an overview of the ideas and technical challenges underlying TF-Replicator. For a more comprehensive description, please read our arXiv paper.A recurring theme in recent AI breakthroughs – from AlphaFold to BigGAN to AlphaStar – is the need for effortless and reliable scalability. Increasing amounts of computational capacity allow researchers to train ever-larger neural networks with new capabilities. To address this, the Research Platform Team developed TF-Replicator, which allows researchers to target different hardware accelerators for Machine Learning, scale up workloads to many devices, and seamlessly switch between different types of accelerators.Read More