Going beyond average for reinforcement learning

Consider the commuter who toils backwards and forwards each day on a train. Most mornings, her train runs on time and she reaches her first meeting relaxed and ready. But she knows that once in awhile the unexpected happens: a mechanical problem, a signal failure, or even just a particularly rainy day. Invariably these hiccups disrupt her pattern, leaving her late and flustered.Randomness is something we encounter everyday and has a profound effect on how we experience the world. The same is true in reinforcement learning (RL) applications, systems that learn by trial and error and are motivated by rewards. Typically, an RL algorithm predicts the average reward it receives from multiple attempts at a task, and uses this prediction to decide how to act. But random perturbations in the environment can alter its behaviour by changing the exact amount of reward the system receives.Ina new paper, we show it is possible to model not only the average but also the full variation of this reward, what we call the value distribution.Read More

Agents that imagine and plan

Imagining the consequences of your actions before you take them is a powerful tool of human cognition. When placing a glass on the edge of a table, for example, we will likely pause to consider how stable it is and whether it might fall. On the basis of that imagined consequence we might readjust the glass to prevent it from falling and breaking. This form of deliberative reasoning is essentially imagination, it is a distinctly human ability and is a crucial tool in our everyday lives.Read More

Imagine this: Creating new visual concepts by recombining familiar ones

Around two and a half thousand years ago a Mesopotamian trader gathered some clay, wood and reeds and changed humanity forever. Over time, their abacus would allow traders to keep track of goods and reconcile their finances, allowing economics to flourish.But that moment of inspiration also shines a light on another astonishing human ability: our ability to recombine existing concepts and imagine something entirely new. The unknown inventor would have had to think of the problem they wanted to solve, the contraption they could build and the raw materials they could gather to create it. Clay could be moulded into a tablet, a stick could be used to scratch the columns and reeds can act as counters. Each component was familiar and distinct, but put together in this new way, they formed something revolutionary.This idea of compositionality is at the core of human abilities such as creativity, imagination and language-based communication. Equipped with just a small number of familiar conceptual building blocks, we are able to create a vast number of new ones on the fly.Read More

Producing flexible behaviours in simulated environments

The agility and flexibility of a monkey swinging through the trees or a football player dodging opponents and scoring a goal can be breathtaking. Mastering this kind of sophisticated motor control is a hallmark of physical intelligence, and is a crucial part of AI research. True motor intelligence requires learning how to control and coordinate a flexible body to solve tasks in a range of complex environments. Existing attempts to control physically simulated humanoid bodies come from diverse fields, including computer animation and biomechanics. A trend has been to use hand-crafted objectives, sometimes with motion capture data, to produce specific behaviors. However, this may require considerable engineering effort, and can result in restricted behaviours or behaviours that may be difficult to repurpose for new tasks.In three new papers, we seek ways to produce flexible and natural behaviours that can be reused and adapted to solve tasks.Read More

Independent Reviewers release first annual report on DeepMind Health

Today, a panel of Independent Reviewers has published itsfirst annual reportinto DeepMind Health. As I wrote in the foreword to their report (written, I add, before Id read it):We chose people who had specific expertise but also reputations for integrity, who did not hold back, who could be angry and critical Thats good for us and makes us better.The panel is made up of experts in their fields who were given full access to our work to carry out their review – a very unusual process for a tech company, but one that we hope will significantly increase scrutiny of our work and ultimately help us get it right. We are grateful for their and honesty, thoughtfulness, and the time they have spent on this complex task. You can read their full report here.As a result of this process, DeepMind Health has committed to a series of changes to our work and practices to try to set higher standards in our second year. We know we need to work harder to be responsive and accountable to the needs of a far greater cross-section of medicine and society. This includes significantly improving our work with patients and the public, and continuing on the path of greater engagement with Royal Colleges, professional bodies and many other groups in the NHS community.Read More

The Information Commissioner, the Royal Free, and what we’ve learned

Today, dozens of people in UK hospitals will die preventably from conditions like sepsis and acute kidney injury (AKI) when their warning signs aren’t picked up and acted on in time. To help address this, we built the Streams app with clinicians at the Royal Free London NHS Foundation Trust, using mobile technology to automatically review test results for serious issues starting with AKI. If one is found, Streams sends a secure smartphone alert to the right clinician, along with information about previous conditions so they can make an immediate diagnosis.Were proud that, within a few weeks of Streams being deployed at the Royal Free, nurses said that it was saving them up to two hours each day, and we’ve already heard examples of patients with serious conditions being seen more quickly thanks to the instant alerts. Because Streams is designed to be ready for more advanced technology in the future, including AI-powered clinical alerts, we hope that it will help bring even more benefits to patients and clinicians in time. The Information Commissioner (ICO) hasnow concluded a year-long investigation that focused on the Royal Frees clinical testing of Streams in late 2015 and 2016, which was intended to guarantee that the service could be deployed safely at the hospital.Read More