Learning by playing

Getting children (and adults) to tidy up after themselves can be a challenge, but we face an even greater challenge trying to get our AI agents to do the same. Success depends on the mastery of several core visuo-motor skills: approaching an object, grasping and lifting it, opening a box and putting things inside of it. To make matters more complicated, these skills must be applied in the right sequence.Control tasks, like tidying up a table or stacking objects, require an agent to determine how, when and where to coordinate the nine joints of its simulated arms and fingers to move correctly and achieve its objective. The sheer number of possible combinations of movements at any given time, along with the need to carry out a long sequence of correct actions constitute a serious exploration problemmaking this a particularly interesting area for reinforcement learning research.Techniques like reward shaping, apprenticeship learning or learning from demonstrations can help with the exploration problem. However, these methods rely on a considerable amount of knowledge about the taskthe problem of learning complex control problems from scratch with minimal prior knowledge is still an open challenge.Our new paper proposes a new learning paradigm called Scheduled Auxiliary Control (SAC-X) which seeks to overcome this exploration issue.Read More

Researching patient deterioration with the US Department of Veterans Affairs

Were excited to announce a medical research partnership with the US Department of Veterans Affairs (VA), one of the worlds leading healthcare organisations responsible for providing high-quality care to veterans and their families across the United States.This project will see us analyse patterns from historical, depersonalised medical records to predict patient deterioration.Patient deterioration is a significant global health problem that often has fatal consequences. Studies estimate that 11% of all in-hospital deaths are due to patient deterioration not being recognised early enough or acted on in the right way.Alongside world-renowned clinicians and researchers at the VA, we are analysing patterns from approximately 700,000 historical, depersonalised medical records in order to determine if machine learning can accurately identify the risk factors for patient deterioration and correctly predict its onset.Were focusing on Acute Kidney Injury (AKI), one of the most common conditions associated with patient deterioration, and an area where DeepMind and the VA both have expertise. This is a complex challenge, because predicting AKI is far from easy. Not only is the onset of AKI sudden and often asymptomatic, but the risk factors associated with it are commonplace throughout hospitals.Read More

Scalable agent architecture for distributed training

Deep Reinforcement Learning (DeepRL) has achieved remarkable success in a range of tasks, from continuous control problems in robotics to playing games like Go and Atari. The improvements seen in these domains have so far been limited to individual tasks where a separate agent has been tuned and trained for each task.In our most recent work, we explore the challenge of training a single agent on many tasks.Today we are releasing DMLab-30, a set of new tasks that span a large variety of challenges in a visually unified environment with a common action space.Training an agent to perform well on many tasks requires massive throughput and making efficient use of every data point. To this end, we have developed a new, highly scalable agent architecture for distributed training called Importance Weighted Actor-Learner Architecture that uses a new off-policy correction algorithm called V-trace.DMLab-30DMLab-30 is a collection of new levels designed using our open source RL environment DeepMind Lab. These environments enable any DeepRL researcher to test systems on a large spectrum of interesting tasks either individually or in a multi-task setting.Read More