Creating Interactive Agents with Imitation Learning

We show that imitation learning of human-human interactions in a simulated world, in conjunction with self-supervised learning, is sufficient to produce a multimodal interactive agent, which we call MIA, that successfully interacts with non-adversarial humans 75% of the time. We further identify architectural and algorithmic techniques that improve performance, such as hierarchical action selection.Read More

Creating Interactive Agents with Imitation Learning

We show that imitation learning of human-human interactions in a simulated world, in conjunction with self-supervised learning, is sufficient to produce a multimodal interactive agent, which we call MIA, that successfully interacts with non-adversarial humans 75% of the time. We further identify architectural and algorithmic techniques that improve performance, such as hierarchical action selection.Read More

Language modelling at scale: Gopher, ethical considerations, and retrieval

Language, and its role in demonstrating and facilitating comprehension – or intelligence – is a fundamental part of being human. It gives people the ability to communicate thoughts and concepts, express ideas, create memories, and build mutual understanding. These are foundational parts of social intelligence. It’s why our teams at DeepMind study aspects of language processing and communication, both in artificial agents and in humans.Read More

On the Expressivity of Markov Reward

Our main results prove that while reward can express many tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a reward function which allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists.Read More

Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons

Our brain has an amazing ability to process visual information. We can take one glance at a complex scene, and within milliseconds be able to parse it into objects and their attributes, like colour or size, and use this information to describe the scene in simple language. Underlying this seemingly effortless ability is a complex computation performed by our visual cortex, which involves taking millions of neural impulses transmitted from the retina and transforming them into a more meaningful form that can be mapped to the simple language description. In order to fully understand how this process works in the brain, we need to figure out both how the semantically meaningful information is represented in the firing of neurons at the end of the visual processing hierarchy, and how such a representation may be learnt from largely untaught experience.Read More

Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons

Our brain has an amazing ability to process visual information. We can take one glance at a complex scene, and within milliseconds be able to parse it into objects and their attributes, like colour or size, and use this information to describe the scene in simple language. Underlying this seemingly effortless ability is a complex computation performed by our visual cortex, which involves taking millions of neural impulses transmitted from the retina and transforming them into a more meaningful form that can be mapped to the simple language description. In order to fully understand how this process works in the brain, we need to figure out both how the semantically meaningful information is represented in the firing of neurons at the end of the visual processing hierarchy, and how such a representation may be learnt from largely untaught experience.Read More