Human-centred mechanism design with Democratic AI

In our recent paper, published in Nature Human Behaviour, we provide a proof-of-concept demonstration that deep reinforcement learning (RL) can be used to find economic policies that people will vote for by majority in a simple game. The paper thus addresses a key challenge in AI research – how to train AI systems that align with human values.Read More

Human-centred mechanism design with Democratic AI

In our recent paper, published in Nature Human Behaviour, we provide a proof-of-concept demonstration that deep reinforcement learning (RL) can be used to find economic policies that people will vote for by majority in a simple game. The paper thus addresses a key challenge in AI research – how to train AI systems that align with human values.Read More

Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models

Controllable generative sequence models with the capability to extract and replicate the style of specific examples enable many applications, including narrating audiobooks in different voices, auto-completing and auto-correcting written handwriting, and generating missing training samples for downstream recognition tasks. However, under an unsupervised-style setting, typical training algorithms for controllable sequence generative models suffer from the training-inference mismatch, where the same sample is used as content and style input during training but unpaired samples are given during…Apple Machine Learning Research

Dynamic Memory for Interpretable Sequential Optimisation

Real-world applications of reinforcement learning for recommendation and experimentation faces a practical challenge: the relative reward of different bandit arms can evolve over the lifetime of the learning agent. To deal with these non-stationary cases, the agent must forget some historical knowledge, as it may no longer be relevant to minimise regret. We present a solution to handling non-stationarity that is suitable for deployment at scale, to provide business operators with automated adaptive optimisation. Our solution aims to provide interpretable learning that can be trusted by humans…Apple Machine Learning Research