GAUDI: A Neural Architect for Immersive 3D Scene Generation

We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera. We tackle this challenging problem with a scalable yet powerful approach, where we first optimize a latent representation that disentangles radiance fields and camera poses. This latent representation is then used to learn a generative model that enables both unconditional and conditional generation of 3D scenes. Our model generalizes previous works that focus on single objects by removing the assumption that the camera pose…Apple Machine Learning Research

Integrating Categorical Features in End-To-End ASR

All-neural, end-to-end ASR systems gained rapid interest from the speech recognition community. Such systems convert speech input to text units using a single trainable neural network model. E2E models require large amounts of paired speech text data that is expensive to obtain. The amount of data available varies across different languages and dialects. It is critical to make use of all these data so that both low resource languages and high resource languages can be improved. When we want to deploy an ASR system for a new application domain, the amount of domain specific training data is…Apple Machine Learning Research

Reachability Embeddings: Self-Supervised Representation Learning from Spatiotemporal Motion Trajectories for Multimodal Geospatial Computer Vision

Self-supervised representation learning techniques utilize large datasets without semantic annotations to learn meaningful, universal features that can be conveniently transferred to solve a wide variety of downstream supervised tasks. In this paper, we propose a self-supervised method for learning representations of geographic locations from unlabeled GPS trajectories to solve downstream geospatial computer vision tasks. Tiles resulting from a raster representation of the earth’s surface are modeled as nodes on a graph or pixels of an image. GPS trajectories are modeled as allowed Markovian…Apple Machine Learning Research

NeuMan: Neural Human Radiance Field from a Single Video

Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences. We propose a novel framework to reconstruct the human and the scene that can be rendered with novel human poses and views from just a single in-the-wild video. Given a video captured by a moving camera, we train two NeRF models: a human NeRF model and a scene NeRF model. To train these models, we rely on existing methods to estimate the rough geometry of the human and the scene. Those rough geometry estimates allow us to create a warping field from the observation space to the canonical…Apple Machine Learning Research

ARtonomous: Introducing Middle School Students to Reinforcement Learning Through Virtual Robotics

Typical educational robotics approaches rely on imperative programming for robot navigation. However, with the increasing presence of AI in everyday life, these approaches miss an opportunity to introduce machine learning (ML) techniques grounded in an authentic and engaging learning context. Furthermore, the needs for costly specialized equipment and ample physical space are barriers that limit access to robotics experiences for all learners. We propose ARtonomous, a relatively low-cost, virtual alternative to physical, programming-only robotics kits. With ARtonomous, students employ…Apple Machine Learning Research

Optimal Algorithms for Mean Estimation under Local Differential Privacy

We study the problem of mean estimation of -bounded vectors under the constraint of local differential privacy. While the literature has a variety of algorithms that achieve the asymptotically optimal rates for this problem, the performance of these algorithms in practice can vary significantly due to varying (and often large) hidden constants. In this work, we investigate the question of designing the protocol with the smallest variance. We show that PrivUnit (Bhowmick et al. 2018) with optimized parameters achieves the optimal variance among a large family of locally private randomizers. To…Apple Machine Learning Research

Private Frequency Estimation via Projective Geometry

In this work, we propose a new algorithm ProjectiveGeometryResponse (PGR) for locally differentially private (LDP) frequency estimation. For a universe size of and with users, our -LDP algorithm has communication cost bits in the private coin setting and in the public coin setting, and has computation cost for the server to approximately reconstruct the frequency histogram, while achieving the state-of-the-art privacy-utility tradeoff. In many parameter settings used in practice this is a significant improvement over the $ O(n+k^2)$ computation cost that is achieved by the recent…Apple Machine Learning Research

Position Prediction as an Effective Pre-training Strategy

Transformers have gained increasing popularity in a wide range of applications, including Natural Language Processing (NLP), Computer Vision and Speech Recognition, because of their powerful representational capacity. However, harnessing this representational capacity effectively requires a large amount of data, strong regularization, or both, to mitigate overfitting. Recently, the power of the Transformer has been unlocked by self-supervised pretraining strategies based on masked autoencoderswhich rely on reconstructing masked inputs, directly, or contrastively from unmasked content. This…Apple Machine Learning Research

Self-Conditioning Pre-Trained Language Models

In this paper we aim to investigate the mechanisms that guide text generation with pre-trained Transformer-based Language Models (TLMs). Grounded on the Product of Experts formulation by Hinton (1999), we describe a generative mechanism that exploits expert units which naturally exist in TLMs. Such units are responsible for detecting concepts in the input and conditioning text generation on such concepts. We describe how to identify expert units and how to activate them during inference in order to induce any desired concept in the generated output. We find that the activation of a…Apple Machine Learning Research