As statistical analyses become more central to science, industry and society, there is a growing need to ensure correctness of their results. Approximate correctness can be verified by replicating the entire analysis, but can we verify without replication? Building on a recent line of work, we study proof-systems that allow a probabilistic verifier to ascertain that the results of an analysis are approximately correct, while drawing fewer samples and using less computational resources than would be needed to replicate the analysis. We focus on distribution testing problems: verifying that an…Apple Machine Learning Research
An LLM-Based Approach to Review Summarization on the App Store
Ratings and reviews are an invaluable resource for users exploring an app on the App Store, providing insights into how others have experienced the app. With review summaries now available in iOS 18.4, users can quickly get a high-level overview of what other users think about an app, while still having the option to dive into individual reviews for more detail. This feature is powered by a novel, multi-step LLM-based system that periodically summarizes user reviews.
Our goal in producing review summaries is to ensure they are inclusive, balanced, and accurately reflect the user’s voice. To…Apple Machine Learning Research
Apple Machine Learning Research at ICLR 2025
Apple researchers are advancing machine learning (ML) and AI through fundamental research that improves the world’s understanding of this technology and helps to redefine what is possible with it. To support the broader research community and help accelerate progress in this field, we share much of our research through publications, open source resources, and engagement at conferences.
This week, the Thirteenth International Conference on Learning Representations (ICLR) will be held in Singapore. ICLR brings together leading experts on deep learning and the application of representation…Apple Machine Learning Research
FastVLM: Efficient Vision encoding for Vision Language Models
Scaling the input image resolution is essential for enhancing the performance of Vision Language Models (VLMs), particularly in text-rich image understanding tasks. However, popular visual encoders such as ViTs become inefficient at high resolutions due to the large number of tokens and high encoding latency. At different operational resolutions, the vision encoder of a VLM can be optimized along two axes: reducing encoding latency and minimizing the number of visual tokens passed to the LLM, thereby lowering overall latency. Based on a comprehensive efficiency analysis of the interplay…Apple Machine Learning Research
Disentangled Representational Learning with the Gromov-Monge Gap
Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning. Solving it may unlock other problems, such as generalization, interpretability, or fairness. Although remarkably challenging to solve in theory, disentanglement is often achieved in practice through prior matching. Furthermore, recent works have shown that prior matching approaches can be enhanced by leveraging geometrical considerations, e.g., by learning representations that preserve geometric features of the data, such as distances or angles between points. However, matching the prior…Apple Machine Learning Research
ACM Human-Computer Interaction Conference (CHI) 2025
Apple is presenting new research at the ACM annual conference on Human-Computer Interaction (CHI), which takes place in person in Yokohama, Japan, from April 26 to May 1. We are proud to again sponsor the conference, which brings together the scientific and industrial research communities focused on interactive technology. Below is an overview of Apple’s participation at CHI 2025.
Schedule
Stop by the Apple booth (304 & 305) in the Yokohama PACIFICO during exhibition hours. All times listed in GMT +9 (Japan Time):
Tuesday, April 29: 10:00 – 17:00
Wednesday, April 30: 10:00 -…Apple Machine Learning Research
Scaling Laws for Native Multimodal Models
Building general-purpose models that can effectively perceive the world through multimodal signals has been a long-standing goal. Current approaches involve integrating separately pre-trained components, such as connecting vision encoders to LLMs and continuing multimodal training. While such approaches exhibit remarkable sample efficiency, it remains an open question whether such late-fusion architectures are inherently superior. In this work, we revisit the architectural design of native multimodal models (NMMs) – those trained from the ground up on all modalities – and conduct an extensive…Apple Machine Learning Research
Step-by-Step Diffusion: An Elementary Tutorial
We present an accessible first course on the mathematics of diffusion models and flow matching for machine learning. We aim to teach diffusion as simply as possible, with minimal mathematical and machine learning prerequisites, but enough technical detail to reason about its correctness. Unlike most tutorials on this subject, we take neither a Variational Auto Encoder (VAE) nor a Stochastic Differential Equations (SDE) approach. In fact, for the core ideas we will not need any SDEs, Evidence-Based-Lower-Bounds (ELBOs), Langevin dynamics, or even the notion of a score. The reader need only be…Apple Machine Learning Research
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
Diffusion models have become the dominant approach for visual generation. They are trained by denoising a Markovian process which gradually adds noise to the input. We argue that the Markovian property limits the model’s ability to fully utilize the generation trajectory, leading to inefficiencies during training and inference. In this paper, we propose DART, a transformer-based model that unifies autoregressive (AR) and diffusion within a non-Markovian framework. DART iteratively denoises image patches spatially and spectrally using an AR model that has the same architecture as standard…Apple Machine Learning Research
International Conference on Learning Representations (ICLR) 2025
Apple Machine Learning Research