Effectively representing 3D scenes for Multimodal Large Language Models (MLLMs) is crucial yet challenging. Existing approaches commonly only rely on 2D image features and use varied tokenization approaches. This work presents a rigorous study of 3D token structures, systematically comparing video-based and point-based representations while maintaining consistent model backbones and parameters. We propose a novel approach that enriches visual tokens by incorporating 3D point cloud features from a Sonata pretrained Point Transformer V3 encoder. Our experiments demonstrate that merging explicit…Apple Machine Learning Research
Apple Machine Learning Research at ICML 2025
Apple researchers are advancing AI and ML through fundamental research, and to support the broader research community and help accelerate progress in this field, we share much of this research through publications and engagement at conferences. Next week, the International Conference on Machine Learning (ICML) will be held in Vancouver, Canada, and Apple is proud to once again participate in this important event for the research community and to be an industry sponsor.
At the main conference and associated workshops, Apple researchers will present new research across a number of topics in AI…Apple Machine Learning Research
Faster Rates for Private Adversarial Bandits
We design new differentially private algorithms for the problems of adversarial bandits and bandits with expert advice. For adversarial bandits, we give a simple and efficient conversion of any non-private bandit algorithms to private bandit algorithms. Instantiating our conversion with existing non-private bandit algorithms gives a regret upper bound of O(KTε)Oleft(frac{sqrt{KT}}{sqrt{varepsilon}}right)O(εKT), improving upon the existing upper bound O(KTlog(KT)ε)Oleft(frac{sqrt{KT log(KT)}}{varepsilon}right)O(εKTlog(KT)) in all privacy regimes. In particular, our algorithms…Apple Machine Learning Research
Beyond Sensor Data: Foundation Models of Behavioral Data from Wearables Improve Health Predictions
Wearable devices record physiological and behavioral signals that can improve health predictions. While foundation models are increasingly used for such predictions, they have been primarily applied to low-level sensor data, despite behavioral data often being more informative due to their alignment with physiologically relevant timescales and quantities. We develop foundation models of such behavioral signals using over 2.5B hours of wearable data from 162K individuals, systematically optimizing architectures and tokenization strategies for this unique dataset. Evaluated on 57 health-related…Apple Machine Learning Research
Addressing Misspecification in Simulation-based Inference through Data-driven Calibration
Driven by steady progress in deep generative modeling, simulation-based inference (SBI) has emerged as the workhorse for inferring the parameters of stochastic simulators. However, recent work has demonstrated that model misspecification can compromise the reliability of SBI, preventing its adoption in important applications where only misspecified simulators are available. This work introduces robust posterior estimation~(RoPE), a framework that overcomes model misspecification with a small real-world calibration set of ground-truth parameter measurements. We formalize the misspecification…Apple Machine Learning Research
Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion
Discrete diffusion is a promising framework for modeling and generating discrete data. In this work, we present Target Concrete Score Matching (TCSM), a novel and versatile objective for training and fine-tuning discrete diffusion models. TCSM provides a general framework with broad applicability. It supports pre-training discrete diffusion models directly from data samples, and many existing discrete diffusion approaches naturally emerge as special cases of our more general TCSM framework. Furthermore, the same TCSM objective extends to post-training of discrete diffusion models, including…Apple Machine Learning Research
Shielded Diffusion: Generating Novel and Diverse Images using Sparse Repellency
Diffusion models are generating ever more realistic images. Yet, when gener-
ating images repeatedly with the same prompt, practitioners often obtain slight
variations of the same, highly-likely mode. As a result, most models fail to re-
flect the inherent diversity seen in data, which hinders their relevance to creative
tasks or ability to power world models. This work proposes a highly effective and
general method to repel generated images away from a reference set of images.
This is achieved by introducing data-driven repellence terms within diffusions dy-
namically, throughout their…Apple Machine Learning Research
A Variational Framework for Improving Naturalness in Generative Spoken Language Models
The success of large language models in text processing has inspired their adaptation to speech modeling. However, since speech is continuous and complex, it is often discretized for autoregressive modeling. Speech tokens derived from self-supervised models (known as semantic tokens) typically focus on the linguistic aspects of speech but neglect prosodic information. As a result, models trained on these tokens can generate speech with reduced naturalness. Existing approaches try to fix this by adding pitch features to the semantic tokens. However, pitch alone cannot fully represent the range…Apple Machine Learning Research
CommVQ: Commutative Vector Quantization for KV Cache Compression
Large Language Models (LLMs) are increasingly used in applications requiring long context
lengths, but the key-value (KV) cache often becomes a memory bottleneck on GPUs as con-
text lengths grow. To address this, we propose Commutative Vector Quantization (CommVQ)
to significantly reduce memory usage for long context LLM inference. First, we leverage additive quantization by introducing a lightweight encoder and codebook to compress the KV cache,
which can then be decoded with a simple matrix multiplication. Second, to tackle the high
computational costs during decoding, we design the…Apple Machine Learning Research
Self-reflective Uncertainties: Do LLMs Know Their Internal Answer Distribution?
This paper was accepted at the Workshop on Reliable and Responsible Foundation Models (RRFMs) Workshop at ICML 2025.
Uncertainty quantification plays a pivotal role when bringing large language models (LLMs) to end-users. Its primary goal is that an LLM should indicate when it is unsure about an answer it gives. While this has been revealed with numerical certainty scores in the past, we propose to use the rich output space of LLMs, the space of all possible strings, to give a string that describes the uncertainty. In particular, we seek a string that describes the distribution of LLM answers…Apple Machine Learning Research