Apple – Page 12 – Vedere AI

Scaling Laws for Native Multimodal Models

April 16, 2025

by Apple

Building general-purpose models that can effectively perceive the world through multimodal signals has been a long-standing goal. Current approaches involve integrating separately pre-trained components, such as connecting vision encoders to LLMs and continuing multimodal training. While such approaches exhibit remarkable sample efficiency, it remains an open question whether such late-fusion architectures are inherently superior. In this work, we revisit the architectural design of native multimodal models (NMMs) – those trained from the ground up on all modalities – and conduct an extensive…Apple Machine Learning Research

Step-by-Step Diffusion: An Elementary Tutorial

April 16, 2025

by Apple

We present an accessible first course on the mathematics of diffusion models and flow matching for machine learning. We aim to teach diffusion as simply as possible, with minimal mathematical and machine learning prerequisites, but enough technical detail to reason about its correctness. Unlike most tutorials on this subject, we take neither a Variational Auto Encoder (VAE) nor a Stochastic Differential Equations (SDE) approach. In fact, for the core ideas we will not need any SDEs, Evidence-Based-Lower-Bounds (ELBOs), Langevin dynamics, or even the notion of a score. The reader need only be…Apple Machine Learning Research

TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization

April 15, 2025

by Apple

Direct Preference Optimization (DPO) has been widely adopted for preference alignment of Large Language Models (LLMs) due to its simplicity and effectiveness. However, DPO is derived as a bandit problem in which the whole response is treated as a single arm, ignoring the importance differences between tokens, which may affect optimization efficiency and make it difficult to achieve optimal results. In this work, we propose that the optimal data for DPO has equal expected rewards for each token in winning and losing responses, as there is no difference in token importance. However, since the…Apple Machine Learning Research

EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing

April 15, 2025

by Apple

Diffusion transformers have been widely adopted for text-to-image synthesis. While scaling these models up to billions of parameters shows promise, the effectiveness of scaling beyond current sizes remains underexplored and challenging. By explicitly exploiting the computational heterogeneity of image generations, we develop a new family of Mixture-of-Experts (MoE) models (EC-DIT) for diffusion transformers with expert-choice routing. EC-DIT learns to adaptively optimize the compute allocated to understand the input texts and generate the respective image patches, enabling heterogeneous…Apple Machine Learning Research

CoMotion: Concurrent Multi-Person 3D Motion

April 15, 2025

by Apple

We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream. Our system maintains temporally coherent predictions in crowded scenes filled with difficult poses and occlusions. Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame. Rather than match detections across time, poses are updated directly from a new input image, which enables online tracking through occlusion. We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a…Apple Machine Learning Research

Understanding Aggregate Trends for Apple Intelligence Using Differential Privacy

April 14, 2025

by Apple

At Apple, we believe privacy is a fundamental human right. And we believe in giving our users a great experience while protecting their privacy. For years, we’ve used techniques like differential privacy as part of our opt-in device analytics program. This lets us gain insights into how our products are used, so we can improve them, while protecting user privacy by preventing Apple from seeing individual-level data from those users.
This same need to understand usage while protecting privacy is also present in Apple Intelligence. One of our principles is that Apple does not use our users’…Apple Machine Learning Research

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations

April 14, 2025

by Apple

This paper was accepted at the Workshop on Foundation Models in the Wild at ICLR 2025.
Visual understanding is inherently contextual – what we focus on in an image depends on the task at hand. For instance, given an image of a person holding a bouquet of flowers, we may focus on either the person such as their clothing, or the type of flowers, depending on the context of interest. Yet, most existing image encoding paradigms represent an image as a fixed, generic feature vector, overlooking the potential needs of prioritizing varying visual information for different downstream use cases. In…Apple Machine Learning Research

MM-Ego: Towards Building Egocentric Multimodal LLMs

April 11, 2025

by Apple

This research aims to comprehensively explore building a multimodal foundation model for egocentric video understanding. To achieve this goal, we work on three fronts. First, as there is a lack of QA data for egocentric video understanding, we automatically generate 7M high-quality QA samples for egocentric videos ranging from 30 seconds to one hour long in Ego4D based on human-annotated data. This is one of the largest egocentric QA datasets. Second, we contribute a challenging egocentric QA benchmark with 629 videos and 7,026 questions to evaluate the models’ ability in recognizing and…Apple Machine Learning Research

Language Models Know More Than They Show: Exploring Hallucinations From the Model’s Viewpoint

April 11, 2025

by Apple

Large language models (LLMs) often produce errors, including factual inaccuracies, biases, and reasoning failures, collectively referred to as “hallucinations”. Recent studies have demonstrated that LLMs’ internal states encode information regarding the truthfulness of their outputs, and that this information can be utilized to detect errors. In this work, we show that the internal representations of LLMs encode much more information about truthfulness than previously recognized. We first discover that the truthfulness information is concentrated in specific tokens, and leveraging this…Apple Machine Learning Research

Controlling Language and Diffusion Models by Transporting Activations

April 10, 2025

by Apple

Large generative models are becoming increasingly capable and more widely deployed to power production applications, but getting these models to produce exactly what’s desired can still be challenging. Fine-grained control over these models’ outputs is important to meet user expectations and to mitigate potential misuses, ensuring the models’ reliability and safety. To address these issues, Apple machine learning researchers have developed a new technique that is modality-agnostic and provides fine-grained control over the model’s behavior with negligible computational overhead, while…Apple Machine Learning Research

Vedere AI

Posts in category: Apple

Scaling Laws for Native Multimodal Models

Step-by-Step Diffusion: An Elementary Tutorial

TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization

EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing

CoMotion: Concurrent Multi-Person 3D Motion

Understanding Aggregate Trends for Apple Intelligence Using Differential Privacy

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations

MM-Ego: Towards Building Egocentric Multimodal LLMs

Language Models Know More Than They Show: Exploring Hallucinations From the Model’s Viewpoint

Controlling Language and Diffusion Models by Transporting Activations

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.