Apple – Page 7 – Vedere AI

CoMotion: Concurrent Multi-Person 3D Motion

April 15, 2025

by Apple

We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream. Our system maintains temporally coherent predictions in crowded scenes filled with difficult poses and occlusions. Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame. Rather than match detections across time, poses are updated directly from a new input image, which enables online tracking through occlusion. We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a…Apple Machine Learning Research

TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization

April 15, 2025

by Apple

Direct Preference Optimization (DPO) has been widely adopted for preference alignment of Large Language Models (LLMs) due to its simplicity and effectiveness. However, DPO is derived as a bandit problem in which the whole response is treated as a single arm, ignoring the importance differences between tokens, which may affect optimization efficiency and make it difficult to achieve optimal results. In this work, we propose that the optimal data for DPO has equal expected rewards for each token in winning and losing responses, as there is no difference in token importance. However, since the…Apple Machine Learning Research

EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing

April 15, 2025

by Apple

Diffusion transformers have been widely adopted for text-to-image synthesis. While scaling these models up to billions of parameters shows promise, the effectiveness of scaling beyond current sizes remains underexplored and challenging. By explicitly exploiting the computational heterogeneity of image generations, we develop a new family of Mixture-of-Experts (MoE) models (EC-DIT) for diffusion transformers with expert-choice routing. EC-DIT learns to adaptively optimize the compute allocated to understand the input texts and generate the respective image patches, enabling heterogeneous…Apple Machine Learning Research

Understanding Aggregate Trends for Apple Intelligence Using Differential Privacy

April 14, 2025

by Apple

At Apple, we believe privacy is a fundamental human right. And we believe in giving our users a great experience while protecting their privacy. For years, we’ve used techniques like differential privacy as part of our opt-in device analytics program. This lets us gain insights into how our products are used, so we can improve them, while protecting user privacy by preventing Apple from seeing individual-level data from those users.
This same need to understand usage while protecting privacy is also present in Apple Intelligence. One of our principles is that Apple does not use our users’…Apple Machine Learning Research

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations

April 14, 2025

by Apple

This paper was accepted at the Workshop on Foundation Models in the Wild at ICLR 2025.
Visual understanding is inherently contextual – what we focus on in an image depends on the task at hand. For instance, given an image of a person holding a bouquet of flowers, we may focus on either the person such as their clothing, or the type of flowers, depending on the context of interest. Yet, most existing image encoding paradigms represent an image as a fixed, generic feature vector, overlooking the potential needs of prioritizing varying visual information for different downstream use cases. In…Apple Machine Learning Research

MM-Ego: Towards Building Egocentric Multimodal LLMs

April 11, 2025

by Apple

This research aims to comprehensively explore building a multimodal foundation model for egocentric video understanding. To achieve this goal, we work on three fronts. First, as there is a lack of QA data for egocentric video understanding, we automatically generate 7M high-quality QA samples for egocentric videos ranging from 30 seconds to one hour long in Ego4D based on human-annotated data. This is one of the largest egocentric QA datasets. Second, we contribute a challenging egocentric QA benchmark with 629 videos and 7,026 questions to evaluate the models’ ability in recognizing and…Apple Machine Learning Research

Language Models Know More Than They Show: Exploring Hallucinations From the Model’s Viewpoint

April 11, 2025

by Apple

Large language models (LLMs) often produce errors, including factual inaccuracies, biases, and reasoning failures, collectively referred to as “hallucinations”. Recent studies have demonstrated that LLMs’ internal states encode information regarding the truthfulness of their outputs, and that this information can be utilized to detect errors. In this work, we show that the internal representations of LLMs encode much more information about truthfulness than previously recognized. We first discover that the truthfulness information is concentrated in specific tokens, and leveraging this…Apple Machine Learning Research

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

April 10, 2025

by Apple

Building a generalist model for user interface (UI) understanding is challenging due to various foundational issues, such as platform diversity, resolution variation, and data limitation. In this paper, we introduce Ferret-UI 2, a multimodal large language model (MLLM) designed for universal UI understanding across a wide range of platforms, including iPhone, Android, iPad, Webpage, and AppleTV. Building on the foundation of Ferret-UI, Ferret-UI 2 introduces three key innovations: support for multiple platform types, high-resolution perception through adaptive scaling, and advanced task…Apple Machine Learning Research

Controlling Language and Diffusion Models by Transporting Activations

April 10, 2025

by Apple

Large generative models are becoming increasingly capable and more widely deployed to power production applications, but getting these models to produce exactly what’s desired can still be challenging. Fine-grained control over these models’ outputs is important to meet user expectations and to mitigate potential misuses, ensuring the models’ reliability and safety. To address these issues, Apple machine learning researchers have developed a new technique that is modality-agnostic and provides fine-grained control over the model’s behavior with negligible computational overhead, while…Apple Machine Learning Research

Adaptive Batch Size for Privately Finding Second-order Stationary Points

April 10, 2025

by Apple

There is a gap between finding a first-order stationary point (FOSP) and a second-order stationary point (SOSP) under differential privacy constraints, and it remains unclear whether privately finding an SOSP is more challenging than finding an FOSP. Specifically, Ganesh et al. (2023) claimed that an αalphaα-SOSP can be found with α=O~(1n1/3+(dnϵ)3/7)alpha=tilde{O}(frac{1}{n^{1/3}}+(frac{sqrt{d}}{nepsilon})^{3/7})α=O~(n1/31+(nϵd)3/7), where nnn is the dataset size, ddd is the dimension, and ϵepsilonϵ is the differential privacy parameter.
However, a recent analysis revealed an issue…Apple Machine Learning Research

Vedere AI

Posts in category: Apple

CoMotion: Concurrent Multi-Person 3D Motion

TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization

EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing

Understanding Aggregate Trends for Apple Intelligence Using Differential Privacy

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations

MM-Ego: Towards Building Egocentric Multimodal LLMs

Language Models Know More Than They Show: Exploring Hallucinations From the Model’s Viewpoint

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Controlling Language and Diffusion Models by Transporting Activations

Adaptive Batch Size for Privately Finding Second-order Stationary Points

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.