Apple – Page 8 – Vedere AI

Simple ReFlow: Improved Techniques for Fast Flow Models

April 10, 2025

by Apple

Diffusion and flow-matching models achieve remarkable generative performance but at the cost of many sampling steps, this slows inference and limits applicability to time-critical tasks. The ReFlow procedure can accelerate sampling by straightening generation trajectories. However, ReFlow is an iterative procedure, typically requiring training on simulated data, and results in reduced sample quality. To mitigate sample deterioration, we examine the design space of ReFlow and highlight potential pitfalls in prior heuristic practices. We then propose seven improvements for training dynamics…Apple Machine Learning Research

Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling

April 10, 2025

by Apple

Specialist language models (LMs) focus on a specific task or domain on which they often outperform generalist LMs of the same size. However, the specialist data needed to pretrain these models is only available in limited amount for most tasks. In this work, we build specialist models from large generalist training sets instead. We adjust the training distribution of the generalist data with guidance from the limited domain-specific data. We explore several approaches, with clustered importance sampling standing out. This method clusters the generalist dataset and samples from these clusters…Apple Machine Learning Research

The AdEMAMix Optimizer: Better, Faster, Older

April 10, 2025

by Apple

Momentum based optimizers are central to a wide range of machine learning applications. These typically rely on an Exponential Moving Average (EMA) of gradients, which decays exponentially the present contribution of older gradients. This accounts for gradients being local linear approximations which lose their relevance as the iterate moves along the loss landscape. This work questions the use of a single EMA to accumulate past gradients and empirically demonstrates how this choice can be sub-optimal: a single EMA cannot simultaneously give a high weight to the immediate past, and a…Apple Machine Learning Research

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

April 10, 2025

by Apple

Building a generalist model for user interface (UI) understanding is challenging due to various foundational issues, such as platform diversity, resolution variation, and data limitation. In this paper, we introduce Ferret-UI 2, a multimodal large language model (MLLM) designed for universal UI understanding across a wide range of platforms, including iPhone, Android, iPad, Webpage, and AppleTV. Building on the foundation of Ferret-UI, Ferret-UI 2 introduces three key innovations: support for multiple platform types, high-resolution perception through adaptive scaling, and advanced task…Apple Machine Learning Research

Controlling Language and Diffusion Models by Transporting Activations

April 10, 2025

by Apple

Large generative models are becoming increasingly capable and more widely deployed to power production applications, but getting these models to produce exactly what’s desired can still be challenging. Fine-grained control over these models’ outputs is important to meet user expectations and to mitigate potential misuses, ensuring the models’ reliability and safety. To address these issues, Apple machine learning researchers have developed a new technique that is modality-agnostic and provides fine-grained control over the model’s behavior with negligible computational overhead, while…Apple Machine Learning Research

Adaptive Batch Size for Privately Finding Second-order Stationary Points

April 10, 2025

by Apple

There is a gap between finding a first-order stationary point (FOSP) and a second-order stationary point (SOSP) under differential privacy constraints, and it remains unclear whether privately finding an SOSP is more challenging than finding an FOSP. Specifically, Ganesh et al. (2023) claimed that an αalphaα-SOSP can be found with α=O~(1n1/3+(dnϵ)3/7)alpha=tilde{O}(frac{1}{n^{1/3}}+(frac{sqrt{d}}{nepsilon})^{3/7})α=O~(n1/31+(nϵd)3/7), where nnn is the dataset size, ddd is the dimension, and ϵepsilonϵ is the differential privacy parameter.
However, a recent analysis revealed an issue…Apple Machine Learning Research

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining

April 9, 2025

by Apple

This paper was accepted at the Scalable Continual Learning for Lifelong Foundation Models (SCLLFM) Workshop at NeurIPS 2024.
Large Language Models (LLMs) trained on historical web data inevitably become outdated. We investigate evaluation strategies and update methods for LLMs as new data becomes available. We introduce a web-scale dataset for time-continual pretraining of LLMs derived from 114 dumps of Common Crawl (CC) – orders of magnitude larger than previous continual language modeling benchmarks. We also design time-stratified evaluations across both general CC data and specific domains…Apple Machine Learning Research

Do LLMs Estimate Uncertainty Well in Instruction-Following?

April 8, 2025

by Apple

Large language models (LLMs) could be valuable personal AI agents across various domains, provided they can precisely follow user instructions. However, recent studies have shown significant limitations in LLMs’ instruction-following capabilities, raising concerns about their reliability in high-stakes applications. Accurately estimating LLMs’ uncertainty in adhering to instructions is critical to mitigating deployment risks. We present, to our knowledge, the first systematic evaluation of uncertainty estimation abilities of LLMs in the context of instruction-following. Our study identifies…Apple Machine Learning Research

Revisit Large-Scale Image–Caption Data in Pre-training Multimodal Foundation Models

April 8, 2025

by Apple

Recent advancements in multimodal models highlight the value of rewritten captions for improving performance, yet key challenges remain. Notably, the role of synthetic captions and their interaction with original web-crawled AltTexts in pre-training is still unclear. Additionally, different multimodal foundation models may have distinct preferences for specific caption formats while the efforts of studying the optimal captions for each foundation model remain limited. In this work, we introduce a novel, controllable, and scalable captioning pipeline that generates diverse caption formats…Apple Machine Learning Research

Apple Workshop on Natural Language Understanding 2024

April 7, 2025

by Apple

Progress in natural language processing enables more intuitive ways of interacting with technology. For example, many of Apple’s products and services, including Siri and search, use natural language understanding and generation to enable a fluent and seamless interface experience for users. Natural language is a rapidly moving area of machine learning research, and includes work on large-scale data curation across multiple languages, novel architectures and algorithms, and new evaluation regimes, all of which involve important issues of privacy and security, as well as of performance and…Apple Machine Learning Research

Vedere AI

Posts in category: Apple

Simple ReFlow: Improved Techniques for Fast Flow Models

Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling

The AdEMAMix Optimizer: Better, Faster, Older

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Controlling Language and Diffusion Models by Transporting Activations

Adaptive Batch Size for Privately Finding Second-order Stationary Points

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining

Do LLMs Estimate Uncertainty Well in Instruction-Following?

Revisit Large-Scale Image–Caption Data in Pre-training Multimodal Foundation Models

Apple Workshop on Natural Language Understanding 2024

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.