Apple – Page 4 – Vedere AI

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

June 5, 2025

by Apple

Recent generations of frontier language models have introduced Large Reasoning Models
(LRMs) that generate detailed thinking processes before providing answers. While these models
demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scal-
ing properties, and limitations remain insufficiently understood. Current evaluations primarily fo-
cus on established mathematical and coding benchmarks, emphasizing final answer accuracy. How-
ever, this evaluation paradigm often suffers from data contamination and does not provide insights
into the reasoning traces’…Apple Machine Learning Research

Improve Vision Language Model Chain-of-thought Reasoning

June 5, 2025

by Apple

Chain-of-thought (CoT) reasoning in vision language
models (VLMs) is crucial for improving
interpretability and trustworthiness. However,
current training recipes often relying on
datasets dominated by short annotations with
minimal rationales. In this work, we show that
training VLM on short answers leads to poor
generalization on reasoning tasks that require
more detailed explanations. To address this limitation,
we propose a two-stage post-training
strategy that extends the usage of short answer
data for enhanced CoT reasoning. First, we
augment short answers with CoT reasoning
generated by…Apple Machine Learning Research

Voice Quality Dimensions as Interpretable Primitives for Speaking Style for Atypical Speech and Affect

June 5, 2025

by Apple

Perceptual voice quality dimensions describe key characteristics of atypical speech and other speech modulations. Here we develop and evaluate voice quality models for seven voice and speech dimensions (intelligibility, imprecise consonants, harsh voice, naturalness, monoloudness, monopitch, and breathiness). Probes were trained on the public Speech Accessibility (SAP) project dataset with 11,184 samples from 434 speakers, using embeddings from frozen pre-trained models as features. We found that our probes had both strong performance and strong generalization across speech elicitation…Apple Machine Learning Research

Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Input Representations Matter

June 4, 2025

by Apple

Cross-lingual transfer is a popular approach to increase the amount of training data for NLP tasks in a low-resource context. However, the best strategy to decide which cross-lingual data to include is unclear. Prior research often focuses on a small set of languages from a few language families or a single task. It is still an open question how these findings extend to a wider variety of languages and tasks. In this work, we contribute to this question by analyzing cross-lingual transfer for 263 languages from a wide variety of language families. Moreover, we include three popular NLP tasks…Apple Machine Learning Research

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

June 4, 2025

by Apple

Apple Machine Learning Research

Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

June 3, 2025

by Apple

*Equal Contributors
Identifying mistakes (i.e., miscues) made while reading aloud is commonly approached post-hoc by comparing automatic speech recognition (ASR) transcriptions to the target reading text. However, post-hoc methods perform poorly when ASR inaccurately transcribes verbatim speech. To improve on current methods for reading error annotation, we propose a novel end-to-end architecture that incorporates the target reading text via prompting and is trained for both improved verbatim transcription and direct miscue detection. Our contributions include: first, demonstrating that…Apple Machine Learning Research

Distillation Scaling Laws

June 3, 2025

by Apple

We propose a distillation scaling law that estimates distilled model performance based on a compute budget and its allocation between the student and teacher. Our findings mitigate the risks associated with large-scale distillation by enabling compute-optimal allocation for both the teacher and student to maximize student performance. We provide compute-optimal distillation recipes for two key scenarios: when a teacher already exists, and when a teacher needs training. In settings involving many students or an existing teacher, distillation outperforms supervised learning up to a compute level…Apple Machine Learning Research

SpeakStream: Streaming Text-to-Speech with Interleaved Data

May 30, 2025

by Apple

With the increasing integration of speech front-ends and large language models (LLM),
there is a need to explore architectures that integrate these modalities.
While end-to-end models have been explored extensively, cascaded models that stream outputs from LLMs to TTS seem to be oddly under-explored, even though they are potentially much simpler.
Using traditional text-to-speech systems to convert LLM outputs to audio, however, poses a technical problem because they need entire utterances to generate sytlistic audio.
In this paper we present a ‘streaming’ TTS that can generate audio from…Apple Machine Learning Research

World-Consistent Video Diffusion With Explicit 3D Modeling

May 30, 2025

by Apple

As diffusion models dominating visual content generation, efforts have been made to adapt these models for multi-view image generation to create 3D content. Traditionally, these methods implicitly learn 3D consistency by generating only RGB frames, which can lead to artifacts and inefficiencies in training. In contrast, we propose generating Normalized Coordinate Space (NCS) frames alongside RGB frames. NCS frames capture each pixel’s global coordinate, providing strong pixel correspondence and explicit supervision for 3D consistency. Additionally, by jointly estimating RGB and NCS frames…Apple Machine Learning Research

Interleaved Reasoning for Large Language Models via Reinforcement Learning

May 28, 2025

by Apple

Long chain-of-thought (CoT) significantly enhances large language models’ (LLM) reasoning capabilities. However, the extensive reasoning traces lead to inefficiencies and an increased time-to-first-token (TTFT). We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions. We observe that models inherently possess the ability to perform interleaved reasoning, which can be further enhanced through RL. We introduce a simple yet effective rule-based reward to incentivize correct intermediate steps…Apple Machine Learning Research

Vedere AI

Posts in category: Apple

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Improve Vision Language Model Chain-of-thought Reasoning

Voice Quality Dimensions as Interpretable Primitives for Speaking Style for Atypical Speech and Affect

Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Input Representations Matter

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

Distillation Scaling Laws

SpeakStream: Streaming Text-to-Speech with Interleaved Data

World-Consistent Video Diffusion With Explicit 3D Modeling

Interleaved Reasoning for Large Language Models via Reinforcement Learning

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.