SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking

In-context learning with Large Language Models (LLMs) has emerged as a promising avenue of research in Dialog State Tracking (DST). However, the best-performing in-context learning methods involve retrieving and adding similar examples to the prompt, requiring access to labeled training data. Procuring such training data for a wide range of domains and applications is time-consuming, expensive, and, at times, infeasible. While zero-shot learning requires no training data, it significantly lags behind the few-shot setup. Thus, ‘Can we efficiently generate synthetic data for any dialogue schema…Apple Machine Learning Research

Keyframer: Empowering Animation Design using Large Language Models

Large language models (LLMs) have the potential to impact a wide range of creative domains, as exemplified in popular text-to-image generators like DALL·E and Midjourney. However, the application of LLMs to motion-based visual design has not yet been explored and presents novels challenges such as how users might effectively describe motion in natural language. Further, many existing generative design tools lack support for iterative refinement of designs beyond prompt engineering. In this paper, we present Keyframer, a design tool that leverages the code generation capabilities of LLMs to…Apple Machine Learning Research

Resource-constrained Stereo Singing Voice Cancellation

We study the problem of stereo singing voice cancellation, a subtask of music source separation, whose goal is to estimate an instrumental background from a stereo mix. We explore how to achieve performance similar to large state-of-the-art source separation networks starting from a small, efficient model for real-time speech separation. Such a model is useful when memory and compute are limited and singing voice processing has to run with limited look-ahead. In practice, this is realised by adapting an existing mono model to handle stereo input. Improvements in quality are obtained by tuning…Apple Machine Learning Research

Efficient ConvBN Blocks for Transfer Learning and Beyond

Convolution-BatchNorm (ConvBN) blocks are integral components in various computer vision tasks and other domains. A ConvBN block can operate in three modes: Train, Eval, and Deploy. While the Train mode is indispensable for training models from scratch, the Eval mode is suitable for transfer learning and beyond, and the Deploy mode is designed for the deployment of models. This paper focuses on the trade-off between stability and efficiency in ConvBN blocks: Deploy mode is efficient but suffers from training instability; Eval mode is widely used in transfer learning but lacks efficiency. To…Apple Machine Learning Research

Scalable Pre-training of Large Autoregressive Image Models

This paper introduces AIM, a collection of vision models pre-trained with an autoregressive objective. These models are inspired by their textual counterparts, i.e., Large Language Models (LLMs), and exhibit similar scaling properties. Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the performance of the model on downstream tasks. We illustrate the practical implication of these findings by pre-training a 7 billion parameter AIM on 2…Apple Machine Learning Research

User-level Differentially Private Stochastic Convex Optimization: Efficient Algorithms with Optimal Rates

We study differentially private stochastic convex optimization (DP-SCO) under user-level privacy where each user may hold multiple data items. Existing work for user-level DP-SCO either requires super-polynomial runtime or requires number of users that grows polynomially with the dimensionality of the problem. We develop new algorithms for user-level DP-SCO that obtain optimal rates, run in polynomial time, and require a number of users that grow logarithmically in the dimension. Moreover, our algorithms are the first to obtain optimal rates for non-smooth functions in polynomial time. These…Apple Machine Learning Research

Acoustic Model Fusion for End-to-end Speech Recognition

Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted its accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, trained exclusively on text corpora, into the E2E system has proven to be beneficial. However, the application of LM fusion presents certain drawbacks, such as its inability to address the domain…Apple Machine Learning Research

Investigating Salient Representations and Label Variance Modeling in Dimensional Speech Emotion Analysis

Representations from models such as Bidirectional Encoder Representations from Transformers (BERT) and Hidden units BERT (HuBERT) have helped to achieve state-of-the-art performance in dimensional speech emotion recognition. Both HuBERT, and BERT models generate fairly large dimensional representations, and such models were not trained with emotion recognition task in mind. Such large dimensional representations result in speech emotion models with large parameter size, resulting in both memory and computational cost complexities. In this work, we investigate the selection of representations…Apple Machine Learning Research

Co-ML: Collaborative Machine Learning Model Building for Developing Dataset Design Practices

Machine learning (ML) models are fundamentally shaped by data, and building inclusive ML systems requires significant considerations around how to design representative datasets. Yet, few novice-oriented ML modeling tools are designed to foster hands-on learning of dataset design practices, including how to design for data diversity and inspect for data quality.
To this end, we outline a set of four data design practices (DDPs) for designing inclusive ML models and share how we designed a tablet-based application called Co-ML to foster the learning of DDPs through a collbaborative ML model…Apple Machine Learning Research

Hybrid Model Learning for Cardiovascular Biomarkers Inference

This paper was accepted at the workshop Deep Generative Models for Health at NeurIPS 2023.
Cardiovascular diseases (CVDs) are a major global health concern, making the longitudinal monitoring of cardiovascular biomarkers vital for early diagnosis and intervention. A core challenge is the inference of cardiac pulse parameters from pulse waves, especially when acquired from wearable sensors at peripheral body locations. Traditional machine learning (ML) approaches face hurdles in this context due to the scarcity of labeled data, primarily sourced from clinical settings. Simultaneously, physical…Apple Machine Learning Research