Apple – Page 41 – Vedere AI

Multichannel Voice Trigger Detection Based on Transform-average-concatenate

February 26, 2024

by Apple

This paper was accepted at the workshop HSCMA at ICASSP 2024.
Voice triggering (VT) enables users to activate their devices by just speaking a trigger phrase. A front-end system is typically used to perform speech enhancement and/or separation, and produces multiple enhanced and/or separated signals. Since conventional VT systems take only single-channel audio as input, channel selection is performed. A drawback of this approach is that unselected channels are discarded, even if the discarded channels could contain useful information for VT. In this work, we propose multichannel acoustic…Apple Machine Learning Research

SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking

February 26, 2024

by Apple

In-context learning with Large Language Models (LLMs) has emerged as a promising avenue of research in Dialog State Tracking (DST). However, the best-performing in-context learning methods involve retrieving and adding similar examples to the prompt, requiring access to labeled training data. Procuring such training data for a wide range of domains and applications is time-consuming, expensive, and, at times, infeasible. While zero-shot learning requires no training data, it significantly lags behind the few-shot setup. Thus, ‘Can we efficiently generate synthetic data for any dialogue schema…Apple Machine Learning Research

Keyframer: Empowering Animation Design using Large Language Models

February 20, 2024

by Apple

Large language models (LLMs) have the potential to impact a wide range of creative domains, as exemplified in popular text-to-image generators like DALL·E and Midjourney. However, the application of LLMs to motion-based visual design has not yet been explored and presents novels challenges such as how users might effectively describe motion in natural language. Further, many existing generative design tools lack support for iterative refinement of designs beyond prompt engineering. In this paper, we present Keyframer, a design tool that leverages the code generation capabilities of LLMs to…Apple Machine Learning Research

Resource-constrained Stereo Singing Voice Cancellation

February 13, 2024

by Apple

We study the problem of stereo singing voice cancellation, a subtask of music source separation, whose goal is to estimate an instrumental background from a stereo mix. We explore how to achieve performance similar to large state-of-the-art source separation networks starting from a small, efficient model for real-time speech separation. Such a model is useful when memory and compute are limited and singing voice processing has to run with limited look-ahead. In practice, this is realised by adapting an existing mono model to handle stereo input. Improvements in quality are obtained by tuning…Apple Machine Learning Research

Efficient ConvBN Blocks for Transfer Learning and Beyond

February 12, 2024

by Apple

Convolution-BatchNorm (ConvBN) blocks are integral components in various computer vision tasks and other domains. A ConvBN block can operate in three modes: Train, Eval, and Deploy. While the Train mode is indispensable for training models from scratch, the Eval mode is suitable for transfer learning and beyond, and the Deploy mode is designed for the deployment of models. This paper focuses on the trade-off between stability and efficiency in ConvBN blocks: Deploy mode is efficient but suffers from training instability; Eval mode is widely used in transfer learning but lacks efficiency. To…Apple Machine Learning Research

Scalable Pre-training of Large Autoregressive Image Models

February 1, 2024

by Apple

This paper introduces AIM, a collection of vision models pre-trained with an autoregressive objective. These models are inspired by their textual counterparts, i.e., Large Language Models (LLMs), and exhibit similar scaling properties. Specifically, we highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, (2) the value of the objective function correlates with the performance of the model on downstream tasks. We illustrate the practical implication of these findings by pre-training a 7 billion parameter AIM on 2…Apple Machine Learning Research

Acoustic Model Fusion for End-to-end Speech Recognition

January 29, 2024

by Apple

Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted its accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, trained exclusively on text corpora, into the E2E system has proven to be beneficial. However, the application of LM fusion presents certain drawbacks, such as its inability to address the domain…Apple Machine Learning Research

Investigating Salient Representations and Label Variance Modeling in Dimensional Speech Emotion Analysis

January 29, 2024

by Apple

Representations from models such as Bidirectional Encoder Representations from Transformers (BERT) and Hidden units BERT (HuBERT) have helped to achieve state-of-the-art performance in dimensional speech emotion recognition. Both HuBERT, and BERT models generate fairly large dimensional representations, and such models were not trained with emotion recognition task in mind. Such large dimensional representations result in speech emotion models with large parameter size, resulting in both memory and computational cost complexities. In this work, we investigate the selection of representations…Apple Machine Learning Research

Co-ML: Collaborative Machine Learning Model Building for Developing Dataset Design Practices

January 29, 2024

by Apple

Machine learning (ML) models are fundamentally shaped by data, and building inclusive ML systems requires significant considerations around how to design representative datasets. Yet, few novice-oriented ML modeling tools are designed to foster hands-on learning of dataset design practices, including how to design for data diversity and inspect for data quality.
To this end, we outline a set of four data design practices (DDPs) for designing inclusive ML models and share how we designed a tablet-based application called Co-ML to foster the learning of DDPs through a collbaborative ML model…Apple Machine Learning Research

User-level Differentially Private Stochastic Convex Optimization: Efficient Algorithms with Optimal Rates

January 29, 2024

by Apple

We study differentially private stochastic convex optimization (DP-SCO) under user-level privacy where each user may hold multiple data items. Existing work for user-level DP-SCO either requires super-polynomial runtime or requires number of users that grows polynomially with the dimensionality of the problem. We develop new algorithms for user-level DP-SCO that obtain optimal rates, run in polynomial time, and require a number of users that grow logarithmically in the dimension. Moreover, our algorithms are the first to obtain optimal rates for non-smooth functions in polynomial time. These…Apple Machine Learning Research

Vedere AI

Posts in category: Apple

Multichannel Voice Trigger Detection Based on Transform-average-concatenate

SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking

Keyframer: Empowering Animation Design using Large Language Models

Resource-constrained Stereo Singing Voice Cancellation

Efficient ConvBN Blocks for Transfer Learning and Beyond

Scalable Pre-training of Large Autoregressive Image Models

Acoustic Model Fusion for End-to-end Speech Recognition

Investigating Salient Representations and Label Variance Modeling in Dimensional Speech Emotion Analysis

Co-ML: Collaborative Machine Learning Model Building for Developing Dataset Design Practices

User-level Differentially Private Stochastic Convex Optimization: Efficient Algorithms with Optimal Rates

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.