Apple – Page 35 – Vedere AI

CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement

May 30, 2024

by Apple

Contrastive language image pretraining (CLIP) is a standard method for training vision-language models. While CLIP is scalable, promptable, and robust to distribution shifts on image classification tasks, it lacks object localization capabilities. This paper studies the following question: Can we augment CLIP training with task-specific vision models from model zoos to improve its visual representations? Towards this end, we leverage open-source task-specific vision models to generate pseudo-labels for an uncurated and noisy image-text dataset. Subsequently, we train CLIP models on these…Apple Machine Learning Research

Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching

May 29, 2024

by Apple

This paper was accepted at the Image Matching: Local Features & Beyond workshop at CVPR 2024.
Identifying robust and accurate correspondences across images is a fundamental problem in computer vision that enables various downstream tasks. Recent semi-dense matching methods emphasize the effectiveness of fusing relevant cross-view information through Transformer. In this paper, we propose several improvements upon this paradigm. Firstly, we introduce affine-based local attention to model cross-view deformations. Secondly, we present selective fusion to merge local and global messages from…Apple Machine Learning Research

KPConvX: Modernizing Kernel Point Convolution with Kernel Attention

May 28, 2024

by Apple

In the field of deep point cloud understanding, KPConv is a unique architecture that uses kernel points to locate convolutional weights in space, instead of relying on Multi-Layer Perceptron (MLP) encodings. While it initially achieved success, it has since been surpassed by recent MLP networks that employ updated designs and training strategies. Building upon the kernel point principle, we present two novel designs: KPConvD (depthwise KPConv), a lighter design that enables the use of deeper architectures, and KPConvX, an innovative design that scales the depthwise convolutional weights of…Apple Machine Learning Research

Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications

May 28, 2024

by Apple

We consider the task of animating 3D facial geometry from speech signal. Existing works are primarily deterministic, focusing on learning a one-to-one mapping from speech signal to 3D face meshes on small datasets with limited speakers. While these models can achieve high-quality lip articulation for speakers in the training set, they are unable to capture the full and diverse distribution of 3D facial motions that accompany speech in the real world. Importantly, the relationship between speech and facial motion is one-to-many, containing both inter-speaker and intra-speaker variations and…Apple Machine Learning Research

ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models

May 28, 2024

by Apple

Modern diffusion-based image generative models have made significant progress and become promising to enrich training data for the object detection task. However, the generation quality and the controllability for complex scenes containing multi-class objects and dense objects with occlusions remain limited. This paper presents ODGEN, a novel method to generate high-quality images conditioned on bounding boxes, thereby facilitating data synthesis for object detection. Given a domain-specific object detection dataset, we first fine-tune a pre-trained diffusion model on both cropped foreground…Apple Machine Learning Research

Efficient Diffusion Models without Attention

May 28, 2024

by Apple

Transformers have demonstrated impressive performance on class-conditional ImageNet benchmarks, achieving state-of-the-art FID scores. However, their computational complexity increases with transformer depth/width or the number of input tokens and requires patchy approximation to operate on even latent input sequences. In this paper, we address these issues by presenting a novel approach to enhance the efficiency and scalability of image generation models, incorporating state space models (SSMs) as the core component and deviating from the widely adopted transformer-based and U-Net…Apple Machine Learning Research

Swallowing the Bitter Pill: Simplified Scalable Conformer Generation

May 24, 2024

by Apple

We present a novel way to predict molecular conformers through a simple formulation that sidesteps many of the heuristics of prior works and achieves state of the art results by using the advantages of scale. By training a diffusion generative model directly on 3D atomic positions without making assumptions about the explicit structure of molecules (e.g. modeling torsional angles) we are able to radically simplify structure learning, and make it trivial to scale up the model sizes. This model, called Molecular Conformer Fields (MCF), works by parameterizing conformer structures as functions…Apple Machine Learning Research

On Efficient and Statistical Quality Estimation for Data Annotation

May 22, 2024

by Apple

Annotated data is an essential ingredient to train, evaluate, compare and productionalize machine learning models. It is therefore imperative that annotations are of high quality. For their creation, good quality management and thereby reliable quality estimates are needed. Then, if quality is insufficient during the annotation process, rectifying measures can be taken to improve it. For instance, project managers can use quality estimates to improve annotation guidelines, retrain annotators or catch as many errors as possible before release.
Quality estimation is often performed by having…Apple Machine Learning Research

Automatic Creative Selection with Cross-Modal Matching

May 20, 2024

by Apple

Application developers advertise their Apps by creating product pages with App images, and bidding on search terms. It is then crucial for App images to be highly relevant with the search terms. Solutions to this problem require an image-text matching model to predict the quality of the match between the chosen image and the search terms. In this work, we present a novel approach to matching an App image to search terms based on fine-tuning a pre-trained LXMERT model. We show that compared to the CLIP model and a baseline using a Transformer model for search terms, and a ResNet model for…Apple Machine Learning Research

ContextQ: Generated Questions to Support Meaningful Parent-Child Dialogue While Co-Reading

May 20, 2024

by Apple

Much of early literacy education happens at home with caretakers reading books to young children. Prior research demonstrates how having dialogue with children during co-reading can develop critical reading readiness skills, but most adult readers are unsure if and how to lead effective conversations. We present ContextQ, a tablet-based reading application to unobtrusively present auto-generated dialogic questions to caretakers to support this dialogic reading practice. An ablation study demonstrates how our method of encoding educator expertise into the question generation pipeline can…Apple Machine Learning Research

Vedere AI

Posts in category: Apple

CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement

Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching

KPConvX: Modernizing Kernel Point Convolution with Kernel Attention

Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications

ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models

Efficient Diffusion Models without Attention

Swallowing the Bitter Pill: Simplified Scalable Conformer Generation

On Efficient and Statistical Quality Estimation for Data Annotation

Automatic Creative Selection with Cross-Modal Matching

ContextQ: Generated Questions to Support Meaningful Parent-Child Dialogue While Co-Reading

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.