Apple – Page 31 – Vedere AI

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

May 7, 2024

by Apple

This paper has been accepted at the Data Problems for Foundation Models workshop at ICLR 2024.
Large language models are trained on massive scrapes of the web, which are often unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such data requires an abundance of both compute and data, which grows with the size of the model being trained. This is infeasible both because of the large compute costs and duration associated with pre-training, and the impending scarcity of high-quality data on the web. In this work, we proposeWebRephrase Augmented Pre-training…Apple Machine Learning Research

ACM Human-Computer Interaction conference (CHI) 2024

May 6, 2024

by Apple

Apple Machine Learning Research

International Conference on Learning Representations (ICLR) 2024

May 6, 2024

by Apple

Apple Machine Learning Research

Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models

May 6, 2024

by Apple

Vision Foundation Models (VFMs) pretrained on massive datasets exhibit impressive performance on various downstream tasks, especially with limited labeled target data. However, due to their high inference compute cost, these models cannot be deployed for many real-world applications. Motivated by this, we ask the following important question, “How can we leverage the knowledge from a large VFM to train a small task-specific model for a new target task with limited labeled training data?”, and propose a simple task-oriented knowledge transfer approach as a highly effective solution to this…Apple Machine Learning Research

Conformal Prediction via Regression-as-Classification

May 3, 2024

by Apple

Conformal prediction (CP) for regression can be challenging, especially when the output distribution is heteroscedastic, multimodal, or skewed. Some of the issues can be addressed by estimating a distribution over the output, but in reality, such approaches can be sensitive to estimation error and yield unstable intervals. Here, we circumvent the challenges by converting regression to a classification problem and then use CP for classification to obtain CP sets for regression. To preserve the ordering of the continuous-output space, we design a new loss function and make necessary…Apple Machine Learning Research

Pseudo-Generalized Dynamic View Synthesis from a Video

May 2, 2024

by Apple

Rendering scenes observed in a monocular video from novel viewpoints is a chal- lenging problem. For static scenes the community has studied both scene-specific optimization techniques, which optimize on every test scene, and generalized tech- niques, which only run a deep net forward pass on a test scene. In contrast, for dy- namic scenes, scene-specific optimization techniques exist, but, to our best knowl- edge, there is currently no generalized method for dynamic novel view synthesis from a given monocular video. To explore whether generalized dynamic novel view synthesis from monocular…Apple Machine Learning Research

ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models

May 2, 2024

by Apple

Large Language Models (LLMs) with billions of parameters have drastically transformed AI applications. However, their demanding computation during inference has raised significant challenges for deployment on resource-constrained devices. Despite recent trends favoring alternative activation functions such as GELU or SiLU, known for increased computation, this study strongly advocates for reinstating ReLU activation in LLMs. We demonstrate that using the ReLU activation function has a negligible impact on convergence and performance while significantly reducing computation and weight transfer…Apple Machine Learning Research

Guiding Instruction-based Image Editing via Multimodal Large Language Models

May 2, 2024

by Apple

Instruction-based image editing improves the controllability and flexibility of image manipulation via natural commands without elaborate descriptions or regional masks. However, human instructions are sometimes too brief for current methods to capture and follow. Multimodal large language models (MLLMs) show promising capabilities in cross-modal understanding and visual-aware response generation via LMs. We investigate how MLLMs facilitate edit instructions and present MLLM-Guided Image Editing (MGIE). MGIE learns to derive expressive instructions and provides explicit guidance. The editing…Apple Machine Learning Research

MOFI: Learning Image Representation from Noisy Entity Annotated Images

May 1, 2024

by Apple

In this paper, we introduce a novel approach to automatically assign entity labels to images from existing noisy image-text pairs. The approach employees a named entity recognition model to extract entities from text, and uses a CLIP model to select the right entities as the labels of the paired image. The approach is simple, and can be readily scaled up to billions of image-text pairs mined from the web, through which we have successfully created a dataset with 2 millions of distinct entities. We study new training approaches on the collected new dataset with large scale entity labels…Apple Machine Learning Research

Large Language Models as Generalizable Policies for Embodied Tasks

May 1, 2024

by Apple

We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment. Using reinforcement learning, we train LLaRP to see and act solely through environmental interactions. We show that LLaRP is robust to complex paraphrasings of task instructions and can generalize to new tasks that require novel optimal behavior. In…Apple Machine Learning Research

Vedere AI

Posts in category: Apple

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

ACM Human-Computer Interaction conference (CHI) 2024

International Conference on Learning Representations (ICLR) 2024

Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models

Conformal Prediction via Regression-as-Classification

Pseudo-Generalized Dynamic View Synthesis from a Video

ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models

Guiding Instruction-based Image Editing via Multimodal Large Language Models

MOFI: Learning Image Representation from Noisy Entity Annotated Images

Large Language Models as Generalizable Policies for Embodied Tasks

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.