Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

Large-scale models are routinely trained on a mixture of different data sources.
Different data mixtures yield very different downstream performances.
We propose a novel architecture that can instantiate one model for each data mixture without having to re-train the model.
Our architecture consists of a bank of expert weights, which are linearly combined to instantiate one model.
We learn the linear combination coefficients as a function of the input histogram.
To train this architecture, we sample random histograms, instantiate the corresponding model, and backprop through one batch of data…Apple Machine Learning Research

Vedere AI

Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.