We used AI to help airlines choose routes that are less likely to cause contrails, minimizing the environmental impact of flights.Read More
Multimodal medical AI
Medicine is an inherently multimodal discipline. When providing care, clinicians routinely interpret data from a wide range of modalities including medical images, clinical notes, lab tests, electronic health records, genomics, and more. Over the last decade or so, AI systems have achieved expert-level performance on specific tasks within specific modalities — some AI systems processing CT scans, while others analyzing high magnification pathology slides, and still others hunting for rare genetic variations. The inputs to these systems tend to be complex data such as images, and they typically provide structured outputs, whether in the form of discrete grades or dense image segmentation masks. In parallel, the capacities and capabilities of large language models (LLMs) have become so advanced that they have demonstrated comprehension and expertise in medical knowledge by both interpreting and responding in plain language. But how do we bring these capabilities together to build medical AI systems that can leverage information from all these sources?
In today’s blog post, we outline a spectrum of approaches to bringing multimodal capabilities to LLMs and share some exciting results on the tractability of building multimodal medical LLMs, as described in three recent research papers. The papers, in turn, outline how to introduce de novo modalities to an LLM, how to graft a state-of-the-art medical imaging foundation model onto a conversational LLM, and first steps towards building a truly generalist multimodal medical AI system. If successfully matured, multimodal medical LLMs might serve as the basis of new assistive technologies spanning professional medicine, medical research, and consumer applications. As with our prior work, we emphasize the need for careful evaluation of these technologies in collaboration with the medical community and healthcare ecosystem.
A spectrum of approaches
Several methods for building multimodal LLMs have been proposed in recent months [1, 2, 3], and no doubt new methods will continue to emerge for some time. For the purpose of understanding the opportunities to bring new modalities to medical AI systems, we’ll consider three broadly defined approaches: tool use, model grafting, and generalist systems.
Tool use
In the tool use approach, one central medical LLM outsources analysis of data in various modalities to a set of software subsystems independently optimized for those tasks: the tools. The common mnemonic example of tool use is teaching an LLM to use a calculator rather than do arithmetic on its own. In the medical space, a medical LLM faced with a chest X-ray could forward that image to a radiology AI system and integrate that response. This could be accomplished via application programming interfaces (APIs) offered by subsystems, or more fancifully, two medical AI systems with different specializations engaging in a conversation.
This approach has some important benefits. It allows maximum flexibility and independence between subsystems, enabling health systems to mix and match products between tech providers based on validated performance characteristics of subsystems. Moreover, human-readable communication channels between subsystems maximize auditability and debuggability. That said, getting the communication right between independent subsystems can be tricky, narrowing the information transfer, or exposing a risk of miscommunication and information loss.
Model grafting
A more integrated approach would be to take a neural network specialized for each relevant domain, and adapt it to plug directly into the LLM — grafting the visual model onto the core reasoning agent. In contrast to tool use where the specific tool(s) used are determined by the LLM, in model grafting the researchers may choose to use, refine, or develop specific models during development. In two recent papers from Google Research, we show that this is in fact feasible. Neural LLMs typically process text by first mapping words into a vector embedding space. Both papers build on the idea of mapping data from a new modality into the input word embedding space already familiar to the LLM. The first paper, “Multimodal LLMs for health grounded in individual-specific data”, shows that asthma risk prediction in the UK Biobank can be improved if we first train a neural network classifier to interpret spirograms (a modality used to assess breathing ability) and then adapt the output of that network to serve as input into the LLM.
The second paper, “ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders”, takes this same tack, but applies it to full-scale image encoder models in radiology. Starting with a foundation model for understanding chest X-rays, already shown to be a good basis for building a variety of classifiers in this modality, this paper describes training a lightweight medical information adapter that re-expresses the top layer output of the foundation model as a series of tokens in the LLM’s input embeddings space. Despite fine-tuning neither the visual encoder nor the language model, the resulting system displays capabilities it wasn’t trained for, including semantic search and visual question answering.
![]() |
Our approach to grafting a model works by training a medical information adapter that maps the output of an existing or refined image encoder into an LLM-understandable form. |
Model grafting has a number of advantages. It uses relatively modest computational resources to train the adapter layers but allows the LLM to build on existing highly-optimized and validated models in each data domain. The modularization of the problem into encoder, adapter, and LLM components can also facilitate testing and debugging of individual software components when developing and deploying such a system. The corresponding disadvantages are that the communication between the specialist encoder and the LLM is no longer human readable (being a series of high dimensional vectors), and the grafting procedure requires building a new adapter for not just every domain-specific encoder, but also every revision of each of those encoders.
Generalist systems
The most radical approach to multimodal medical AI is to build one integrated, fully generalist system natively capable of absorbing information from all sources. In our third paper in this area, “Towards Generalist Biomedical AI”, rather than having separate encoders and adapters for each data modality, we build on PaLM-E, a recently published multimodal model that is itself a combination of a single LLM (PaLM) and a single vision encoder (ViT). In this set up, text and tabular data modalities are covered by the LLM text encoder, but now all other data are treated as an image and fed to the vision encoder.
![]() |
Med-PaLM M is a large multimodal generative model that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same model weights. |
We specialize PaLM-E to the medical domain by fine-tuning the complete set of model parameters on medical datasets described in the paper. The resulting generalist medical AI system is a multimodal version of Med-PaLM that we call Med-PaLM M. The flexible multimodal sequence-to-sequence architecture allows us to interleave various types of multimodal biomedical information in a single interaction. To the best of our knowledge, it is the first demonstration of a single unified model that can interpret multimodal biomedical data and handle a diverse range of tasks using the same set of model weights across all tasks (detailed evaluations in the paper).
This generalist-system approach to multimodality is both the most ambitious and simultaneously most elegant of the approaches we describe. In principle, this direct approach maximizes flexibility and information transfer between modalities. With no APIs to maintain compatibility across and no proliferation of adapter layers, the generalist approach has arguably the simplest design. But that same elegance is also the source of some of its disadvantages. Computational costs are often higher, and with a unitary vision encoder serving a wide range of modalities, domain specialization or system debuggability could suffer.
The reality of multimodal medical AI
To make the most of AI in medicine, we’ll need to combine the strength of expert systems trained with predictive AI with the flexibility made possible through generative AI. Which approach (or combination of approaches) will be most useful in the field depends on a multitude of as-yet unassessed factors. Is the flexibility and simplicity of a generalist model more valuable than the modularity of model grafting or tool use? Which approach gives the highest quality results for a specific real-world use case? Is the preferred approach different for supporting medical research or medical education vs. augmenting medical practice? Answering these questions will require ongoing rigorous empirical research and continued direct collaboration with healthcare providers, medical institutions, government entities, and healthcare industry partners broadly. We look forward to finding the answers together.
A new fund for women creating AI startups in Asia Pacific
The Google for Startups Women Founders Fund will provide equity-free funding and Google support to women-led AI startups in India, Japan and Korea.Read More
Lab Sessions: A new series of experimental AI collaborations
Lab Sessions is a series of experimental AI collaborations with everyone from artists to academics, scientists to students.Read More
Speaking robot: Our new AI model translates vision and language into robotic actions
Google DeepMind introduces a new vision-language-action model for improving robotics.Read More
Offering free AI training for everyone in the UK
Jonny Cottom knows the juggle involved with running a start-up single handedly. Since launching BreakBottle, an eco-friendly water bottle brand, last year, he has been s…Read More
3 emerging practices for responsible generative AI
For our midyear update, we’d like to share three of our best practices based on this guidance and what we’ve done in our pre-launch design, reviews and development of ge…Read More
In search of a generalizable method for source-free domain adaptation
Deep learning has recently made tremendous progress in a wide range of problems and applications, but models often fail unpredictably when deployed in unseen domains or distributions. Source-free domain adaptation (SFDA) is an area of research that aims to design methods for adapting a pre-trained model (trained on a “source domain”) to a new “target domain”, using only unlabeled data from the latter.
Designing adaptation methods for deep models is an important area of research. While the increasing scale of models and training datasets has been a key ingredient to their success, a negative consequence of this trend is that training such models is increasingly computationally expensive, out of reach for certain practitioners and also harmful for the environment. One avenue to mitigate this issue is through designing techniques that can leverage and reuse already trained models for tackling new tasks or generalizing to new domains. Indeed, adapting models to new tasks is widely studied under the umbrella of transfer learning.
SFDA is a particularly practical area of this research because several real-world applications where adaptation is desired suffer from the unavailability of labeled examples from the target domain. In fact, SFDA is enjoying increasing attention [1, 2, 3, 4]. However, albeit motivated by ambitious goals, most SFDA research is grounded in a very narrow framework, considering simple distribution shifts in image classification tasks.
In a significant departure from that trend, we turn our attention to the field of bioacoustics, where naturally-occurring distribution shifts are ubiquitous, often characterized by insufficient target labeled data, and represent an obstacle for practitioners. Studying SFDA in this application can, therefore, not only inform the academic community about the generalizability of existing methods and identify open research directions, but can also directly benefit practitioners in the field and aid in addressing one of the biggest challenges of our century: biodiversity preservation.
In this post, we announce “In Search for a Generalizable Method for Source-Free Domain Adaptation”, appearing at ICML 2023. We show that state-of-the-art SFDA methods can underperform or even collapse when confronted with realistic distribution shifts in bioacoustics. Furthermore, existing methods perform differently relative to each other than observed in vision benchmarks, and surprisingly, sometimes perform worse than no adaptation at all. We also propose NOTELA, a new simple method that outperforms existing methods on these shifts while exhibiting strong performance on a range of vision datasets. Overall, we conclude that evaluating SFDA methods (only) on the commonly-used datasets and distribution shifts leaves us with a myopic view of their relative performance and generalizability. To live up to their promise, SFDA methods need to be tested on a wider range of distribution shifts, and we advocate for considering naturally-occurring ones that can benefit high-impact applications.
Distribution shifts in bioacoustics
Naturally-occurring distribution shifts are ubiquitous in bioacoustics. The largest labeled dataset for bird songs is Xeno-Canto (XC), a collection of user-contributed recordings of wild birds from across the world. Recordings in XC are “focal”: they target an individual captured in natural conditions, where the song of the identified bird is at the foreground. For continuous monitoring and tracking purposes, though, practitioners are often more interested in identifying birds in passive recordings (“soundscapes”), obtained through omnidirectional microphones. This is a well-documented problem that recent work shows is very challenging. Inspired by this realistic application, we study SFDA in bioacoustics using a bird species classifier that was pre-trained on XC as the source model, and several “soundscapes” coming from different geographical locations — Sierra Nevada (S. Nevada); Powdermill Nature Reserve, Pennsylvania, USA; Hawai’i; Caples Watershed, California, USA; Sapsucker Woods, New York, USA (SSW); and Colombia — as our target domains.
This shift from the focalized to the passive domain is substantial: the recordings in the latter often feature much lower signal-to-noise ratio, several birds vocalizing at once, and significant distractors and environmental noise, like rain or wind. In addition, different soundscapes originate from different geographical locations, inducing extreme label shifts since a very small portion of the species in XC will appear in a given location. Moreover, as is common in real-world data, both the source and target domains are significantly class imbalanced, because some species are significantly more common than others. In addition, we consider a multi-label classification problem since there may be several birds identified within each recording, a significant departure from the standard single-label image classification scenario where SFDA is typically studied.
Audio files |
Focal domain |
Soundscape domain1 |
||
Spectogram images | ![]() |
![]() |
Illustration of the distribution shift from the focal domain (left) to the soundscape domain (right), in terms of the audio files (top) and spectrogram images (bottom) of a representative recording from each dataset. Note that in the second audio clip, the bird song is very faint; a common property in soundscape recordings where bird calls aren’t at the “foreground”. Credits: Left: XC recording by Sue Riffe (CC-BY-NC license). Right: Excerpt from a recording made available by Kahl, Charif, & Klinck. (2022) “A collection of fully-annotated soundscape recordings from the Northeastern United States” [link] from the SSW soundscape dataset (CC-BY license). |
State-of-the-art SFDA models perform poorly on bioacoustics shifts
As a starting point, we benchmark six state-of-the-art SFDA methods on our bioacoustics benchmark, and compare them to the non-adapted baseline (the source model). Our findings are surprising: without exception, existing methods are unable to consistently outperform the source model on all target domains. In fact, they often underperform it significantly.
As an example, Tent, a recent method, aims to make models produce confident predictions for each example by reducing the uncertainty of the model’s output probabilities. While Tent performs well in various tasks, it doesn’t work effectively for our bioacoustics task. In the single-label scenario, minimizing entropy forces the model to choose a single class for each example confidently. However, in our multi-label scenario, there’s no such constraint that any class should be selected as being present. Combined with significant distribution shifts, this can cause the model to collapse, leading to zero probabilities for all classes. Other benchmarked methods like SHOT, AdaBN, Tent, NRC, DUST and Pseudo-Labelling, which are strong baselines for standard SFDA benchmarks, also struggle with this bioacoustics task.
![]() |
Evolution of the test mean average precision (mAP), a standard metric for multilabel classification, throughout the adaptation procedure on the six soundscape datasets. We benchmark our proposed NOTELA and Dropout Student (see below), as well as SHOT, AdaBN, Tent, NRC, DUST and Pseudo-Labelling. Aside from NOTELA, all other methods fail to consistently improve the source model. |
Introducing NOisy student TEacher with Laplacian Adjustment (NOTELA)
Nonetheless, a surprisingly positive result stands out: the less celebrated Noisy Student principle appears promising. This unsupervised approach encourages the model to reconstruct its own predictions on some target dataset, but under the application of random noise. While noise may be introduced through various channels, we strive for simplicity and use model dropout as the only noise source: we therefore refer to this approach as Dropout Student (DS). In a nutshell, it encourages the model to limit the influence of individual neurons (or filters) when making predictions on a specific target dataset.
DS, while effective, faces a model collapse issue on various target domains. We hypothesize this happens because the source model initially lacks confidence in those target domains. We propose improving DS stability by using the feature space directly as an auxiliary source of truth. NOTELA does this by encouraging similar pseudo-labels for nearby points in the feature space, inspired by NRC’s method and Laplacian regularization. This simple approach is visualized below, and consistently and significantly outperforms the source model in both audio and visual tasks.
![]() |
Conclusion
The standard artificial image classification benchmarks have inadvertently limited our understanding of the true generalizability and robustness of SFDA methods. We advocate for broadening the scope and adopt a new assessment framework that incorporates naturally-occurring distribution shifts from bioacoustics. We also hope that NOTELA serves as a robust baseline to facilitate research in that direction. NOTELA’s strong performance perhaps points to two factors that can lead to developing more generalizable models: first, developing methods with an eye towards harder problems and second, favoring simple modeling principles. However, there is still future work to be done to pinpoint and comprehend existing methods’ failure modes on harder problems. We believe that our research represents a significant step in this direction, serving as a foundation for designing SFDA methods with greater generalizability.
Acknowledgements
One of the authors of this post, Eleni Triantafillou, is now at Google DeepMind. We are posting this blog post on behalf of the authors of the NOTELA paper: Malik Boudiaf, Tom Denton, Bart van Merriënboer, Vincent Dumoulin*, Eleni Triantafillou* (where * denotes equal contribution). We thank our co-authors for the hard work on this paper and the rest of the Perch team for their support and feedback.
1Note that in this audio clip, the bird song is very faint; a common property in soundscape recordings where bird calls aren’t at the “foreground”. ↩
A new partnership to promote responsible AI
Today, Google, Microsoft, OpenAI and Anthropic published a joint announcement establishing the Frontier Model Forum.Read More
Google at ICML 2023
Groups across Google actively pursue research in the field of machine learning (ML), ranging from theory and application. We build ML systems to solve deep scientific and engineering challenges in areas of language, music, visual processing, algorithm development, and more. We aim to build a more collaborative ecosystem with the broader ML research community through open-sourcing tools and datasets, publishing our work, and actively participating in conferences.
Google is proud to be a Diamond Sponsor of the 40th International Conference on Machine Learning (ICML 2023), a premier annual conference, which is being held this week in Honolulu, Hawaii. As a leader in ML research, Google has a strong presence at this year’s conference with over 120 accepted papers and active involvement in a number of workshops and tutorials. Google is also proud to be a Platinum Sponsor for both the LatinX in AI and Women in Machine Learning workshops. We look forward to sharing some of our extensive ML research and expanding our partnership with the broader ML research community.
Registered for ICML 2023? We hope you’ll visit the Google booth to learn more about the exciting work, creativity, and fun that goes into solving a portion of the field’s most interesting challenges. Visit the @GoogleAI Twitter account to find out about Google booth activities (e.g., demos and Q&A sessions). See Google DeepMind’s blog to learn about their technical participation at ICML 2023.
Take a look below to learn more about the Google research being presented at ICML 2023 (Google affiliations in bold).
Board and Organizing Committee
Board Members include: Corinna Cortes, Hugo Larochelle
Tutorial Chairs include: Hanie Sedghi
Google Research booth activities
Presenters: Bryan Perozzi, Anton Tsitsulin, Brandon Mayer
Title: Unsupervised Graph Embedding @ Google (paper, EXPO workshop)
Tuesday, July 25th at 10:30 AM HST
Presenters: Zheng Xu
Title: Federated Learning of Gboard Language Models with Differential Privacy (paper 1, paper 2, blog post)
Tuesday, July 25th at 3:30 PM HST
Presenters: Thomas Kipf
Title: Self-supervised scene understanding (paper 1, paper 2)
Wednesday, July 26th at 10:30 AM HST
Presenters: Johannes von Oswald, Max Vladymyrov
Title: Transformers learn in-context by gradient descent (paper)
Wednesday, July 26th at 3:30 PM HST
Accepted papers
Scaling Vision Transformers to 22 Billion Parameters (see blog post)
Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan, Matan Kalman, Yossi Matias
Best of Both Worlds Policy Optimization
Christoph Dann, Chen-Yu Wei, Julian Zimmert
Inflow, Outflow, and Reciprocity in Machine Learning
Mukund Sundararajan, Walid Krichene
Transformers Learn In-Context by Gradient Descent
Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, Max Vladymyrov
Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models
Luke Vilnis, Yury Zemlyanskiy, Patrick Murray*, Alexandre Passos*, Sumit Sanghai
Differentially Private Hierarchical Clustering with Provable Approximation Guarantees (see blog post)
Jacob Imola*, Alessandro Epasto, Mohammad Mahdian, Vincent Cohen-Addad, Vahab Mirrokni
Multi-Epoch Matrix Factorization Mechanisms for Private Machine Learning
Christopher A. Choquette-Choo, H. Brendan McMahan, Keith Rush, Abhradeep Thakurta
Random Classification Noise Does Not Defeat All Convex Potential Boosters Irrespective of Model Choice
Yishay Mansour, Richard Nock, Robert Williamson
Simplex Random Features
Isaac Reid, Krzysztof Choromanski, Valerii Likhosherstov, Adrian Weller
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova
Mu2SLAM: Multitask, Multilingual Speech and Language Models
Yong Cheng, Yu Zhang, Melvin Johnson, Wolfgang Macherey, Ankur Bapna
Robust Budget Pacing with a Single Sample
Santiago Balseiro, Rachitesh Kumar*, Vahab Mirrokni, Balasubramanian Sivan, Di Wang
A Statistical Perspective on Retrieval-Based Models
Soumya Basu, Ankit Singh Rawat, Manzil Zaheer
Approximately Optimal Core Shapes for Tensor Decompositions
Mehrdad Ghadiri, Matthew Fahrbach, Gang Fu, Vahab Mirrokni
Efficient List-Decodable Regression Using Batches
Abhimanyu Das, Ayush Jain*, Weihao Kong, Rajat Sen
Efficient Training of Language Models Using Few-Shot Learning
Sashank J. Reddi, Sobhan Miryoosefi, Stefani Karp, Shankar Krishnan, Satyen Kale, Seungyeon Kim, Sanjiv Kumar
Fully Dynamic Submodular Maximization Over Matroids
Paul Duetting, Federico Fusco, Silvio Lattanzi, Ashkan Norouzi-Fard, Morteza Zadimoghaddam
GFlowNet-EM for Learning Compositional Latent Variable Models
Edward J Hu, Nikolay Malkin, Moksh Jain, Katie Everett, Alexandros Graikos, Yoshua Bengio
Improved Online Learning Algorithms for CTR Prediction in Ad Auctions
Zhe Feng, Christopher Liaw, Zixin Zhou
Large Language Models Struggle to Learn Long-Tail Knowledge
Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, Colin Raffel
Multi-channel Autobidding with Budget and ROI Constraints
Yuan Deng, Negin Golrezaei, Patrick Jaillet, Jason Cheuk Nam Liang, Vahab Mirrokni
Multi-layer Neural Networks as Trainable Ladders of Hilbert Spaces
Zhengdao Chen
On User-Level Private Convex Optimization
Badih Ghazi, Pritish Kamath, Ravi Kumar, Raghu Meka, Pasin Manurangsi, Chiyuan Zhang
PAC Generalization via Invariant Representations
Advait U Parulekar, Karthikeyan Shanmugam, Sanjay Shakkottai
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Menard, Mohammad Gheshlaghi Azar, Remi Munos, Olivier Pietquin, Matthieu Geist,Csaba Szepesvari, Wataru Kumagai, Yutaka Matsuo
Speeding Up Bellman Ford via Minimum Violation Permutations
Silvio Lattanzi, Ola Svensson, Sergei Vassilvitskii
Statistical Indistinguishability of Learning Algorithms
Alkis Kalavasis, Amin Karbasi, Shay Moran, Grigoris Velegkas
Test-Time Adaptation with Slot-Centric Models
Mihir Prabhudesai, Anirudh Goyal, Sujoy Paul, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Gaurav Aggarwal, Thomas Kipf, Deepak Pathak, Katerina Fragkiadaki>
Algorithms for Bounding Contribution for Histogram Estimation Under User-Level Privacy
Yuhan Liu*, Ananda Theertha Suresh, Wennan Zhu, Peter Kairouz, Marco Gruteser
Bandit Online Linear Optimization with Hints and Queries
Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit
CLUTR: Curriculum Learning via Unsupervised Task Representation Learning
Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, Ion Stoica
CSP: Self-Supervised Contrastive Spatial Pre-training for Geospatial-Visual Representations
Gengchen Mai, Ni Lao, Yutong He, Jiaming Song, Stefano Ermon
Ewald-Based Long-Range Message Passing for Molecular Graphs
Arthur Kosmala, Johannes Gasteiger, Nicholas Gao, Stephan Günnemann
Fast (1+ε)-Approximation Algorithms for Binary Matrix Factorization
Ameya Velingker, Maximilian Vötsch, David Woodruff, Samson Zhou
Federated Linear Contextual Bandits with User-Level Differential Privacy
Ruiquan Huang, Huanyu Zhang, Luca Melis, Milan Shen, Meisam Hejazinia, Jing Yang
Investigating the Role of Model-Based Learning in Exploration and Transfer
Jacob C Walker, Eszter Vértes, Yazhe Li, Gabriel Dulac-Arnold, Ankesh Anand, Theophane Weber, Jessica B Hamrick
Label Differential Privacy and Private Training Data Release
Robert Busa-Fekete, Andres Munoz, Umar Syed, Sergei Vassilvitskii
Lifelong Language Pretraining with Distribution-Specialized Experts
Wuyang Chen*, Yanqi Zhou, Nan Du, Yanping Huang, James Laudon, Zhifeng Chen, Claire Cui
Multi-User Reinforcement Learning with Low Rank Rewards
Dheeraj Mysore Nagaraj, Suhas S Kowshik, Naman Agarwal, Praneeth Netrapalli, Prateek Jain
Multi-View Masked World Models for Visual Robotic Manipulation
Younggyo Seo, Junsu Kim, Stephen James, Kimin Lee, Jinwoo Shin, Pieter Abbeel
PaLM-E: An Embodied Multimodal Language Model (see blog post)
Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter,Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence
Private Federated Learning with Autotuned Compression
Enayat Ullah*, Christopher A. Choquette-Choo, Peter Kairouz, Sewoong Oh
Refined Regret for Adversarial MDPs with Linear Function Approximation
Yan Dai, Haipeng Luo, Chen-Yu Wei, Julian Zimmert
Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory
Justin Cui, Ruoche Wan, Si Si, Cho-Jui Hsieh
SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance
Amit Attia, Tomer Koren
The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Mark Rowland, Yunhao Tang, Clare Lyle, Rémi Munos, Marc G. Bellemare, Will Dabney
Unveiling The Mask of Position-Information Pattern Through the Mist of Image Features
Chieh Hubert Lin, Hung-Yu Tseng, Hsin-Ying Lee, Maneesh Kumar Singh, Ming-Hsuan Yang
User-Level Private Stochastic Convex Optimization with Optimal Rates
Raef Bassily, Ziteng Sun
A Simple Zero-Shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
James Urquhart Allingham*, Jie Ren, Michael W Dusenberry, Xiuye Gu, Yin Cui, Dustin Tran, Jeremiah Zhe Liu, Balaji Lakshminarayanan
Can Large Language Models Reason About Program Invariants?
Kexin Pei, David Bieber, Kensen Shi, Charles Sutton, Pengcheng Yin
Concurrent Shuffle Differential Privacy Under Continual Observation
Jay Tenenbaum, Haim Kaplan, Yishay Mansour, Uri Stemmer
Constant Matters: Fine-Grained Error Bound on Differentially Private Continual Observation
Hendrik Fichtenberger, Monika Henzinger, Jalaj Upadhyay
Cross-Entropy Loss Functions: Theoretical Analysis and Applications
Anqi Mao, Mehryar Mohri, Yutao Zhong
Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation
Orin Levy, Alon Cohen, Asaf Cassel, Yishay Mansour
Fairness in Streaming Submodular Maximization Over a Matroid Constraint
Marwa El Halabi, Federico Fusco, Ashkan Norouzi-Fard, Jakab Tardos, Jakub Tarnawski
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning (see blog post)
Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V Le, Barret Zoph, Jason Wei, Adam Roberts
Graph Reinforcement Learning for Network Control via Bi-level Optimization
Daniele Gammelli, James Harrison, Kaidi Yang, Marco Pavone, Filipe Rodrigues, Francisco C. Pereira
Learning-Augmented Private Algorithms for Multiple Quantile Release
Mikhail Khodak*, Kareem Amin, Travis Dick, Sergei Vassilvitskii
LegendreTron: Uprising Proper Multiclass Loss Learning
Kevin H Lam, Christian Walder, Spiridon Penev, Richard Nock
Measuring the Impact of Programming Language Distribution
Gabriel Orlanski*, Kefan Xiao, Xavier Garcia, Jeffrey Hui, Joshua Howland, Jonathan Malmaud, Jacob Austin, Rishabh Singh, Michele Catasta*
Multi-task Differential Privacy Under Distribution Skew
Walid Krichene, Prateek Jain, Shuang Song, Mukund Sundararajan, Abhradeep Thakurta, Li Zhang
Muse: Text-to-Image Generation via Masked Generative Transformers
Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, José Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan
On the Convergence of Federated Averaging with Cyclic Client Participation
Yae Jee Cho, Pranay Sharma, Gauri Joshi, Zheng Xu, Satyen Kale, Tong Zhang
Optimal Stochastic Non-smooth Non-convex Optimization Through Online-to-Non-convex Conversion
Ashok Cutkosky, Harsh Mehta, Francesco Orabona
Out-of-Domain Robustness via Targeted Augmentations
Irena Gao, Shiori Sagawa, Pang Wei Koh, Tatsunori Hashimoto, Percy Liang
Polynomial Time and Private Learning of Unbounded Gaussian Mixture Models
Jamil Arbas, Hassan Ashtiani, Christopher Liaw
Pre-computed Memory or On-the-Fly Encoding? A Hybrid Approach to Retrieval Augmentation Makes the Most of Your Compute
Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Joshua Ainslie, Sumit Sanghai, Fei Sha, William W. Cohen
Scalable Adaptive Computation for Iterative Generation
Allan Jabri*, David J. Fleet, Ting Chen
Scaling Spherical CNNs
Carlos Esteves, Jean-Jacques Slotine, Ameesh Makadia
STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition
Yucheng Lu, Shivani Agrawal, Suvinay Subramanian, Oleg Rybakov, Christopher De Sa, Amir Yazdanbakhsh
Stratified Adversarial Robustness with Rejection
Jiefeng Chen, Jayaram Raghuram, Jihye Choi, Xi Wu, Yingyu Liang, Somesh Jha
When Does Privileged information Explain Away Label Noise?
Guillermo Ortiz-Jimenez*, Mark Collier, Anant Nawalgaria, Alexander D’Amour, Jesse Berent, Rodolphe Jenatton, Effrosyni Kokiopoulou
Adaptive Computation with Elastic Input Sequence
Fuzhao Xue*, Valerii Likhosherstov, Anurag Arnab, Neil Houlsby, Mostafa Dehghani, Yang You
Can Neural Network Memorization Be Localized?
Pratyush Maini, Michael C. Mozer, Hanie Sedghi, Zachary C. Lipton, J. Zico Kolter, Chiyuan Zhang
Controllability-Aware Unsupervised Skill Discovery
Seohong Park, Kimin Lee, Youngwoon Lee, Pieter Abbeel
Efficient Learning of Mesh-Based Physical Simulation with Bi-Stride Multi-Scale Graph Neural Network
Yadi Cao, Menglei Chai, Minchen Li, Chenfanfu Jiang
Federated Heavy Hitter Recovery Under Linear Sketching
Adria Gascon, Peter Kairouz, Ziteng Sun, Ananda Theertha Suresh
Graph Generative Model for Benchmarking Graph Neural Networks
Minji Yoon, Yue Wu, John Palowitch, Bryan Perozzi, Russ Salakhutdinov
H-Consistency Bounds for Pairwise Misranking Loss Surrogates
Anqi Mao, Mehryar Mohri, Yutao Zhong
Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation
Uri Sherman, Tomer Koren, Yishay Mansour
Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames
Ondrej Biza*, Sjoerd van Steenkiste, Mehdi S. M. Sajjadi, Gamaleldin Fathy Elsayed, Aravindh Mahendran, Thomas Kipf
Multi-task Off-Policy Learning from Bandit Feedback
Joey Hong, Branislav Kveton, Manzil Zaheer, Sumeet Katariya, Mohammad Ghavamzadeh
Optimal No-Regret Learning for One-Sided Lipschitz Functions
Paul Duetting, Guru Guruganesh, Jon Schneider, Joshua Ruizhi Wang
Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games
Batuhan Yardim, Semih Cayci, Matthieu Geist, Niao He
Regret Minimization and Convergence to Equilibria in General-Sum Markov Games
Liad Erez, Tal Lancewicki, Uri Sherman, Tomer Koren, Yishay Mansour
Reinforcement Learning Can Be More Efficient with Multiple Rewards
Christoph Dann, Yishay Mansour, Mehryar Mohri
Reinforcement Learning with History-Dependent Dynamic Contexts
Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig Boutlier
User-Defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems
Marc Anton Finzi*, Anudhyan Boral, Andrew Gordon Wilson, Fei Sha, Leonardo Zepeda-Nunez
Discrete Key-Value Bottleneck
Frederik Träuble, Anirudh Goyal, Nasim Rahaman, Michael Curtis Mozer, Kenji Kawaguchi, Yoshua Bengio, Bernhard Schölkopf
DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm
Lisang Ding, Kexin Jin, Bicheng Ying, Kun Yuan, Wotao Yin
Exphormer: Sparse Transformers for Graphs
Hamed Shirzad, Ameya Velingker, Balaji Venkatachalam, Danica J. Sutherland, Ali Kemal Sinop
Fast, Differentiable and Sparse Top-k: A Convex Analysis Perspective
Michael Eli Sander*, Joan Puigcerver, Josip Djolonga, Gabriel Peyré, Mathieu Blondel
Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation
Aditya Mate, Bryan Wilder, Aparna Taneja, Milind Tambe
In Search for a Generalizable Method for Source Free Domain Adaptation
Malik Boudiaf*, Tom Denton, Bart van Merrienboer, Vincent Dumoulin, Eleni Triantafillou
Learning Rate Schedules in the Presence of Distribution Shift
Matthew Fahrbach, Adel Javanmard, Vahab Mirrokni, Pratik Worah
Not All Semantics Are Created Equal: Contrastive Self-Supervised Learning with Automatic Temperature Individualization
Zi-Hao Qiu, Quanqi Hu, Zhuoning Yuan, Denny Zhou, Lijun Zhang, Tianbao Yang
On the Relationship Between Explanation and Prediction: A Causal View
Amir-Hossein Karimi*, Krikamol Muandet, Simon Kornblith, Bernhard Schölkopf, Been Kim
On the Role of Attention in Prompt-Tuning
Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi, Christos Thrampoulidis
PLay: Parametrically Conditioned Layout Generation Using Latent Diffusion
Chin-Yi Cheng, Forrest Huang, Gang Li, Yang Li
The Power of Learned Locally Linear Models for Nonlinear Policy Optimization
Daniel Pfrommer, Max Simchowitz, Tyler Westenbroek, Nikolai Matni, Stephen Tu
Relevant Walk Search for Explaining Graph Neural Networks
Ping Xiong, Thomas Schnake, Michael Gastegger, Grégoire Montavon, Klaus Robert Muller,Shinichi Nakajima
Repository-Level Prompt Generation for Large Language Models of Code
Disha Shrivastava, Hugo Larochelle, Daniel Tarlow
Robust and Private Stochastic Linear Bandits
Vasileios Charisopoulos*, Hossein Esfandiari, Vahab Mirrokni
Simple Diffusion: End-to-End Diffusion for High Resolution Images
Emiel Hoogeboom, Jonathan Heek, Tim Salimans
Tied-Augment: Controlling Representation Similarity Improves Data Augmentation
Emirhan Kurtulus, Zichao Li, Yann Dauphin, Ekin D. Cubuk
Why Is Public Pre-Training Necessary for Private Model Training?
Arun Ganesh, Mahdi Haghifam*, Milad Nasr, Sewoong Oh, Thomas Steinke, Om Thakkar, Abhradeep Guha Thakurta, Lun Wang
A Connection Between One-Step RL and Critic Regularization in Reinforcement Learning
Benjamin Eysenbach, Matthieu Geist, Sergey Levine, Ruslan Salakhutdinov
Beyond Uniform Lipschitz Condition in Differentially Private Optimization
Rudrajit Das*, Satyen Kale, Zheng Xu, Tong Zhang, Sujay Sanghavi
Efficient Graph Field Integrators Meet Point Clouds
Krzysztof Choromanski, Arijit Sehanobish, Han Lin, Yunfan Zhao, Eli Berger, Tetiana Parshakova, Alvin Pan, David Watkins, Tianyi Zhang, Valerii Likhosherstov, Somnath Basu Roy Chowdhury, Avinava Dubey, Deepali Jain, Tamas Sarlos, Snigdha Chaturvedi, Adrian Weller
Fast as CHITA: Neural Network Pruning with Combinatorial Optimization
Riade Benbaki, Wenyu Chen, Xiang Meng, Hussein Hazimeh, Natalia Ponomareva, Zhe Zhao, Rahul Mazumder
Jump-Start Reinforcement Learning (see blog post)
Ikechukwu Uchendu*, Ted Xiao, Yao Lu, Banghua Zhu, Mengyuan Yan, Joséphine Simon, Matthew Bennice, Chuyuan Fu, Cong Ma, Jiantao Jiao, Sergey Levine, Karol Hausman
Learning in POMDPs is Sample-Efficient with Hindsight Observability
Jonathan Lee, Alekh Agarwal, Christoph Dann, Tong Zhang
Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single
Paul Vicol
Masked Trajectory Models for Prediction, Representation, and Control
Philipp Wu, Arjun Majumdar, Kevin Stone, Yixin Lin, Igor Mordatch, Pieter Abbeel, Aravind Rajeswaran
Overcoming Simplicity Bias in Deep Networks Using a Feature Sieve
Rishabh Tiwari, Pradeep Shenoy
Pairwise Ranking Losses of Click-Through Rates Prediction for Welfare Maximization in Ad Auctions
Boxiang Lyu, Zhe Feng, Zachary Robertson, Sanmi Koyejo
Predictive Flows for Faster Ford-Fulkerson
Sami Davies, Benjamin Moseley, Sergei Vassilvitskii, Yuyan Wang
Scaling Laws for Multilingual Neural Machine Translation
Patrick Fernandes, Behrooz Ghorbani, Xavier Garcia, Markus Freitag, Orhan Firat
Sequential Monte Carlo Learning for Time Series Structure Discovery
Feras Saad, Brian Patton, Matthew Douglas Hoffman, Rif A. Saurous, Vikash Mansinghka
Stochastic Gradient Succeeds for Bandits
Jincheng Mei, Zixin Zhong, Bo Dai, Alekh Agarwal, Csaba Szepesvari, Dale Schuurmans
Subset-Based Instance Optimality in Private Estimation
Travis Dick, Alex Kulesza, Ziteng Sun, Ananda Theertha Suresh
The Unreasonable Effectiveness of Few-Shot Learning for Machine Translation
Xavier Garcia, Yamini Bansal, Colin Cherry, George Foster, Maxim Krikun, Melvin Johnson, Orhan Firat
Tutorials
Self-Supervised Learning in Vision: from Research Advances to Best Practices
Xinlei Chen, Ishan Misra, Randall Balestriero, Mathilde Caron, Christoph Feichtenhofer, Mark Ibrahim
How to DP-fy ML: A Practical Tutorial to Machine Learning with Differential Privacy (see blog post)
Sergei Vassilvitskii, Natalia Ponomareva, Zheng Xu
Recent Advances in the Generalization Theory of Neural Networks
Tengyu Ma, Alex Damian
EXPO Day workshops
Graph Neural Networks in Tensorflow: A Practical Guide
Workshop Organizers include: Bryan Perozzi, Anton Tsitsulin, Brandon Mayer, Jonathan Halcrow
Google sponsored affinity workshops
LatinX in AI (LAXAI)
Platinum Sponsor
Keynote Speaker: Monica Ribero
Panelist: Yao Qin
Women in Machine Learning (WiML)
Platinum Sponsor
Panelists: Yao Qin
Workshops
Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities
Organizer: Peter Kairouz, Zheng Xu
Speaker: Brendan McMahan
Interpretable Machine Learning in Healthcare (IMLH)
Organizer: Ramin Zabih
Knowledge and Logical Reasoning in the Era of Data-Driven Learning
Organizer: Beliz Günel
The Many Facets of Preference-Based Learning (MFPL)
Organizer: Robert Busa-Fekete, Mohammad Ghavamzadeh
The Synergy of Scientific and Machine Learning Modelling (SynS & ML)
Speaker: Sercan Arik
Theory of Mind in Communicating Agents
Organizer: Pei Zhou
Artificial Intelligence & Human Computer Interaction
Organizer: Yang Li, Forrest Huang
Data-Centric Machine Learning Research (DMLR)
Organizer: Alicia Parrish, Najoung Kim
Speaker: Peter Mattson
Neural Compression: from Information Theory to Applications
Speaker: Johannes Ballé
Panelist: George Toderici
Organizer: Ahmad Beirami
Spurious Correlations, Invariance and Stability (SCIS)
Organizer: Amir Feder
* Work done while at Google