Google at ICLR 2022

The 10th International Conference on Learning Representations (ICLR 2022) kicks off this week, bringing together researchers, entrepreneurs, engineers and students alike to discuss and explore the rapidly advancing field of deep learning. Entirely virtual this year, ICLR 2022 offers conference and workshop tracks that present some of the latest research in deep learning and its applications to areas ranging from computer vision, speech recognition and text understanding to robotics, computational biology, and more.

As a Platinum Sponsor of ICLR 2022 and Champion DEI Action Fund contributor, Google will have a robust presence with nearly 100 accepted publications and extensive participation on organizing committees and in workshops. If you have registered for ICLR 2022, we hope you’ll watch our talks and learn about the work done at Google to address complex problems that affect billions of people. Here you can learn more about the research we will be presenting as well as our general involvement at ICLR 2022 (those with Google affiliations in bold).

Senior Area Chairs:
Includes: Been Kim, Dale Schuurmans, Sergey Levine

Area Chairs:
Includes: Adam White, Aditya Menon, Aleksandra Faust, Amin Karbasi, Amir Globerson, Andrew Dai, Balaji Lakshminarayanan, Behnam Neyshabur, Ben Poole, Bhuwan Dhingra, Bo Dai, Boqing Gong, Cristian Sminchisescu, David Ha, David Woodruff, Denny Zhou, Dipanjan Das, Dumitru Erhan, Dustin Tran, Emma Strubell, Eunsol Choi, George Dahl, George Tucker, Hanie Sedghi, Heinrich Jiang, Hossein Mobahi, Hugo Larochelle, Izhak Shafran, Jasper Snoek, Jean-Philippe Vert, Jeffrey Pennington, Justin Gilmer, Karol Hausman, Kevin Swersky, Krzysztof Choromanski, Mathieu Blondel, Matt Kusner, Michael Ryoo, Ming-Hsuan Yang, Minmin Chen, Mirella Lapata, Mohammad Ghavamzadeh, Mohammad Norouzi, Naman Agarwal, Nicholas Carlini, Olivier Bachem, Piyush Rai, Prateek Jain, Quentin Berthet, Richard Nock, Rose Yu, Sewoong Oh, Silvio Lattanzi, Slav Petrov, Srinadh Bhojanapalli, Tim Salimans, Ting Chen, Tong Zhang, Vikas Sindhwani, Weiran Wang, William Cohen, Xiaoming Liu

Workflow Chairs:
Includes: Yaguang Li

Diversity Equity & Inclusion Chairs:
Includes: Rosanne Liu

Invited Talks
Beyond Interpretability: Developing a Language to Shape Our Relationships with AI
Google Speaker: Been Kim

Do You See What I See? Large-Scale Learning from Multimodal Videos
Google Speaker: Cordelia Schmid

Publications
Hyperparameter Tuning with Renyi Differential Privacy – 2022 Outstanding Paper Award
Nicolas Papernot, Thomas Steinke

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling
Yusong Wu, Ethan Manilow, Yi Deng, Rigel Swavely, Kyle Kastner, Tim Cooijmans, Aaron Courville, Cheng-Zhi Anna Huang, Jesse Engel

The Information Geometry of Unsupervised Reinforcement Learning
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Learning Strides in Convolutional Neural Networks – 2022 Outstanding Paper Award
Rachid Riad*, Olivier Teboul, David Grangier, Neil Zeghidour

Poisoning and Backdooring Contrastive Learning
Nicholas Carlini, Andreas Terzis

Coordination Among Neural Modules Through a Shared Global Workspace
Anirudh Goyal, Aniket Didolkar, Alex Lamb, Kartikeya Badola, Nan Rosemary Ke, Nasim Rahaman, Jonathan Binas, Charles Blundell, Michael Mozer, Yoshua Bengio

Fine-Tuned Language Models Are Zero-Shot Learners (see the blog post)
Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le

Large Language Models Can Be Strong Differentially Private Learners
Xuechen Li, Florian Tramèr, Percy Liang, Tatsunori Hashimoto

Progressive Distillation for Fast Sampling of Diffusion Models
Tim Salimans, Jonathan Ho

Exploring the Limits of Large Scale Pre-training
Samira Abnar, Mostafa Dehghani, Behnam Neyshabur, Hanie Sedghi

Scarf: Self-Supervised Contrastive Learning Using Random Feature Corruption
Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler

Scalable Sampling for Nonsymmetric Determinantal Point Processes
Insu Han, Mike Gartrell, Jennifer Gillenwater, Elvis Dohmatob, Amin Karbasi

When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Xiangning Chen, Cho-Jui Hsieh, Boqing Gong

ViTGAN: Training GANs with Vision Transformers
Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu

Generalized Decision Transformer for Offline Hindsight Information Matching
Hiroki Furuta, Yutaka Matsuo, Shixiang Shane Gu

The MultiBERTs: BERT Reproductions for Robustness Analysis
Thibault Sellam, Steve Yadlowsky, Ian Tenney, Jason Wei, Naomi Saphra, Alexander D’Amour, Tal Linzen, Jasmijn Bastings, Iulia Turc, Jacob Eisenstein, Dipanjan Das, Ellie Pavlick

Scaling Laws for Neural Machine Translation
Behrooz Ghorbani, Orhan Firat, Markus Freitag, Ankur Bapna, Maxim Krikun, Xavier Garcia, Ciprian Chelba, Colin Cherry

Interpretable Unsupervised Diversity Denoising and Artefact Removal
Mangal Prakash, Mauricio Delbracio, Peyman Milanfar, Florian Jug

Understanding Latent Correlation-Based Multiview Learning and Self-Supervision: An Identifiability Perspective
Qi Lyu, Xiao Fu, Weiran Wang, Songtao Lu

Memorizing Transformers
Yuhuai Wu, Markus N. Rabe, DeLesley Hutchins, Christian Szegedy

Churn Reduction via Distillation
Heinrich Jiang, Harikrishna Narasimhan, Dara Bahri, Andrew Cotter, Afshin Rostamizadeh

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization
Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine

Path Auxiliary Proposal for MCMC in Discrete Space
Haoran Sun, Hanjun Dai, Wei Xia, Arun Ramamurthy

On the Relation Between Statistical Learning and Perceptual Distances
Alexander Hepburn, Valero Laparra, Raul Santos-Rodriguez, Johannes Ballé, Jesús Malo

Possibility Before Utility: Learning And Using Hierarchical Affordances
Robby Costales, Shariq Iqbal, Fei Sha

MT3: Multi-Task Multitrack Music Transcription
Josh Gardner*, Ian Simon, Ethan Manilow*, Curtis Hawthorne, Jesse Engel

Bayesian Neural Network Priors Revisited
Vincent Fortuin, Adrià Garriga-Alonso, Sebastian W. Ober, Florian Wenzel, Gunnar Rätsch, Richard E. Turner, Mark van der Wilk, Laurence Aitchison

GradMax: Growing Neural Networks using Gradient Information
Utku Evci, Bart van Merrienboer, Thomas Unterthiner, Fabian Pedregosa, Max Vladymyrov

Scene Transformer: A Unified Architecture for Predicting Future Trajectories of Multiple Agents
Jiquan Ngiam, Benjamin Caine, Vijay Vasudevan, Zhengdong Zhang, Hao-Tien Lewis Chiang, Jeffrey Ling, Rebecca Roelofs, Alex Bewley, Chenxi Liu, Ashish Venugopal, David Weiss, Ben Sapp, Zhifeng Chen, Jonathon Shlens

The Role of Pretrained Representations for the OOD Generalization of RL Agents
Frederik Träuble, Andrea Dittadi, Manuel Wüthrich, Felix Widmaier, Peter Gehler, Ole Winther, Francesco Locatello, Olivier Bachem, Bernhard Schölkopf, Stefan Bauer

Autoregressive Diffusion Models
Emiel Hoogeboom, Alexey A. Gritsenko, Jasmijn Bastings, Ben Poole, Rianne van den Berg, Tim Salimans

The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks
Rahim Entezari, Hanie Seghi, Olga Saukh, Behnam Neyshabur

DISSECT: Disentangled Simultaneous Explanations via Concept Traversals
Asma Ghandeharioun, Been Kim, Chun-Liang Li, Brendan Jou, Brian Eoff, Rosalind W. Picard

Anisotropic Random Feature Regression in High Dimensions
Gabriel C. Mel, Jeffrey Pennington

Open-Vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu, Tsung-Yi Lin*, Weicheng Kuo, Yin Cui

MCMC Should Mix: Learning Energy-Based Model with Flow-Based Backbone
Erik Nijkamp*, Ruiqi Gao, Pavel Sountsov, Srinivas Vasudevan, Bo Pang, Song-Chun Zhu, Ying Nian Wu

Effect of Scale on Catastrophic Forgetting in Neural Networks
Vinay Ramasesh, Aitor Lewkowycz, Ethan Dyer

Incremental False Negative Detection for Contrastive Learning
Tsai-Shien Chen, Wei-Chih Hung, Hung-Yu Tseng, Shao-Yi Chien, Ming-Hsuan Yang

Towards Evaluating the Robustness of Neural Networks Learned by Transduction
Jiefeng Chen, Xi Wu, Yang Guo, Yingyu Liang, Somesh Jha

What Do We Mean by Generalization in Federated Learning?
Honglin Yuan*, Warren Morningstar, Lin Ning, Karan Singhal

ViDT: An Efficient and Effective Fully Transformer-Based Object Detector
Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang

Measuring CLEVRness: Black-Box Testing of Visual Reasoning Models
Spyridon Mouselinos, Henryk Michalewski, Mateusz Malinowski

Wisdom of Committees: An Overlooked Approach To Faster and More Accurate Models (see the blog post)
Xiaofang Wang, Dan Kondratyuk, Eric Christiansen, Kris M. Kitani, Yair Alon (prev. Movshovitz-Attias), Elad Eban

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance
Saurabh Garg*, Sivaraman Balakrishnan, Zachary C. Lipton, Behnam Neyshabur, Hanie Sedghi

Data-Driven Offline Optimization for Architecting Hardware Accelerators (see the blog post)
Aviral Kumar, Amir Yazdanbakhsh, Milad Hashemi, Kevin Swersky, Sergey Levine

Diurnal or Nocturnal? Federated Learning of Multi-branch Networks from Periodically Shifting Distributions
Chen Zhu*, Zheng Xu, Mingqing Chen, Jakub Konecny, Andrew Hard, Tom Goldstein

Policy Gradients Incorporating the Future
David Venuto, Elaine Lau, Doina Precup, Ofir Nachum

Discrete Representations Strengthen Vision Transformer Robustness
Chengzhi Mao*, Lu Jiang, Mostafa Dehghani, Carl Vondrick, Rahul Sukthankar, Irfan Essa

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision (see the blog post)
Zirui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao

Neural Stochastic Dual Dynamic Programming
Hanjun Dai, Yuan Xue, Zia Syed, Dale Schuurmans, Bo Dai

PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions
Zhaoqi Leng, Mingxing Tan, Chenxi Liu, Ekin Dogus Cubuk, Xiaojie Shi, Shuyang Cheng, Dragomir Anguelov

Information Prioritization Through Empowerment in Visual Model-Based RL
Homanga Bharadhwaj*, Mohammad Babaeizadeh, Dumitru Erhan, Sergey Levine

Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning
Dhruv Shah, Peng Xu, Yao Lu, Ted Xiao, Alexander Toshev, Sergey Levine, Brian Ichter

Understanding and Leveraging Overparameterization in Recursive Value Estimation
Chenjun Xiao, Bo Dai, Jincheng Mei, Oscar Ramirez, Ramki Gummadi, Chris Harris, Dale Schuurmans

The Efficiency Misnomer
Mostafa Dehghani, Anurag Arnab, Lucas Beyer, Ashish Vaswani, Yi Tay

On the Role of Population Heterogeneity in Emergent Communication
Mathieu Rita, Florian Strub, Jean-Bastien Grill, Olivier Pietquin, Emmanuel Dupoux

No One Representation to Rule Them All: Overlapping Features of Training Methods
Raphael Gontijo-Lopes, Yann Dauphin, Ekin D. Cubuk

Data Poisoning Won’t Save You From Facial Recognition
Evani Radiya-Dixit, Sanghyun Hong, Nicholas Carlini, Florian Tramèr

AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation
David Berthelot, Rebecca Roelofs, Kihyuk Sohn, Nicholas Carlini, Alex Kurakin

Maximum Entropy RL (Provably) Solves Some Robust RL Problems
Benjamin Eysenbach, Sergey Levine

Auto-scaling Vision Transformers Without Training
Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou

Optimizing Few-Step Diffusion Samplers by Gradient Descent
Daniel Watson, William Chan, Jonathan Ho, Mohammad Norouzi

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler

Fortuitous Forgetting in Connectionist Networks
Hattie Zhou, Ankit Vani, Hugo Larochelle, Aaron Courville

Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent
Oliver Bryniarski, Nabeel Hingun, Pedro Pachuca, Vincent Wang, Nicholas Carlini

Benchmarking the Spectrum of Agent Capabilities
Danijar Hafner

Charformer: Fast Character Transformers via Gradient-Based Subword Tokenization
Yi Tay, Vinh Q. Tran, Sebastian Ruder, Jai Gupta, Hyung Won Chung, Dara Bahri, Zhen Qin, Simon Baumgartner, Cong Yu, Donald Metzler

Mention Memory: Incorporating Textual Knowledge into Transformers Through Entity Mention Attention
Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Fei Sha, William Cohen

Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic Objectives with Skewed Hessian Spectrums
Rui Pan, Haishan Ye, Tong Zhang

Scale Efficiently: Insights from Pre-training and Fine-Tuning Transformers
Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

Omni-Scale CNNs: A Simple and Effective Kernel Size Configuration for Time Series Classification
Wensi Tang, Guodong Long, Lu Liu,Tianyi Zhou, Michael Blumenstein, Jing Jiang

Embedded-Model Flows: Combining the Inductive Biases of Model-Free Deep Learning and Explicit Probabilistic Modeling
Gianluigi Silvestri, Emily Fertig, Dave Moore, Luca Ambrogioni

Post Hoc Explanations May be Ineffective for Detecting Unknown Spurious Correlation
Julius Adebayo, Michael Muelly, Hal Abelson, Been Kim

Axiomatic Explanations for Visual Search, Retrieval, and Similarity Learning
Mark Hamilton, Scott Lundberg, Stephanie Fu, Lei Zhang, William T. Freeman

Pix2seq: A Language Modeling Framework for Object Detection (see the blog post)
Ting Chen, Saurabh Saxena, Lala Li, David J. Fleet, Geoffrey Hinton

Mirror Descent Policy Optimization
Manan Tomar, Lior Shani, Yonathan Efroni, Mohammad Ghavamzadeh

CodeTrek: Flexible Modeling of Code Using an Extensible Relational Representation
Pardis Pashakhanloo, Aaditya Naik, Yuepeng Wang, Hanjun Dai, Petros Maniatis, Mayur Naik

Conditional Object-Centric Learning From Video
Thomas Kipf, Gamaleldin F. Elsayed, Aravindh Mahendran, Austin Stone, Sara Sabour, Georg Heigold, Rico Jonschkowski, Alexey Dosovitskiy, Klaus Greff

A Loss Curvature Perspective on Training Instabilities of Deep Learning Models
Justin Gilmer, Behrooz Ghorbani, Ankush Garg, Sneha Kudugunta, Behnam Neyshabur, David Cardoze, George E. Dahl, Zack Nado, Orhan Firat

Autonomous Reinforcement Learning: Formalism and Benchmarking
Archit Sharma, Kelvin Xu, Nikhil Sardana, Abhishek Gupta, Karol Hausman, Sergey Levine, Chelsea Finn

TRAIL: Near-Optimal Imitation Learning with Suboptimal Data
Mengjiao Yang, Sergey Levine, Ofir Nachum

Minimax Optimization With Smooth Algorithmic Adversaries
Tanner Fiez, Lillian J. Ratliff, Chi Jin, Praneeth Netrapalli

Unsupervised Semantic Segmentation by Distilling Feature Correspondences
Mark Hamilton, Zhoutong Zhang, Bharath Hariharan, Noah Snavely, William T. Freeman

InfinityGAN: Towards Infinite-Pixel Image Synthesis
Chieh Hubert Lin, Hsin-Ying Lee, Yen-Chi Cheng, Sergey Tulyakov, Ming-Hsuan Yang

Shuffle Private Stochastic Convex Optimization
Albert Cheu, Matthew Joseph, Jieming Mao, Binghui Peng

Hybrid Random Features
Krzysztof Choromanski, Haoxian Chen, Han Lin, Yuanzhe Ma, Arijit Sehanobish, Deepali Jain, Michael S Ryoo, Jake Varley, Andy Zeng, Valerii Likhosherstov, Dmitry Kalashnikov, Vikas Sindhwani, Adrian Weller

Vector-Quantized Image Modeling With Improved VQGAN
Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu

On the Benefits of Maximum Likelihood Estimation for Regression and Forecasting
Pranjal Awasthi, Abhimanyu Das, Rajat Sen, Ananda Theertha Suresh

Surrogate Gap Minimization Improves Sharpness-Aware Training
Juntang Zhuang*, Boqing Gong, Liangzhe Yuan, Yin Cui, Hartwig Adam, Nicha C. Dvornek, Sekhar Tatikonda, James S. Duncan, Ting Liu

Online Target Q-learning With Reverse Experience Replay: Efficiently Finding the Optimal Policy for Linear MDPs
Naman Agarwal, Prateek Jain, Dheeraj Nagaraj, Praneeth Netrapalli, Syomantak Chaudhuri

CrossBeam: Learning to Search in Bottom-Up Program Synthesis
Kensen Shi, Hanjun Dai, Kevin Ellis, Charles Sutton

Workshops
Workshop on the Elements of Reasoning: Objects, Structure, and Causality (OSC)
Organizers include: Klaus Greff, Thomas Kipf

Workshop on Agent Learning in Open-Endedness
Organizers include: Krishna Srinivasan
Speakers include: Natasha Jaques, Danijar Hafner

Wiki-M3L: Wikipedia and Multi-modal & Multi-lingual Research
Organizers include: Klaus Greff, Thomas Kipf
Speakers include: Jason Baldridge, Tom Duerig

Setting Up ML Evaluation Standards to Accelerate Progress
Organizers include: Rishabh Agarwal
Speakers and Panelists include: Katherine Heller, Sara Hooker, Corinna Cortes

From Cells to Societies: Collective Learning Across Scales
Organizers include: Mark Sandler, Max Vladymyrov
Speakers include: Blaise Aguera y Arcas, Alexander Mordvintsev, Michael Mozer

Emergent Communication: New Frontiers
Speakers include: Natasha Jaques

Deep Learning for Code
Organizers include: Jonathan Herzig

GroundedML: Anchoring Machine Learning in Classical Algorithmic Theory
Speakers include: Gintare Karolina Dziugaite

Generalizable Policy Learning in the Physical World
Speakers and Panelists include: Mrinal Kalakrishnan

CoSubmitting Summer (CSS) Workshop
Organizers include: Rosanne Liu



*Work done while at Google.  

Read More

How Nordic Aviation Capital uses Amazon Rekognition to streamline operations and save up to EUR200,000 annually

Nordic Aviation Capital (NAC) is the industry’s leading regional aircraft lessor, serving almost 70 airlines in approximately 45 countries worldwide.

In 2021, NAC turned to AWS to help it use artificial intelligence (AI) to further improve its leasing operations and reduce its reliance on manual labor.

With Amazon Rekognition Custom Labels, NAC built an AI solution that could automatically scan aircraft maintenance records and identify, based on their visual layouts, the specific documents requiring further review. This reduced their reliance on external contractors to do this work, improving speed and saving an estimated EUR200,000 annually in costs.

In this post, we share how NAC uses Amazon Rekognition to streamline their operations.

“Amazon Rekognition Custom Labels has given us superpowers when it comes to improving our aircraft maintenance reviews. We’re both impressed and excited by the opportunities this opens up for our team and the value it can help us create for our customers.”

-Mads Krog-Jensen, Senior Vice President of IT, NAC

Automating the document review process with AI

A key part of NAC’s leasing process is the validation of each of its leased aircraft’s maintenance history to determine the safety and operability of its component parts.

This process requires NAC’s maintenance technicians to validate a variety of key forms, with the collection of documents containing the maintenance history of each of the aircraft’s key parts, known as the maintenance package.

These maintenance packages are extensive and unstructured, often amounting to as many as 10,000 pages, and containing various types and formats of documents that can vary widely based on the age and maintenance history of the aircraft.

The task of finding these specific forms was long and menial, generally performed by external contractors, who could take as long as a week to review each maintenance package and identify any essential forms requiring further review. This created a key process bottleneck that added additional cost and time to NAC’s lending process.

To streamline this process, NAC set out to develop an AI-driven document review workflow that could automate this manual process by scanning entire maintenance packages to accurately identify and return only those documents that required further review by NAC specialists.

Building a custom computer vision solution with Amazon Rekognition

To solve this, NAC’s Director of Software Engineering, Martin Høst Normark, turned to Rekognition Custom Labels, a fully-managed computer vision service that helps developers quickly and easily train and deploy custom computer vision models tailored to any use case.

Rekognition Custom Labels accelerates the development of custom computer vision models by building on the capabilities of Amazon Rekognition and simplifying the key steps of the computer vision development process, such as image labeling, data inspection, and algorithm selection and deployment. Rekognition Custom Labels allows you to build custom computer vision models for image classification and object detection tasks. You can navigate through the image labeling process from within the Rekognotion Custom Labels console or use Amazon SageMaker Ground Truth to allow for image labeling at scale. Rekognition Custom Labels automatically inspects the data, selects the right model framework and algorithm, optimizes the hyperparameters, and trains the model. When you’re satisified with the model accuracy, you can host the trained model with just one click.

NAC chose Amazon Rekognition because it significantly reduced the undifferentiated heavy lifting of training and deploying a custom computer vision model. For example, instead of requiring thousands of labeled training images to get started, as is the case with most custom computer vision models, NAC was able to get started with just a few hundred examples of the types of documents it needed to identify. These images, together with an equal number of negative examples chosen at random, were then loaded into an Amazon Simple Storage Service (Amazon S3) bucket to be used for model training. This also enabled NAC to use Rekognition Custom Label’s automatic labeling service, which could infer the labels of the two types of documents based solely on their S3 folder names.

From there, NAC was able to start training its model in just a few clicks, at which point Rekognition Custom Labels took care of loading and inspecting the training data, selecting the correct machine learning algorithm, training and testing the model, and reporting its performance metrics.

In order for the solution to deliver real business value, NAC identified a minimum performance baseline of 75% recall for its computer vision model, meaning that the solution had to be able to capture at least 75% of all relevant documents in any given maintenance package to warrant being used in production.

Using Rekognition Custom Labels and training on only those initial images, NAC was able to produce an initial model within its first week of development that delivered a recall of 98%, beating its performance baseline by 23 percentage points.

NAC then spent an additional week inspecting the types of pages causing classification errors, and added some additional examples of those challenging examples to its S3 bucket to retrain its model. This step further optimized performance above 99% recall and far exceeded its production performance requirements.

Improving operational efficiency and increasing innovation with AWS

With Rekognition Custom Labels, NAC was able to build, in just two weeks, a production-ready custom computer vision solution that could accurately identify and return relevant documents at higher than 99% accuracy, reducing to a matter of minutes a process that previously took manual reviewers about a week to complete.

This success has enabled NAC to move this solution to production, removing key process bottlenecks in its aircraft maintenance review processes to improve efficiency, reduce reliance on external contractors, and continue to deliver on its 30-year history of technical and commercial innovation in the regional aircraft industry.

Conclusion

Rekognition Custom Labels can help you develop custom computer vision models with ease by simplifying key steps such as image labeling, data inspection, and algorithm selection and deployment.

Learn more about how you can build custom computer vision models tailored to your specific use case by visiting Getting Started with Amazon Rekognition Custom Labels or reviewing the Amazon Rekognition Custom Labels Guide.


About the Author

Daniel Burke is the European lead for AI and ML in the Private Equity group at AWS. Daniel works directly with Private Equity funds and their portfolio companies, helping them accelerate their AI and ML adoption and improve innovation and increase enterprise value.

Read More

PPE: A fast and provably efficient RL algorithm for exogenous noise

A view of a park with an agent walking along a trail. Sources of exogenous noise surround the agent, including ducks gliding on a pond, people in the background, and reflections on the water.

Picture a person walking in a park by a pond. The surrounding environment contains a number of moving objects that change the quality of the environment: clouds moving to hide the sun, altering the quality of light; ducks gliding across the pond, causing its surface to ripple; people walking along a path, their images reflecting on the water. If we’re creating an AI model for navigating to a given goal, for example, a robot navigating to a specific location in a park to deliver a package, we want this model to recognize the robot and any obstacle in its way, but not the changes in its surrounding environment that occur independently of the agent, which we define as exogenous noise.

Although reinforcement learning (RL) has proven to be a successful paradigm for training AI models in navigation tasks, often used in gaming, existing RL methods are not yet robust enough to handle exogenous noise. While they may be able to heuristically solve certain problems, such as helping a robot navigate to a specific destination in a particular environment, there is no guarantee that they can solve problems in environments they have not seen.

In this post, we introduce Path Predictive Elimination (PPE), the first RL algorithm that can solve the problem of exogenous noise with a mathematical guarantee. Specifically, for any problem that satisfies certain assumptions, the algorithm succeeds in solving the problem using a small number of episodes. We discuss this algorithm in detail in our paper, “Provable RL with Exogenous Distractors via Multistep Inverse Dynamics.”

A view of a park with an agent walking along a trail. Sources of exogenous noise surround the agent, including ducks gliding on a pond, people in the background, and reflections on the water. 
Figure 1: A robot walking in a park to a specific destination. The environment has many sources of exogenous noise, such as people walking in the background as their reflections appear on the water and ducks gliding along the surface of the pond.

Real-world RL and exogenous noise

To understand how PPE works, it’s important to first discuss how a real-world RL agent (the decision-maker) operates. Agents have an action space with (A) number of actions and receive information about the world in the form of an observation. In our example, the robot is the agent, and its action space contains four actions: a step forward, backward, left, or right.

After an agent takes a single action, it gets a new observation—that is, it receives more information about its environment—along with a reward. If the robot observes the park through a camera, the observation takes the form of an image. When an agent has a task to solve, such as reaching a specific destination, it must take a sequence of actions, each resulting in a reward. Its goal is to maximize the sum of rewards. When the robot takes a step forward, the camera generates a new observation of the park, and it receives a reward for this action. It may get a reward of 1 for the first action that takes it toward its goal and 0 otherwise. 

Key challenges in real-world RL include how to handle complex observations and very large observation spaces. In our example, the robot in the park will have to work with an image that contains relevant information, such as the position of the destination, but this information is not directly accessible due to the exogenous noise and camera-generated image noise in the observation.

An image can be in a 500 x 500 x 3 pixel space, where each pixel takes 255 values. This would give us 255500 x 500 x 3 the number of different images which is an extremely large number of possibilities. However, the environment is much simpler to describe than this number suggests. This means the observation in an RL environment is generated from a much more compact but hidden endogenous state. In our park example, the endogenous state contains the position of the agent, the destination, and any obstacles around the agent.

In our paper, we assume that the endogenous state dynamics are near-deterministic. That is, taking a fixed action in an endogenous state always leads to the same next endogenous state in most cases. We also require that it is possible to extract the endogenous state from an observation. However, we make no assumptions about dynamics of exogenous noise or how observations are generated.

Most existing RL algorithms are either unable to solve problems containing complex observations or lack a mathematical guarantee for working on new, untried problems. This guarantee is desirable because the cost of failure in the real world can be potentially high. Many existing algorithms require an impractically large amount of data to succeed, requiring the agent to perform a large number of actions before it solves the task.

PPE takes an approach called hidden state decoding, where the agent learns a type of ML model called a decoder to extract the hidden endogenous state from an observation. It does this in a self-supervised manner, meaning it does not require a human to provide it with labels. For example, PPE can learn a decoder to extract the robot and any obstacle’s position in the park. PPE is the first provable algorithm that can extract the endogenous state and use it to perform RL efficiently.

Path Prediction and Elimination: An RL algorithm that is robust to exogenous noise

PPE is simple to implement and is fast to run. It works by learning a small set of paths that can take the agent to all possible endogenous states. The agent can technically consider all possible paths of length (h), enabling it to visit every endogenous state. However, as there are (A^h) possible paths of length (h), the number of paths will overwhelm the agent as (h) increases. The more paths the agent has to work with, the more data it needs to solve a given task. Ideally, if there are (S) number of endogenous states, we need just (S) number of paths, with only one unique path going to each endogenous state. PPE works by eliminating redundant paths that visit the same endogenous state by solving a novel self-supervised classification task.

PPE is similar in structure to the breadth-first search algorithm in that it runs a for-loop, where, in iteration (h) of the loop, the agent learns to visit all endogenous states that can be reached by taking (h) actions. At the start of iteration, the agent maintains a list of paths of length (h). This list has a path to visit every endogenous state that’s reachable after taking (h) actions. However, this list may also contain redundant paths, i.e., multiple paths that reach the same endogenous state. When this list is simply all paths of length 1, it corresponds to every action in the agent’s action space.

The top of Figure 2 shows agent’s initial list of paths, which contains at least three paths: ( pi_1), (pi_2), and (pi_3). The first two paths reach the same destination, denoted by the endogenous state (s_1). In contrast, the last path (pi_3) reaches a different endogenous state (s_2). Figure 2 shows a sampled observation (or image) for each endogenous state.

Because PPE wants to learn a small set of paths to visit all endogenous states, it seeks to eliminate the redundant paths by collecting a dataset of observations coupled with the path that was followed to observe them. In Figure 2, both (pi_1) or (pi_2) reach the same endogenous state, so one of them can be eliminated. This is done by randomly selecting a path in its list, following this path to the end, and saving the last observation. For example, our dataset can contain a tuple ((pi_1, x)) where (pi_1) is the policy in our list and (x) is the image in top-right of Figure 2. PPE collects a dataset of many such tuples.

This animation shows an iteration for a PPE algorithm. At the start of iteration, the algorithm contains a list of paths to visit endogenous states, including three redundant paths, two of which visit the same endogenous state, while a third visits a different endogenous state. It also shows two sampled observations for these endogenous states. PPE eliminates the redundant path while keeping the other paths.
Figure 2: Execution of the PPE algorithm at a given for-loop iteration. For each iteration, PPE starts with a list of paths to visit endogenous states and then eliminates redundant paths—those that visit an endogenous state that can also be reached by an existing path. The extra path, (pi_2) is eliminated because it reaches an endogenous state that can also be reached by an existing path (pi_1).

PPE then solves a multiclass classification problem to predict the index of the path from the last observation. The index of a path is computed with respect to the original list. This classification problem can be solved with any appropriate model class, such as deep neural networks, using PyTorch, TensorFlow, or a library of your choice. If two different paths, (pi_1) and (pi_2), reach the same endogenous state, the learned classifier won’t be able to deterministically predict which path was used to visit observations from this state. That is, the learned classifier predicts a high probability for both paths given an observation from this endogenous state. PPE uses this confusion signal to eliminate one of these paths because both paths reach the same endogenous state. PPE also learns a decoder as a result solving the classification problem described above, which maps an observation to the index of the leftover path with the highest probability under the learned classifier.

At the end of iteration (h) of the for-loop, PPE will have found a list of leftover paths that includes a unique path for every endogenous state that’s reachable after taking (h) actions. It then expands these leftover paths to create the list for the next iteration of the for-loop. For every path that’s left over, PPE creates (A) number of new paths by concatenating every action to the end of the path. The for-loop then continues with the next iteration.

Note that the above steps of PPE can be computed even in the absence of rewards. The output of these steps, namely the decoder and the learned leftover paths, can be cached and used to optimize any reward functions provided later. We discuss various strategies to optimize any given reward function in our paper, including both model-free and model-based approaches.

Proof, experiment, and code

The paper also provides a mathematical proof that PPE efficiently solves a large class of RL problems. Using a small amount of data, it can accurately explore, find a policy that achieves maximum sum of rewards, recover a decoder that maps the observation to its hidden endogenous state, and recover the dynamics of the endogenous state with a high probability. We describe various experiments where PPE successfully performs these tasks in line with its mathematical guarantee and outperforms various prior methods.

This is illustrated in Figure 3. It depicts a visual grid-world where the agent’s goal is to navigate to the slice of pizza on the other side of the pond, populated by two ducks that move independently of agent’s actions and are the source of exogenous noise. The endogenous state will consist of the position of the agent. The figure shows what PPE is expected to do in this task. It will gradually learn longer paths that reach various endogenous states in the environment. It will also learn a decoder and use it to extract the dynamics of the latent endogenous state, shown on the right.

This animation shows an agent navigating in a grid-world task to reach a goal on the opposite side. Sources of exogenous noise appear between the agent and its goal. These change in position independent of the agent. The PPE learning paths of longer length explore the environment and finally reach the goal. On the right of the animation, we show the dynamics of the endogenous state that is being extracted by PPE. The dynamics are represented by green circles that denote endogenous states. Arrows between two circles shows whether it is possible for the agent to move between the corresponding endogenous states. The endogenous state in the dynamics corresponds to the position of the agent in the grid-world.
Figure 3: The area on the left shows a visual grid-world navigation task where an agent is trying to reach a slice of pizza. The motion of the ducks is a source of exogenous noise. PPE allows the agent to learn a small set of paths to visit every endogenous state. On the right, PPE also learns a decoder and uses it to extract the dynamics of the latent endogenous state. The circles denote an endogenous state and the arrows denote possible ways to navigate from one endogenous state to another.

The road ahead

While PPE is the first RL algorithm that offers a mathematical guarantee in the presence of exogenous noise, there is still work to do before we can solve every RL problem that includes exogenous noise. Some of the unanswered questions that we are pursuing include:

  1. How can we eliminate the assumption that PPE makes, that latent endogenous state dynamics are near-deterministic?
  2. Can we extend PPE to work in nonepisodic settings, where the agent generates a single long episode?
  3. How does PPE perform on real-world problems?
  4. Can we make PPE a truly online algorithm, eliminating the need to collect large datasets before it improves?

RL algorithms hold great promise for improving applications in a diverse range of fields, from robotics, gaming, and software debugging, to healthcare. However, exogenous noise presents a serious challenge in unlocking the full potential of RL agents in the real world. We’re hopeful that PPE will motivate further research in RL in the presence of exogenous noise.

The post PPE: A fast and provably efficient RL algorithm for exogenous noise appeared first on Microsoft Research.

Read More

Let Me Shoyu How It’s Done: Creating the NVIDIA Omniverse Ramen Shop

When brainstorming a scene to best showcase the groundbreaking capabilities of the Omniverse platform, some NVIDIA artists turned to a cherished memory: enjoying ramen together in a mom-and-pop shop down a side street in Tokyo.

Simmering pots of noodles, steaming dumplings, buzzing kitchen appliances, warm ambient lighting and glistening black ledger stools. These were all simulated in a true-to-reality virtual world by nearly two dozen NVIDIA artists and freelancers across the globe using NVIDIA Omniverse, a 3D design collaboration and world simulation platform.

The final scene — consisting of over 22 million triangles, 350 unique textured models and 3,000 4K-resolution texture maps — welcomes viewers into a virtual ramen shop featured in last month’s GTC keynote address by NVIDIA founder and CEO Jensen Huang.

The mouth-watering demo was created to highlight the NVIDIA RTX-powered real-time rendering and physics simulation capabilities of Omniverse, which scales performance and speed when running on multiple GPUs.

It’s a feast for the eyes, as all of the demo’s parts are physically accurate and photorealistic, from the kitchen appliances and order slips; to the shoyu ramen and chashu pork; to the stains on the pots and pans.

“Our team members were hungry just looking at the renders,” said Andrew Averkin, senior art manager and lead environment artist at NVIDIA, in a GTC session offering a behind-the-scenes look at the making of the Omniverse ramen shop.

The session — presented by Averkin and Gabriele Leone, senior art director at NVIDIA — is now available on demand.

Gathering the Ingredients for Reference

The team’s first step was to gather the artistic ingredients: visual references on which to base the 3D models and props for the scene.

An NVIDIA artist traveled to a real ramen restaurant in Tokyo and collected over 2,000 high-resolution reference images and videos, each capturing aspects from the kitchen’s distinct areas for cooking, cleaning, food preparation and storage.

Then, props artists modeled and textured 3D assets for all of the shop’s items, from the stoves and fridges to gas pipes and power plugs. Even the nutrition labels on bottled drinks and the buttons for the ticket machine from which visitors order meals were precisely replicated.

Drinks in a fridge at the virtual ramen shop, made using Omniverse Create, Adobe Substance 3D Painter, Autodesk 3ds Max, Blender, Maxon Cinema 4D, and RizomUV.

In just two months, NVIDIA artists across the world modeled 350 unique props for the scene, using a range of design software including Autodesk Maya, Autodesk 3ds Max, Blender, Maxon Cinema 4D and Pixologic Zbrush. Omniverse Connectors and Pixar’s Universal Scene Description format enabled the models to be seamlessly brought into the Omniverse Create app.

“The best way to think about Omniverse Create is to consider it a world-building tool,” Leone said. “It works with Omniverse Connectors, which allow artists to use whichever third-party apps they’re familiar with and connect their work seamlessly in Omniverse — taking creativity and experimentation to new levels.”

Adding Lighting and Texture Garnishes 

Artists then used Adobe Substance Painter to texture the materials. To make the props look used on a daily basis, the team whipped up details like dents on wooden counters, stickers peeling off appliances and sauce stains on pots.

“Some of our artists went as far as cooking some of the recipes themselves and taking references of their own pots to get a good idea of how sauce or burn stains might accumulate,” Averkin said.

Omniverse’s simulation capabilities enable light to reflect off of glass and other materials with true-to-reality physical accuracy. Plus, real-time photorealistic lighting rendered in 4K resolution created an orange warmth inside the cozy virtual shop, contrasting the rainy atmosphere that can be seen through the windows.

Artists used Omniverse Flow, a fluid simulation Omniverse Extension for smoke and fire, to bring the restaurant’s burning stoves and steaming plates to life. SideFX Houdini software helped to animate the boiling water, which was eventually brought into the virtual kitchen using an Omniverse Connector.

Broth boils in the virtual kitchen using visual effects offered by Houdini software.

And Omniverse Create’s camera animation feature allowed the artists to capture the final path-traced scene in real time, exactly as observed through the viewport.

Photorealistic lighting illuminates the virtual ramen shop, enabled by NVIDIA RTX-based ray tracing and path tracing.

Learn more about Omniverse by watching additional GTC sessions on demand — featuring visionaries from the Omniverse team, Adobe, Autodesk, Epic Games, Pixar, Unity and Walt Disney Studios.

Join in on the Creation

Creators across the world can download NVIDIA Omniverse for free, and enterprise teams can use the platform for their 3D projects.

Join the #MadeInMachinima contest, running through June 27, for a chance to win the latest NVIDIA Studio laptop.

Connect your workflows to Omniverse with software from Adobe, Autodesk, Epic Games, Maxon, Reallusion and more.

Follow Omniverse on Instagram, Twitter, YouTube and Medium for additional resources and inspiration. Check out the Omniverse forums and join our Discord Server to chat with the community.

The post Let Me Shoyu How It’s Done: Creating the NVIDIA Omniverse Ramen Shop appeared first on NVIDIA Blog.

Read More

Should I Use Offline RL or Imitation Learning?

Should I Use Offline RL or Imitation Learning?



Figure 1: Summary of our recommendations for when a practitioner should BC and various imitation learning style methods, and when they should use offline RL approaches.

Offline reinforcement learning allows learning policies from previously collected data, which has profound implications for applying RL in domains where running trial-and-error learning is impractical or dangerous, such as safety-critical settings like autonomous driving or medical treatment planning. In such scenarios, online exploration is simply too risky, but offline RL methods can learn effective policies from logged data collected by humans or heuristically designed controllers. Prior learning-based control methods have also approached learning from existing data as imitation learning: if the data is generally “good enough,” simply copying the behavior in the data can lead to good results, and if it’s not good enough, then filtering or reweighting the data and then copying can work well. Several recent works suggest that this is a viable alternative to modern offline RL methods.

This brings about several questions: when should we use offline RL? Are there fundamental limitations to methods that rely on some form of imitation (BC, conditional BC, filtered BC) that offline RL addresses? While it might be clear that offline RL should enjoy a large advantage over imitation learning when learning from diverse datasets that contain a lot of suboptimal behavior, we will also discuss how even cases that might seem BC-friendly can still allow offline RL to attain significantly better results. Our goal is to help explain when and why you should use each method and provide guidance to practitioners on the benefits of each approach. Figure 1 concisely summarizes our findings and we will discuss each component.

Should I Use Offline RL or Imitation Learning?



Figure 1: Summary of our recommendations for when a practitioner should BC and various imitation learning style methods, and when they should use offline RL approaches.

Offline reinforcement learning allows learning policies from previously collected data, which has profound implications for applying RL in domains where running trial-and-error learning is impractical or dangerous, such as safety-critical settings like autonomous driving or medical treatment planning. In such scenarios, online exploration is simply too risky, but offline RL methods can learn effective policies from logged data collected by humans or heuristically designed controllers. Prior learning-based control methods have also approached learning from existing data as imitation learning: if the data is generally “good enough,” simply copying the behavior in the data can lead to good results, and if it’s not good enough, then filtering or reweighting the data and then copying can work well. Several recent works suggest that this is a viable alternative to modern offline RL methods.

This brings about several questions: when should we use offline RL? Are there fundamental limitations to methods that rely on some form of imitation (BC, conditional BC, filtered BC) that offline RL addresses? While it might be clear that offline RL should enjoy a large advantage over imitation learning when learning from diverse datasets that contain a lot of suboptimal behavior, we will also discuss how even cases that might seem BC-friendly can still allow offline RL to attain significantly better results. Our goal is to help explain when and why you should use each method and provide guidance to practitioners on the benefits of each approach. Figure 1 concisely summarizes our findings and we will discuss each component.

Stellar Weather: Researchers Describe the Skies of Exoplanets

A paper released today describes in the greatest detail to date the atmospheres on distant planets.

Seeking the origins of what’s in and beyond the Milky Way, researchers surveyed 25 exoplanets, bodies that orbit stars far beyond our solar system. Specifically, they studied hot Jupiters, the largest and thus easiest to detect exoplanets, many sweltering at temperatures over 3,000 degrees Fahrenheit.

Their analysis of these torrid atmospheres used high performance computing with NVIDIA GPUs to advance understanding of all planets, including our own.

Hot Jupiters Shine New Lights

Hot Jupiters “offer an incredible opportunity to study physics in environmental conditions nearly impossible to reproduce on Earth,” said Quentin Changeat, lead author of the paper and a research fellow at University College London (UCL).

By analyzing trends across a large group of exoplanets, they shine new light on big questions.

“This work can help make better models of how the Earth and other planets came to be,” said Ahmed F. Al-Refaie, a co-author of the paper and head of numerical methods at the UCL Centre for Space Exochemistry Data.

Parsing Hubble’s Big Data

They used the most data ever employed in a survey of exoplanets — 1,000 hours of archival observations, mainly from the Hubble Space Telescope.

The hardest and, for Changeat, the most fascinating part of the process was determining what small set of models to run in a consistent way against data from all 25 exoplanets to get the most reliable and revealing results.

“There was an amazing period of exploration — I was finding all kinds of sometimes weird solutions — but it was really fast to get the answers using NVIDIA GPUs,” he said.

Millions of Calculations

Their overall results required heady math. Each of about 20 models had to run 250,000 times for all 25 exoplanets.

They used the Wilkes3 supercomputer at the University of Cambridge, which packs 320 NVIDIA A100 Tensor Core GPUs on an NVIDIA Quantum InfiniBand network.

“I expected the A100s might be double the performance of V100s and P100s I used previously, but honestly it was like an order of magnitude difference,” said Al-Refaie.

Orders of Magnitude Gains

A single A100 GPU gave a 200x performance boost compared to a CPU.

Packing 32 processes on each GPU, the team got the equivalent of a 6,400x speedup compared to a CPU. Each node on Wilkes3 delivered with its four A100s the equivalent of up to 25,600 CPU cores, he said.

The speedups are high because their application is amazingly parallel. It simulates on GPUs how hundreds of thousands of light wavelengths would travel through an exoplanet’s atmosphere

On A100s, their models complete in minutes work that would require weeks on CPUs.

The GPUs ran the complex physics models so fast that their bottleneck became a CPU-based system handling a much simpler task of determining statistically where to explore next.

“It was a little funny, and somewhat astounding, that simulating the atmosphere was not the hard part — that gave us an ability to really see what was in the data,” he said.

A Wealth of Software

Al-Refaie employed CUDA profilers to optimize jobs, PyCUDA to optimize the team’s code and cuBlas to speed up some math routines.

“With all the NVIDIA software available, there’s a wealth of things you can exploit, so the team is starting to spit out papers quickly now because we have the right tools,” he said.

They will need all the help they can get, as the work is poised to get much more challenging.

Getting a Better Telescope

The James Webb Space Telescope comes online in June. Unlike Hubble and all previous instruments, it’s specifically geared to observe exoplanets.

The team is already developing ways to work at higher resolutions to accommodate the expected data. For example, instead of using one-dimensional models, they will use two- or three-dimensional ones and account for more parameters like changes over time.

“If a planet has a storm, for example, we may not be able to see it with current data, but with the next generation data, we think we will,” said Changeat.

Exploring HPC+AI

The rising tide of data opens a door to apply deep learning, something the group’s AI experts are exploring.

It’s an exciting time, said Changeat, who’s joining the Space Telescope Science Institute in Baltimore as an ESA fellow to work directly with experts and engineers there.

“It’s really fun working with experts from many fields. We had space observers, data analysts, machine-learning and software experts on this team — that’s what made this paper possible,” Changeat said.

Learn more about the paper here.

Image at top courtesy of ESA/Hubble, N. Bartmann

The post Stellar Weather: Researchers Describe the Skies of Exoplanets appeared first on NVIDIA Blog.

Read More

Stanford AI Lab Papers and Talks at ICLR 2022

The International Conference on Learning Representations (ICLR) 2022 is being hosted virtually from April 25th – April 29th. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to papers, videos and blogs below. Feel free to reach out to the contact authors directly to learn more about the work that’s happening at Stanford!

List of Accepted Papers

Autonomous Reinforcement Learning: Formalism and Benchmarking


Authors: Archit Sharma*, Kelvin Xu*, Nikhil Sardana, Abhishek Gupta, Karol Hausman, Sergey Levine, Chelsea Finn

Contact: architsh@stanford.edu

Links: Paper | Website

Keywords: reinforcement learning, continual learning, reset-free reinforcement learning

MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts


Authors: Weixin Liang, James Zou

Contact: wxliang@stanford.edu

Links: Paper | Video | Website

Keywords: benchmark dataset, distribution shift, out-of-domain generalization

An Explanation of In-context Learning as Implicit Bayesian Inference


Authors: Sang Michael Xie, Aditi Raghunathan, Percy Liang, Tengyu Ma

Contact: xie@cs.stanford.edu

Links: Paper | Video

Keywords: gpt-3, in-context learning, pretraining, few-shot learning

GreaseLM: Graph REASoning Enhanced Language Models for Question Answering


Authors: Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D. Manning, Jure Leskovec

Contact: xikunz2@cs.stanford.edu

Award nominations: Spotlight

Links: Paper | Website

Keywords: knowledge graph, question answering, language model, commonsense reasoning, graph neural networks, biomedical qa

Fast Model Editing at Scale


Authors: Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, Christopher D. Manning

Contact: eric.mitchell@cs.stanford.edu

Links: Paper | Website

Keywords: model editing; meta-learning; language models; continual learning; temporal generalization

Vision-Based Manipulators Need to Also See from Their Hands


Authors: Kyle Hsu, Moo Jin Kim, Rafael Rafailov, Jiajun Wu, Chelsea Finn

Contact: kylehsu@cs.stanford.edu

Award nominations: Oral Presentation

Links: Paper | Website

Keywords: reinforcement learning, observation space, out-of-distribution generalization, visuomotor control, robotics, manipulation

IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes


Authors: Qi Li*, Kaichun Mo*, Yanchao Yang, Hang Zhao, Leonidas J. Guibas

Contact: kaichun@cs.stanford.edu

Links: Paper

Keywords: embodied ai, 3d scene graph, interactive perception

VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects


Authors: Ruihai Wu*, Yan Zhao*, Kaichun Mo*, Zizheng Guo, Yian Wang, Tianhao Wu, Qingnan Fan, Xuelin Chen, Leonidas J. Guibas, Hao Dong

Contact: kaichun@cs.stanford.edu

Links: Paper | Video | Website

Keywords: visual affordance learning, robotic manipulation, 3d perception, interactive perception

Language modeling via stochastic processes


Authors: Rose E Wang

Contact: rewang@stanford.edu

Award nominations: Oral Presentation

Links: Paper | Video | Website

Keywords: contrastive learning, language modeling, stochastic processes

MetaMorph: Learning Universal Controllers with Transformers


Authors: Agrim Gupta, Linxi Fan, Surya Ganguli, Li Fei-Fei

Contact: agrim@stanford.edu

Links: Paper | Video | Website

Keywords: rl, modular robots, transformers

Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution


Authors: Ananya Kumar

Contact: ananya@cs.stanford.edu

Award nominations: Oral Presentation

Links: Paper

Keywords: fine-tuning theory, transfer learning theory, fine-tuning, distribution shift, implicit regularization

An Experimental Design Perspective on Model-Based Reinforcement Learning


Authors: Viraj Mehta, Biswajit Paria, Jeff Schneider, Stefano Ermon, Willie Neiswanger

Contact: virajm@cs.cmu.edu, neiswanger@cs.stanford.edu

Links: Paper

Keywords: reinforcement learning, model-based reinforcement learning, mbrl, bayesian optimal experimental design, boed, bax

Domino: Discovering Systematic Errors with Cross-Modal Embeddings


Authors: Sabri Eyuboglu*, Maya Varma*, Khaled Saab*, Jean-Benoit Delbrouck, Christopher Lee-Messer, Jared Dunnmon, James Zou, Christopher Ré

Contact: {eyuboglu,mvarma2,ksaab}@stanford.edu

Award nominations: Oral Presentation

Links: Paper | Blog Post | Website

Keywords: robustness, subgroup analysis, error analysis, multimodal, slice discovery

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models


Authors: Tri Dao, Beidi Chen, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, Christopher Ré

Contact: trid@stanford.edu

Award nominations: Spotlight

Links: Paper | Blog Post

Keywords: sparse training, butterfly matrices

Hindsight: Posterior-guided training of retrievers for improved open-ended generation


Authors: Ashwin Paranjape, Omar Khattab, Christopher Potts, Matei Zaharia, Christopher D Manning

Contact: ashwinp@cs.stanford.edu

Links: Paper

Keywords: retrieval, generation, retrieval-augmented generation, open-ended generation, informative conversations, free-form qa, posterior distribution, elbo

Unsupervised Discovery of Object Radiance Fields


Authors: Hong-Xing Yu, Leonidas J. Guibas, Jiajun Wu

Contact: koven@cs.stanford.edu

Links: Paper | Video | Website

Keywords: object-centric representation, unsupervised, 3d object discovery

Efficiently Modeling Long Sequences with Structured State Spaces


Authors: Albert Gu, Karan Goel, Christopher Ré

Contact: albertgu@stanford.edu

Award nominations: Outstanding Paper Honorable Mention

Links: Paper | Blog Post | Video

Keywords: hippo

How many degrees of freedom do we need to train deep networks: a loss landscape perspective


Authors: Brett W. Larsen, Stanislav Fort, Nic Becker, Surya Ganguli

Contact: bwlarsen@stanford.edu

Links: Paper

Keywords: loss landscape, high-dimensional geometry, random hyperplanes, optimization

How did the Model Change? Efficiently Assessing Machine Learning API Shifts


Authors: Lingjiao Chen, Matei Zaharia, James Zou

Contact: lingjiao@stanford.edu

Links: Paper | Website

Keywords: mlaas, performance shifts, ml systems


We look forward to seeing you at ICLR 2022!

Read More