Machine Learning-based Damage Assessment for Disaster Relief

Machine Learning-based Damage Assessment for Disaster Relief

Posted by Joseph Xu, Senior Software Engineer and Pranav Khaitan, Engineering Lead, Google Research

Natural disasters, such as earthquakes, hurricanes, and floods, affect large areas and millions of people, but responding to such disasters is a massive logistical challenge. Crisis responders, including governments, NGOs, and UN organizations, need fast access to comprehensive and accurate assessments in the aftermath of disasters to plan how best to allocate limited resources.To this end, very high resolution (VHR) satellite imagery, with up to 0.3 meter resolution, is becoming an increasingly important tool for crisis response, giving responders an unprecedented breadth of visual information about how terrain, infrastructure, and populations are changed by disasters.

However, intensive manual labor is still required to extract operationally-relevant information — collapsed buildings, cracks in bridges, where people have set up temporary shelters — from the raw satellite imagery. As an example, for the 2010 Haiti earthquake, analysts manually examined over 90,000 buildings in the Port-au-Prince area alone, rating the damage each one incurred on a five point scale. Many of these manual analyses take teams of experts many weeks to complete, whereas they are most needed within 48-72 hours after the disaster, when the most urgent decisions are made.

To help mitigate the impact of such disasters, we present “Building Damage Detection in Satellite Imagery Using Convolutional Neural Networks“, which details a machine learning (ML) approach to automatically process satellite data to generate building damage assessments. Developed in partnership with the United Nations World Food Program (WFP) Innovation Accelerator, we believe this work has the potential to drastically reduce the time and effort required for crisis workers to produce damage assessment reports. In turn, this would reduce the turnaround times needed to deliver timely disaster aid to the most severely affected areas, while increasing the overall coverage of such critical services.

The Approach
The automatic damage assessment process is split into two steps: building detection and damage classification. In the building detection step, our approach uses an object detection model to draw bounding boxes around each building in the image. We then extract pre-disaster and post-disaster images centered on each detected building and use a classification model to determine whether the building is damaged.

The classification model consists of a convolutional neural network to which is input two 161 pixel x 161 pixel RGB images, corresponding to a 50 m x 50 m ground footprint, centered on a given building. One image is from before the disaster event, and the other image is from after the disaster event. The model analyzes differences in the two images and outputs a score from 0.0 to 1.0, where 0.0 means the building was not damaged, and 1.0 means the building was damaged.

Because the before and after images are taken on different dates, at different times of day, and in some cases by different satellites altogether, there can be a host of different problems that arise. For example, the brightness, contrast, color saturation, and lighting conditions of the images may differ significantly, and the pixels in the image may be misaligned.

To correct for differences in color and illumination, we use histogram equalization to normalize the colors in the before and after images. We also make the model more robust to insignificant color differences by using standard data augmentation techniques, such as randomly perturbing the contrast and saturation of the images, during training.

Training Data
One of the main challenges of this work is assembling a training data set. Data availability in this application is inherently limited because there are only a handful of disasters that have high resolution satellite images and an even smaller number that have existing damage assessments. For labels, we use publicly available damage assessments manually generated by humanitarian organizations operating in this space, such as UNOSAT and REACH. We obtain the original satellite images on which the manual assessments are performed and then use Google Earth Engine to spatially join the damage assessment labels with the satellite images in order to produce the final training examples. All images used to train the model were sourced from commercially available sources.

Examples of individual image patches that capture before and after images of damaged and undamaged buildings from different disasters.

Results
We evaluated this technology for 3 major past earthquakes: the 2010 earthquake in Haiti (magnitude 7.0), the 2017 event in Mexico City (magnitude 7.1), and the series of earthquakes occuring in Indonesia in 2018 (magnitudes 5.9 – 7.5). For each event, we trained the model on buildings in one part of the region affected by the quake and tested it on buildings in another part of the region. We used human expert damage assessments performed by UNOSAT and REACH as the ground truth for evaluation. We measure the model’s quality using both true accuracy (compared to expert assessment) and the area under the ROC curve (AUROC), which captures the trade-off between the model’s true positive and false positive rates of detection, and is a common way to measure quality when the number of positive and negative examples in the test dataset is imbalanced. An AUROC value of 0.5 means that the model’s predictions are random, while a value of 1.0 means the model is perfectly accurate. According to crisis responder feedback, 70% accuracy is the threshold needed for making high-level decisions in the first 72 hours after the disaster.

Area under the
Event Accuracy ROC curve
2010 Haiti earthquake 77% 0.83
2017 Mexico City earthquake 71% 0.79
2018 Indonesia earthquake 78% 0.86
Evaluation of model predictions against human expert assessments (higher is better).
Example model predictions from the 2010 Haiti earthquake. Prediction values closer to 1.0 means the model is more confident that the building is damaged. Values closer to 0.0 means the building is not damaged. A threshold value of 0.5 is typically used to distinguish between damaged/undamaged predictions, but this can be tuned to make the predictions more or less sensitive.

Future Work
While the current model works reasonably well when trained and tested on buildings from the same regions (e.g., same city or country), the ultimate goal is to have a model that can accurately assess building damage for disasters that happen anywhere in the world, and not just those that look similar to the ones the model has been trained on. This is challenging because the variety of the available training data for past disasters is inherently limited to a handful of events that occurred in a few geographic locations. Generalizing to future disasters that will likely occur in new locations is therefore still a challenge for our model and is the focus of ongoing work. We envision a system that can be interactively trained, validated, and deployed by expert analysts so that important aid distribution decisions are always verified by experienced crisis responders. Our hope is that this technology can help communities get the aid that they need in times of most critical need in a timely fashion.

Acknowledgements
This post reflects the work of our co-authors Wenhan Lu and Zebo Li. We would also like to thank Maolin Zuo for his contributions to the project. In tackling this problem, we have had a very productive partnership with the United Nations World Food Programme (WFP) Innovation Accelerator, an organization that identifies, funds, and supports startups and innovative projects to disrupt world hunger.

A competition to identify bird calls using machine learning

A competition to identify bird calls using machine learning

Do you hear the birds chirping outside your window? There are more than 10,000 bird species in the world, and they can be found in nearly every environment, from untouched rainforests to suburbs and cities. Birds play an essential role in nature. They are high up in the food chain and integrate changes occurring at low levels. As such, birds are excellent indicators of deteriorating habitat quality and environmental pollution. However, it’s often easier to hear birds than see them. With proper sound detection and classification, researchers could automatically intuit factors about an area’s quality of life based on a changing bird population.

There are already many projects underway to extensively monitor birds by recording natural soundscapes over long periods. However the analysis of these datasets is often done manually, is painstakingly slow, and results are incomplete. Data science may be able to assist, so researchers have turned to large crowdsourced databases of vocal recordings of birds to train AI models.

To fully take advantage of these extensive and information-rich sound archives, researchers need good machine listeners to reliably extract as much information as possible to aid data-driven conservation.

In partnership with the Cornell Lab of Ornithology, Google’s bioacoustics team—part of ourAI for Social Good initiative—is announcing a competition to use machine learning to identify bird calls. In this competition, data scientists will identify a wide variety of bird vocalizations in soundscape recordings. Training audio comes from the Xeno-Canto project, a crowd-sourced collection of thousands of hours of bird sounds from around the world. We’re offering $25,000 in prizes for the best entries, and hosting the competition on Kaggle, the world’s largest data science competition community with more than 4 million members from 194 countries. The competition kicks off today and will last until September 2—check out the competition page for more details.

If successful, winners of this competition will help researchers better understand changes in habitat quality, levels of pollution, and the effectiveness of restoration efforts. The eventual conservation outcomes could greatly improve the quality of life for many living organisms—birds and human beings included.

Attribution for image at the top of the post: Red-winged Blackbird © Drew Weber / Macaulay Library at the Cornell Lab of Ornithology (ML227768151)

Read More

How The Trevor Project is using AI to help prevent suicide

How The Trevor Project is using AI to help prevent suicide

Suicide disproportionately affects LGBTQ+ youth. In the U.S. alone, more than 1.8 million LGBTQ+ youth between the ages of 13 and 24 seriously consider suicide or experience a significant crisis each year. Additionally, LGBTQ+ youth are over four times more likely to attempt suicide than their peers, while up to 50 percent of all trans people have made a suicide attempt—most before the age of 25. Black LGBTQ+ young people are even more impacted as they hold multiple marginalized identities, and research shows that Black youth ages five to 12 are dying by suicide at roughly twice the rate of their white peers. 

To support this particularly vulnerable and diverse community, The Trevor Project takes an intersectional approach to crisis intervention and suicide prevention. The organization offers free and confidential crisis services that they provide 24/7 via phone, chat, and text. In this time of emotional stress, isolation and civil unrest, these services offer much needed support to LGBTQ youth experiencing fear, hopelessness, confusion, and race-based trauma. Sadly, the volume of callers sometimes outnumbers the available crisis counselors who are trained to assist. With support from Google.org, The Trevor Project is incorporating artificial intelligence into its crisis services to connect more people to the resources they need.  

Last year, Google.org provided The Trevor Project with $1.5 million and 11 Googlers from the Google.org Fellowship, a pro-bono program that matches teams of Googlers with Google.org grantees and civic entities for up to six months to work full-time on technical projects. Google.org Fellows assisted The Trevor Project in building an artificial intelligence system that could identify and prioritize high-risk contacts while simultaneously reaching more people. 

Here’s how it works. When someone first contacts The Trevor Project, they’re asked a few intake questions like: “What’s going on?” After that, they talk to a crisis counselor who assesses their risk using a clinical assessment model. Looking at anonymized historical data, the team used natural language processing (NLP) to train the system to learn which types of responses on the intake form were most likely linked to a particular diagnosis risk level. While some specific words or phrases are known to correlate with high risk, the NLP model interprets the entire sentence to determine risk level. Now if a person is identified as a high or imminent risk based on their initial intake questions, they are automatically placed in a priority queue and connected to a counselor sooner. 

To help accelerate this work, Google.org has committed an additional $1.2 million in grant funding and is planning to engage a new cohort of Google.org Fellows set to start in July to expand Trevor’s application of NLP to new contexts. This will include developing a conversation simulator to enhance and scale Trevor’s virtual counselor training program, and automating the moderation of TrevorSpace, the organization’s affirming international online community, to flag and address unsafe content. At the same time, Google.org is partnering with Google’s LGBTQ+ employee groups to build a pool of volunteer digital crisis counselors to help respond to Trevor’s increased need for crisis services due to COVID-19 impacts. More than fifty Googlers have signed up already. 

The Trevor Project is the world’s largest suicide prevention and crisis intervention organization for LGBTQ+ youth. We’re honored to support their critical mission and stand with LGBTQ+ people of color, trans and non-binary communities, LGBTQ+ families, and so many more

Read More

Google at CVPR 2020

Google at CVPR 2020

Posted by Emily Knapp, Program Manager and Benjamin Hütteroth, Program Specialist

This week marks the start of the fully virtual 2020 Conference on Computer Vision and Pattern Recognition (CVPR 2020), the premier annual computer vision event consisting of the main conference, workshops and tutorials. As a leader in computer vision research and a Supporter Level Virtual Sponsor, Google will have a strong presence at CVPR 2020, with nearly 70 publications accepted, along with the organization of, and participation in, multiple workshops/tutorials.

If you are participating in CVPR this year, please visit our virtual booth to learn about what Google is actively pursuing for the next generation of intelligent systems that utilize the latest machine learning techniques applied to various areas of machine perception.

You can also learn more about our research being presented at CVPR 2020 in the list below (Google affiliations are bolded).

Organizing Committee

General Chairs: Terry Boult, Gerard Medioni, Ramin Zabih
Program Chairs: Ce Liu, Greg Mori, Kate Saenko, Silvio Savarese
Workshop Chairs: Tal Hassner, Tali Dekel
Website Chairs: Tianfan Xue, Tian Lan
Technical Chair: Daniel Vlasic
Area Chairs include: Alexander Toshev, Alexey Dosovitskiy, Boqing Gong, Caroline Pantofaru, Chen Sun, Deqing Sun, Dilip Krishnan, Feng Yang, Liang-Chieh Chen, Michael Rubinstein, Rodrigo Benenson, Timnit Gebru, Thomas Funkhouser, Varun Jampani, Vittorio Ferrari, William Freeman

Oral Presentations

Evolving Losses for Unsupervised Video Representation Learning
AJ Piergiovanni, Anelia Angelova, Michael Ryoo

CvxNet: Learnable Convex Decomposition
Boyang Deng, Kyle Genova, Soroosh Yazdani, Sofien Bouaziz, Geoffrey Hinton, Andrea Tagliasacchi

Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise
Xuanqing Liu, Tesi Xiao, Si Si, Qin Cao, Sanjiv Kumar, Cho-Jui Hsieh

Scalability in Perception for Autonomous Driving: Waymo Open Dataset
Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla‎, Aurélien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev‎, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi‎, Sheng Zhao, Shuyang Chen, Yu Zhang, Jon Shlens, Zhifeng Chen, Dragomir Anguelov

Deep Implicit Volume Compression
Saurabh Singh, Danhang Tang, Cem Keskin, Philip Chou, Christian Haene, Mingsong Dou, Sean Fanello, Jonathan Taylor, Andrea Tagliasacchi, Philip Davidson, Yinda Zhang, Onur Guleryuz, Shahram Izadi, Sofien Bouaziz

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model
Dongdong Wan, Yandong Li, Liqiang Wang, and Boqing Gong

Google Landmarks Dataset v2 – A Large-Scale Benchmark for Instance-Level Recognition and Retrieval (see the blog post)
Tobias Weyand, Andre Araujo, Jack Sim, Bingyi Cao

CycleISP: Real Image Restoration via Improved Data Synthesis
Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

Dynamic Graph Message Passing Networks
Li Zhang, Dan Xu, Anurag Arnab, Philip Torr

Local Deep Implicit Functions for 3D Shape
Kyle Genova, Forrester Cole, Avneesh Sud, Aaron Sarna, Thomas Funkhouser

GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models
Hongyi Xu, Eduard Gabriel Bazavan, Andrei Zanfir, William Freeman, Rahul Sukthankar, Cristian Sminchisescu

Search to Distill: Pearls are Everywhere but not the Eyes
Yu Liu, Xuhui Jia, Mingxing Tan, Raviteja Vemulapalli, Yukun Zhu, Bradley Green, Xiaogang Wang

Semantic Pyramid for Image Generation
Assaf Shocher, Yossi Gandelsman, Inbar Mosseri, Michal Yarom, Michal Irani, William Freeman, Tali Dekel

Flow Contrastive Estimation of Energy-Based Models
Ruiqi Gao, Erik Nijkamp, Diederik Kingma, Zhen Xu, Andrew Dai, Ying Nian Wu

Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from A Domain Adaptation Perspective
Muhammad Abdullah Jamal, Matthew Brown, Ming-Hsuan Yang, Liqiang Wang, Boqing Gong

Category-Level Articulated Object Pose Estimation
Xiaolong Li, He Wang, Li Yi, Leonidas Guibas, Amos Abbott, Shuran Song

AdaCoSeg: Adaptive Shape Co-Segmentation with Group Consistency Loss
Chenyang Zhu, Kai Xu, Siddhartha Chaudhuri, Li Yi, Leonidas Guibas, Hao Zhang

SpeedNet: Learning the Speediness in Videos
Sagie Benaim, Ariel Ephrat, Oran Lang, Inbar Mosseri, William Freeman, Michael Rubinstein, Michal Irani, Tali Dekel

BSP-Net: Generating Compact Meshes via Binary Space Partitioning
Zhiqin Chen, Andrea Tagliasacchi, Hao Zhang

SAPIEN: A SimulAted Part-based Interactive ENvironment
Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, Li Yi, Angel Chang, Leonidas Guibas, Hao Su

SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving
Zhenpei Yang, Yuning Chai, Dragomir Anguelov, Yin Zhou, Pei Sun, Dumitru Erhan, Sean Rafferty, Henrik Kretzschmar

Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks
Saurabh Singh, Shankar Krishnan

RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real
Kanishka Rao, Chris Harris, Alex Irpan, Sergey Levine, Julian Ibarz, Mohi Khansari

Open Compound Domain Adaptation
Ziwei Liu, Zhongqi Miao, Xingang Pan, Xiaohang Zhan, Dahua Lin, Stella X.Yu, and Boqing Gong

Posters
Single-view view synthesis with multiplane images
Richard Tucker, Noah Snavely

Adversarial Examples Improve Image Recognition
Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le

Adversarial Texture Optimization from RGB-D Scans
Jingwei Huang, Justus Thies, Angela Dai, Abhijit Kundu, Chiyu “Max” Jiang,Leonidas Guibas, Matthias Niessner, Thomas Funkhouser

Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline
Yu-Lun Liu, Wei-Sheng Lai, Yu-Sheng Chen, Yi-Lung Kao, Ming-Hsuan Yang,Yung-Yu Chuang, Jia-Bin Huang

Collaborative Distillation for Ultra-Resolution Universal Style Transfer
Huan Wang, Yijun Li, Yuehai Wang, Haoji Hu, Ming-Hsuan Yang

Learning to Autofocus
Charles Herrmann, Richard Strong Bowen, Neal Wadhwa, Rahul Garg, Qiurui He, Jonathan T. Barron, Ramin Zabih

Multi-Scale Boosted Dehazing Network with Dense Feature Fusion
Hang Dong, Jinshan Pan, Lei Xiang, Zhe Hu, Xinyi Zhang, Fei Wang, Ming-Hsuan Yang

Composing Good Shots by Exploiting Mutual Relations
Debang Li, Junge Zhang, Kaiqi Huang, Ming-Hsuan Yang

PatchVAE: Learning Local Latent Codes for Recognition
Kamal Gupta, Saurabh Singh, Abhinav Shrivastava

Neural Voxel Renderer: Learning an Accurate and Controllable Rendering Tool
Konstantinos Rematas, Vittorio Ferrari

Local Implicit Grid Representations for 3D Scenes
Chiyu “Max” Jiang, Avneesh Sud, Ameesh Makadia, Jingwei Huang, Matthias Niessner, Thomas Funkhouser

Large Scale Video Representation Learning via Relational Graph Clustering
Hyodong Lee, Joonseok Lee, Joe Yue-Hei Ng, Apostol (Paul) Natsev

Deep Homography Estimation for Dynamic Scenes
Hoang Le, Feng Liu, Shu Zhang, Aseem Agarwala

C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds
Albert Pumarola, Stefan Popov, Francesc Moreno-Noguer, Vittorio Ferrari

Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination
Pratul Srinivasan, Ben Mildenhall, Matthew Tancik, Jonathan T. Barron, Richard Tucker, Noah Snavely

Scale-space flow for end-to-end optimized video compression
Eirikur Agustsson, David Minnen, Nick Johnston, Johannes Ballé, Sung Jin Hwang, George Toderici

StructEdit: Learning Structural Shape Variations
Kaichun Mo, Paul Guerrero, Li Yi, Hao Su, Peter Wonka, Niloy Mitra, Leonidas Guibas

3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation
Francis Engelmann, Martin Bokeloh, Alireza Fathi, Bastian Leibe, Matthias Niessner

Sequential mastery of multiple tasks: Networks naturally learn to learn and forget to forget
Guy Davidson, Michael C. Mozer

Distilling Effective Supervision from Severe Label Noise
Zizhao Zhang, Han Zhang, Sercan Ö. Arik, Honglak Lee, Tomas Pfister

ViewAL: Active Learning With Viewpoint Entropy for Semantic Segmentation
Yawar Siddiqui, Julien Valentin, Matthias Niessner

Attribution in Scale and Space
Shawn Xu, Subhashini Venugopalan, Mukund Sundararajan

Weakly-Supervised Semantic Segmentation via Sub-category Exploration
Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung, Robinson Piramuthu, Yi-Hsuan Tsai, Ming-Hsuan Yang

Speech2Action: Cross-modal Supervision for Action Recognition
Arsha Nagrani, Chen Sun, David Ross, Rahul Sukthankar, Cordelia Schmid, Andrew Zisserman

Counting Out Time: Class Agnostic Video Repetition Counting in the Wild
Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman

The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction
Junwei Liang, Lu Jiang, Kevin Murphy, Ting Yu, Alexander Hauptmann

Self-training with Noisy Student improves ImageNet classification
Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. Le

EfficientDet: Scalable and Efficient Object Detection (see the blog post)
Mingxing Tan, Ruoming Pang, Quoc Le

ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning
Weiwei Sun, Wei Jiang, Eduard Trulls, Andrea Tagliasacchi, Kwang Moo Yi

VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation
Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Cordelia Schmid, Congcong Li

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization
Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc Le, Xiaodan Song

KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects
Xingyu Liu, Rico Jonschkowski, Anelia Angelova, Kurt Konolige

Structured Multi-Hashing for Model Compression
Elad Eban, Yair Movshovitz-Attias, Hao Wu, Mark Sandler, Andrew Poon, Yerlan Idelbayev, Miguel A. Carreira-Perpinan

DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes
Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Tom Funkhouser, Caroline Pantofaru, David Ross, Larry Davis, Alireza Fathi

Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation
Bowen Cheng, Maxwell Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam, Liang-Chieh Chen

Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection
Sara Beery, Guanhang Wu, Vivek Rathod, Ronny Votel, Jonathan Huang

Distortion Agnostic Deep Watermarking
Xiyang Luo, Ruohan Zhan, Huiwen Chang, Feng Yang, Peyman Milanfar

Can weight sharing outperform random architecture search? An investigation with TuNAS
Gabriel Bender, Hanxiao Liu, Bo Chen, Grace Chu, Shuyang Cheng, Pieter-Jan Kindermans, Quoc Le

GIFnets: Differentiable GIF Encoding Framework
Innfarn Yoo, Xiyang Luo, Yilin Wang, Feng Yang, Peyman Milanfar

Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models
Giannis Daras, Augustus Odena, Han Zhang, Alex Dimakis

Fast Sparse ConvNets
Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan

RetinaTrack: Online Single Stage Joint Detection and Tracking
Zhichao Lu, Vivek Rathod, Ronny Votel, Jonathan Huang

Learning to See Through Obstructions
Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang,Yung-Yu Chuang, Jia-Bin Huang

Self-Supervised Learning of Video-Induced Visual Invariances
Michael Tschannen, Josip Djolonga, Marvin Ritter, Aravindh Mahendran, Neil Houlsby, Sylvain Gelly, Mario Lucic

Workshops

3rd Workshop and Challenge on Learned Image Compression
Organizers include: George Toderici, Eirikur Agustsson, Lucas Theis, Johannes Ballé, Nick Johnston

CLVISION 1st Workshop on Continual Learning in Computer Vision
Organizers include: Zhiyuan (Brett) Chen, Marc Pickett

Embodied AI
Organizers include: Alexander Toshev, Jie Tan, Aleksandra Faust, Anelia Angelova

The 1st International Workshop and Prize Challenge on Agriculture-Vision: Challenges & Opportunities for Computer Vision in Agriculture
Organizers include: Zhen Li, Jim Yuan

Embodied AI
Organizers include: Alexander Toshev, Jie Tan, Aleksandra Faust, Anelia Angelova

New Trends in Image Restoration and Enhancement workshop and challenges on image and video restoration and enhancement (NTIRE)
Talk: “Sky Optimization: Semantically aware image processing of skies in low-light photography”
Orly Liba, Longqi Cai, Yun-Ta Tsai, Elad Eban, Yair Movshovitz-Attias, Yael Pritch, Huizhong Chen, Jonathan Barron

The End-of-End-to-End A Video Understanding Pentathlon
Organizers include: Rahul Sukthankar

4th Workshop on Media Forensics
Organizers include: Christoph Bregler

4th Workshop on Visual Understanding by Learning from Web Data
Organizers include: Jesse Berent, Rahul Sukthankar

AI for Content Creation
Organizers include: Deqing Sun, Lu Jiang, Weilong Yang

Fourth Workshop on Computer Vision for AR/VR
Organizers include: Sofien Bouaziz

Low-Power Computer Vision Competition (LPCVC)
Organizers include: Bo Chen, Andrew Howard, Jaeyoun Kim

Sight and Sound
Organizers include: William Freeman

Workshop on Efficient Deep Learning for Computer Vision
Organizers include: Pete Warden

Extreme classification in computer vision
Organizers include: Ramin Zabih, Zhen Li

Image Matching: Local Features and Beyond (see the blog post)
Organizers include: Eduard Trulls

The DAVIS Challenge on Video Object Segmentation
Organizers include: Alberto Montes, Jordi Pont-Tuset, Kevis-Kokitsi Maninis

2nd Workshop on Precognition: Seeing through the Future
Organizers include: Utsav Prabhu

Computational Cameras and Displays (CCD)
Talk: Orly Liba

2nd Workshop on Learning from Unlabeled Videos (LUV)
Organizers include:Honglak Lee, Rahul Sukthankar

7th Workshop on Fine Grained Visual Categorization (FGVC7) (see the blog post)
Organizers include: Christine Kaeser-Chen, Serge Belongie

Language & Vision with applications to Video Understanding
Organizers include: Lu Jiang

Neural Architecture Search and Beyond for Representation Learning
Organizers include: Barret Zoph

Tutorials

Disentangled 3D Representations for Relightable Performance Capture of Humans
Organizers include: Sean Fanello, Christoph Rhemann, Jonathan Taylor, Sofien Bouaziz, Adarsh Kowdle, Rohit Pandey, Sergio Orts-Escolano, Paul Debevec, Shahram Izadi

Learning Representations via Graph-Structured Networks
Organizers include:Chen Sun, Ming-Hsuan Yang

Novel View Synthesis: From Depth-Based Warping to Multi-Plane Images and Beyond
Organizers include:Varun Jampani

How to Write a Good Review
Talks by:Vittorio Ferrari, Bill Freeman, Jordi Pont-Tuset

Neural Rendering
Organizers include:Ricardo Martin-Brualla, Rohit K. Pandey, Sean Fanello,Maneesh Agrawala, Dan B. Goldman

Fairness Accountability Transparency and Ethics and Computer Vision
Organizers: Timnit Gebru, Emily Denton

Stanford AI Lab Papers and Talks at CVPR 2020

Stanford AI Lab Papers and Talks at CVPR 2020

The Conference on Computer Vision and Pattern Recognition (CVPR) 2020 is being hosted virtually from June 14th – June 19th. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to papers, videos and blogs below. Feel free to reach out to the contact authors directly to learn more about the work that’s happening at Stanford!

List of Accepted Papers

Action Genome: Actions as Compositions of Spatio-temporal Scene Graphs


Authors: Jingwei Ji, Ranjay Krishna, Li Fei-Fei, Juan Carlos Niebles

Contact: jingweij@cs.stanford.edu

Links: Paper

Keywords: action recognition, scene graph, video understanding, relationships, composition, action, activity, video


AdaCoSeg: Adaptive Shape Co-Segmentation with Group Consistency Loss


Authors: Chenyang Zhu, Kai Xu, Siddhartha Chaudhuri, Li Yi, Leonidas J. Guibas, Hao Zhang

Contact: guibas@cs.stanford.edu

Links: Paper

Keywords: shape segmentation, consistency


Adversarial Texture Optimization from RGB-D Scans


Authors: Jingwei Huang, Justus Thies, Angela Dai, Abhijit Kundu, Chiyu Jiang, Leonidas Guibas, Matthias Nießner, Thomas Funkhouser

Contact: jingweih@stanford,edu

Links: Paper | Video

Keywords: texture; adversarial;


Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data


Authors: Henry M. Clever, Zackory Erickson, Ari Kapusta, Greg Turk, C.Karen Liu, and Charlie C. Kemp

Contact: karenliu@cs.stanford.edu

Links: Paper | Video

Keywords: human pose estimation;


Category-Level Articulated Object Pose Estimation


Authors: Xiaolong Li, He Wang, Li Yi, Leonidas Guibas, A. Lynn Abbott, Shuran Song

Contact: hewang@stanford.edu

Award nominations: Oral presentation

Links: Paper | Video

Keywords: category level pose estimation, articulated object, 3d vision, point cloud, object part, object joint, segmentation, kinematic constraints


Few-Shot Video Classification via Temporal Alignment


Authors: Kaidi Cao, Jingwei Ji, Zhangjie Cao, Chien-Yi Chang, Juan Carlos Niebles

Contact: kaidicao@cs.stanford.edu

Links: Paper | Video

Keywords: video classification, few-shot learning, action recognition, temporal alignment


ImVoteNet: Boosting 3D Object Detection in Point Clouds With Image Votes


Authors: Charles R. Qi, Xinlei Chen, Or Litany, Leonidas J. Guibas

Contact: or.litany@gmail.com

Links: Paper

Keywords: 3d object detection, rgb-d, voting, point clouds, multi-modality, fusion, deep learning, object recognition.


Learning multiview 3D point cloud registration


Authors: Zan Gojcic, Caifa Zhou, Jan D. Wegner, Leonidas J. Guibas, Tolga Birdal

Contact: tbirdal@stanford.edu

Links: Paper | Video

Keywords: registration, multiview, 3d reconstruction, point clouds, global alignment, synchronization, 3d, local features, end to end, 3d matching


Robust Learning Through Cross-Task Consistency


Authors: Amir R. Zamir, Alexander Sax, Nikhil Cheerla, Rohan Suri, Zhangjie Cao, Jitendra Malik, Leonidas J. Guibas;

Contact: guibas@cs.stanford.edu

Links: Paper | Video

Keywords: multi-task learning, transfer learning, cycle consistency


SAPIEN: A SimulAted Part-based Interactive ENvironment


Authors: Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, Li Yi, Angel X.Chang, Leonidas J. Guibas, Hao Su

Contact: kaichunm@stanford.edu

Award nominations: Oral presentation

Links: Paper | Video

Keywords: robotic simulator, 3d shape parts, robotic manipulation, 3d vision and robotics


Spatio-Temporal Graph for Video Captioning with Knowledge Distillation


Authors: Boxiao Pan, Haoye Cai, De-An Huang, Kuan-Hui Lee, Adrien Gaidon, Ehsan Adeli, Juan Carlos Niebles

Contact: bxpan@stanford.edu

Links: Paper | Video

Keywords: video captioning, spatio-temporal graph, knowledge distillation, video understanding, vision and language.


StructEdit: Learning Structural Shape Variations


Authors: Kaichun Mo, Paul Guerrero, Li Yi, Hao Su, Peter Wonka, Niloy Mitra, Leonidas J. Guibas

Contact: kaichunm@stanford.edu

Links: Paper

Keywords: shape editing; shape structure; 3d vision and graphics


Synchronizing Probability Measures on Rotations via Optimal Transport


Authors: Tolga Birdal, Michael Arbel, Umut Şimşekli, Leonidas Guibas

Contact: tbirdal@stanford.edu

Links: Paper | Video

Keywords: synchronization, optimal transport, rotation averaging, slam, sfm, probability measure, riemannian, gradient descent, pose estimation


Unsupervised Learning From Video With Deep Neural Embeddings


Authors: Chengxu Zhuang, Tianwei She, Alex Andonian, Max Sobol Mark, Daniel Yamins

Contact: chengxuz@stanford.edu

Links: Paper

Keywords: unsupervised learning, self-supervised learning, video learning, contrastive learning, deep neural networks, action recognition, object recognition, two-pathway models


We look forward to seeing you at CVPR!

Read More

dm_control: Software and Tasks for Continuous Control

The dm_control software package is a collection of Python libraries and task suites for reinforcement learning agents in an articulated-body simulation. A MuJoCo wrapper provides convenient bindings to functions and data structures. The PyMJCF and Composer libraries enable procedural model manipulation and task authoring.Read More

Open Compound Domain Adaptation

Open Compound Domain Adaptation

The World is Continuously Varying

Imagine we want to train a self-driving car in New York so that we can take it
all the way to Seattle without tediously driving it for over 48 hours. We hope
our car can handle all kinds of environments on the trip and send us safely to
the destination. We know that road conditions and views can be very different.
It is intuitive to simply collect road data of this trip, let the car learn
from every possible condition, and hope it becomes the perfect self-driving car
for our New York to Seattle trip. It needs to understand the traffic and
skyscrapers in big cities like New York and Chicago, more unpredictable weather
in Seattle, mountains and forests in Montana, and all kinds of country views,
farmlands, animals, etc. However, how much data is enough? How many cities
should we collect data from? How many weather conditions should we consider? We
never know, and these questions never stop.




Figure 1: Domains boundaries are rarely clear. Therefore, it is hard to set up
definite domain descriptions for all possible domains.

Extracting Structured Data from Templatic Documents

Extracting Structured Data from Templatic Documents

Posted by Sandeep Tata, Software Engineer, Google Research

Templatic documents, such as receipts, bills, insurance quotes, and others, are extremely common and critical in a diverse range of business workflows. Currently, processing these documents is largely a manual effort, and automated systems that do exist are based on brittle and error-prone heuristics. Consider a document type like invoices, which can be laid out in thousands of different ways — invoices from different companies, or even different departments within the same company, may have slightly different formatting. However, there is a common understanding of the structured information that an invoice should contain, such as an invoice number, an invoice date, the amount due, the pay-by date, and the list of items for which the invoice was sent. A system that can automatically extract all this data has the potential to dramatically improve the efficiency of many business workflows by avoiding error-prone, manual work.

In “Representation Learning for Information Extraction from Form-like Documents”, accepted to ACL 2020, we present an approach to automatically extract structured data from templatic documents. In contrast to previous work on extraction from plain-text documents, we propose an approach that uses knowledge of target field types to identify candidate fields. These are then scored using a neural network that learns a dense representation of each candidate using the words in its neighborhood. Experiments on two corpora (invoices and receipts) show that we’re able to generalize well to unseen layouts.

Why Is This Hard?
The challenge in this information extraction problem arises because it straddles the natural language processing (NLP) and computer vision worlds. Unlike classic NLP tasks, such documents do not contain “natural language” as might be found in regular sentences and paragraphs, but instead resemble forms. Data is often presented in tables, but in addition many documents have multiple pages, frequently with a varying number of sections, and have a variety of layout and formatting clues to organize the information. An understanding of the two-dimensional layout of text on the page is key to understanding such documents. On the other hand, treating this purely as an image segmentation problem makes it difficult to take advantage of the semantics of the text.

Solution Overview
Our approach to this problem allows developers to train and deploy an extraction system for a given domain (like invoices) using two inputs — a target schema (i.e., a list of fields to extract and their corresponding types) and a small collection of documents labeled with the ground truth for use as a training set. Supported field types include basics, such as dates, integers, alphanumeric codes, currency amounts, phone-numbers, and URLs. We also take advantage of entity types commonly detected by the Google Knowledge Graph, such as addresses, names of companies, etc.

The input document is first run through an Optical Character Recognition (OCR) service to extract the text and layout information, which allows this to work with native digital documents, such as PDFs, and document images (e.g., scanned documents). We then run a candidate generator that identifies spans of text in the OCR output that might correspond to an instance of a given field. The candidate generator utilizes pre-existing libraries associated with each field type (date, number, phone-number, etc.), which avoids the need to write new code for each candidate generator. Each of these candidates is then scored using a trained neural network (the “scorer”, described below) to estimate the likelihood that it is indeed a value one might extract for that field. Finally, an assigner module matches the scored candidates to the target fields. By default, the assigner simply chooses the highest scoring candidate for the field, but additional domain-specific constraints can be incorporated, such as requiring that the invoice date field is chronologically before the payment date field.

The processing steps in the extraction system using a toy schema with two fields on an input invoice document. Blue boxes show the candidates for the invoice_date field and gold boxes for the amount_due field.

Scorer
The scorer is a neural model that is trained as a binary classifier. It takes as input the target field from the schema along with the extraction candidate and produces a prediction score between 0 and 1. The target label for a candidate is determined by whether the candidate matches the ground truth for that document and field. The model learns how to represent each field and each candidate in a vector space in which the nearer a field and candidate are in the vector space, the more likely it is that the candidate is the true extraction value for that field and document.

Candidate Representation
A candidate is represented by the tokens in its neighborhood along with the relative position of the token on the page with respect to the centroid of the bounding box identified for the candidate. Using the invoice_date field as an example, phrases in the neighborhood like “Invoice Date’” or “Inv Date” might indicate to the scorer that this is a likely candidate, while phrases like “Delivery Date” would indicate that this is likely not the invoice_date. We do not include the value of the candidate in its representation in order to avoid overfitting to values that happen to be present in a small training data set — e.g., “2019” for the invoice date, if the training corpus happened to include only invoices from that year.

A small snippet of an invoice. The green box shows a candidate for the invoice_date field, and the red box is a token in the neighborhood along with the arrow representing the relative position. Each of the other tokens (‘number’, ‘date’, ‘page’, ‘of’, etc along with the other occurrences of ‘invoice’) are part of the neighborhood for the invoice candidate.

Model Architecture
The figure below shows the general structure of the network. In order to construct the candidate encoding (i), each token in the neighborhood is embedded using a word embedding table (a). The relative position of each neighbor (b) is embedded using two fully connected ReLU layers that capture fine-grained non-linearities. The text and position embeddings for each neighbor are concatenated to form a neighbor encoding (d). A self attention mechanism is used to incorporate the neighborhood context for each neighbor (e), which is combined into a neighborhood encoding (f) using max-pooling. The absolute position of the candidate on the page (g) is embedded in a manner similar to the positional embedding for a neighbor, and concatenated with the neighborhood encoding for the candidate encoding (i). The final scoring layer computes the cosine similarity between the field embedding (k) and the candidate encoding (i) and then rescales it to be between 0 and 1.

Results
For training and validation, we used an internal dataset of invoices with a large variety of layouts. In order to test the ability of the model to generalize to unseen layouts, we used a test-set of invoices with layouts that were disjoint from the training and validation set. We report the F1 score of the extractions from this system on a few key fields below (higher is better):

Field F1 Score
amount_due 0.801
delivery_date 0.667
due_date 0.861
invoice_date 0.940
invoice_id 0.949
purchase_order 0.896
total_amount 0.858
total_tax_amount 0.839

As you can see from the table above, the model does well on most fields. However, there’s room for improvement for fields like delivery_date. Additional investigation revealed that this field was present in a very small subset of the examples in our training data. We expect that gathering additional training data will help us improve on it.

What’s next?
Google Cloud recently announced an invoice parsing service as part of the Document AI product. The service uses the methods described above, along with other recent research breakthroughs like BERT, to extract more than a dozen key fields from invoices. You can upload an invoice at the demo page and see this technology in action!

For a given document type we expect to be able to build an extraction system given a modest sized labeled corpus. There are several follow-ons we are currently pursuing, including the improvement of data efficiency and accurately handling nested and repeated fields, and fields for which it is difficult to define a good candidate generator.

Acknowledgements
This work was a collaboration between Google Research and several engineers in Google Cloud. I’d like to thank Navneet Potti, James Wendt, Marc Najork, Qi Zhao, and Ivan Kuznetsov in Google Research as well as Lauro Costa, Evan Huang, Will Lu, Lukas Rutishauser, Mu Wang, and Yang Xu on the Cloud AI team for their support. And finally, our research interns Bodhisattwa Majumder and Beliz Gunel for their tireless experimentation on dozens of ideas.

Unlocking the "Chemome" with DNA-Encoded Chemistry and Machine Learning

Unlocking the “Chemome” with DNA-Encoded Chemistry and Machine Learning

Posted by Patrick Riley, Principal Engineer, Accelerated Science Team, Google Research

Much of the development of therapeutics for human disease is built around understanding and modulating the function of proteins, which are the main workhorses of many biological activities. Small molecule drugs such as ibuprofen often work by inhibiting or promoting the function of proteins or their interactions with other biomolecules. Developing useful “virtual screening” methods where potential small molecules can be evaluated computationally rather than in a lab, has long been an area of research. However, the persistent challenge is to build a method that works well enough across a wide range of chemical space to be useful for finding small molecules with physically verified useful interaction with a protein of interest, i.e., “hits”.

In “Machine learning on DNA-encoded libraries: A new paradigm for hit-finding”, recently published in the Journal of Medicinal Chemistry, we worked in collaboration with X-Chem Pharmaceuticals to demonstrate an effective new method for finding biologically active molecules using a combination of physical screening with DNA-encoded small molecule libraries and virtual screening using a graph convolutional neural network (GCNN). This research has led to the creation of the Chemome initiative, a cooperative project between our Accelerated Science team and ZebiAI that will enable the discovery of many more small molecule chemical probes for biological research.

Background on Chemical Probes
Making sense of the biological networks that support life and produce disease is an immensely complex task. One approach to study these processes is using chemical probes, small molecules that aren’t necessarily useful as drugs, but that selectively inhibit or promote the function of specific proteins. When you have a biological system to study (such as cancer cells growing in a dish), you can add the chemical probe at a specific time and observe how the biological system responds differently when the targeted protein has increased or decreased activity. But, despite how useful chemical probes are for this kind of basic biomedical research, only 4% of human proteins have a known chemical probe available.

The process of finding chemical probes begins similarly to the earliest stages of small molecule drug discovery. Given a protein target of interest, the space of small molecules is scanned to find “hit” molecules that can be further tested. Robotic assisted high throughput screening where up to hundred of thousands or millions of molecules are physically tested is a cornerstone of modern drug research. However, the number of small molecules you can easily purchase (1.2×109) is much larger than that, which in turn is much smaller than the number of small drug like molecules (estimates from 1020 to 1060). “Virtual screening” could possibly quickly and efficiently search this vast space of potentially synthesizable molecules and greatly speed up the discovery of therapeutic compounds.

DNA-Encoded Small Molecule Library Screening
The physical part of the screening process uses DNA-encoded small molecule libraries (DELs), which contain many distinct small molecules in one pool, each of which is attached to a fragment of DNA serving as a unique barcode for that molecule. While this basic technique has been around for several decades, the quality of the library and screening process is key to producing meaningful results.

DELs are a very clever idea to solve a biochemical challenge, which is how to collect small molecules into one place with an easy way to identify each. The key is to use DNA as a barcode to identify each molecule, similar to Nobel Prize winning phage display technology. First, one generates many chemical fragments, each with a unique DNA barcode attached, along with a common chemical handle (the NH2 in this case). The results are then pooled and split into separate reactions where a set of distinct chemical fragments with another common chemical handle (e.g., OH) are added. The chemical fragments from the two steps react and fuse together at the common chemical handles. The DNA fragments are also connected to build one continuous barcode for each molecule. The net result is that by performing 2N operations, one gets N2 unique molecules, each of which is identified by its own unique DNA barcode. By using more fragments or more cycles, it’s relatively easy to make libraries with millions or even billions of distinct molecules.

An overview of the process of creating a DNA encoded small molecule library. First, DNA “barcodes” (represented here with numbered helices) are attached to small chemical fragments (the blue shapes) which expose a common chemical “handle” (e.g. the NH2 shown here). When mixed with other chemical fragments (the orange shapes) each of which has another exposed chemical “handle” (the OH) with attached DNA fragments, reactions merge the sets of chemical and DNA fragments, resulting in a voluminous library of small molecules of interest, each with a unique DNA “barcode”.

Once the library has been generated, it can be used to find the small molecules that bind to the protein of interest by mixing the DEL together with the protein and washing away the small molecules that do not attach. Sequencing the remaining DNA barcodes produces millions of individual reads of DNA fragments, which can then be carefully processed to estimate which of the billions of molecules in the original DEL interact with the protein.

Machine Learning on DEL Data
Given the physical screening data returned for a particular protein, we build an ML model to predict whether an arbitrarily chosen small molecule will bind to that protein. The physical screening with the DEL provides positive and negative examples for an ML classifier. To simplify slightly, the small molecules that remain at the end of the screening process are positive examples and everything else are negative examples. We use a graph convolutional neural network, which is a type of neural network specially designed for small graph-like inputs, such as the small molecules in which we are interested.

Results
We physically screened three diverse proteins using DEL libraries: sEH (a hydrolase), ERα (a nuclear receptor), and c-KIT (a kinase). Using the DEL-trained models, we virtually screened large make-on-demand libraries from Mcule and an internal molecule library at X-Chem to identify a diverse set of molecules predicted to show affinity with each target. We compared the results of the GCNN models to a random forest (RF) model, a common method for virtual screening that uses standard chemical fingerprints, which we use as baseline. We find that the GCNN model significantly outperforms the RF model in discovering more potent candidates.

Fraction of molecules (“hit rates”) from those tested showing various levels of activity, comparing predictions from two different machine learned models (a GCNN and random forests, RF) on three distinct protein targets. The color scale on the right uses a common metric IC50 for representing the potency of a molecule. nM means “nanomolar” and µM means “micromolar”. Smaller values / darker colors are generally better molecules. Note that typical virtual screening approaches not built with DEL data normally only reach a few percent on this scale.

Importantly, unlike many other uses of virtual screening, the process to select the molecules to test was automated or easily automatable given the results of the model, and we did not rely on review and selection of the most promising molecules by a trained chemist. In addition, we tested almost 2000 molecules across the three targets, the largest published prospective study of virtual screening of which we are aware. While providing high confidence on the hit rates above, this also allows one to carefully examine the diversity of hits and the usefulness of the model for molecules near and far from the training set.

The Chemome Initiative
ZebiAI Therapeutics was founded based on the results of this research and has partnered with our team and X-Chem Pharmaceuticals to apply these techniques to efficiently deliver new chemical probes to the research community for human proteins of interest, an effort called the Chemome Initiative.

As part of the Chemome Initiative, ZebiAI will work with researchers to identify proteins of interest and source screening data, which our team will use to build machine learning models and make predictions on commercially available libraries of small molecules. ZebiAI will provide the predicted molecules to researchers for activity testing and will collaborate with researchers to advance some programs through discovery. Participation in the program requires that the validated hits be published within a reasonable time frame so that the whole community can benefit. While more validation must be done to make the hit molecules useful as chemical probes, especially for specifically targeting the protein of interest and the ability to function correctly in common assays, having potent hits is a big step forward in the process.

We’re excited to be a part of the Chemome Initiative enabled by the effective ML techniques described here and look forward to its discovery of many new chemical probes. We expect the Chemome will spur significant new biological discoveries and ultimately accelerate new therapeutic discovery for the world.

Acknowledgements
This work represents a multi-year effort between the Accelerated Science Team and X-Chem Pharmaceuticals with many people involved. This project would not have worked without the combined diverse skills of biologists, chemists, and ML researchers. We should especially acknowledge Eric Sigel (of X-Chem, now at ZebiAI) and Kevin McCloskey (of Google), the first authors on the paper and Steve Kearnes (of Google) for core modelling ideas and technical work.