Reducing the Computational Cost of Deep Reinforcement Learning Research

Posted by Pablo Samuel Castro, Staff Software Engineer, Google Research

It is widely accepted that the enormous growth of deep reinforcement learning research, which combines traditional reinforcement learning with deep neural networks, began with the publication of the seminal DQN algorithm. This paper demonstrated the potential of this combination, showing that it could produce agents that could play a number of Atari 2600 games very effectively. Since then, there have been several approaches that have built on and improved the original DQN. The popular Rainbow algorithm combined a number of these recent advances to achieve state-of-the-art performance on the ALE benchmark. This advance, however, came at a very high computational cost, which has the unfortunate side effect of widening the gap between those with ample access to computational resources and those without.

In “Revisiting Rainbow: Promoting more Insightful and Inclusive Deep Reinforcement Learning Research”, to be presented at ICML 2021, we revisit this algorithm on a set of small- and medium-sized tasks. We first discuss the computational cost associated with the Rainbow algorithm. We explore how the same conclusions regarding the benefits of combining the various algorithmic components can be reached with smaller-scale experiments, and further generalize that idea to how research done on a smaller computational budget can provide valuable scientific insights.

The Cost of Rainbow
A major reason for the computational cost of Rainbow is that the standards in academic publishing often require evaluating new algorithms on large benchmarks like ALE, which consists of 57 Atari 2600 games that reinforcement learning agents may learn to play. For a typical game, it takes roughly five days to train a model using a Tesla P100 GPU. Furthermore, if one wants to establish meaningful confidence bounds, it is common to perform at least five independent runs. Thus, to train Rainbow on the full suite of 57 games required around 34,200 GPU hours (or 1425 days) in order to provide convincing empirical performance statistics. In other words, such experiments are only feasible if one is able to train on multiple GPUs in parallel, which can be prohibitive for smaller research groups.

Revisiting Rainbow
As in the original Rainbow paper, we evaluate the effect of adding the following components to the original DQN algorithm: double Q-learning, prioritized experience replay, dueling networks, multi-step learning, distributional RL, and noisy nets.

We evaluate on a set of four classic control environments, which can be fully trained in 10-20 minutes (compared to five days for ALE games):

Upper left: In CartPole, the task is to balance a pole on a cart that the agent can move left and right. Upper right: In Acrobot, there are two arms and two joints, where the agent applies force to the joint between the two arms in order to raise the lower arm above a threshold. Lower left: In LunarLander, the agent is meant to land the spaceship between the two flags. Lower right: In MountainCar, the agent must build up momentum between two hills to drive to the top of the rightmost hill.

We investigated the effect of both independently adding each of the components to DQN, as well as removing each from the full Rainbow algorithm. As in the original Rainbow paper, we find that, in aggregate, the addition of each of these algorithms does improve learning over the base DQN. However, we also found some important differences, such as the fact that distributional RL — commonly thought to be a positive addition on its own — does not always yield improvements on its own. Indeed, in contrast to the ALE results in the Rainbow paper, in the classic control environments, distributional RL only yields an improvement when combined with another component.

Each plot shows the training progress when adding the various components to DQN. The x-axis is training steps,the y-axis is performance (higher is better).
Each plot shows the training progress when removing the various components from Rainbow. The x-axis is training steps,the y-axis is performance (higher is better).

We also re-ran the Rainbow experiments on the MinAtar environment, which consists of a set of five miniaturized Atari games, and found qualitatively similar results. The MinAtar games are roughly 10 times faster to train than the regular Atari 2600 games on which the original Rainbow algorithm was evaluated, but still share some interesting aspects, such as game dynamics and having pixel-based inputs to the agent. As such, they provide a challenging mid-level environment, in between the classic control and the full Atari 2600 games.

When viewed in aggregate, we find our results to be consistent with those of the original Rainbow paper — the impact resulting from each algorithmic component can vary from environment to environment. If we were to suggest a single agent that balances the tradeoffs of the different algorithmic components, our version of Rainbow would likely be consistent with the original, in that combining all components produces a better overall agent. However, there are important details in the variations of the different algorithmic components that merit a more thorough investigation.

Beyond the Rainbow
When DQN was introduced, it made use of the Huber loss and the RMSProp Optimizer. It has been common practice for researchers to use these same choices when building on DQN, as most of their effort is spent on other algorithmic design decisions. In the spirit of reassessing these assumptions, we revisited the loss function and optimizer used by DQN on a lower-cost, small-scale classic control and MinAtar environments. We ran some initial experiments using the Adam optimizer, which has lately been the most popular optimizer choice, combined with a simpler loss function, the mean-squared error loss (MSE). Since the selection of optimizer and loss function is often overlooked when developing a new algorithm, we were surprised to see that we observed a dramatic improvement on all the classic control and MinAtar environments.

We thus decided to evaluate the different ways of combining the two optimizers (RMSProp and Adam) with the two losses (Huber and MSE) on the full ALE suite (60 Atari 2600 games). We found that Adam+MSE is a superior combination than RMSProp+Huber.

Measuring the improvement Adam+MSE gives over the default DQN settings (RMSProp + Huber); higher is better.

Additionally, when comparing the various optimizer-loss combinations, we find that when using RMSProp, the Huber loss tends to perform better than MSE (illustrated by the gap between the solid and dotted orange lines).

Normalized scores aggregated over all 60 Atari 2600 games, comparing the different optimizer-loss combinations.

Conclusion
On a limited computational budget we were able to reproduce, at a high-level, the findings of the Rainbow paper and uncover new and interesting phenomena. Evidently it is much easier to revisit something than to discover it in the first place. Our intent with this work, however, was to argue for the relevance and significance of empirical research on small- and medium-scale environments. We believe that these less computationally intensive environments lend themselves well to a more critical and thorough analysis of the performance, behaviors, and intricacies of new algorithms.

We are by no means calling for less emphasis to be placed on large-scale benchmarks. We are simply urging researchers to consider smaller-scale environments as a valuable tool in their investigations, and reviewers to avoid dismissing empirical work that focuses on smaller-scale environments. By doing so, in addition to reducing the environmental impact of our experiments, we will get both a clearer picture of the research landscape and reduce the barriers for researchers from diverse and often underresourced communities, which can only help make our community and scientific advances stronger.

Acknowledgments
Thank you to Johan, the first author of this paper, for his hard work and persistence in seeing this through! We would also like to thank Marlos C. Machado, Sara Hooker, Matthieu Geist, Nino Vieillard, Hado van Hasselt, Eleni Triantafillou, and Brian Tanner for their insightful comments on this work.

Read More

The TensorFlow Developer Certificate turns 1!

Posted by Alina Shinkarsky and Jocelyn Becker on behalf of the TensorFlow Team

The TensorFlow Developer Certificate exam is celebrating its first anniversary with a big milestone: more than 3,000 people have passed the exam! Successful test takers received the official TensorFlow Developer Certificate and badge. They also had the opportunity to showcase their skill set on social networks such as LinkedIn and the TensorFlow Certificate Network, where recruiters are able to seek out entry-level TensorFlow developers.

The TF Certificate program bridges the gap between the demand from companies for data-knowledgeable, production ML-capable engineers — and the students and developers around the world interested in getting a job in ML.

The goal of this program is to provide developers around the world with the opportunity to demonstrate their skills in ML in an increasingly AI-driven global job market. This is a foundational certificate for students, developers, and data scientists who want to demonstrate practical machine learning skills through building and training basic models using TensorFlow.

We’ve followed up with folks who have taken the exam to understand the impact on their professional lives.

Fran shared, “Lost my job due to COVID 1 month before taking the exam, hired by Google in August, I think this cert helped a lot for my resume to be selected for interviews :)“

Fran is now a Conversational AI engineer, and has been working at Google for over 6 months.

photo of a googler wearing a noogler hat

Tia shared, “I was a stay-at-home mom when I started to learn Machine Learning at Google Bangkit Academy back in 2020. Bangkit supported us to take TF Certification and with this certificate, I was able to get back to work after years of hiatus. My current role is Machine Learning Curriculum Developer in Dicoding, an online technology education platform in Indonesia.”

Check out the short video below to hear Tia’s story.

Are you interested in earning the TensorFlow Developer Certificate? Learn more about the TensorFlow Developer Certificate on our website, including information on exam criteria, exam cost, and a stipend application to ensure taking this certificate exam is accessible, regardless of income.

If you’ve taken the exam and have feedback or would like to share your story, we would love to hear from you!

We look forward to growing this community of TensorFlow Developer Certificate recipients, and are immensely thankful to the continued contributions of our open source community!

Note: Participating in the program and/or obtaining this certificate are not endorsements of a participant’s employability nor guarantee of future work performance.

Read More

AutoX Unveils Full Self-Driving System Powered by NVIDIA DRIVE

Your robotaxi is arriving soon.

Self-driving startup AutoX last week took the wraps off its “Gen5” self-driving system. The autonomous driving platform, which is specifically designed for robotaxis, uses NVIDIA DRIVE automotive-grade GPUs to reach up to 2,200 trillion operations per second (TOPS) of AI compute performance.

In January, AutoX launched a commercial robotaxi system in Shenzhen, China, becoming one of the first autonomous driving companies in the world to provide full self-driving mobility services with no safety driver behind the wheel. The Gen5 system is the next step in its global rollout of safer, more efficient autonomous transportation.

“Safety is key. We need higher processing performance for safe and scalable robotaxi operations,” said Jianxiong Xiao, founder and CEO at AutoX. “With NVIDIA DRIVE, we now have power for more redundancy in a form factor that is automotive grade and more compact.”

Zero Blind Spots

In developing safe self-driving technology, AutoX is aimed at solving the toughest environments first — specifically high-traffic, urban areas.

At the Gen5 Release Event, the company livestreamed its fully driverless robotaxi transporting a passenger through challenging narrow streets in China, called the “Urban Village.”

Safely navigating such chaotic streets requires sensors that can detect obstacles and other road users with the highest levels of accuracy. The Gen5 system relies on 28 automotive-grade camera sensors generating more than 200 million pixels per frame 360-degrees around the car. (For comparison, a single high-definition video frame contains about 2 million pixels.)

In addition to cameras, the robotaxi system includes six high-resolution lidar sensors that produce 15 million points per second and surround 4D radar.

At the center of the Gen5 system are two NVIDIA Ampere architecture GPUs that deliver 900 TOPS each for a truly level 4 autonomous, production platform. With this unprecedented level of AI compute at the core, Gen5 has enough performance to power ultra complex self-driving DNNs while maintaining the compute headroom for more advanced upgrades.

This capability makes it possible for the vehicles to react to high-traffic situations — like dozens of motorcycles and scooters cutting in or riding the opposite way at the same time — in real time, and continually improving, learning how to manage new scenarios as they arise.

More Stops Added

The Shenzhen fully driverless robotaxi service is just the first stop in AutoX’s roadmap to deploy a global driverless vehicle platform.

With a population of more than 12 million people and ranking in the top 50 of global cities with the heaviest traffic, Shenzhen provides an ideal setting for developing a scalable robotaxi model.

The startup plans to roll out thousands of autonomous vehicles powered by the Gen5 system over the next couple of years and expand to multiple cities around the world. AutoX is working with partners such as Stellantis and Honda to integrate their technology in a variety of vehicle platforms.

By leveraging the open, scalable NVIDIA DRIVE platform for each of these use cases, the opportunities for the road ahead are limitless.

The post AutoX Unveils Full Self-Driving System Powered by NVIDIA DRIVE appeared first on The Official NVIDIA Blog.

Read More

Reasoning with Language Models and Knowledge Graphs for Question Answering

Question Answering with Knowledge

From search engines to personal assistants, we use question-answering systems every day. When we ask a question (“Where was the painter of the Mona Lisa born?”), the system needs to gather background knowledge (“The Mona Lisa was painted by Leonardo da Vinci”, “Leonardo da Vinci was born in Italy”) and reason over it to produce the answer (“Italy”).

Knowledge sources
In recent AI research, such background knowledge is commonly available in the forms of knowledge graphs (KGs) and language models (LMs) pre-trained on a large set of documents. In KGs, entities are represented as nodes and relations between them as edges, e.g. [Leonardo da Vinci — born in — Italy]. Examples of KGs include Freebase (general-purpose facts)1, ConceptNet (commonsense)2, and UMLS (biomedical facts)3. Examples of pre-trained LMs include BERT (trained on Wikipedia articles and 10,000 books)4, RoBERTa (extending BERT)5, BioBERT (trained on biomedical publications)6, and GPT-3 (the largest public LM to date)7.

The two knowledge sources have complementary strengths. LMs can be pre-trained on any unstructured text and thus have a broad coverage of knowledge. On the other hand, KGs are more structured and help for logical reasoning by providing paths between entities. KGs also include knowledge that may not be commonly stated in text: for instance, people do not often state obvious facts like “people breathe” and compositional sentences like “The birthplace of the painter of the Mona Lisa is Italy”.

In our recent work8 published at NAACL 2021, we study how to effectively combine both sources of knowledge, LMs and KGs, to perform question answering.

Problem setup and Challenges
We consider a question answering setup illustrated in the figure below, where given a question and answer choices if any (combined, we call them the QA context) the system predicts an answer. Using LMs and KGs for question answering presents two challenges. Given a QA context (purple box in the figure), the system needs to first identify informative knowledge from a large KG (green box), and then capture the nuance of the QA context and the structure of the KG to jointly reason over them.

In existing systems that use LMs and KGs, such as RelationNet9, KagNet10 and MHGRN11, extracted KG subgraphs tended to be noisy, and the interactions between the QA context and KG were not modeled. In this work, we introduce promising solutions to the aforementioned two challenges: i) KG relevance scoring, where we estimate the relevance of KG nodes conditioned on the QA context, and ii) Joint graph, where we connect the QA context and KG as a joint graph to model their interactions.

Approach

We design an end-to-end question answering model that uses a pre-trained LM and KG. First, as commonly done in existing systems, we use an LM to obtain a vector representation for the QA context, and retrieve a KG subgraph by entity linking. Then, in order to identify informative knowledge from the KG, we estimate the relevance of KG nodes conditioned on the QA context (see the “KG Relevance Scoring” section below). Next, to jointly reason with the QA context and KG, we connect them as a joint graph and update their representations (see the “Joint Reasoning” section below). Finally, we combine the representations of the QA context and KG to predict the answer.

KG Relevance Scoring
Real-world KGs are huge, with millions of entities. How can we effectively extract a KG subgraph that is most relevant to the given question? Let’s consider an example question in the figure: “A revolving door is convenient for two direction travel, but also serves as a security measure at what?”. Common methods for extracting a KG subgraph link entities in the QA context such as “travel”, “door”, “security” and “bank” (topic entities; blue and red nodes in the figure left) and retrieve their 1- or 2-hop neighbors from the KG (gray nodes in the figure left). However, this may introduce many entity nodes that are semantically irrelevant to the QA context, especially when the number of hops or entities in the QA context increases. In this example, 1-hop neighbors may include nodes like “holiday”, “riverbank”, “human” and “place”, but they are off-topic or too generic.

Joint Reasoning
Now we have the QA context and the retrieved KG ready. How can we jointly reason over them to obtain the answer? To create a joint reasoning space, we explicitly connect them in a graph, where we view the QA context as a node (purple node in the figure) and connect it to each topic entity in the KG (blue and red nodes in the figure). As this joint graph intuitively provides a working memory for reasoning, we call it the working graph. Each node in the working graph is associated with one of the four types: purple is the QA context node, blue is an entity in the question, orange is an entity in the answer choices, and gray is any other entity. The representation of each node is initialized as the LM representation of the QA context (for the QA context node) or entity name (for KG nodes). The working graph essentially unifies the two modalities, text and KG, into one graph.

To reason on the working graph, we mutually update the representation of the QA context node and the KG via graph attention networks (GAT). The basic idea of GAT is to update the representation of each node by letting neighboring nodes send message vectors to each other for multiple layers. Concretely, in our model, we update the representation of each node t by the rule shown on the figure right, where m is the message vector from the neighbor node s, α is the attention weight between the current node t and neighbor node s. For more details about GAT, we refer readers to 12. Below are examples of how the message passing can look like, where a thicker edge indicates a higher attention weight.

Let’s use our question answering model!

We apply and evaluate our question answering model (we call QA-GNN) on two QA benchmarks that require reasoning with knowledge:

  • CommonsenseQA13: contains questions that test commonsense knowledge (e.g. “What do people typically do while playing guitar?”)
  • OpenBookQA14: contains questions that test elementary science knowledge (e.g. “Which of the following objects would let the most heat travel through?”)

For our LM component, we use RoBERTa, which was pre-trained on Wikipedia articles, books and other popular web documents. For our KG component, we use ConceptNet, which contains a million entities and covers commonsense facts such as [round brush — used for — painting].

QA-GNN improves on existing methods of using LMs and KGs for question answering
We compare with a baseline that only uses the LM (RoBERTa) without the KG, and existing LM+KG models (RelationNet, KagNet and MHGRN). The main innovations we made in QA-GNN are that we perform the KG relevance scoring w.r.t. questions and that we mutually update the text and KG representations on the joint graph, while existing methods combined text and KG representations at later stages. We find that these two techniques provide improvement on the question answering accuracy e.g. 71%→73% on CommonsenseQA and 67%→70% on OpenBookQA (figure below).

Case studies: When is KG helpful and when is LM?
Let’s look at several question-answering examples in the CommonsenseQA benchmark, and see when/how the KG component or the LM component of our model is helpful. In each figure below, blue nodes are entities in the question, and red nodes are answer choices, where the bolded entity is the correct answer and the entity with (P) is the prediction by our model. As shown in the next two figures, we find that the KG component is especially useful when the KG provides concrete facts (e.g. [postpone — antonym — hasten] in the first figure) or paths (e.g. [chicken egg — egg — chicken — barn] in the second figure) that help for answering the questions.

On the other hand, we find that the LM component is especially helpful when the question requires language nuance and commonsense that are not available in the KG. For instance, in the next two figures, if we simply follow the paths in the KG, we may reach answers like “night sky” or “water” in the first and second questions respectively. While they are not completely wrong answers, “universe” and “soup” are better collocations.

Conclusion

In this work, we studied how to combine two sources of background knowledge (pre-trained LM and KG) to do better in question answering. To solve this problem, we introduced a new model QA-GNN, which has two innovations:

  1. KG relevance scoring: We use a pre-trained LM to score KG nodes conditioned on a question. This is a general framework to weight information on KGs.
  2. Joint reasoning over text and KGs: We connect the QA context and KG to form a joint graph, and mutually update their representations via a LM and graph neural network.

Through case studies we also identified the complementary strengths of pre-trained LMs and KGs as knowledge sources.

You can check out our full paper here and our source code/data on GitHub. If you have questions, please feel free to email us.

Acknowledgments

This blog post is based on the paper:

Many thanks to my collaborators and advisors, Hongyu Ren, Antoine Bosselut, Percy Liang and Jure Leskovec for their help. Many thanks to Megha Srivastava and Sidd Karamcheti for edits on this blog post.

  1. Freebase: a collaboratively created graph database for structuring human knowledge. Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. 

  2. Conceptnet 5.5: An open multilingual graph of general knowledge. Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. Website here

  3. The unified medical language system (UMLS): integrating biomedical terminology. Olivier Bodenreider. 2004 

  4. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. 2019 

  5. RoBERTa: A Robustly Optimized BERT Pretraining Approach. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. 2019 

  6. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, Jaewoo Kang. 2019. 

  7. Language Models are Few-Shot Learners. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei. 2020 

  8. QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering. Michihiro Yasunaga, Hongyu Ren, Antoine Bosselut, Percy Liang, and Jure Leskovec. 2021. 

  9. A Simple Neural Network Module for Relational Reasoning. Adam Santoro, David Raposo, David G Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, and Timothy Lillicrap. 2017. 

  10. Kagnet: Knowledge-aware Graph Networks for Commonsense Reasoning. Bill Yuchen Lin, Xinyue Chen, Jamin Chen, and Xiang Ren. 2019. 

  11. Scalable multi-hop relational reasoning for knowledge-aware question answering. Yanlin Feng, Xinyue Chen, Bill Yuchen Lin, Peifeng Wang, Jun Yan, and Xiang Ren. 2020. 

  12. Graph Attention Networks. Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio. 2018. 

  13. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge. Alon Talmor, Jonathan Herzig, Nicholas Lourie, Jonathan Berant. 2019. Dataset website here

  14. Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering. Todor Mihaylov, Peter Clark, Tushar Khot, Ashish Sabharwal. 2018. Dataset website here.s 

Read More

Launching the AI Academy for small newsrooms

As people searched for the latest information on COVID-19 last year, including school reopenings and travel restrictions, the BBC recognized they needed to find new ways of bringing their journalism to their audiences. They released a new online tool, the BBC Corona Bot, which uses artificial intelligence to draw on BBC News’ explanatory journalism. It responds with an answer to a reader’s specific question where possible, or points to health authorities’ websites when appropriate. AI technology allowed BBC News to reach new audiences and drive more traffic to their stories and explainers. 

This is one example of how AI can help newsrooms. AI can help build new audiences and automate tasks, freeing up time for journalists to work on the more creative aspects of news production and leaving tedious and repetitive tasks to machines. However, newsrooms around the world have told researchers they worry that access to AI technology is unequal. They fear big publishers likely will benefit most from artificial intelligence, while smaller news organizations could get left behind. 

To help bridge this gap, the Google News Initiative is partnering with Polis, the London School of Economics and Political Science’s journalism think tank, to launch a training academy for 20 media professionals to learn how AI can be used to support their journalism. 

The AI Academy for Small Newsrooms is a six-week long, free online program taught by industry-leading journalists and researchers who work at the intersection of journalism and AI. It will start in September 2021 and will welcome journalists and developers from small news organizations in the Europe, Middle East, and Africa (EMEA) region.

By the end of the course, participants will have a practical understanding of the challenges and opportunities of AI technologies. They will learn examples of how to use AI to automate repetitive tasks, such as interview transcription or image search, as well as how to optimize newsroom processes by getting insights on what content is most engaging.

For example, other newsrooms using AI technology in the region include Schibsted, a Nordic news outlet that developed an innovative model to reduce gender bias in news coverage, while in Spain, El Pais uses an AI-based tool to moderate toxic comments.

Most importantly, participants will create action plans to guide the development of AI projects within their news organizations. JournalismAI will share these plans openly to help other publishers around the world.

This pilot program — which we plan to launch in other regions in 2022 — is part of a broader training effort over the last three years by JournalismAI, a partnership between the GNI and Polis to forester AI literacy in newsrooms globally. More than 110,000 participants have already taken the online training modules available on the Google News Initiative Training Center.

This year, JournalismAI will also create an AI Journalism Starter Pack to make the information about AI in journalism more accessible to small and local publishers. It will include examples of AI tools that can solve small and local publishers’ basic needs such as tagging or transcribing.

Find more detailed information on the AI Academy for Small Newsrooms and how to apply on the JournalismAI website. The deadline for applications is 11:59 PM GMT on August 1, 2021.

Read More