Multi-Task Robotic Reinforcement Learning at Scale

Posted by Karol Hausman, Senior Research Scientist and Yevgen Chebotar, Research Scientist, Robotics at Google

For general-purpose robots to be most useful, they would need to be able to perform a range of tasks, such as cleaning, maintenance and delivery. But training even a single task (e.g., grasping) using offline reinforcement learning (RL), a trial and error learning method where the agent uses training previously collected data, can take thousands of robot-hours, in addition to the significant engineering needed to enable autonomous operation of a large-scale robotic system. Thus, the computational costs of building general-purpose everyday robots using current robot learning methods becomes prohibitive as the number of tasks grows.

Multi-task data collection across multiple robots where different robots collect data for different tasks.

In other large-scale machine learning domains, such as natural language processing and computer vision, a number of strategies have been applied to amortize the effort of learning over multiple skills. For example, pre-training on large natural language datasets can enable few- or zero-shot learning of multiple tasks, such as question answering and sentiment analysis. However, because robots collect their own data, robotic skill learning presents a unique set of opportunities and challenges. Automating this process is a large engineering endeavour, and effectively reusing past robotic data collected by different robots remains an open problem.

Today we present two new advances for robotic RL at scale, MT-Opt, a new multi-task RL system for automated data collection and multi-task RL training, and Actionable Models, which leverages the acquired data for goal-conditioned RL. MT-Opt introduces a scalable data-collection mechanism that is used to collect over 800,000 episodes of various tasks on real robots and demonstrates a successful application of multi-task RL that yields ~3x average improvement over baseline. Additionally, it enables robots to master new tasks quickly through use of its extensive multi-task dataset (new task fine-tuning in <1 day of data collection). Actionable Models enables learning in the absence of specific tasks and rewards by training an implicit model of the world that is also an actionable robotic policy. This drastically increases the number of tasks the robot can perform (via visual goal specification) and enables more efficient learning of downstream tasks.

Large-Scale Multi-Task Data Collection System
The cornerstone for both MT-Opt and Actionable Models is the volume and quality of training data. To collect diverse, multi-task data at scale, users need a way to specify tasks, decide for which tasks to collect the data, and finally, manage and balance the resulting dataset. To that end, we create a scalable and intuitive multi-task success detector using data from all of the chosen tasks. The multi-task success is trained using supervised learning to detect the outcome of a given task and it allows users to quickly define new tasks and their rewards. When this success detector is being applied to collect data, it is periodically updated to accommodate distribution shifts caused by various real-world factors, such as varying lighting conditions, changing background surroundings, and novel states that the robots discover.

Second, we simultaneously collect data for multiple distinct tasks across multiple robots by using solutions to easier tasks to effectively bootstrap learning of more complex tasks. This allows training of a policy for the harder tasks and improves the data collected for them. As such, the amount of per-task data and the number of successful episodes for each task grows over time. To further improve the performance, we focus data collection on underperforming tasks, rather than collecting data uniformly across tasks.

This system collected 9600 robot hours of data (from 57 continuous data collection days on seven robots). However, while this data collection strategy was effective at collecting data for a large number of tasks, the success rate and data volume was imbalanced between tasks.

Learning with MT-Opt
We address the data collection imbalance by transferring data across tasks and re-balancing the per-task data. The robots generate episodes that are labelled as success or failure for each task and are then copied and shared across other tasks. The balanced batch of episodes is then sent to our multi-task RL training pipeline to train the MT-Opt policy.

Data sharing and task re-balancing strategy used by MT-Opt. The robots generate episodes which then get labelled as success or failure for the current task and are then shared across other tasks.

MT-Opt uses Q-learning, a popular RL method that learns a function that estimates the future sum of rewards, called the Q-function. The learned policy then picks the action that maximizes this learned Q-function. For multi-task policy training, we specify the task as an extra input to a large Q-learning network (inspired by our previous work on large-scale single-task learning with QT-Opt) and then train all of the tasks simultaneously with offline RL using the entire multi-task dataset. In this way, MT-Opt is able to train on a wide variety of skills that include picking specific objects, placing them into various fixtures, aligning items on a rack, rearranging and covering objects with towels, etc.

Compared to single-task baselines, MT-Opt performs similarly on the tasks that have the most data and significantly improves performance on underrepresented tasks. So, for a generic lifting task, which has the most supporting data, MT-Opt achieved an 89% success rate (compared to 88% for QT-Opt) and achieved a 50% average success rate across rare tasks, compared to 1% with a single-task QT-Opt baseline and 18% using a naïve, multi-task QT-Opt baseline. Using MT-Opt not only enables zero-shot generalization to new but similar tasks, but also can quickly (in about 1 day of data collection on seven robots) be fine-tuned to new, previously unseen tasks. For example, when applied to an unseen towel-covering task, the system achieved a zero-shot success rate of 92% for towel-picking and 79% for object-covering, which wasn’t present in the original dataset.

Example tasks that MT-Opt is able to learn, such as instance and indiscriminate grasping, chasing, placing, aligning and rearranging.


Example tasks that MT-Opt is able to learn, such as instance and indiscriminate grasping, chasing, placing, aligning and rearranging.


Towel-covering task that was not present in the original dataset. We fine-tune MT-Opt on this novel task in 1 day to achieve a high (>90%) success rate.

Learning with Actionable Models
While supplying a rigid definition of tasks facilitates autonomous data collection for MT-Opt, it limits the number of learnable behaviors to a fixed set. To enable learning a wider range of tasks from the same data, we use goal-conditioned learning, i.e., learning to reach given goal configurations of a scene in front of the robot, which we specify with goal images. In contrast to explicit model-based methods that learn predictive models of future world observations, or approaches that employ online data collection, this approach learns goal-conditioned policies via offline model-free RL.

To learn to reach any goal state, we perform hindsight relabeling of all trajectories and sub-sequences in our collected dataset and train a goal-conditioned Q-function in a fully offline manner (in contrast to learning online using a fixed set of success examples as in recursive classification). One challenge in this setting is the distributional shift caused by learning only from “positive” hindsight relabeled examples. This we address by employing a conservative strategy to minimize Q-values of unseen actions using artificial negative actions. Furthermore, to enable reaching temporary-extended goals, we introduce a technique for chaining goals across multiple episodes.

Actionable Models relabel sub-sequences with all intermediate goals and regularize Q-values with artificial negative actions.

Training with Actionable Models allows the system to learn a large repertoire of visually indicated skills, such as object grasping, container placing and object rearrangement. The model is also able to generalize to novel objects and visual objectives not seen in the training data, which demonstrates its ability to learn general functional knowledge about the world. We also show that downstream reinforcement learning tasks can be learned more efficiently by either fine-tuning a pre-trained goal-conditioned model or through a goal-reaching auxiliary objective during training.

Example tasks (specified by goal-images) that our Actionable Model is able to learn.

The results of both MT-Opt and Actionable Models indicate that it is possible to collect and then learn many distinct tasks from large diverse real-robot datasets within a single model, effectively amortizing the cost of learning across many skills. We see this an important step towards general robot learning systems that can be further scaled up to perform many useful services and serve as a starting point for learning downstream tasks.

This post is based on two papers, “MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale” and “Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills,” with additional information and videos on the project websites for MT-Opt and Actionable Models.

This research was conducted by Dmitry Kalashnikov, Jake Varley, Yevgen Chebotar, Ben Swanson, Rico Jonschkowski, Chelsea Finn, Sergey Levine, Yao Lu, Alex Irpan, Ben Eysenbach, Ryan Julian and Ted Xiao. We’d like to give special thanks to Josh Weaver, Noah Brown, Khem Holden, Linda Luu and Brandon Kinman for their robot operation support; Anthony Brohan for help with distributed learning and testing infrastructure; Tom Small for help with videos and project media; Julian Ibarz, Kanishka Rao, Vikas Sindhwani and Vincent Vanhoucke for their support; Tuna Toksoz and Garrett Peake for improving the bin reset mechanisms; Satoshi Kataoka, Michael Ahn, and Ken Oslund for help with the underlying control stack, and the rest of the Robotics at Google team for their overall support and encouragement. All the above contributions were incredibly enabling for this research.

Read More

NVIDIA Unveils 50+ New, Updated AI Tools and Trainings for Developers

To help developers hone their craft, NVIDIA this week introduced more than 50 new and updated tools and training materials for data scientists, researchers, students and developers of all kinds.

The offerings range from software development kits for conversational AI and ray tracing, to hands-on courses from the NVIDIA Deep Learning Institute.

They’re available to all members of the NVIDIA Developer Program, a free-to-join global community of over 2.5 million technology innovators who are revolutionizing industries through accelerated computing.

Training for Success

Learning new and advanced software development skills is vital to staying ahead in a competitive job market. DLI offers a comprehensive learning experience on a wide range of important topics in AI, data science and accelerated computing. Courses include hands-on exercises and are available in both self-paced and instructor-led formats.

The five courses cover topics such as deep learning, data science, autonomous driving and conversational AI. All include hands-on exercises that accelerate learning and mastery of the material. DLI workshops are led by NVIDIA-certified instructors and include access to fully configured GPU-accelerated servers in the cloud for each participant.

New self-paced courses, which are available now:

New full-day, instructor-led workshops for live virtual classroom delivery (coming soon):

These instructor-led workshops will be available to enterprise customers and the general public. DLI recently launched public workshops for its popular instructor-led courses, increasing accessibility to individual developers, data scientists, researchers and students.

To extend training further, DLI is releasing a new book, “Learning Deep Learning,” that provides a complete guide to deep learning theory and practical applications. Authored by NVIDIA Engineer Magnus Ekman, it explores how deep neural networks are applied to solve complex and challenging problems. Pre-orders are available now through Amazon.

New and Accelerated SDKs, Plus Updated Technical Tools

SDKs are a key component that can make or break an application’s performance. Dozens of new and updated kits for high performance computing, computer vision, data science, conversational AI, recommender systems and real-time graphics are available so developers can meet virtually any challenge. Updated tools are also in place to help developers accelerate application development.

Updated tools available now:

  • NGC is a GPU-optimized hub for AI and HPC software with a catalog of hundreds of SDKs, AI, ML and HPC containers, pre-trained models and Helm charts that simplify and accelerate workflows from end to end. Pre-trained models help developers jump-start their AI projects for a variety of use cases, including computer vision and speech.

New SDK (coming soon):

  • TAO (Train, Adapt, Optimize) is a GUI-based, workflow-driven framework that simplifies and accelerates the creation of enterprise AI applications and services. Enterprises can fine-tune pre-trained models using transfer learning or federated learning to produce domain specific models in hours rather than months, eliminating the need for large training runs and deep AI expertise. Learn more about TAO.

New and updated SDKs and frameworks available now:

  • Jarvis, a fully accelerated application framework for building multimodal conversational AI services. It includes state-of-the-art models pre-trained for thousands of hours on NVIDIA DGX systems, the Transfer Learning Toolkit for adapting those models to domains with zero coding, and optimized end-to-end speech, vision and language pipelines that run in real time. Learn more.
  • Maxine, a GPU-accelerated SDK with state-of-the-art AI features for developers to build virtual collaboration and content creation applications such as video conferencing and live streaming. Maxine’s AI SDKs — video effects, audio effects and augmented reality — are highly optimized and include modular features that can be chained into end-to-end pipelines to deliver the highest performance possible on GPUs, both on PCs and in data centers. Learn more.
  • Merlin, an application framework, currently in open beta, enables the development of deep learning recommender systems — from data preprocessing to model training and inference — all accelerated on NVIDIA GPUs. Read more about Merlin.
  • DeepStream, an AI streaming analytics toolkit for building high-performance, low-latency, complex video analytics apps and services.
  • Triton Inference Server, which lets teams deploy trained AI models from any framework, from local storage or cloud platform on any GPU- or CPU-based infrastructure.
  • TensorRT, for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT 8 is 2x faster for transformer-based models and new techniques to achieve accuracy similar to FP32 while using high-performance INT8 precision.
  • RTX technology, which helps developers harness and bring realism to their games:
    • DLSS is a deep learning neural network that helps graphics developers boost frame rates and generates beautiful, sharp images for their projects. It includes performance headroom to maximize ray-tracing settings and increase output resolution. Unity has announced that DLSS will be natively supported in Unity Engine 2021.2.
    • RTX Direct Illumination (RTXDI) makes it possible to render, in real time, scenes with millions of dynamic lights without worrying about performance or resource constraints.
    • RTX Global Illumination (RTXGI) leverages the power of ray tracing to scalably compute multi-bounce indirect lighting without bake times, light leaks or expensive per-frame costs.
    • Real-Time Denoisers (NRD) is a spatio-temporal API-agnostic denoising library that’s designed to work with low ray-per-pixel signals.

Joining the NVIDIA Developer Program is easy, check it out today.

The post NVIDIA Unveils 50+ New, Updated AI Tools and Trainings for Developers appeared first on The Official NVIDIA Blog.

Read More

The Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

Model-based reinforcement learning (MBRL) is a variant of the iterative
learning framework, reinforcement learning, that includes a structured
component of the system that is solely optimized to model the environment
dynamics. Learning a model is broadly motivated from biology, optimal control,
and more – it is grounded in natural human intuition of planning before acting. This intuitive
grounding, however, results in a more complicated learning process. In this
post, we discuss how model-based reinforcement learning is more susceptible to
parameter tuning and how AutoML can help in finding very well performing
parameter settings and schedules. Below, left is the expected behavior of an
agent maximizing velocity on a “Half Cheetah” robotic task, and to the right is
what our paper with hyperparameter tuning finds.

Cost-sensitive exploration in multi-armed bandits: Application to SMS routing

What the research is:

Many businesses, including Facebook, send SMS messages to their users for phone number verification, two-factor authentication, and notifications. In order to deliver these SMS messages to their users, companies generally leverage aggregators (e.g., Twilio) that have deals with operators throughout the world. These aggregators are responsible for delivering the SMS message to users and offer different quality and cost attributes. Quality, in this context, is the likelihood that a message will be successfully delivered to a user. A key decision faced by the business is identifying the best aggregator to route these messages. However, a significant challenge here is the nonstationarity in aggregators’ quality, which varies substantially over time. This necessitates a balanced exploration-exploitation approach, where we need to learn the best aggregator at any given time and maximize the number of messages we route through them. Multi-armed bandits (MAB) are a natural framework to formulate this problem. However, existing multi-armed bandit literature largely focuses on optimizing a single objective function and cannot be easily generalizable to the setting where we have multiple metrics, like quality and cost.

Motivated by the above problem, in this paper, we propose a novel variant of the MAB problem, which factors costs associated with playing an arm and introduces new metrics that uniquely capture the features of multiple real-world applications. We argue about the hardness of this problem by establishing fundamental limits on the performance of any online algorithm. We further show that naive generalization of existing algorithms performs poorly from a theoretical and empirical standpoint. Lastly, we propose a simple algorithm that balances two asymmetric objectives and achieves near-optimal performance.

How it works:

In traditional (stochastic) multi-armed bandit problem, the learning agent has access to a set of K actions (arms) with unknown but fixed reward distributions and has to repeatedly select an arm to maximize the cumulative reward. Here, the challenge is developing a policy that balances the tension between acquiring information about actions with little historical observations and exploiting the most rewarding arm based on existing information. Regret metric is typically used to measure the effectiveness of any such adaptive policy. Briefly, regret measures the cumulative difference between expected reward from the best action, had we known the true reward distributions with the expected reward from the action preferred by the policy. Existing literature has extensively studied this setting, leading to simple but extremely effective algorithms like Upper Confidence Bound (UCB) and Thompson Sampling (TS), which have been further generalized and applied to a wide range of application domains. The central focus of these algorithms is incentivizing sufficient exploration of actions that appear promising. In particular, these approaches ensure that the best action always has a chance to get out of situations where the expected reward function is underestimated due to unfavorable randomness. However, there is also a fundamental limitation on how well an online learning algorithm can perform in general settings. In situations where the number of decision epochs is small and/or there are many actions with reward distributions similar to the optimal action, it becomes hard to effectively learn the optimal action. Mathematically speaking, in the traditional problem, it has been established that any online learning algorithm must incur a regret of Ω(KT) where K is the number of arms and T is the number of decision epochs.

In this work, we generalize the MAB framework to the multiobjective setting. In particular, we consider the problem setting where the learning agent has balanced the traditional exploration-exploitation trade-offs and trade-offs associated with multiple objectives, for example reward and cost. Keeping the SMS application in mind, to manage costs, we allow the agent to be agnostic between actions, whose expected reward (quality) is greater than 1−α fraction of the highest expected reward (quality). We refer to α as the subsidy factor and assume it is a known parameter specified by the problem domain. The agent’s objective is to explore various actions and identify the cheapest arm among these high-quality arms as frequently as possible. To measure the performance of any policy, we define two notions of regret, quality regret and cost regret. Quality regret is defined as the cumulative difference between α-adjusted expected reward from the highest quality action and the expected reward from the action preferred by the policy. Similarly, cost regret is defined as the cumulative difference between the cost of the action preferred by our policy and the cost of the cheapest feasible arms, had the qualities and costs been known beforehand.

For this problem, we show that naively extending existing algorithms, like TS, will perform poorly on the cost regret. In particular, we consider the variant of TS, where we form the feasible set based on the corresponding estimates on quality and pick the cheapest feasible action. For this variant, we can show that TS performs arbitrarily worse (i.e., incurs linear regret). This is primarily because existing algorithms are designed to incentivize exploration of promising actions and can end up incurring large cost regret in settings where there are two actions with similar rewards but vastly different costs. We then establish a fundamental lower bound of Ω(K1/3T2/3) on the performance of any online learning algorithm for this problem, highlighting the hardness of our problem in comparison to the classical MAB problem. Building on these insights, we develop a simple algorithm based on the explore-then-commit idea, which balances the tension between two asymmetric objectives and achieves near-optimal performance up to logarithmic factors. We also demonstrate the superior performance of our algorithm through numerical simulations:

Why it matters:

Multi-armed Bandits (MAB) is the most popular paradigm to balance the exploration-exploitation trade-off that is intrinsic to online decision making under uncertainty. They are applied to a wide range of application domains, including drug trials and online experiments, to ensure that maximal number samples are offered to the most promising candidate. Similar trade-offs are typical in recommendation systems, where the possible options to recommend and user preferences are constantly evolving. Facebook also has leveraged the MAB framework to improve various products including identifying the ideal video bandwidth allocation for users and best aggregator for sending authentication messages. Though one can model this in the traditional MAB framework by considering cost subtracted from the reward as the modified objective, such a modification is not always meaningful, particularly in settings where the reward and cost associated with an action represent different quantities (for example, quality and cost of an aggregator). In such problems, it is natural for the learning agent to optimize for both the metrics, typically avoiding incurring exorbitant costs for a marginal increase in cumulative reward. To the best of our knowledge, this paper takes a first step in generalizing the multi-armed bandit framework to problems with two metrics and presents both fundamental theoretical performance limits as well as easy-to-implement algorithms to balance the multiobjective trade-offs.

Lastly, we perform extensive simulations to understand various regimes of the problem parameters and compare different algorithms. More specifically, we consider scenarios where naive generalizations of UCB and TS, which have been adapted in real-life implementations, perform well and settings where they perform poorly, which is of interest to practitioners.

Read the full paper:

Multi-armed bandits with cost subsidy

The post Cost-sensitive exploration in multi-armed bandits: Application to SMS routing appeared first on Facebook Research.

Read More

AI and 5G to Fuel Next Wave of IoT Services, Says GTC Panel of Telecom Experts

The rollout of 5G for edge AI services promises to fuel a magic carpet ride into the future for everything from autonomous vehicles, to supply chains and education.

That was a key takeaway from a panel of five 5G experts speaking at NVIDIA’s GPU Technology Conference this week.

With speed boosts up to 10x that of 4G, 5G will offer game-changing features to cellular networks, such as low latency, improved reliability and built-in security. It will also radically improve AI services, such as online gaming, those provided by AVs, and robots used for logistics. In addition, AI on 5G could help deliver services like online learning and micro banking to remote regions of underdeveloped parts of the world today.

Executives from Verizon, Wind River, Mavenir, Google and NVIDIA shared their views on the wide-ranging impact 5G will have on edge AI services. And if just half of their predictions appear within the next decade, the future promises exciting times.

Enhance Human Experience

The next generation of applications is going to enhance the human experience and create new opportunities, said Ganesh Harinath, VP and CTO of 5G MEC and AI platforms at Verizon. But he said the networking requirements for the future call for edge computing.

“The inferencing aspect of machine learning has to be moved closer and closer to where the signals are generated,” said Harinath.

Propel Digital World

Nermin Mohamed, head of telco solutions at embedded systems software provider Wind River, said that 5G, AI and edge computing are “the three magic words that will propel the digital connected world.”

She said that companies are looking at 5G as an accelerator for their revenue and that the rollout of 5G grew four times faster than 4G over the past 18 months.

Bridge Digital Divide

The availability of 5G will usher in digital services to remote places, bridging the digital divide, said Pardeep Kohli, president and CEO of telecom software company Mavenir.

With 5G “you can have low latency and a good experience where this type of connectivity can be used for having an education” where it might otherwise not be available, said Kohli.

Reshape Telecom, Edge

Open ecosystems are key to encouraging developers to build applications, said Shailesh Shukla, vice president and general manager for Networking and Telecom at Google Cloud

“With the advent of 5G and AI, there is an opportunity now to reshape the broader telecom infrastructure and the edge industry by doing something very similar to what was done with Google and Android,” Shukla said.

‘Heady Mix Ahead’

A lot of the applications — autonomous vehicles, augmented and virtual reality — have been restrained by network limitations, said Ronnie Vasishta, NVIDIA senior vice president for Telecoms. NVIDIA has been investing in GPU and DPU platforms for accelerated compute to support the ecosystem of edge AI applications and telecom partners, he said.

“Sometimes we underestimate the impact that 5G will have on our lives,” he said. “We’re really in for a heady mix ahead of us with the combination of AI and 5G.”

The panel discussion, “Is AI at the Edge the Killer App for 5G?,” is available for replay. 

The post AI and 5G to Fuel Next Wave of IoT Services, Says GTC Panel of Telecom Experts appeared first on The Official NVIDIA Blog.

Read More

EV Technology Goes into Hyperdrive with Mercedes-Benz EQS

Mercedes-Benz is calling on its long heritage of luxury to accelerate electric vehicle technology with the new EQS sedan.

The premium automaker lifted the wraps off the long-awaited flagship EV during a digital event today. The focal point of the revolutionary vehicle is the MBUX Hyperscreen, a truly intuitive and personalized AI cockpit, powered by NVIDIA.

The EQS is the first Mercedes-Benz to feature the “one bow” design, resembling a high-speed bullet train to increase efficiency as well as provide a quiet, comfortable interior experience.

The cabin is further transformed by the MBUX Hyperscreen — a single, 55-inch surface extending from the cockpit to the passenger seat. It delivers both safety and convenience by displaying all necessary functions at once.

Like the MBUX system recently unveiled with the new Mercedes-Benz S-Class, this extended-screen system runs on the high-performance, energy-efficient NVIDIA DRIVE platform for instantaneous AI processing and sharp graphics.

“The EQS is high tech in a true luxury shell,” said Ola Källenius, chairman of the Mercedes-Benz Board of Management.

With NVIDIA’s high-performance, energy-efficient compute, Mercedes-Benz was able to consolidate the varied and distributed cockpit components into one AI platform — with three separate screens under one glass surface — to simplify the architecture while creating more space to add new features.

Intelligence in Many Flavors

The MBUX Hyperscreen makes it easy to focus on the road ahead, yet delivers beautiful graphics for when attention to driving isn’t necessary.

Leveraging a “zero layer” design concept, the display features 90 percent of functions drivers and passengers need right on the surface, reducing the driver’s reliance on buttons or voice commands. An augmented reality heads-up display provides clear, 3D, turn-by-turn navigation, keeping drivers focused.

The deep neural networks powering the system process datasets such as vehicle position, cabin temperature and time of day to prioritize certain features — like entertainment or points of interest recommendations — while always keeping navigation at the center of the display.

The EQS will be capable of level 3 automated driving with Mercedes-Benz DRIVE PILOT. For times when the driver’s attention doesn’t need to be on the road, the MBUX Hyperscreen provides crystal-clear graphics as well as an intelligent voice assistant for the utmost convenience.

The map feature allows drivers to view their route in 3D, down to the tiniest detail. It can also factor battery capacity, weather conditions and topography into route planning, suggesting charging points along the way if needed. Front-seat passengers also get a dedicated screen for entertainment and ride information that doesn’t interfere with the driver’s display. It also enables the front seat passenger to share content with others in the car.

“The MBUX Hyperscreen surprises with intelligence in many flavors,” said Sajjad Khan, executive vice president at Mercedes-Benz.

And with high-performance NVIDIA compute at MBUX Hyperscreen’s core, users can seamlessly experience these flavors, toggling between features without experiencing any lag or delay.

Ahead of the Curve

Equipped with the most powerful battery in the industry, providing an estimated 478 miles of range and 516 horsepower, the EQS was designed to lead its class in every metric.

The sedan’s sleek design optimizes aerodynamics for lightning-quick acceleration — it can bolt from 0 to 60 mph in 4 seconds — while maintaining battery efficiency and reducing cabin noise.

Taking cues from its internal combustion engine sibling, the Mercedes-Benz S-Class, the EQS boasts the largest interior of any electric sedan on the market. The vehicle can recognize the driver either using facial recognition or fingerprint scanner, and adjust seating and climate settings to personal preferences. It also features customizable ambient lighting for whatever the mood.

The EQS is slated to arrive at U.S. dealerships this summer, ushering in a new generation of intelligent, electric luxury vehicles.

The post EV Technology Goes into Hyperdrive with Mercedes-Benz EQS appeared first on The Official NVIDIA Blog.

Read More

Counterfactual predictions under runtime confounding

Figure 1. Due to feasibility or ethical requirements, a prediction model may only access a subset of the confounding factors that affect both the decision and outcome. We propose a procedure for learning valid counterfactual predictions in this setting.

In machine learning, we often want to predict the likelihood of an outcome if we take a proposed decision or action. A healthcare setting, for instance, may require predicting whether a patient will be re-admitted to the hospital if the patient receives a particular treatment. In the child welfare setting, a social worker needs to assess the likelihood of adverse outcomes if the agency offers family services. In such settings, algorithmic predictions can be used to help decision-makers. Since the prediction target depends on a particular decision (e.g., the particular medical treatment, or offering family services), we refer to these predictions as counterfactual.

In general, for valid counterfactual inference, we need to measure all factors that affect both the decision and the outcome of interest. However, we may not want to use all such factors in our prediction model. Some factors such as race or gender may be too sensitive to use for prediction. Some factors may be too complex to use when model interpretability is desired, or some factors may be difficult to measure at prediction time.

Child welfare example: The child welfare screening task requires a social worker to decide which calls to the child welfare hotline should be investigated. In jurisdictions such as Allegheny County, the social worker makes their decision based on allegations in the call and historical information about individuals associated with the call, such as their prior child welfare interaction and criminal justice history. Both the call allegations and historical information may contain factors that affect both the decision and future child outcomes, but the child welfare agency may be unable to parse and preprocess call information in real-time for use in a prediction system. The social worker would still benefit from a prediction that summarizes the risk based on historical information. Therefore, the goal is a prediction based on a subset of the confounding factors.

Figure 2. Algorithmic predictions can help child welfare hotline screeners decide which cases to investigate. However, these predictions cannot access allegations in the call because of limitations in real-time processing.

Healthcare example: Healthcare providers may make decisions based on the patient’s history as well as lab results and diagnostic tests, but the patient’s health record may not be in a form that can be easily input to a prediction algorithm.

Figure 3. Predictions used to inform medical treatment decisions may not have access to all confounding factors.

How can we make counterfactual predictions using only a subset of confounding factors?

We propose a method for using offline data to build a prediction model that only requires access to the available subset of confounders at prediction time. Offline data is an important part of the solution because if we know nothing about the unmeasured confounders, then in general we cannot make progress. Fortunately, in our settings of interest, it is often possible to obtain an offline dataset that contains measurements of the full set of confounders as well as the outcome of interest and historical decision.

What is “runtime confounding?”

Runtime confounding occurs when all confounding factors are recorded in the training data, but the prediction model cannot use all confounding factors as features due to sensitivity, interpretability, or feasibility requirements. As examples,

  • It may not be possible to measure factors efficiently enough for use in the prediction model but it is possible to measure factors offline with sufficient processing time. Child welfare agencies typically do record call allegations for offline processing.
  • It may be undesirable to use some factors that are too sensitive or too complex for use in a prediction model.

Formally, let (V in mathbb{R}^{d_v}) denote the vector of factors available for prediction and (Z in mathbb{R}^{d_z}) denote the vector of confounding factors unavailable for prediction (but available in the training data). Given (V), our goal is to predict an outcome under a proposed decision; we wish to predict the potential outcome (Y^{A=a}) that we would observe under decision (a).

Prediction target: $$nu(v) := mathbb{E}[Y^{A=a} mid V = v] .$$ In order to estimate this hypothetical counterfactual quantity, we need assumptions that enable us to identify this quantity with observable data. We require three assumptions that are standard in causal inference:

Assumption 1: The decision assigned to one unit does not affect the potential outcomes of another unit.
Assumption 2: All units have some non-zero probability of receiving decision (a) (the decision of interest for prediction).
Assumption 3: (V,Z) describe all factors that jointly affect the decision and outcome.

These assumptions enable us to identify our target estimand as $$nu(v) = mathbb{E}[ mathbb{E}[Y mid A = a, V = v, Z =z] mid V =v].$$

This suggests that we can estimate an outcome model (mu(v,z) := mathbb{E}[Y mid A = a, V = v, Z =z]) and then regress the outcome model estimates on (V).

The simple plug-in (PL) approach:

  1. Estimate the outcome model (mu(v,z)) by regressing (Y sim V, Zmid A = a). Use this model to construct pseudo-outcomes (hat{mu}(V,Z)) for each case in our training data.
  2. Regress (hat{mu}(V,Z) sim V) to yield a prediction model that only requires knowledge of (V).
Figure 4. The Plug-in (PL) learning procedure. The full set of confounders ((V, Z)) is used to build an outcome model. The output of the outcome model and the available predictors (V) are used to build a prediction model.

How does the PL approach perform?

  • Yields valid counterfactual predictions under our three causal assumptions.
  • Not optimal: Consider the setting in which (d_z >> d_v), for instance, in the child welfare setting where (Z) corresponds to the natural language in the hotline call. The PL approach requires us to efficiently estimate a more challenging high-dimensional target (mathbb{E}[Y mid A = a, V = v, Z =z]) when our target is a lower-dimensional quantity (nu(V)).

We can better take advantage of the lower-dimensional structure of our target estimand using doubly-robust techniques, which are popular in causal inference because they give us two chances to get our estimation right.

Our proposed doubly-robust (DR) approach

In addition to estimating the outcome model like the PL approach, a doubly-robust approach also estimates a decision model (pi(v,z) := mathbb{E}[mathbb{I}{A=a} mid V = v, Z =z]), which is known as the propensity model in causal inference. This is particularly helpful in settings where it is easier to estimate the decision model than the outcome model.

We propose a doubly-robust (DR) approach that also involves two stages:

  1. Regress (Y sim V, Zmid A = a) to yield outcome model (hat{mu}(v,z)). Regress (mathbb{I}{A=a} sim V, Z) to yield decision model (hat{pi}(v,z)).
  2. Regress $$frac{mathbb{I}{A=a}}{hat{pi}(V,Z)}(Y – hat{mu}(V,Z)) + hat{mu}(V,Z) sim V.$$
Figure 5. Our proposed doubly-robust (DR) learning procedure. The full set of confounders ((V, Z)) is used to build an outcome model and a decision model. The output of the outcome and decision models and the available predictors (V) are used to build a prediction model.

When does the DR approach perform well?

  • When we can build either a very good outcome model or a very good decision model
  • If both the decision model and outcome model are somewhat good

The DR approach can achieve oracle optimality–that is, it achieves the same regression error (up to constants) as an oracle with access to the true potential outcomes (Y^a).

We can see this by bounding the error of our method (hat{nu}) with the sum of the oracle error and a product of error terms on the outcome and decision models:

mathbb{E}[(hat{nu}(v) – nu(v))^2] ≲
& mathbb{E}[(tilde{nu}(v) – nu(v))^2] + \
& mathbb{E}[(hat{pi}(V,Z) -pi(V,Z))^2 mid V = v]mathbb{E}[(hat{mu}(V,Z) -mu(V,Z))^2 mid V = v].

where (tilde{nu}(v)) denotes the function we would get in our second-stage estimation if we had oracle access to (Y^a).

So as long as we can estimate the outcome and decision models such that their product of errors is smaller than the oracle error, then the DR approach is oracle-efficient. This result holds for any regression method, assuming that we have used sample-splitting to learn (hat{nu}), (hat{mu}), and (hat{pi}).

While the DR approach has this desirable theoretical guarantee, in practice is it possible that the PL approach may perform better depending on the dimensionality of the problem.

How do I know which method I should use?

To determine which method will work best in a given setting, we provide an evaluation procedure that can be applied to any prediction method to estimate its mean-squared error. Under our three causal assumptions, the prediction error of a model (hat{nu}) is identified as

$$mathbb{E}[(Y^a – hat{nu}(V))^2] = mathbb{E}[mathbb{E}[(Y-hat{nu}(V)^2 mid V, Z, A = a]].$$

Defining the error regression (eta(v,z) = mathbb{E}[(Y-hat{nu}(V))^2 mid V = v, Z =a, A = a] ), we propose the following doubly-robust estimator for the MSE on a validation sample of (n) cases:

$$frac{1}{n} sum_{i=1}^n left[ frac{mathbb{I}{A_i = a }}{hat{pi}(V_i, Z_i)} left( (Y_i -hat{nu}(V_i))^2 – hat{eta}(V_i, Z_i) right) + hat{eta}(V_i, Z_i) right] .$$

Under mild assumptions, this estimator is (sqrt{n}) consistent, enabling us to get error estimates with confidence intervals.

DR achieves lowest MSE in synthetic experiments

We perform simulations on synthetic data to show how the level of confounding and dimensionalities of (V) and (Z) determine which method performs best. Synthetic experiments enable us to evaluate the methods on the ground-truth counterfactual outcomes. We compare the PL and DR approaches to a biased single-stage approach that estimates (mathbb{E}[Y mid V, A =a]), which we refer to as the treatment-conditional regression (TCR) approach.

MSE of the plug-in (PL), doubly-robust (DR), and treatment conditional regression (TCR) approaches to counterfactual prediction under runtime confounding as we vary the level of confounding ((k_z)) in the left-hand panel and as we vary (d_v), the dimensionality of our predictors (V), in the right-hand panel.

In the left-hand panel above, we compare the method as we vary the amount of confounding. When there is no confounding ((k_z = 0)), the TCR approach performs best as expected. Under no confounding, the TCR approach is no longer biased and efficiently estimates the target of interest in one stage. However, as we increase the level of confounding, the TCR performance degrades faster than the PL and DR methods. The DR method performs best under any non-zero level of confounding.

The right-hand panel compares the methods as we vary the dimensionality of our predictors. We hold the total dimensionality of ((V, Z)) fixed at (500) (so (d_z = 500 – d_v)). The DR approach performs best across the board, and the TCR approach performs well when the dimensionality is low because TCR avoids the high-dimensional second stage regression. However, this advantage disappears as (d_v) increases. The gap between the PL and DR methods is largest for low (d_v) because the DR method is able to take advantage of the lower dimensional target. At high (d_v) the PL error approaches the DR error.

DR is comparable to PL in a real-world task

We compare the methods on a real-world child welfare screening task where the goal is to predict the likelihood that a case will require services under the decision “screened in for investigation” using historical information as predictors and controlling for confounders that are sensitive (race) and hard to process (the allegations in the call). Our dataset consists of over 30,000 calls to the child welfare hotline in Allegheny County, PA. We evaluate the methods using our proposed real-world evaluation procedure since we do not have access to the ground-truth outcomes for cases that were not screened in for investigation.

Child welfare screening task: estimated MSE. The PL and DR methods achieve lower MSE than the TCR approach. Parentheses denote 95% confidence intervals.

We find that the DR and PL approach perform comparably on this task, both outperforming the TCR method.


  • Runtime confounding arises when it is undesirable or impermissible to use some confounding factors in the prediction model.
  • We propose a generic procedure to build counterfactual predictions when the factors are available in offline training data.
  • In theory, our approach is provably efficient in the oracle sense
  • In practice, we recommend building the DR, PL, and TCR approaches and using our proposed evaluation scheme to choose the best performing model.
  • Our full paper is available in the Proceedings of NeurIPS 2020.

Read More

Knight Rider Rides a GAN: Bringing KITT to Life with AI, NVIDIA Omniverse

Fasten your seatbelts. NVIDIA Research is revving up a new deep learning engine that creates 3D object models from standard 2D images — and can bring iconic cars like the Knight Rider’s AI-powered KITT to life — in NVIDIA Omniverse.

Developed by the NVIDIA AI Research Lab in Toronto, the GANverse3D application inflates flat images into realistic 3D models that can be visualized and controlled in virtual environments. This capability could help architects, creators, game developers and designers easily add new objects to their mockups without needing expertise in 3D modeling, or a large budget to spend on renderings.

A single photo of a car, for example, could be turned into a 3D model that can drive around a virtual scene, complete with realistic headlights, tail lights and blinkers.

To generate a dataset for training, the researchers harnessed a generative adversarial network, or GAN, to synthesize images depicting the same object from multiple viewpoints — like a photographer who walks around a parked vehicle, taking shots from different angles. These multi-view images were plugged into a rendering framework for inverse graphics, the process of inferring 3D mesh models from 2D images.

Once trained on multi-view images, GANverse3D needs only a single 2D image to predict a 3D mesh model. This model can be used with a 3D neural renderer that gives developers control to customize objects and swap out backgrounds.

When imported as an extension in the NVIDIA Omniverse platform and run on NVIDIA RTX GPUs, GANverse3D can be used to recreate any 2D image into 3D — like the beloved crime-fighting car KITT, from the popular 1980s Knight Rider TV show.

Previous models for inverse graphics have relied on 3D shapes as training data.

Instead, with no aid from 3D assets, “We turned a GAN model into a very efficient data generator so we can create 3D objects from any 2D image on the web,” said Wenzheng Chen, research scientist at NVIDIA and lead author on the project.

“Because we trained on real images instead of the typical pipeline, which relies on synthetic data, the AI model generalizes better to real-world applications,” said NVIDIA researcher Jun Gao, an author on the project.

The research behind GANverse3D will be presented at two upcoming conferences: the International Conference on Learning Representations in May, and the Conference on Computer Vision and Pattern Recognition, in June.

From Flat Tire to Racing KITT 

Creators in gaming, architecture and design rely on virtual environments like the NVIDIA Omniverse simulation and collaboration platform to test out new ideas and visualize prototypes before creating their final products. With Omniverse Connectors, developers can use their preferred 3D applications in Omniverse to simulate complex virtual worlds with real-time ray tracing.

But not every creator has the time and resources to create 3D models of every object they sketch. The cost of capturing the number of multi-view images necessary to render a showroom’s worth of cars, or a street’s worth of buildings, can be prohibitive.

That’s where a trained GANverse3D application can be used to convert standard images of a car, a building or even a horse into a 3D figure that can be customized and animated in Omniverse.

To recreate KITT, the researchers simply fed the trained model an image of the car, letting GANverse3D predict a corresponding 3D textured mesh, as well as different parts of the vehicle such as wheels and headlights. They then used NVIDIA Omniverse Kit and NVIDIA PhysX tools to convert the predicted texture into high-quality materials that give KITT a more realistic look and feel, and placed it in a dynamic driving sequence.

“Omniverse allows researchers to bring exciting, cutting-edge research directly to creators and end users,” said Jean-Francois Lafleche, deep learning engineer at NVIDIA. “Offering GANverse3D as an extension in Omniverse will help artists create richer virtual worlds for game development, city planning or even training new machine learning models.”

GANs Power a Dimensional Shift

Because real-world datasets that capture the same object from different angles are rare, most AI tools that convert images from 2D to 3D are trained using synthetic 3D datasets like ShapeNet.

To obtain multi-view images from real-world data — like images of cars available publicly on the web — the NVIDIA researchers instead turned to a GAN model, manipulating its neural network layers to turn it into a data generator.

The team found that opening the first four layers of the neural network and freezing the remaining 12 caused the GAN to render images of the same object from different viewpoints.

Keeping the first four layers frozen and the other 12 layers variable caused the neural network to generate different images from the same viewpoint. By manually assigning standard viewpoints, with vehicles pictured at a specific elevation and camera distance, the researchers could rapidly generate a multi-view dataset from individual 2D images.

The final model, trained on 55,000 car images generated by the GAN, outperformed an inverse graphics network trained on the popular Pascal3D dataset.

Read the full ICLR paper, authored by Wenzheng Chen, fellow NVIDIA researchers Jun Gao and Huan Ling, Sanja Fidler, director of NVIDIA’s Toronto research lab, University of Waterloo student Yuxuan Zhang, Stanford student Yinan Zhang and MIT professor Antonio Torralba. Additional collaborators on the CVPR paper include Jean-Francois Lafleche, NVIDIA researcher Kangxue Yin and Adela Barriuso.

The NVIDIA Research team consists of more than 200 scientists around the globe, focusing on areas such as AI, computer vision, self-driving cars, robotics and graphics. Learn more about the company’s latest research and industry breakthroughs in NVIDIA CEO Jensen Huang’s keynote address at this week’s GPU Technology Conference.

GTC registration is free, and open through April 23. Attendees will have access to on-demand content through May 11.

Knight Rider content courtesy of Universal Studios Licensing LLC. 

The post Knight Rider Rides a GAN: Bringing KITT to Life with AI, NVIDIA Omniverse appeared first on The Official NVIDIA Blog.

Read More

An overview of the ML models introduced in TorchVision v0.9

TorchVision v0.9 has been released and it is packed with numerous new Machine Learning models and features, speed improvements and bug fixes. In this blog post, we provide a quick overview of the newly introduced ML models and discuss their key features and characteristics.


  • MobileNetV3 Large & Small: These two classification models are optimized for Mobile use-cases and are used as backbones on other Computer Vision tasks. The implementation of the new MobileNetV3 architecture supports the Large & Small variants and the depth multiplier parameter as described in the original paper. We offer pre-trained weights on ImageNet for both Large and Small networks with depth multiplier 1.0 and resolution 224×224. Our previous training recipes have been updated and can be used to easily train the models from scratch (shoutout to Ross Wightman for inspiring some of our training configuration). The Large variant offers a competitive accuracy comparing to ResNet50 while being over 6x faster on CPU, meaning that it is a good candidate for applications where speed is important. For applications where speed is critical, one can sacrifice further accuracy for speed and use the Small variant which is 15x faster than ResNet50.

  • Quantized MobileNetV3 Large: The quantized version of MobilNetV3 Large reduces the number of parameters by 45% and it is roughly 2.5x faster than the non-quantized version while remaining competitive in terms of accuracy. It was fitted on ImageNet using Quantization Aware Training by iterating on the non-quantized version and it can be trained from scratch using the existing reference scripts.


model = torchvision.models.mobilenet_v3_large(pretrained=True)
# model = torchvision.models.mobilenet_v3_small(pretrained=True)
# model = torchvision.models.quantization.mobilenet_v3_large(pretrained=True)
predictions = model(img)

Object Detection

  • Faster R-CNN MobileNetV3-Large FPN: Combining the MobileNetV3 Large backbone with a Faster R-CNN detector and a Feature Pyramid Network leads to a highly accurate and fast object detector. The pre-trained weights are fitted on COCO 2017 using the provided reference scripts and the model is 5x faster on CPU than the equivalent ResNet50 detector while remaining competitive in terms of accuracy.
  • Faster R-CNN MobileNetV3-Large 320 FPN: This is an iteration of the previous model that uses reduced resolution (min_size=320 pixel) and sacrifices accuracy for speed. It is 25x faster on CPU than the equivalent ResNet50 detector and thus it is good for real mobile use-cases.


model = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)
# model = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(pretrained=True)
predictions = model(img)

Semantic Segmentation

  • DeepLabV3 with Dilated MobileNetV3 Large Backbone: A dilated version of the MobileNetV3 Large backbone combined with DeepLabV3 helps us build a highly accurate and fast semantic segmentation model. The pre-trained weights are fitted on COCO 2017 using our standard training recipes. The final model has the same accuracy as the FCN ResNet50 but it is 8.5x faster on CPU and thus making it an excellent replacement for the majority of applications.
  • Lite R-ASPP with Dilated MobileNetV3 Large Backbone: We introduce the implementation of a new segmentation head called Lite R-ASPP and combine it with the dilated MobileNetV3 Large backbone to build a very fast segmentation model. The new model sacrifices some accuracy to achieve a 15x speed improvement comparing to the previously most lightweight segmentation model which was the FCN ResNet50.


model = torchvision.models.segmentation.deeplabv3_mobilenet_v3_large(pretrained=True)
# model = torchvision.models.segmentation.lraspp_mobilenet_v3_large(pretrained=True)
predictions = model(img)

In the near future we plan to publish an article that covers the details of how the above models were trained and discuss their tradeoffs and design choices. Until then we encourage you to try out the new models and provide your feedback.

Read More