Inside Uber ATG’s Data Mining Operation: Identifying Real Road Scenarios at Scale for Machine Learning

How did the pedestrian cross the road?

Contrary to popular belief, sometimes the answer isn’t as simple as “to get to the other side.” To bring safe, reliable self-driving vehicles (SDVs) to the streets at Uber Advanced Technologies Group (ATG)

The post Inside Uber ATG’s Data Mining Operation: Identifying Real Road Scenarios at Scale for Machine Learning appeared first on Uber Engineering Blog.

Read More

Giving soft robots feeling

One of the hottest topics in robotics is the field of soft robots, which utilizes squishy and flexible materials rather than traditional rigid materials. But soft robots have been limited due to their lack of good sensing. A good robotic gripper needs to feel what it is touching (tactile sensing), and it needs to sense the positions of its fingers (proprioception). Such sensing has been missing from most soft robots.

In a new pair of papers, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) came up with new tools to let robots better perceive what they’re interacting with: the ability to see and classify items, and a softer, delicate touch. 

“We wish to enable seeing the world by feeling the world. Soft robot hands have sensorized skins that allow them to pick up a range of objects, from delicate, such as potato chips, to heavy, such as milk bottles,” says CSAIL Director Daniela Rus, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science and the deputy dean of research for the MIT Stephen A. Schwarzman College of Computing. 

One paper builds off last year’s research from MIT and Harvard University, where a team developed a soft and strong robotic gripper in the form of a cone-shaped origami structure. It collapses in on objects much like a Venus’ flytrap, to pick up items that are as much as 100 times its weight. 

To get that newfound versatility and adaptability even closer to that of a human hand, a new team came up with a sensible addition: tactile sensors, made from latex “bladders” (balloons) connected to pressure transducers. The new sensors let the gripper not only pick up objects as delicate as potato chips, but it also classifies them — letting the robot better understand what it’s picking up, while also exhibiting that light touch. 

When classifying objects, the sensors correctly identified 10 objects with over 90 percent accuracy, even when an object slipped out of grip.

“Unlike many other soft tactile sensors, ours can be rapidly fabricated, retrofitted into grippers, and show sensitivity and reliability,” says MIT postdoc Josie Hughes, the lead author on a new paper about the sensors. “We hope they provide a new method of soft sensing that can be applied to a wide range of different applications in manufacturing settings, like packing and lifting.” 

In a second paper, a group of researchers created a soft robotic finger called “GelFlex” that uses embedded cameras and deep learning to enable high-resolution tactile sensing and “proprioception” (awareness of positions and movements of the body). 

The gripper, which looks much like a two-finger cup gripper you might see at a soda station, uses a tendon-driven mechanism to actuate the fingers. When tested on metal objects of various shapes, the system had over 96 percent recognition accuracy. 

“Our soft finger can provide high accuracy on proprioception and accurately predict grasped objects, and also withstand considerable impact without harming the interacted environment and itself,” says Yu She, lead author on a new paper on GelFlex. “By constraining soft fingers with a flexible exoskeleton, and performing high-resolution sensing with embedded cameras, we open up a large range of capabilities for soft manipulators.” 

Magic ball senses 

The magic ball gripper is made from a soft origami structure, encased by a soft balloon. When a vacuum is applied to the balloon, the origami structure closes around the object, and the gripper deforms to its structure. 

While this motion lets the gripper grasp a much wider range of objects than ever before, such as soup cans, hammers, wine glasses, drones, and even a single broccoli floret, the greater intricacies of delicacy and understanding were still out of reach — until they added the sensors.  

When the sensors experience force or strain, the internal pressure changes, and the team can measure this change in pressure to identify when it will feel that again. 

In addition to the latex sensor, the team also developed an algorithm which uses feedback to let the gripper possess a human-like duality of being both strong and precise — and 80 percent of the tested objects were successfully grasped without damage. 

The team tested the gripper-sensors on a variety of household items, ranging from heavy bottles to small, delicate objects, including cans, apples, a toothbrush, a water bottle, and a bag of cookies. 

Going forward, the team hopes to make the methodology scalable, using computational design and reconstruction methods to improve the resolution and coverage using this new sensor technology. Eventually, they imagine using the new sensors to create a fluidic sensing skin that shows scalability and sensitivity. 

Hughes co-wrote the new paper with Rus, which they will present virtually at the 2020 International Conference on Robotics and Automation. 

GelFlex

In the second paper, a CSAIL team looked at giving a soft robotic gripper more nuanced, human-like senses. Soft fingers allow a wide range of deformations, but to be used in a controlled way there must be rich tactile and proprioceptive sensing. The team used embedded cameras with wide-angle “fisheye” lenses that capture the finger’s deformations in great detail.

To create GelFlex, the team used silicone material to fabricate the soft and transparent finger, and put one camera near the fingertip and the other in the middle of the finger. Then, they painted reflective ink on the front and side surface of the finger, and added LED lights on the back. This allows the internal fish-eye camera to observe the status of the front and side surface of the finger. 

The team trained neural networks to extract key information from the internal cameras for feedback. One neural net was trained to predict the bending angle of GelFlex, and the other was trained to estimate the shape and size of the objects being grabbed. The gripper could then pick up a variety of items such as a Rubik’s cube, a DVD case, or a block of aluminum. 

During testing, the average positional error while gripping was less than 0.77 millimeter, which is better than that of a human finger. In a second set of tests, the gripper was challenged with grasping and recognizing cylinders and boxes of various sizes. Out of 80 trials, only three were classified incorrectly. 

In the future, the team hopes to improve the proprioception and tactile sensing algorithms, and utilize vision-based sensors to estimate more complex finger configurations, such as twisting or lateral bending, which are challenging for common sensors, but should be attainable with embedded cameras.

Yu She co-wrote the GelFlex paper with MIT graduate student Sandra Q. Liu, Peiyu Yu of Tsinghua University, and MIT Professor Edward Adelson. They will present the paper virtually at the 2020 International Conference on Robotics and Automation.

Read More

SAIL and Stanford Robotics at ICRA 2020

SAIL and Stanford Robotics at ICRA 2020

The International Conference on Robotics and Automation (ICRA) 2020 is being hosted virtually from May 31 – Jun 4.
We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to papers, videos and blogs below. Feel free to reach out to the contact authors directly to learn more about the work that’s happening at Stanford!

List of Accepted Papers

Design of a Roller-Based Dexterous Hand for Object Grasping and Within-Hand Manipulation


Authors: Shenli Yuan, Austin D. Epps, Jerome B. Nowak, J. Kenneth Salisbury

Contact: shenliy@stanford.edu

Award nominations: Best Paper, Best Student Paper, Best Paper Award in Robot Manipulation, Best Paper in Mechanisms and Design

Links: Paper | Video

Keywords: dexterous manipulation, grasping, grippers and other end-effectors


Distributed Multi-Target Tracking for Autonomous Vehicle Fleets


Authors: Ola Shorinwa, Javier Yu, Trevor Halsted, Alex Koufos, and Mac Schwager

Contact: shorinwa@stanford.edu

Award nominations: Best Paper

Links: Paper | Video

Keywords: mulit-target tracking, distributed estimation, multi-robot systems


Efficient Large-Scale Multi-Drone Delivery Using Transit Networks


Authors: Shushman Choudhury, Kiril Solovey, Mykel J. Kochenderfer, Marco Pavone

Contact: shushman@stanford.edu

Award nominations: Best Multi-Robot Systems Paper

Links: Paper | Video

Keywords: multi-robot, optimization, task allocation, route planning


Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly


Authors: Kevin Zakka, Andy Zeng, Johnny Lee, Shuran Song

Contact: zakka@cs.stanford.edu

Award nominations: Best Automation Paper

Links: Paper | Blog Post | Video

Keywords: perception for grasping, assembly, robotics


Human Interface for Teleoperated Object Manipulation with a Soft Growing Robot


Authors: Fabio Stroppa, Ming Luo, Kyle Yoshida, Margaret M. Coad, Laura H. Blumenschein, and Allison M. Okamura

Contact: fstroppa@stanford.edu

Award nominations: Best Human-Robot Interaction Paper

Links: Paper | Video

Keywords: soft robot, growing robot, manipulation, interface, teleoperation


6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints

Authors: Chen Wang, Roberto Martín-Martín, Danfei Xu, Jun Lv, Cewu Lu, Li Fei-Fei, Silvio Savarese, Yuke Zhu

Contact: chenwj@stanford.edu

Links: Paper | Blog Post | Video

Keywords: category-level 6d object pose tracking, unsupervised 3d keypoints


A Stretchable Capacitive Sensory Skin for Exploring Cluttered Environments

Authors: Alexander Gruebele, Jean-Philippe Roberge, Andrew Zerbe, Wilson Ruotolo, Tae Myung Huh, Mark R. Cutkosky

Contact: agruebe2@stanford.edu

Links: Paper

Keywords: robot sensing systems , skin , wires , capacitance , grasping


Accurate Vision-based Manipulation through Contact Reasoning

Authors: Alina Kloss, Maria Bauza, Jiajun Wu, Joshua B. Tenenbaum, Alberto Rodriguez, Jeannette Bohg

Contact: alina.kloss@tue.mpg.de

Links: Paper | Video

Keywords: manipulation planning, contact modeling, perception for grasping and manipulation


Assistive Gym: A Physics Simulation Framework for Assistive Robotics

Authors: Zackory Erickson, Vamsee Gangaram, Ariel Kapusta, C. Karen Liu, and Charles C. Kemp

Contact: karenliu@cs.stanford.edu

Links: Paper

Keywords: assistive robotics; physics simulation; reinforcement learning; physical human robot interaction


Controlling Assistive Robots with Learned Latent Actions

Authors: Dylan P. Losey, Krishnan Srinivasan, Ajay Mandlekar, Animesh Garg, Dorsa Sadigh

Contact: dlosey@stanford.edu

Links: Paper | Blog Post | Video

Keywords: human-robot interaction, assistive control


Distal Hyperextension is Handy: High Range of Motion in Cluttered Environments

Authors: Wilson Ruotolo, Rachel Thomasson, Joel Herrera, Alex Gruebele, Mark R. Cutkosky

Contact: wruotolo@stanford.edu

Links: Paper | Video

Keywords: dexterous manipulation, grippers and other end-effectors, multifingered hands


Dynamically Reconfigurable Discrete Distributed Stiffness for Inflated Beam Robots

Authors: Brian H. Do, Valory Banashek, Allison M. Okamura

Contact: brianhdo@stanford.edu

Links: Paper | Video

Keywords: soft robot materials and design; mechanism design; compliant joint/mechanism


Dynamically Reconfigurable Tactile Sensor for Robotic Manipulation

Authors: Tae Myung Huh, Hojung Choi, Simone Willcox, Stephanie Moon, Mark R. Cutkosky

Contact: taemyung@stanford.edu

Links: Paper | Video

Keywords: robot sensing systems , electrodes , force , force measurement , capacitance


Enhancing Game-Theoretic Autonomous Car Racing Using Control Barrier Functions

Authors: Gennaro Notomista, Mingyu Wang, Mac Schwager, Magnus Egerstedt

Contact: mingyuw@stanford.edu

Links: Paper

Keywords: autonomous driving


Evaluation of Non-Collocated Force Feedback Driven by Signal-Independent Noise

Authors: Zonghe Chua, Allison Okamura, Darrel Deo

Contact: chuazh@stanford.edu

Links: Paper | Video

Keywords: haptics and haptic interfaces; prosthetics and exoskeletons; brain-machine interface


From Planes to Corners: Multi-Purpose Primitive Detection in Unorganized 3D Point Clouds

Authors: Christiane Sommer, Yumin Sun, Leonidas Guibas, Daniel Cremers, Tolga Birdal

Contact: tbirdal@stanford.edu

Links: Paper | Code | Video

Keywords: plane detection, corner detection, orthogonal, 3d geometry, computer vision, point pair, slam


Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

Authors: Michelle A. Lee, Carlos Florensa, Jonathan Tremblay, Nathan Ratliff, Animesh Garg, Fabio Ramos, Dieter Fox

Contact: mishlee@stanford.edu

Links: Paper | Video

Keywords: deep learning in robotics and automation, perception for grasping and manipulation, learning and adaptive systems


IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data

Authors: Ajay Mandlekar, Fabio Ramos, Byron Boots, Silvio Savarese, Li Fei-Fei, Animesh Garg, Dieter Fox

Contact: amandlek@stanford.edu

Links: Paper | Video

Keywords: imitation learning, reinforcement learning, robotics


Interactive Gibson Benchmark: A Benchmark for Interactive Navigation in Cluttered Environments

Authors: Fei Xia, William B. Shen, Chengshu Li, Priya Kasimbeg, Micael Tchapmi, Alexander Toshev, Roberto Martín-Martín, Silvio Savarese

Contact: feixia@stanford.edu

Links: Paper | Video

Keywords: visual navigation, deep learning in robotics, mobile manipulation


KETO: Learning Keypoint Representations for Tool Manipulation

Authors: Zengyi Qin, Kuan Fang, Yuke Zhu, Li Fei-Fei, Silvio Savarese

Contact: qinzy@cs.stanford.edu

Links: Paper | Blog Post | Video

Keywords: manipulation, representation, keypoint, interaction, self-supervised learning


Learning Hierarchical Control for Robust In-Hand Manipulation

Authors: Tingguang Li, Krishnan Srinivasan, Max Qing-Hu Meng, Wenzhen Yuan, Jeannette Bohg

Contact: tgli@link.cuhk.edu.hk

Links: Paper | Blog Post | Video

Keywords: in-hand manipulation, robotics, reinforcement learning, hierarchical


Learning Task-Oriented Grasping from Human Activity Datasets

Authors: Mia Kokic, Danica Kragic, Jeannette Bohg

Contact: mkokic@kth.se

Links: Paper

Keywords: perception, grasping


Learning a Control Policy for Fall Prevention on an Assistive Walking Device

Authors: Visak CV Kumar, Sehoon Ha, Gregory Sawicki, C. Karen Liu

Contact: karenliu@cs.stanford.edu

Links: Paper

Keywords: assistive robotics; human motion modeling; physical human robot interaction; reinforcement learning


Learning an Action-Conditional Model for Haptic Texture Generation

Authors: Negin Heravi, Wenzhen Yuan, Allison M. Okamura, Jeannette Bohg

Contact: nheravi@stanford.edu

Links: Paper | Blog Post | Video

Keywords: haptics and haptic interfaces


Learning to Collaborate from Simulation for Robot-Assisted Dressing

Authors: Alexander Clegg, Zackory Erickson, Patrick Grady, Greg Turk, Charles C. Kemp, C. Karen Liu

Contact: karenliu@cs.stanford.edu

Links: Paper

Keywords: assistive robotics; physical human robot interaction; reinforcement learning; physics simulation; cloth manipulation


Learning to Scaffold the Development of Robotic Manipulation Skills

Authors: Lin Shao, Toki Migimatsu, Jeannette Bohg

Contact: lins2@stanford.edu

Links: Paper | Video

Keywords: learning and adaptive systems, deep learning in robotics and automation, intelligent and flexible manufacturing


Map-Predictive Motion Planning in Unknown Environments

Authors: Amine Elhafsi, Boris Ivanovic, Lucas Janson, Marco Pavone

Contact: amine@stanford.edu

Links: Paper

Keywords: motion planning deep learning robotics


Motion Reasoning for Goal-Based Imitation Learning

Authors: De-An Huang, Yu-Wei Chao, Chris Paxton, Xinke Deng, Li Fei-Fei, Juan Carlos Niebles, Animesh Garg, Dieter Fox

Contact: dahuang@stanford.edu

Links: Paper | Video

Keywords: imitation learning, goal inference


Object-Centric Task and Motion Planning in Dynamic Environments

Authors: Toki Migimatsu, Jeannette Bohg

Contact: takatoki@cs.stanford.edu

Links: Paper | Blog Post | Video

Keywords: control of systems integrating logic, dynamics, and constraints


Optimal Sequential Task Assignment and Path Finding for Multi-Agent Robotic Assembly Planning

Authors: Kyle Brown, Oriana Peltzer, Martin Sehr, Mac Schwager, Mykel Kochenderfer

Contact: kjbrown7@stanford.edu

Links: Paper | Video

Keywords: multi robot systems, multi agent path finding, mixed integer programming, automated manufacturing, sequential task assignment


Refined Analysis of Asymptotically-Optimal Kinodynamic Planning in the State-Cost Space

Authors: Michal Kleinbort, Edgar Granados, Kiril Solovey, Riccardo Bonalli, Kostas E. Bekris, Dan Halperin

Contact: kirilsol@stanford.edu

Links: Paper

Keywords: motion planning, sampling-based planning, rrt, optimality


Retraction of Soft Growing Robots without Buckling

Authors: Margaret M. Coad, Rachel P. Thomasson, Laura H. Blumenschein, Nathan S. Usevitch, Elliot W. Hawkes, and Allison M. Okamura

Contact: mmcoad@stanford.edu

Links: Paper | Video

Keywords: soft robot materials and design; modeling, control, and learning for soft robots


Revisiting the Asymptotic Optimality of RRT*

Authors: Kiril Solovey, Lucas Janson, Edward Schmerling, Emilio Frazzoli, and Marco Pavone

Contact: kirilsol@stanford.edu

Links: Paper | Video

Keywords: motion planning, rapidly-exploring random trees, rrt*, sampling-based planning


Sample Complexity of Probabilistic Roadmaps via Epsilon Nets

Authors: Matthew Tsao, Kiril Solovey, Marco Pavone

Contact: mwtsao@stanford.edu

Links: Paper | Video

Keywords: motion planning, sampling-based planning, probabilistic roadmaps, epsilon nets


Self-Supervised Learning of State Estimation for Manipulating Deformable Linear Objects

Authors: Mengyuan Yan, Yilin Zhu, Ning Jin, Jeannette Bohg

Contact: myyan92@gmail.com, bohg@stanford.edu

Links: Paper | Video

Keywords: self-supervision, deformable objects


Spatial Scheduling of Informative Meetings for Multi-Agent Persistent Coverage

Authors: Ravi Haksar, Sebastian Trimpe, Mac Schwager

Contact: rhaksar@stanford.edu

Links: Paper | Video

Keywords: distributed systems, multi-robot systems, multi-robot path planning


Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction

Authors: Bingbin Liu, Ehsan Adeli, Zhangjie Cao, Kuan-Hui Lee, Abhijeet Shenoi, Adrien Gaidon, Juan Carlos Niebles

Contact: eadeli@stanford.edu

Links: Paper | Video

Keywords: spatiotemporal graphs, forecasting, graph neural networks, autonomous-driving.


TRASS: Time Reversal as Self-Supervision

Authors: Suraj Nair, Mohammad Babaeizadeh, Chelsea Finn, Sergey Levine, Vikash Kumar

Contact: surajn@stanford.edu

Links: Paper | Blog Post | Video

Keywords: visual planning; reinforcement learning; self-supervision


UniGrasp: Learning a Unified Model to Grasp with Multifingered Robotic Hands

Authors: Lin Shao, Fabio Ferreira, Mikael Jorda, Varun Nambiar, Jianlan Luo, Eugen Solowjow, Juan Aparicio Ojea, Oussama Khatib, Jeannette Bohg

Contact: lins2@stanford.edu

Links: Paper | Video

Keywords: deep learning in robotics and automation; grasping; multifingered hands


Vine Robots: Design, Teleoperation, and Deployment for Navigation and Exploration

Authors: Margaret M. Coad, Laura H. Blumenschein, Sadie Cutler, Javier A. Reyna Zepeda, Nicholas D. Naclerio, Haitham El-Hussieny, Usman Mehmood, Jee-Hwan Ryu, Elliot W. Hawkes, and Allison M. Okamura

Contact: mmcoad@stanford.edu

Links: Paper | Video

Keywords: soft robot applications; field robots


We look forward to seeing you at ICRA!

Read More

DADS: Unsupervised Reinforcement Learning for Skill Discovery

DADS: Unsupervised Reinforcement Learning for Skill Discovery

Posted by Archit Sharma, AI Resident, Google Research

Recent research has demonstrated that supervised reinforcement learning (RL) is capable of going beyond simulation scenarios to synthesize complex behaviors in the real world, such as grasping arbitrary objects or learning agile locomotion. However, the limitations of teaching an agent to perform complex behaviors using well-designed task-specific reward functions are also becoming apparent. Designing reward functions can require significant engineering effort, which becomes untenable for a large number of tasks. For many practical scenarios, designing a reward function can be complicated, for example, requiring additional instrumentation for the environment (e.g., sensors to detect the orientation of doors) or manual-labelling of “goal” states. Considering that the ability to generate complex behaviors is limited by this form of reward-engineering, unsupervised learning presents itself as an interesting direction for RL.

In supervised RL, the extrinsic reward function from the environment guides the agent towards the desired behaviors, reinforcing the actions which bring the desired changes in the environment. With unsupervised RL, the agent uses an intrinsic reward function (such as curiosity to try different things in the environment) to generate its own training signals to acquire a broad set of task-agnostic behaviors. The intrinsic reward functions can bypass the problems of the engineering extrinsic reward functions, while being generic and broadly applicable to several agents and problems without any additional design. While much research has recently focused on different approaches to unsupervised reinforcement learning, it is still a severely under-constrained problem — without the guidance of rewards from the environment, it can be hard to learn behaviors which will be useful. Are there meaningful properties of the agent-environment interaction that can help discover better behaviors (“skills”) for the agents?

In this post, we present two recent publications that develop novel unsupervised RL methods for skill discovery. In “Dynamics-Aware Unsupervised Discovery of Skills” (DADS), we introduce the notion of “predictability” to the optimization objective for unsupervised learning. In this work we posit that a fundamental attribute of skills is that they bring about a predictable change in the environment. We capture this idea in our unsupervised skill discovery algorithm, and show applicability in a broad range of simulated robotic setups. In our follow-up work “Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning”, we improve the sample-efficiency of DADS to demonstrate that unsupervised skill discovery is feasible in the real world.

The behavior on the left is random and unpredictable, while the behavior on the right demonstrates systematic motion with predictable changes in the environment. Our goal is to learn potentially useful behaviors such as those on the right, without engineered reward functions.

Overview of DADS
DADS designs an intrinsic reward function that encourages discovery of “predictable” and “diverse” skills. The intrinsic reward function is high if (a) the changes in the environment are different for different skills (encouraging diversity) and (b) changes in the environment for a given skill are predictable (predictability). Since DADS does not obtain any rewards from the environment, optimizing the skills to be diverse enables the agent to capture as many potentially useful behaviors as possible.

In order to determine if a skill is predictable, we train another neural network, called the skill-dynamics network, to predict the changes in the environment state when given the current state and the skill being executed. The better the skill-dynamics network can predict the change of state in the environment, the more “predictable” the skill is. The intrinsic reward defined by DADS can be maximized using any conventional reinforcement learning algorithm.

An overview of DADS.

The algorithm enables several different agents to discover predictable skills purely from reward-free interaction with the environment. DADS, unlike prior work, can scale to high-dimensional continuous control environments such as Humanoid, a simulated bipedal robot. Since DADS is environment agnostic, it can be applied to both locomotion and manipulation oriented environments. We show some of the skills discovered by different continuous control agents.

Ant discovers galloping (top left) and skipping (bottom left), Humanoid discovers different locomotive gaits (middle, sped up 2x), and D’Claw from ROBEL (right) discovers different ways to rotate an object, all using DADS. More sample videos are available here.

Model-Based Control Using Skill-Dynamics
Not only does DADS enable the discovery of predictable and potentially useful skills, it allows for an efficient approach to apply the learned skills to downstream tasks. We can leverage the learned skill-dynamics to predict the state-transitions for each skill. The predicted state-transitions can be chained together to simulate the complete trajectory of states for any learned skill without executing it in the environment. Therefore, we can simulate the trajectory for different skills and choose the skill which gets the highest reward for the given task. The model-based planning approach described here can be very sample-efficient as no additional training is required for the skills. This is a significant step up from the prior approaches, which require additional training on the environment to combine the learned skills.

Using the skills discovered by the agents, we can traverse an arbitrary sequence of checkpoints without any additional training. The plot on the right follows the agent’s traversal from one checkpoint to another.

Real-World Results
The demonstration of unsupervised learning in real-world robotics has been fairly limited, with results being restricted to simulation environments. In “Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning”, we develop a sample-efficient version of our earlier algorithm, called off-DADS, through algorithmic and systematic improvements in an off-policy learning setup. Off-policy learning enables the use of data collected from different policies to improve the current policy. In particular, reusing the previously collected data can dramatically improve the sample-efficiency of reinforcement learning algorithms. Leveraging the improvement from off-policy learning, we train D’Kitty (a quadruped from ROBEL) in the real-world starting from random policy initialization without any rewards from the environment or hand-crafted exploration strategies. We observe the emergence of complex behaviors with diverse gaits and directions by optimizing the intrinsic reward defined by DADS.

Using off-DADS, we train D’Kitty from ROBEL to acquire diverse locomotion behaviors, which can then be used for goal-navigation through model-based control.

Future Work
We have contributed a novel unsupervised skill discovery algorithm with broad applicability that is feasible to be executed in the real-world. This work provides a foundation for future work, where robots can solve a broad range of tasks with minimal human effort. One possibility is to study the relationship between the state-representation and the skills discovered by DADS in order to learn a state-representation that encourages discovery of skills for a known distribution of downstream tasks. Another interesting direction for exploration is provided by the formulation of skill-dynamics that separates high-level planning and low-level control, and study its general applicability to reinforcement learning problems.

Acknowledgements
We would like to thank our coauthors, Michael Ahn, Sergey Levine, Vikash Kumar, Shixiang Gu and Karol Hausman. We would also like to acknowledge the support and feedback provided by various members of the Google Brain team and the Robotics at Google team.

Responding to the European Commission’s AI white paper

In January, our CEO Sundar Pichai visited Brussels to talk about artificial intelligence and how Google could help people and businesses succeed in the digital age through partnership. Much has changed since then due to COVID-19, but one thing hasn’t—our commitment to the potential of partnership with Europe on AI, especially to tackle the pandemic and help people and the economy recover. 

As part of that effort, we earlier today filed our response to the European Commission’s Consultation on Artificial Intelligence, giving our feedback on the Commission’s initial proposal for how to regulate and accelerate the adoption of AI. 

Excellence, skills, trust

Our filing applauds the Commission’s focus on building out the European “ecosystem of excellence.” European universities already boast renowned leaders in dozens of areas of AI research—Google partners with some of them via our machine learning research hubs in Zurich, Amsterdam, Berlin, Paris and London—and many of their students go on to make important contributions to European businesses.  

We support the Commission’s plans to help businesses develop the AI skills they need to thrive in the new digital economy. Next month, we’ll contribute to those efforts by extending our machine learning check-up tool to 11 European countries to help small businesses implement AI and grow their businesses. Google Cloud already works closely with scores of businesses across Europe to help them innovate using AI.  

We also support the Commission’s goal of building a framework for AI innovation that will create trust and guide ethical development and use of this widely applicable technology. We appreciate the Commission’s proportionate, risk-based approach. It’s important that AI applications in sensitive fields—such as medicine or transportation—are held to the appropriate standards. 

Based on our experience working with AI, we also offered a couple of suggestions for making future regulation more effective. We want to be a helpful and engaged partner to policymakers, and we have provided more details in our formal response to the consultation.

Definition of high-risk AI applications

AI has a broad range of current and future applications, including some that involve significant benefits and risks.  We think any future regulation would benefit from a more carefully nuanced definition of “high-risk” applications of AI. We agree that some uses warrant extra scrutiny and safeguards to address genuine and complex challenges around safety, fairness, explainability, accountability, and human interactions. 

Assessment of AI applications

When thinking about how to assess high-risk AI applications, it’s important to strike a balance. While AI won’t always be perfect, it has great potential to help us improve over the performance of existing systems and processes. But the development process for AI must give people confidence that the AI system they’re using is reliable and safe. That’s especially true for applications like new medical diagnostic techniques, which potentially allow skilled medical practitioners to offer more accurate diagnoses, earlier interventions, and better patient outcomes. But the requirements need to be proportionate to the risk, and shouldn’t unduly limit innovation, adoption, and impact. 

This is not an easy needle to thread. The Commission’s proposal suggests “ex ante” assessment of AI applications (i.e., upfront assessment, based on forecasted rather than actual use cases). Our contribution recommends having established due diligence and regulatory review processes expand to include the assessment of AI applications. This would avoid unnecessary duplication of efforts and likely speed up implementation.

For the (probably) rare instances when high-risk applications of AI are not obviously covered by existing regulations, we would encourage clear guidance on the “due diligence” criteria companies should use in their development processes. This would enable robust upfront self-assessment and documentation of any risks and their mitigations, and could also include further scrutiny after launch.

This approach would give European citizens confidence about the trustworthiness of AI applications, while also fostering innovation across the region. And it would encourage companies—especially smaller ones—to launch a range of valuable new services. 

Principles and process

Responsible development of AI presents new challenges and critical questions for all of us. In 2018 we published our own AI Principles to help guide our ethical development and use of AI, and also established internal review processes to help us avoid bias, test rigorously for safety, design with privacy top of mind.  Our principles also specify areas where we will not design or deploy AI, such as to support mass surveillance or violate human rights. Look out for an update on our work around these principles in the coming weeks. 

AI is an important part of Google’s business and our aspirations for the future. We share a common goal with policymakers—a desire to build trust in AI through responsible innovation and thoughtful regulation, so that European citizens can safely enjoy the full social and economic benefits of AI. We hope that our contribution to the consultation is useful, and we look forward to participating in the discussion in coming months.

Read More