Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning
Self-Improving Robots:
End-to-End Autonomous Visuomotor Reinforcement Learning

Archit Sharma^§, Ahmed M. Ahmed^§, Rehaan Ahmad, Chelsea Finn

Stanford University

Paper Code

^§ Authors with substantial contributions to real-robot experimentation

Overview

Summary. In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on. An aspirational goal is to construct self-improving robots: robots that can learn and improve on their own, from autonomous interaction with minimal human supervision or oversight. Such robots could collect and train on much larger datasets, and thus learn more robust and performant policies. MEDAL++ is an autonomous reinforcement learning algorithm that trains a forward policy to do the task, and a backward policy to undo the task towards states visited by an expert. Starting with a small set of demonstrations collected by an expert, the forward and backward policy interact with the environment in a cyclic fashion, switching control after a fixed number of steps. Chaining the forward and backward policies allows the robot to self-improve, minimizing the need for humans to reset the environment after every episode. Importantly, MEDAL++ learns end-to-end from high-dimensional visual inputs and learns the reward function from the expert demonstrations, bypassing the need for reward engineering. In contrast to prior work, this allows MEDAL++ to be applied in the real world, improving the success rate by 30-70% over behavior cloning policies in practice. Overall, MEDAL++ takes a step towards simple and general self-improving robotic systems.

This website features autonomous training and evaluation videos of MEDAL++ on three manipulation tasks using the Franka Panda arm: cloth hanging, peg insertion and bowl covering.