Overview

Typical episodic reinforcement learning training.

Autonomous reinforcement learning training.

Embodied agents such as humans and robots are situated in a continual non-episodic world. Reinforcement learning (RL) promises a framework for enabling artificial agents to learn autonomously with minimal human intervention. However, current episodic RL routine takes the following form:

Sample an initial state s.
Let the agent run from s for a short period of time (typically 100 to 1000 steps).
Update the agent (policy, model etc).
Repeat till convergence.

This algorithmic routine relies on the existence of episodes, an assumption that breaks the autonomy of the learning system and cannot be realized without extrinsic interventions to reset the environment after every interaction. An overview of the algorithmic difference:

Environment for Autonomous RL (EARL): The goal of our proposed framework Autonomous RL (ARL) and the accompanying benchmark EARL is to encourage research that develops algorithms for the continual non-episodic world, moving towards building truly autonomous embodied agents. At a high level, algorithms are evaluated on EARL under the following conditions:

Sample an initial state s.
With probability 1-ε, the agent runs from s according to the environment dynamics.
With probability ε, an extrinsic intervention resets the environment to a newly sampled initial state.

In the EARL environments, ε is very low (10^-5 to 10^-6), such that the agents operate autonomously for several hundred thousands of steps. EARL provides a diverse set of environments to evaluate autonomous learning algorithms, offering two modes for evaluating the algorithms:

Deployment Evaluation: The learned policy is intermittently evaluated from the initial state, representing the performance of the policy if it were deployed. Algorithms are evaluated based on how quickly the deployed performance of the policy improves.
Continuing Evaluation: Algorithms are evaluated on the rate at which they accumulate the reward during their lifetimes.

As motivation, consider training a robot to clean a kitchen. We might want to evaluate how well the robot keeps the kitchen clean over it's entire lifetime, as well as how it performs if it is deployed for a specific cleaning task, like turning off the stove. Both objectives are important, but may not always align, so EARL returns both values for all environments.

The EARL benchmark consists of 6 simulated environments, ranging from the aforementioned kitchen environment to dexterous hand manipulation and locomotive robots. Learn more about them on our environments page →

Get Started

First, clone the repository and place it in in your PYTHONPATH:

        
      git clone https://github.com/architsharma97/earl_benchmark.git

To import the environments, run:

        
      import earl_benchmark

      # Options are tabletop_manipulation, sawyer_door, sawyer_peg, kitchen, minitaur
      env_loader = earl_benchmark.EARLEnvs('tabletop_manipulation')
      train_env, eval_env = env_loader.get_envs()
      initial_states = env_loader.get_initial_states()
      goal_states = env_loader.get_goal_states()

Authors

Archit Sharma, Stanford University
Kelvin Xu, UC Berkeley
Nikhil Sardana, Stanford University
Abhishek Gupta, UC Berkeley
Karol Hausman, Google & Stanford University
Sergey Levine, UC Berkeley
Chelsea Finn, Stanford University

If you use this benchmark, please cite our paper:

          
  @article{sharma2021Autonomous,
    title={Autonomous Reinforcement Learning: Benchmarking and Formalism},
    author={Archit Sharma and Kelvin Xu and Nikhil Sardana and Abhishek Gupta and Karol Hausman and Sergey Levine and Chelsea Finn},
    journal={arXiv preprint arXiv:2112.09605},
    year={2021}
  }