Embodied agents such as humans and robots are situated in a continual non-episodic world. Reinforcement learning (RL) promises a framework for enabling artificial agents to learn autonomously with minimal human intervention. However, current episodic RL routine takes the following form:
This algorithmic routine relies on the existence of episodes, an assumption that breaks the autonomy of the learning system and cannot be realized without extrinsic interventions to reset the environment after every interaction. An overview of the algorithmic difference:
Environment for Autonomous RL (EARL): The goal of our proposed framework Autonomous RL (ARL) and the accompanying benchmark EARL is to encourage research that develops algorithms for the continual non-episodic world, moving towards building truly autonomous embodied agents. At a high level, algorithms are evaluated on EARL under the following conditions:
In the EARL environments, ε is very low (10-5 to 10-6), such that the agents operate autonomously for several hundred thousands of steps. EARL provides a diverse set of environments to evaluate autonomous learning algorithms, offering two modes for evaluating the algorithms:
As motivation, consider training a robot to clean a kitchen. We might want to evaluate how well the robot keeps the kitchen clean over it's entire lifetime, as well as how it performs if it is deployed for a specific cleaning task, like turning off the stove. Both objectives are important, but may not always align, so EARL returns both values for all environments.
The EARL benchmark consists of 6 simulated environments, ranging from the aforementioned kitchen environment to dexterous hand manipulation and locomotive robots. Learn more about them on our environments page →
First, clone the repository and place it in in your PYTHONPATH:
git clone https://github.com/architsharma97/earl_benchmark.git
To import the environments, run:
import earl_benchmark
# Options are tabletop_manipulation, sawyer_door, sawyer_peg, kitchen, minitaur
env_loader = earl_benchmark.EARLEnvs('tabletop_manipulation')
train_env, eval_env = env_loader.get_envs()
initial_states = env_loader.get_initial_states()
goal_states = env_loader.get_goal_states()
If you use this benchmark, please cite our paper:
@article{sharma2021Autonomous,
title={Autonomous Reinforcement Learning: Benchmarking and Formalism},
author={Archit Sharma and Kelvin Xu and Nikhil Sardana and Abhishek Gupta and Karol Hausman and Sergey Levine and Chelsea Finn},
journal={arXiv preprint arXiv:2112.09605},
year={2021}
}