Representative Autonomous Settings. We include a broad array of tasks that reflect the types of autonomous learning scenarios agents may encounter in the real world. This includes different problems in manipulation and locomotion, and tasks with multiple object interactions for which it would be challenging to instrument resets. We also ensure that both the continuing and deployment evaluation protocols of ARL are realistic representative evaluations.

Directed Exploration. In the autonomous setting, it may be necessary to practice a task multiple times from different initial states. This gives rise to the need for agents to learn rich reset behaviors. For example, in the instance of a robot learning to interact with multiple objects in a kitchen, the robot must learn to implicitly or explicitly compose different reset behaviors.


The Tabletop-Organization environment consists of a gripper agent, modeled as a pointmass, which can grasp objects that are close to it. The agent's goal is to bring a mug to four different locations designated by a goal coaster. The agent's reward function is a sparse indicator function when the mug is placed at the goal location.


The Sawyer-Door task, from the MetaWorld benchmark, consists of a Sawyer robot arm who's goal is to close the door whenever it is in an open position. The task reward is a sparse indicator function based on the angle of the door. Repeatedly practicing this task implicitly requires the agent to learn to open the door.


The Sawyer-Peg task, also from MetaWorld, consists of a Sawyer robot tasked with inserting a peg into the designated goal location. The task reward is a sparse indicator function for when the peg is inserted into the goal location.


The Franka-Kitchen (Gupta et al., 2019) is a domain where a 9-DoF robot, situated in a kitchen environment, is required to solve tasks consisting of compound object interactions. The environment consists of a microwave, a hinged cabinet, a burner, and a slide cabinet. One example task is to open the microwave, door and burner.

This domain presents a number of distinct challenges for ARL. First, the compound nature of each task results in a challenging long horizon problem, which introduces exploration and credit assignment challenges. Second, while generalization is important in solving the environment, combining reset behaviors are equally important given the compositional nature of the task.


The DHand-Lightbulb environment (Gupta et al., 2021) consists of a 22-DoF 4 fingered hand, mounted on a 6 DoF Sawyer robot. The task in this domain is for the robot to pickup a lightbulb to a specific location. The high-dimensional action space makes the task extremely challenging.


The Minitaur-Pen task (Coumans and Bai, 2016) consists of an 8-DoF Minitaur robot confined to a pen environment. The goal of the agent is to navigate to a set of goal locations in the pen. The task is designed to mimic the setup of leaving a robot to learn to navigate within an enclosed setting in an autonomous fashion.