Autonomous Reinforcement Learning: Formalism and Benchmarking
Authors: Archit Sharma, Kelvin Xu, Nikhil Sardana, Abhishek Gupta, Karol Hausman, Sergey Levine, Chelsea Finn
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 BENCHMARKING AND ANALYSIS The aim of this section is to understand the challenges in autonomous reinforcement learning and to evaluate the performance and shortcomings of current autonomous RL algorithms. In Section 6.1, we first evaluate standard episodic RL algorithms in ARL settings as they are required to operate with increased autonomy, underscoring the need for a greater focus on autonomy in RL algorithms. We then evaluate prior autonomous learning algorithms on EARL in Section 6.2. While these algorithms do improve upon episodic RL methods, they fail to make progress on more challenging tasks compared to methods provided with oracle resets leaving a large gap for improvement. Lastly, in Section 6.3, we investigate the learning of existing algorithms, providing a hypothesis for their inadequate performance. |
| Researcher Affiliation | Collaboration | 1Stanford University 2University of California, Berkeley 3MIT 4Google Brain |
| Pseudocode | No | The paper describes methods and experiments but does not include any explicit pseudocode blocks or sections labeled 'Algorithm'. |
| Open Source Code | Yes | Code and related information for EARL can be found at architsharma97.github.io/earl benchmark/ |
| Open Datasets | Yes | Our benchmarks combine components from previously proposed environments (Coumans & Bai, 2016; Gupta et al., 2019; Yu et al., 2020; Gupta et al., 2021; Sharma et al., 2021), but reformulate the learning tasks to reflect ARL constraints, such as the absence of explicitly available resets. |
| Dataset Splits | No | The paper describes training and testing protocols but does not specify explicit validation dataset splits (e.g., percentages or sample counts for a validation set). |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory) used to conduct the experiments. |
| Software Dependencies | No | The paper mentions 'soft actor-critic (SAC)' as the base algorithm and 'Pybullet' for one environment, but does not specify version numbers for any software, libraries, or dependencies used in the experiments. |
| Experiment Setup | Yes | Table 3: Shared algorithm parameters. Table 4: Environment specific parameters, including the training horizon (i.e. how frequently an intervention is provided), the evaluation horizon, and the replay buffer capacity. |