reproducibilityindex.ai

Autonomous Reinforcement Learning: Formalism and Benchmarking

Authors: Archit Sharma, Kelvin Xu, Nikhil Sardana, Abhishek Gupta, Karol Hausman, Sergey Levine, Chelsea Finn

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 BENCHMARKING AND ANALYSIS The aim of this section is to understand the challenges in autonomous reinforcement learning and to evaluate the performance and shortcomings of current autonomous RL algorithms. In Section 6.1, we ﬁrst evaluate standard episodic RL algorithms in ARL settings as they are required to operate with increased autonomy, underscoring the need for a greater focus on autonomy in RL algorithms. We then evaluate prior autonomous learning algorithms on EARL in Section 6.2. While these algorithms do improve upon episodic RL methods, they fail to make progress on more challenging tasks compared to methods provided with oracle resets leaving a large gap for improvement. Lastly, in Section 6.3, we investigate the learning of existing algorithms, providing a hypothesis for their inadequate performance.
Researcher Affiliation	Collaboration	1Stanford University 2University of California, Berkeley 3MIT 4Google Brain
Pseudocode	No	The paper describes methods and experiments but does not include any explicit pseudocode blocks or sections labeled 'Algorithm'.
Open Source Code	Yes	Code and related information for EARL can be found at architsharma97.github.io/earl benchmark/
Open Datasets	Yes	Our benchmarks combine components from previously proposed environments (Coumans & Bai, 2016; Gupta et al., 2019; Yu et al., 2020; Gupta et al., 2021; Sharma et al., 2021), but reformulate the learning tasks to reﬂect ARL constraints, such as the absence of explicitly available resets.
Dataset Splits	No	The paper describes training and testing protocols but does not specify explicit validation dataset splits (e.g., percentages or sample counts for a validation set).
Hardware Specification	No	The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory) used to conduct the experiments.
Software Dependencies	No	The paper mentions 'soft actor-critic (SAC)' as the base algorithm and 'Pybullet' for one environment, but does not specify version numbers for any software, libraries, or dependencies used in the experiments.
Experiment Setup	Yes	Table 3: Shared algorithm parameters. Table 4: Environment speciﬁc parameters, including the training horizon (i.e. how frequently an intervention is provided), the evaluation horizon, and the replay buffer capacity.