reproducibilityindex.ai

Optimistic Initialization for Exploration in Continuous Control

Authors: Sam Lobel, Omer Gottesman, Cameron Allen, Akhil Bagaria, George Konidaris7612-7619

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate these approaches on a variety of hard exploration problems in continuous control, where our method outperforms existing exploration techniques. We empirically investigate our method s behavior on a variety of challenging sparse reward continuous control problems, demonstrating state-of-the-art performance on a maze navigation domain and improved sample-efﬁciency compared with exploratory baselines on sparse-reward tasks in the Deep Mind Control Suite (Tassa et al. 2018).
Researcher Affiliation	Academia	Brown University samuel lobel@brown.edu, omer gottesman@brown.edu, csal@brown.edu, akhil bagaria@brown.edu, gdk@cs.brown.edu
Pseudocode	Yes	Algorithm 1 Iterative Covering Set Creation
Open Source Code	Yes	All code used to generate results is included as supplementary material.
Open Datasets	Yes	Point Maze (Trott et al. 2019) is a challenging continuous control problem with sparse rewards... We test our method on modiﬁed versions of four sparse-reward tasks from the Deep Mind control suite (Tassa et al. 2018): Pendulum, Hopper Stand, Acrobot, and Ball in Cup (Figure 6).
Dataset Splits	No	The paper discusses training over a certain number of episodes (e.g., 'over 2000 episodes'), but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification	No	The paper states 'was conducted using computational resources and services at the Center for Computation and Visualization, Brown University.' This provides a general location but lacks specific hardware details such as GPU/CPU models, memory, or cloud instance types.
Software Dependencies	No	The paper mentions integrating into an 'RBFDQN (Asadi et al. 2021) base agent' but does not provide specific version numbers for any software components (e.g., Python, PyTorch, TensorFlow, or other libraries/solvers).
Experiment Setup	No	The paper states 'Details on architectures, training procedures, resource usage and shaping functions are included in Appendix D.' While such details may exist in the appendix, the main text itself does not contain specific experimental setup details, such as concrete hyperparameter values or training configurations, as required by the question.