Optimistic Initialization for Exploration in Continuous Control

Authors: Sam Lobel, Omer Gottesman, Cameron Allen, Akhil Bagaria, George Konidaris7612-7619

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate these approaches on a variety of hard exploration problems in continuous control, where our method outperforms existing exploration techniques. We empirically investigate our method s behavior on a variety of challenging sparse reward continuous control problems, demonstrating state-of-the-art performance on a maze navigation domain and improved sample-efficiency compared with exploratory baselines on sparse-reward tasks in the Deep Mind Control Suite (Tassa et al. 2018).
Researcher Affiliation Academia Brown University samuel lobel@brown.edu, omer gottesman@brown.edu, csal@brown.edu, akhil bagaria@brown.edu, gdk@cs.brown.edu
Pseudocode Yes Algorithm 1 Iterative Covering Set Creation
Open Source Code Yes All code used to generate results is included as supplementary material.
Open Datasets Yes Point Maze (Trott et al. 2019) is a challenging continuous control problem with sparse rewards... We test our method on modified versions of four sparse-reward tasks from the Deep Mind control suite (Tassa et al. 2018): Pendulum, Hopper Stand, Acrobot, and Ball in Cup (Figure 6).
Dataset Splits No The paper discusses training over a certain number of episodes (e.g., 'over 2000 episodes'), but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification No The paper states 'was conducted using computational resources and services at the Center for Computation and Visualization, Brown University.' This provides a general location but lacks specific hardware details such as GPU/CPU models, memory, or cloud instance types.
Software Dependencies No The paper mentions integrating into an 'RBFDQN (Asadi et al. 2021) base agent' but does not provide specific version numbers for any software components (e.g., Python, PyTorch, TensorFlow, or other libraries/solvers).
Experiment Setup No The paper states 'Details on architectures, training procedures, resource usage and shaping functions are included in Appendix D.' While such details may exist in the appendix, the main text itself does not contain specific experimental setup details, such as concrete hyperparameter values or training configurations, as required by the question.