Optimistic Initialization for Exploration in Continuous Control
Authors: Sam Lobel, Omer Gottesman, Cameron Allen, Akhil Bagaria, George Konidaris7612-7619
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate these approaches on a variety of hard exploration problems in continuous control, where our method outperforms existing exploration techniques. We empirically investigate our method s behavior on a variety of challenging sparse reward continuous control problems, demonstrating state-of-the-art performance on a maze navigation domain and improved sample-efficiency compared with exploratory baselines on sparse-reward tasks in the Deep Mind Control Suite (Tassa et al. 2018). |
| Researcher Affiliation | Academia | Brown University samuel lobel@brown.edu, omer gottesman@brown.edu, csal@brown.edu, akhil bagaria@brown.edu, gdk@cs.brown.edu |
| Pseudocode | Yes | Algorithm 1 Iterative Covering Set Creation |
| Open Source Code | Yes | All code used to generate results is included as supplementary material. |
| Open Datasets | Yes | Point Maze (Trott et al. 2019) is a challenging continuous control problem with sparse rewards... We test our method on modified versions of four sparse-reward tasks from the Deep Mind control suite (Tassa et al. 2018): Pendulum, Hopper Stand, Acrobot, and Ball in Cup (Figure 6). |
| Dataset Splits | No | The paper discusses training over a certain number of episodes (e.g., 'over 2000 episodes'), but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts for each split). |
| Hardware Specification | No | The paper states 'was conducted using computational resources and services at the Center for Computation and Visualization, Brown University.' This provides a general location but lacks specific hardware details such as GPU/CPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions integrating into an 'RBFDQN (Asadi et al. 2021) base agent' but does not provide specific version numbers for any software components (e.g., Python, PyTorch, TensorFlow, or other libraries/solvers). |
| Experiment Setup | No | The paper states 'Details on architectures, training procedures, resource usage and shaping functions are included in Appendix D.' While such details may exist in the appendix, the main text itself does not contain specific experimental setup details, such as concrete hyperparameter values or training configurations, as required by the question. |