Benchmarking Deep Reinforcement Learning for Continuous Control
Authors: Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. |
| Researcher Affiliation | Collaboration | University of California, Berkeley, Department of Electrical Engineering and Computer Sciences Ghent University i Minds, Department of Information Technology Open AI |
| Pseudocode | No | The paper describes algorithms but does not provide pseudocode or algorithm blocks. |
| Open Source Code | Yes | Both the benchmark and reference implementations are released at https://github.com/ rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers. |
| Open Datasets | Yes | The benchmark and reference implementations are available at https: //github.com/rllab/rllab, allowing for the development, implementation, and evaluation of new algorithms and tasks. |
| Dataset Splits | No | The paper mentions 'Hyperparameter Tuning' and selecting tasks for grid search but does not specify dataset splits (e.g., train/validation/test splits) for reproduction. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments. |
| Software Dependencies | No | The paper mentions using Box2D (Catto, 2011) and Mu Jo Co (Todorov et al., 2012) physics simulators but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | In this section, we elaborate on the experimental setup used to generate the results. For the DDPG algorithm, we used the hyperparametes reported in Lillicrap et al. (2015). For the other algorithms, we follow the approach in (Mnih et al., 2015), and we select two tasks in each category, on which a grid search of hyperparameters is performed. Each choice of hyperparameters is executed under five random seeds. |