Structured Control Nets for Deep Reinforcement Learning
Authors: Mario Srouji, Jian Zhang, Ruslan Salakhutdinov
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validated our hypothesis with competitive results on simulations from Open AI Mu Jo Co, Roboschool, Atari, and a custom urban driving environment, with various ablation and generalization tests, trained with multiple black-box and policy gradient training methods. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213. 2Apple Inc., 1 Infinite Loop, Cupertino, CA 95014. |
| Pseudocode | No | The paper describes the architecture and experimental procedures in text but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured, code-like steps. |
| Open Source Code | No | The paper states, 'For PPO and ACKTR, we use the same hyper-parameters and algorithm implementation from Open AI Baselines (Dhariwal et al., 2017).' This refers to a third-party codebase used by the authors, not their own implementation code for the Structured Control Net (SCN). The paper does not provide any link or explicit statement about the availability of the SCN's source code. |
| Open Datasets | Yes | We conduct experiments on several benchmarks, shown in Figure 2, including Open AI Mu Jo Co v1 (Todorov et al., 2012), Open AI Roboschool v1 (Open AI, 2017), and Atari Games (Bellemare et al., 2013). |
| Dataset Splits | No | The paper mentions training networks for '2M timesteps' and 'averaged over 5 training runs with random seeds from 1 to 5', but it does not specify any explicit train/validation/test dataset splits or cross-validation methodology. |
| Hardware Specification | Yes | For our ES implementation, we use an efficient shared-memory implementation on a single machine with 48 cores. |
| Software Dependencies | No | The paper states, 'For PPO and ACKTR, we use the same hyper-parameters and algorithm implementation from Open AI Baselines (Dhariwal et al., 2017).' While it names a software project, it does not provide specific version numbers for Open AI Baselines or any other software components (e.g., Python, TensorFlow, PyTorch libraries) used in the experiments. |
| Experiment Setup | Yes | For our ES implementation, we use an efficient shared-memory implementation on a single machine with 48 cores. We set the noise standard deviation and learning rate as 0.1 and 0.01, respectively, and the number of workers to 30. [...] For each experiment, we trained each network for 2M timesteps and averaged over 5 training runs with random seeds from 1 to 5 to obtain each learning curve. |