Structured Control Nets for Deep Reinforcement Learning

Authors: Mario Srouji, Jian Zhang, Ruslan Salakhutdinov

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validated our hypothesis with competitive results on simulations from Open AI Mu Jo Co, Roboschool, Atari, and a custom urban driving environment, with various ablation and generalization tests, trained with multiple black-box and policy gradient training methods.
Researcher Affiliation Collaboration 1Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213. 2Apple Inc., 1 Infinite Loop, Cupertino, CA 95014.
Pseudocode No The paper describes the architecture and experimental procedures in text but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured, code-like steps.
Open Source Code No The paper states, 'For PPO and ACKTR, we use the same hyper-parameters and algorithm implementation from Open AI Baselines (Dhariwal et al., 2017).' This refers to a third-party codebase used by the authors, not their own implementation code for the Structured Control Net (SCN). The paper does not provide any link or explicit statement about the availability of the SCN's source code.
Open Datasets Yes We conduct experiments on several benchmarks, shown in Figure 2, including Open AI Mu Jo Co v1 (Todorov et al., 2012), Open AI Roboschool v1 (Open AI, 2017), and Atari Games (Bellemare et al., 2013).
Dataset Splits No The paper mentions training networks for '2M timesteps' and 'averaged over 5 training runs with random seeds from 1 to 5', but it does not specify any explicit train/validation/test dataset splits or cross-validation methodology.
Hardware Specification Yes For our ES implementation, we use an efficient shared-memory implementation on a single machine with 48 cores.
Software Dependencies No The paper states, 'For PPO and ACKTR, we use the same hyper-parameters and algorithm implementation from Open AI Baselines (Dhariwal et al., 2017).' While it names a software project, it does not provide specific version numbers for Open AI Baselines or any other software components (e.g., Python, TensorFlow, PyTorch libraries) used in the experiments.
Experiment Setup Yes For our ES implementation, we use an efficient shared-memory implementation on a single machine with 48 cores. We set the noise standard deviation and learning rate as 0.1 and 0.01, respectively, and the number of workers to 30. [...] For each experiment, we trained each network for 2M timesteps and averaged over 5 training runs with random seeds from 1 to 5 to obtain each learning curve.