reproducibilityindex.ai

NerveNet: Learning Structured Policy with Graph Neural Networks

Authors: Tingwu Wang, Renjie Liao, Jimmy Ba, Sanja Fidler

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the experiments, we ﬁrst show that our Nerve Net is comparable to state-of-the-art methods on standard Mu Jo Co environments. We further propose our customized reinforcement learning environments for benchmarking two types of structure transfer learning tasks, i.e., size and disability transfer, as well as multi-task learning. We demonstrate that policies learned by Nerve Net are signiﬁcantly more transferable and generalizable than policies learned by other models and are able to transfer even in a zero-shot setting.
Researcher Affiliation	Academia	Tingwu Wang , Renjie Liao , Jimmy Ba & Sanja Fidler Department of Computer Science University of Toronto Vector Institute {tingwuwang,rjliao}@cs.toronto.edu, jimmy@psi.toronto.edu, fidler@cs.toronto.edu
Pseudocode	No	The paper describes the architecture and various components of Nerve Net using mathematical equations and textual explanations (e.g., in Sections 2.2.1, 2.2.2, 2.2.3), but it does not include a distinct pseudocode block or a section explicitly labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code	Yes	The demo and code for this project are released, under the project page of http://www.cs.toronto.edu/ tingwuwang/nervenet.html.
Open Datasets	Yes	We ﬁrst evaluate our Nerve Net on standard RL benchmarks such as the Open AI Gym, Brockman et al. (2016) which stem from Mu Jo Co.
Dataset Splits	No	The paper describes training procedures and parameters, such as 'We set the maximum number of training steps to be 1 million for all environments', and mentions 'validating' in the context of its model variants ('validation' section 4.6), but it does not provide explicit details about data splits for training, validation, or testing, such as percentages or sample counts for each split.
Hardware Specification	No	The paper discusses the use of the Mu Jo Co engine for simulations but does not specify any hardware details like CPU, GPU models, memory, or specific cloud computing resources used for running the experiments.
Software Dependencies	No	The paper mentions software environments and frameworks such as 'Open AI Gym, Brockman et al. (2016)', 'Mu Jo Co, Todorov et al. (2012)', and 'proximal policy optimization (PPO) by Schulman et al. (2017)'. However, it does not provide specific version numbers for these or any other software components used, which is necessary for reproducibility.
Experiment Setup	Yes	We do grid search to ﬁnd the best hyperparameters and leave the details in the Appendix 6.3. As the randomness might have a big impact on the performance, for each environment, we run 3 experiments with different random seeds and plot the average curves and the standard deviations. In Appendix 6.1: 'Value Discount Factor γ .99, GAE λ .95, PPO Clip Value 0.2, Starting Learning Rate 3e-4, Gradient Clip Value 5.0, Target KL 0.01'. Appendix 6.3: 'Network Shape [64, 64], [128,128], [256, 256]', 'Number of Iteration Per Update 10, 20'.