NerveNet: Learning Structured Policy with Graph Neural Networks
Authors: Tingwu Wang, Renjie Liao, Jimmy Ba, Sanja Fidler
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, we first show that our Nerve Net is comparable to state-of-the-art methods on standard Mu Jo Co environments. We further propose our customized reinforcement learning environments for benchmarking two types of structure transfer learning tasks, i.e., size and disability transfer, as well as multi-task learning. We demonstrate that policies learned by Nerve Net are significantly more transferable and generalizable than policies learned by other models and are able to transfer even in a zero-shot setting. |
| Researcher Affiliation | Academia | Tingwu Wang , Renjie Liao , Jimmy Ba & Sanja Fidler Department of Computer Science University of Toronto Vector Institute {tingwuwang,rjliao}@cs.toronto.edu, jimmy@psi.toronto.edu, fidler@cs.toronto.edu |
| Pseudocode | No | The paper describes the architecture and various components of Nerve Net using mathematical equations and textual explanations (e.g., in Sections 2.2.1, 2.2.2, 2.2.3), but it does not include a distinct pseudocode block or a section explicitly labeled as 'Algorithm' or 'Pseudocode'. |
| Open Source Code | Yes | The demo and code for this project are released, under the project page of http://www.cs.toronto.edu/ tingwuwang/nervenet.html. |
| Open Datasets | Yes | We first evaluate our Nerve Net on standard RL benchmarks such as the Open AI Gym, Brockman et al. (2016) which stem from Mu Jo Co. |
| Dataset Splits | No | The paper describes training procedures and parameters, such as 'We set the maximum number of training steps to be 1 million for all environments', and mentions 'validating' in the context of its model variants ('validation' section 4.6), but it does not provide explicit details about data splits for training, validation, or testing, such as percentages or sample counts for each split. |
| Hardware Specification | No | The paper discusses the use of the Mu Jo Co engine for simulations but does not specify any hardware details like CPU, GPU models, memory, or specific cloud computing resources used for running the experiments. |
| Software Dependencies | No | The paper mentions software environments and frameworks such as 'Open AI Gym, Brockman et al. (2016)', 'Mu Jo Co, Todorov et al. (2012)', and 'proximal policy optimization (PPO) by Schulman et al. (2017)'. However, it does not provide specific version numbers for these or any other software components used, which is necessary for reproducibility. |
| Experiment Setup | Yes | We do grid search to find the best hyperparameters and leave the details in the Appendix 6.3. As the randomness might have a big impact on the performance, for each environment, we run 3 experiments with different random seeds and plot the average curves and the standard deviations. In Appendix 6.1: 'Value Discount Factor γ .99, GAE λ .95, PPO Clip Value 0.2, Starting Learning Rate 3e-4, Gradient Clip Value 5.0, Target KL 0.01'. Appendix 6.3: 'Network Shape [64, 64], [128,128], [256, 256]', 'Number of Iteration Per Update 10, 20'. |