Snowflake: Scaling GNNs to high-dimensional continuous control via parameter freezing
Authors: Charles Blake, Vitaly Kurin, Maximilian Igl, Shimon Whiteson
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we show that applying SNOWFLAKE to NERVENET dramatically improves asymptotic performance and sample complexity on such tasks. We also demonstrate that a policy trained using SNOWFLAKE exhibits improved zero-shot transfer compared to regular NERVENET or MLPs on high-dimensional tasks. Figure 2: Comparison of the scaling of NERVENET relative to an MLP-based policy. Figure 6: Comparison of the performance of SNOWFLAKE training, regular NERVENET and the MLPbased policy. |
| Researcher Affiliation | Academia | Charlie Blake University of Oxford thecharlieblake@gmail.com Vitaly Kurin University of Oxford vitaly.kurin@cs.ox.ac.uk Maximilian Igl University of Oxford maximilian.igl@gmail.com Shimon Whiteson University of Oxford shimon.whiteson@cs.ox.ac.uk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper uses Mu Jo Co and Gym environments (e.g., Centipede-n agents) which are well-known, but it does not provide explicit access information (link, DOI, formal citation) for the datasets/environments themselves as a data source that needs to be 'accessed' like a traditional dataset. |
| Dataset Splits | No | The paper does not explicitly provide details about training/validation/test dataset splits. It discusses training steps, batch sizes, and evaluation on different agent sizes, but not formal dataset splits. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., GPU/CPU models, memory) used for running its experiments, beyond acknowledging a grant from NVIDIA. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies. It mentions PPO, Adam optimizer, Gym, and Mu Jo Co, but without versions. |
| Experiment Setup | Yes | NERVENET assumes an MDP where the state s can be factored into input labels V , which are fed to the GNN to generate output labels: V0 = NERVENET(G, V). These are then used to parameterise a normal distribution defining the stochastic policy: (a|s) = N(V0, diag(σ2)), where the standard deviation is a separate vector of parameters learned during training. The policy is trained using PPO, with parameter updates computed via the Adam optimisation algorithm [25]. Figure 3: Final performance of NERVENET on Centipede-20 after ten million timesteps, across a range of clipping hyperparameter values. We use mostly the same experimental setup as Wang et al. [55], with details of any differences and our choice of hyperparameters outlined in Appendix A.2. |