Snowflake: Scaling GNNs to high-dimensional continuous control via parameter freezing

Authors: Charles Blake, Vitaly Kurin, Maximilian Igl, Shimon Whiteson

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, we show that applying SNOWFLAKE to NERVENET dramatically improves asymptotic performance and sample complexity on such tasks. We also demonstrate that a policy trained using SNOWFLAKE exhibits improved zero-shot transfer compared to regular NERVENET or MLPs on high-dimensional tasks. Figure 2: Comparison of the scaling of NERVENET relative to an MLP-based policy. Figure 6: Comparison of the performance of SNOWFLAKE training, regular NERVENET and the MLPbased policy.
Researcher Affiliation Academia Charlie Blake University of Oxford thecharlieblake@gmail.com Vitaly Kurin University of Oxford vitaly.kurin@cs.ox.ac.uk Maximilian Igl University of Oxford maximilian.igl@gmail.com Shimon Whiteson University of Oxford shimon.whiteson@cs.ox.ac.uk
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper uses Mu Jo Co and Gym environments (e.g., Centipede-n agents) which are well-known, but it does not provide explicit access information (link, DOI, formal citation) for the datasets/environments themselves as a data source that needs to be 'accessed' like a traditional dataset.
Dataset Splits No The paper does not explicitly provide details about training/validation/test dataset splits. It discusses training steps, batch sizes, and evaluation on different agent sizes, but not formal dataset splits.
Hardware Specification No The paper does not specify the hardware (e.g., GPU/CPU models, memory) used for running its experiments, beyond acknowledging a grant from NVIDIA.
Software Dependencies No The paper does not provide specific version numbers for software dependencies. It mentions PPO, Adam optimizer, Gym, and Mu Jo Co, but without versions.
Experiment Setup Yes NERVENET assumes an MDP where the state s can be factored into input labels V , which are fed to the GNN to generate output labels: V0 = NERVENET(G, V). These are then used to parameterise a normal distribution defining the stochastic policy: (a|s) = N(V0, diag(σ2)), where the standard deviation is a separate vector of parameters learned during training. The policy is trained using PPO, with parameter updates computed via the Adam optimisation algorithm [25]. Figure 3: Final performance of NERVENET on Centipede-20 after ten million timesteps, across a range of clipping hyperparameter values. We use mostly the same experimental setup as Wang et al. [55], with details of any differences and our choice of hyperparameters outlined in Appendix A.2.