reproducibilityindex.ai

Snowflake: Scaling GNNs to high-dimensional continuous control via parameter freezing

Authors: Charles Blake, Vitaly Kurin, Maximilian Igl, Shimon Whiteson

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, we show that applying SNOWFLAKE to NERVENET dramatically improves asymptotic performance and sample complexity on such tasks. We also demonstrate that a policy trained using SNOWFLAKE exhibits improved zero-shot transfer compared to regular NERVENET or MLPs on high-dimensional tasks. Figure 2: Comparison of the scaling of NERVENET relative to an MLP-based policy. Figure 6: Comparison of the performance of SNOWFLAKE training, regular NERVENET and the MLPbased policy.
Researcher Affiliation	Academia	Charlie Blake University of Oxford thecharlieblake@gmail.com Vitaly Kurin University of Oxford vitaly.kurin@cs.ox.ac.uk Maximilian Igl University of Oxford maximilian.igl@gmail.com Shimon Whiteson University of Oxford shimon.whiteson@cs.ox.ac.uk
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper uses Mu Jo Co and Gym environments (e.g., Centipede-n agents) which are well-known, but it does not provide explicit access information (link, DOI, formal citation) for the datasets/environments themselves as a data source that needs to be 'accessed' like a traditional dataset.
Dataset Splits	No	The paper does not explicitly provide details about training/validation/test dataset splits. It discusses training steps, batch sizes, and evaluation on different agent sizes, but not formal dataset splits.
Hardware Specification	No	The paper does not specify the hardware (e.g., GPU/CPU models, memory) used for running its experiments, beyond acknowledging a grant from NVIDIA.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies. It mentions PPO, Adam optimizer, Gym, and Mu Jo Co, but without versions.
Experiment Setup	Yes	NERVENET assumes an MDP where the state s can be factored into input labels V , which are fed to the GNN to generate output labels: V0 = NERVENET(G, V). These are then used to parameterise a normal distribution deﬁning the stochastic policy: (a\|s) = N(V0, diag(σ2)), where the standard deviation is a separate vector of parameters learned during training. The policy is trained using PPO, with parameter updates computed via the Adam optimisation algorithm [25]. Figure 3: Final performance of NERVENET on Centipede-20 after ten million timesteps, across a range of clipping hyperparameter values. We use mostly the same experimental setup as Wang et al. [55], with details of any differences and our choice of hyperparameters outlined in Appendix A.2.