reproducibilityindex.ai

Towards Training Billion Parameter Graph Neural Networks for Atomic Simulations

Authors: Anuroop Sriram, Abhishek Das, Brandon M Wood, Siddharth Goyal, C. Lawrence Zitnick

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate our method by scaling up the number of parameters of the recently proposed Dime Net++ and Gem Net models by over an order of magnitude. On the large-scale Open Catalyst 2020 (OC20) dataset, these graph-parallelized models lead to relative improvements of 1) 15% on the force MAE metric for the S2EF task and 2) 21% on the AFb T metric for the IS2RS task, establishing new state-of-the-art results.
Researcher Affiliation	Collaboration	Meta FAIR National Energy Research Scientiﬁc Computing Center (NERSC) {anuroops,abhshkdz,bmwood,sidgoyal,zitnick}@fb.com
Pseudocode	No	No section or figure explicitly labeled "Pseudocode" or "Algorithm" was found, nor were any structured code-like blocks present.
Open Source Code	No	The paper does not include an unambiguous statement that the authors are releasing the source code for the methodology described in the paper, nor does it provide a direct link to such a repository. The links provided are for the Open Catalyst leaderboard and a discussion forum, which do not contain the specific code for their Graph Parallelism method.
Open Datasets	Yes	We benchmark our approach by scaling up two recent GNN architectures Dime Net++ (Klicpera et al., 2020a) and Gem Net-T (Klicpera et al., 2021) on the Open Catalyst (OC20) dataset (Chanussot* et al., 2021). The OC20 dataset, aimed at discovering new catalyst materials for renewable energy storage, consists of 134M training examples spanning a wide range of adsorbates and catalyst materials.
Dataset Splits	No	The paper mentions "validation error plateaus" implying a validation set was used, and states the OC20 dataset consists of "134M training examples". However, it does not explicitly provide the specific percentages, sample counts, or methodology for creating the training, validation, and test splits within the paper itself.
Hardware Specification	Yes	The model was trained with an effective batch size of 128 on 256 Volta 32GB GPUs with a combination of data parallel and graph parallel training... Finally, ﬁgure 2b shows the raw performance of running these models on V100 GPUs... We estimate that training our Gem Net-XL model with Tesla v100 32 GB GPUs on cloud resources in the US...
Software Dependencies	No	The paper mentions the use of the Adam W optimizer but does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers, nor any programming language versions.
Experiment Setup	Yes	Our Dime Net++ model consists of B = 4 interaction blocks, with a hidden dimension of H = 2048, an output block dimension of D = 1536, and intermediate triplet dimension of T = 256... The model was trained with the Adam W optimizer... starting with an initial learning rate of 10 4, that was multiplied by 0.8 whenever the validation error plateaus. The model was trained with an effective batch size of 128... Our Gem Net model consists of B = 6 interaction blocks, with an edge embedding size of E = 1536, triplet embedding size of T = 384 and embedding dimension of the bilinear layer of B = 192... We followed the same training procedure as with Dime Net++, except for a starting learning rate of 2 10 4.