Towards Training Billion Parameter Graph Neural Networks for Atomic Simulations

Authors: Anuroop Sriram, Abhishek Das, Brandon M Wood, Siddharth Goyal, C. Lawrence Zitnick

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate our method by scaling up the number of parameters of the recently proposed Dime Net++ and Gem Net models by over an order of magnitude. On the large-scale Open Catalyst 2020 (OC20) dataset, these graph-parallelized models lead to relative improvements of 1) 15% on the force MAE metric for the S2EF task and 2) 21% on the AFb T metric for the IS2RS task, establishing new state-of-the-art results.
Researcher Affiliation Collaboration Meta FAIR National Energy Research Scientific Computing Center (NERSC) {anuroops,abhshkdz,bmwood,sidgoyal,zitnick}@fb.com
Pseudocode No No section or figure explicitly labeled "Pseudocode" or "Algorithm" was found, nor were any structured code-like blocks present.
Open Source Code No The paper does not include an unambiguous statement that the authors are releasing the source code for the methodology described in the paper, nor does it provide a direct link to such a repository. The links provided are for the Open Catalyst leaderboard and a discussion forum, which do not contain the specific code for their Graph Parallelism method.
Open Datasets Yes We benchmark our approach by scaling up two recent GNN architectures Dime Net++ (Klicpera et al., 2020a) and Gem Net-T (Klicpera et al., 2021) on the Open Catalyst (OC20) dataset (Chanussot* et al., 2021). The OC20 dataset, aimed at discovering new catalyst materials for renewable energy storage, consists of 134M training examples spanning a wide range of adsorbates and catalyst materials.
Dataset Splits No The paper mentions "validation error plateaus" implying a validation set was used, and states the OC20 dataset consists of "134M training examples". However, it does not explicitly provide the specific percentages, sample counts, or methodology for creating the training, validation, and test splits within the paper itself.
Hardware Specification Yes The model was trained with an effective batch size of 128 on 256 Volta 32GB GPUs with a combination of data parallel and graph parallel training... Finally, figure 2b shows the raw performance of running these models on V100 GPUs... We estimate that training our Gem Net-XL model with Tesla v100 32 GB GPUs on cloud resources in the US...
Software Dependencies No The paper mentions the use of the Adam W optimizer but does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers, nor any programming language versions.
Experiment Setup Yes Our Dime Net++ model consists of B = 4 interaction blocks, with a hidden dimension of H = 2048, an output block dimension of D = 1536, and intermediate triplet dimension of T = 256... The model was trained with the Adam W optimizer... starting with an initial learning rate of 10 4, that was multiplied by 0.8 whenever the validation error plateaus. The model was trained with an effective batch size of 128... Our Gem Net model consists of B = 6 interaction blocks, with an edge embedding size of E = 1536, triplet embedding size of T = 384 and embedding dimension of the bilinear layer of B = 192... We followed the same training procedure as with Dime Net++, except for a starting learning rate of 2 10 4.