Training Graph Neural Networks with 1000 Layers

Authors: Guohao Li, Matthias Müller, Bernard Ghanem, Vladlen Koltun

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our models Rev GNN-Deep (1001 layers with 80 channels each) and Rev GNN-Wide (448 layers with 224 channels each) were both trained on a single commodity GPU and achieve an ROC-AUC of 87.74 0.13 and 88.24 0.15 on the ogbn-proteins dataset.
Researcher Affiliation Collaboration 1Intel Labs 2King Abdullah University of Science and Technology.
Pseudocode No The paper describes methods in text and equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We release our implementation, which supports Py Torch Geometric (Fey & Lenssen, 2019) and the Deep Graph Library (Wang et al., 2019a).
Open Datasets Yes We conduct experiments on several datasets from the Open Graph Benchmark (OGB) (Hu et al., 2020). ...ogbn-proteins dataset from the Open Graph Benchmark (OGB) (Hu et et al., 2020).
Dataset Splits Yes We use mini-batch training with random partitioning where graphs are split into 10 parts during training and 5 parts during testing (Li et al., 2020). The data splits and evaluation metrics on all datasets follow the OGB evaluation protocol.
Hardware Specification Yes Our models Rev GNN-Deep... and Rev GNN-Wide... were both trained on a single commodity GPU... Rev GNN-Deep and Rev GNN-Wide take 13.5 days and 17.1 days, respectively, to train for 2000 epochs on a single NVIDIA V100. We perform the inferences on a NVIDIA RTX A6000 (48GB).
Software Dependencies No The implementation of all the reversible models is based on Py Torch (Paszke et al., 2019) and supports both Py Torch Geometric (Py G) (Fey & Lenssen, 2019) and Deep Graph Libray (DGL) (Wang et al., 2019a) frameworks.
Experiment Setup Yes We use the same GNN operator (Li et al., 2020), hyper-parameters (e.g. learning rate, dropout rate, training epoch, etc.), and optimizers to make the comparison as fair as possible. ... Rev GNN-Wide uses a larger dropout rate of 0.2 to prevent overfitting. ... ϵ is set to 10 6 BD and 2 10 10 BD for the forward pass and the backward pass respectively... The iteration thresholds in the forward pass and the backward pass are set to the same value.