Training Graph Neural Networks with 1000 Layers
Authors: Guohao Li, Matthias Müller, Bernard Ghanem, Vladlen Koltun
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our models Rev GNN-Deep (1001 layers with 80 channels each) and Rev GNN-Wide (448 layers with 224 channels each) were both trained on a single commodity GPU and achieve an ROC-AUC of 87.74 0.13 and 88.24 0.15 on the ogbn-proteins dataset. |
| Researcher Affiliation | Collaboration | 1Intel Labs 2King Abdullah University of Science and Technology. |
| Pseudocode | No | The paper describes methods in text and equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our implementation, which supports Py Torch Geometric (Fey & Lenssen, 2019) and the Deep Graph Library (Wang et al., 2019a). |
| Open Datasets | Yes | We conduct experiments on several datasets from the Open Graph Benchmark (OGB) (Hu et al., 2020). ...ogbn-proteins dataset from the Open Graph Benchmark (OGB) (Hu et et al., 2020). |
| Dataset Splits | Yes | We use mini-batch training with random partitioning where graphs are split into 10 parts during training and 5 parts during testing (Li et al., 2020). The data splits and evaluation metrics on all datasets follow the OGB evaluation protocol. |
| Hardware Specification | Yes | Our models Rev GNN-Deep... and Rev GNN-Wide... were both trained on a single commodity GPU... Rev GNN-Deep and Rev GNN-Wide take 13.5 days and 17.1 days, respectively, to train for 2000 epochs on a single NVIDIA V100. We perform the inferences on a NVIDIA RTX A6000 (48GB). |
| Software Dependencies | No | The implementation of all the reversible models is based on Py Torch (Paszke et al., 2019) and supports both Py Torch Geometric (Py G) (Fey & Lenssen, 2019) and Deep Graph Libray (DGL) (Wang et al., 2019a) frameworks. |
| Experiment Setup | Yes | We use the same GNN operator (Li et al., 2020), hyper-parameters (e.g. learning rate, dropout rate, training epoch, etc.), and optimizers to make the comparison as fair as possible. ... Rev GNN-Wide uses a larger dropout rate of 0.2 to prevent overfitting. ... ϵ is set to 10 6 BD and 2 10 10 BD for the forward pass and the backward pass respectively... The iteration thresholds in the forward pass and the backward pass are set to the same value. |