On the Bottleneck of Graph Neural Networks and its Practical Implications

Authors: Uri Alon, Eran Yahav

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that the bottleneck hinders popular GNNs from fitting long-range signals in the training data; we further show that GNNs that absorb incoming edges equally, such as GCN and GIN, are more susceptible to over-squashing than GAT and GGNN; finally, we show that prior work, which extensively tuned GNN models of long-range problems, suffer from over-squashing, and that breaking the bottleneck improves their state-of-the-art results without any tuning or additional weights. Our code is available at https://github.com/tech-srl/bottleneck/.
Researcher Affiliation Academia Uri Alon & Eran Yahav Technion, Israel {urialon,yahave}@cs.technion.ac.il
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/tech-srl/bottleneck/.
Open Datasets Yes Data The QM9 dataset (Ramakrishnan et al., 2014; Gilmer et al., 2017; Wu et al., 2018) contains ~130,000 graphs with ~18 nodes. [...] The NCI1 dataset (Wale et al., 2008) contains 4110 graphs with ~30 nodes on average, and its task is to predict whether a biochemical compound contains anti-lung-cancer activity. ENZYMES (Borgwardt et al., 2005) contains 600 graphs with ~36 nodes on average, and its task is to classify an enzyme to one out of six classes. [...] VARMISUSE (Allamanis et al., 2018) is a node-prediction problem that depends on long-range information in computer programs.
Dataset Splits Yes We used the same splits and their best-found configurations. (for QM9, referring to Brockschmidt, 2020) and We used the same 10-folds and split as Errica et al. (2020). (for NCI1/ENZYMES). Table 8 also explicitly shows Training Validation Test splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions 'Py Torch Geometric' and cites its paper but does not provide specific version numbers for software dependencies.
Experiment Setup Yes Our training configuration and hyperparameter ranges are detailed in Appendix A. ... We used the Adam optimizer with a learning rate of 10-3, decayed by 0.5 after every 1000 epochs without an increase in training accuracy, and stopped training after 2000 epochs of no training accuracy improvement. (from Section 4.1). For QM9, it states: We re-trained each modified model for each target property using the same code, configuration, and training scheme as Brockschmidt (2020), training each model five times (using different random seeds) for each target property task.