How Graph Neural Networks Learn: Lessons from Training Dynamics

Authors: Chenxiao Yang, Qitian Wu, David Wipf, Ruoyu Sun, Junchi Yan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide theoretical explanations for the emergence of this phenomenon in the overparameterized regime and empirically validate it on real-world GNNs. and Empirical Verification. To further verify the theory, we numerically study the evolution of real-world GNNs during the GD-based training process on synthetic and realworld datasets. We found that their NTKs indeed align with the message passing matrix used in the forward pass.
Researcher Affiliation Collaboration 1School of Artificial Intelligence & Department of Computer Science and Engineering & Mo E Lab of AI, Shanghai Jiao Tong University 2Amazon Web Services 3School of Data Science, The Chinese University of Hong Kong, Shenzhen 4Shenzhen International Center for Industrial and Applied Mathematics, Shenzhen Research Institute of Big Data.
Pseudocode Yes Algorithm 1 Basic version of residual propagation. and Algorithm 2 Generalized residual propagation with kernel functions.
Open Source Code Yes Our codes are available at https://github.com/chr26195/ResidualPropagation.
Open Datasets Yes Arxiv, Proteins and Products (Hu et al., 2020) are three relatively large datasets containing 169343, 132534 and 2449029 nodes respectively. and We compare RP with some standard GNN architectures (Linear GNN Wu et al. (2019) and GCN Kipf & Welling (2017)) on a diverse set of 15 datasets, including three challenging OGB (Hu et al., 2020) datasets Arxiv, Proteins, Products with up to millions of nodes and edges.
Dataset Splits Yes We follow the original splitting of (Hu et al., 2020) for evaluation. and For Cora, Citeseer, Pubmed, we follow the public split, while for other datasets, we randomly split them into training/validation/testing sets based on ratio 8/1/1.
Hardware Specification Yes All experiments are conducted on Quadro RTX 8000 with 48GB memory.
Software Dependencies No The paper mentions using standard GNN architectures (e.g., GCN) and refers to the OGB leaderboard for implementation details, but it does not explicitly list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes For hyperparamter search of RP, we adopt grid search for the RP algorithm with the step size η from {0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1}, the power K ranging from 1 to 10. and The optimization algorithm is gradient descent with learning rates 1e 2, 3e 4, 5e 5 respectively for Cora, Texas, Synthetic respectively, momentum 0.9 and weight decay 5e 4. The loss function is the standard cross-entropy loss for multi-class classification.