Towards Deep Attention in Graph Neural Networks: Problems and Remedies
Authors: Soo Yong Lee, Fanchen Bu, Jaemin Yoo, Kijung Shin
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On 9 out of 12 node classification benchmarks, AERO-GNN outperforms the baseline GNNs, highlighting the advantages of deep graph attention. Our code is available at https: //github.com/syleeheal/AERO-GNN. and In this section, we conduct experiments to demonstrate the empirical strengths of AERO-GNN and elaborate on the theoretical analyses. |
| Researcher Affiliation | Academia | 1Kim Jaechul Graduate School of Artificial Intelligence, KAIST, Daejeon, Republic of Korea 2School of Electrical Engineering, KAIST, Daejeon, Republic of Korea 3Heinz College of Information Systems and Public Policy, Carnegie Mellon University, Pittsburgh, PA, USA. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks with explicit labels like 'Algorithm' or 'Pseudocode'. |
| Open Source Code | Yes | Our code is available at https: //github.com/syleeheal/AERO-GNN. |
| Open Datasets | Yes | We use 12 node classification benchmark datasets, among which 6 are homophilic and 6 are heterophilic (Mc Pherson et al., 2001; Pei et al., 2020; Lim et al., 2021a). In all the experiments, we use the publicly available train-validation-test splits, unless otherwise specified. |
| Dataset Splits | Yes | In all the experiments, we use the publicly available train-validation-test splits, unless otherwise specified. Table 6: Data Split (%) (e.g. 48/32/20, 2.5/2.5/95, 5.0/15/50, 0.3/2.5/5.0, 5.2/18/37, 3.6/15/30). Since computer and photo datasets do not have publicly available splits, we follow the methodology of prior works (Chien et al., 2021) and use a random (2.5%, 2.5%, 95%) split for each trial. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'PyTorch Geometric' but does not provide specific version numbers for it or any other software dependencies like Python or CUDA. |
| Experiment Setup | Yes | The Adam optimizer (Kingma & Ba, 2015) is used to train the models, and the best parameters are selected based on early stopping. In measuring model performance (Section 5.2), we use 100 predetermined random seeds and report the mean standard deviation (SD) of classification accuracy over 100 trials. For all the methods, we set the learning rate as 0.01 for sparse-labeled training and 0.005 for dense-labeled training. For models with separate decay weights for parameters in propagation and feature transformation layers, we use WDprop and WDft to denote each. |