reproducibilityindex.ai

Towards Deep Attention in Graph Neural Networks: Problems and Remedies

Authors: Soo Yong Lee, Fanchen Bu, Jaemin Yoo, Kijung Shin

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On 9 out of 12 node classification benchmarks, AERO-GNN outperforms the baseline GNNs, highlighting the advantages of deep graph attention. Our code is available at https: //github.com/syleeheal/AERO-GNN. and In this section, we conduct experiments to demonstrate the empirical strengths of AERO-GNN and elaborate on the theoretical analyses.
Researcher Affiliation	Academia	1Kim Jaechul Graduate School of Artificial Intelligence, KAIST, Daejeon, Republic of Korea 2School of Electrical Engineering, KAIST, Daejeon, Republic of Korea 3Heinz College of Information Systems and Public Policy, Carnegie Mellon University, Pittsburgh, PA, USA.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks with explicit labels like 'Algorithm' or 'Pseudocode'.
Open Source Code	Yes	Our code is available at https: //github.com/syleeheal/AERO-GNN.
Open Datasets	Yes	We use 12 node classification benchmark datasets, among which 6 are homophilic and 6 are heterophilic (Mc Pherson et al., 2001; Pei et al., 2020; Lim et al., 2021a). In all the experiments, we use the publicly available train-validation-test splits, unless otherwise specified.
Dataset Splits	Yes	In all the experiments, we use the publicly available train-validation-test splits, unless otherwise specified. Table 6: Data Split (%) (e.g. 48/32/20, 2.5/2.5/95, 5.0/15/50, 0.3/2.5/5.0, 5.2/18/37, 3.6/15/30). Since computer and photo datasets do not have publicly available splits, we follow the methodology of prior works (Chien et al., 2021) and use a random (2.5%, 2.5%, 95%) split for each trial.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions 'PyTorch Geometric' but does not provide specific version numbers for it or any other software dependencies like Python or CUDA.
Experiment Setup	Yes	The Adam optimizer (Kingma & Ba, 2015) is used to train the models, and the best parameters are selected based on early stopping. In measuring model performance (Section 5.2), we use 100 predetermined random seeds and report the mean standard deviation (SD) of classification accuracy over 100 trials. For all the methods, we set the learning rate as 0.01 for sparse-labeled training and 0.005 for dense-labeled training. For models with separate decay weights for parameters in propagation and feature transformation layers, we use WDprop and WDft to denote each.