How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision

Authors: Dongkwan Kim, Alice Oh

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiment on 17 real-world datasets demonstrates that our recipe generalizes across 15 datasets of them, and our models designed by recipe show improved performance over baselines.
Researcher Affiliation Academia Dongkwan Kim & Alice Oh KAIST, Republic of Korea dongkwan.kim@kaist.ac.kr, alice.oh@kaist.edu
Pseudocode No The paper describes the model architecture and equations but does not include a clearly labeled pseudocode block or algorithm.
Open Source Code Yes We make our code available for future research (https://github.com/dongkwan-kim/Super GAT).
Open Datasets Yes We use a total of 17 real-world datasets (Cora, Cite Seer, Pub Med, Cora-ML, Cora-Full, DBLP, ogbn-arxiv, CS, Physics, Photo, Computers, Wiki-CS, Four-Univ, Chameleon, Crocodile, Flickr, and PPI) in diverse domains... See appendix A.1 for detailed description, splits, statistics (including degree and homophily), and references. We follow the train/validation/test split of previous work (Kipf & Welling, 2017).
Dataset Splits Yes We follow the train/validation/test split of previous work (Kipf & Welling, 2017). We use 20 samples per class for training, 500 samples for the validation, and 1000 samples for the test.
Hardware Specification Yes To demonstrate our model s efficiency, we measure the mean wall-clock time of the entire training process of three runs using a single GPU (Ge Force GTX 1080Ti).
Software Dependencies No The paper states: "All models are implemented in Py Torch (Paszke et al., 2019) and Py Torch Geometric (Fey & Lenssen, 2019)." While it cites the papers, it does not provide specific version numbers for these software components (e.g., PyTorch 1.x) within the text.
Experiment Setup Yes All parameters are initialized by Glorot initialization (Glorot & Bengio, 2010) and optimized by Adam (Kingma & Ba, 2014). We apply L2 regularization, dropout (Srivastava et al., 2014) to features and attention coefficients, and early stopping on validation loss and accuracy. We use ELU (Clevert et al., 2016) as a non-linear activation ρ. Unless specified, we employ a two-layer Super GAT with F = 8 features and K = 8 attention heads (total 64 features). For real-world datasets, we tune two hyperparameters (mixing coefficients λ2 and λE) by Bayesian optimization for the mean performance of 3 random seeds. We choose negative sampling ratio pn from {0.3, 0.5, 0.7, 0.9}, and edge sampling ratio pe from {0.6, 0.8, 1.0}. We fix dropout probability to 0.0 for PPI, 0.2 for ogbn-arxiv, 0.6 for others. We set learning rate to 0.05 (ogbn-arxiv), 0.01 (Pub Med, PPI, Wiki-CS, Photo, Computers, CS, Physics, Crocodile, Cora-Full, DBLP), 0.005 (Cora, Cite Seer, Cora-ML, Chameleon), 0.001 (Four-Univ).