How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision
Authors: Dongkwan Kim, Alice Oh
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiment on 17 real-world datasets demonstrates that our recipe generalizes across 15 datasets of them, and our models designed by recipe show improved performance over baselines. |
| Researcher Affiliation | Academia | Dongkwan Kim & Alice Oh KAIST, Republic of Korea dongkwan.kim@kaist.ac.kr, alice.oh@kaist.edu |
| Pseudocode | No | The paper describes the model architecture and equations but does not include a clearly labeled pseudocode block or algorithm. |
| Open Source Code | Yes | We make our code available for future research (https://github.com/dongkwan-kim/Super GAT). |
| Open Datasets | Yes | We use a total of 17 real-world datasets (Cora, Cite Seer, Pub Med, Cora-ML, Cora-Full, DBLP, ogbn-arxiv, CS, Physics, Photo, Computers, Wiki-CS, Four-Univ, Chameleon, Crocodile, Flickr, and PPI) in diverse domains... See appendix A.1 for detailed description, splits, statistics (including degree and homophily), and references. We follow the train/validation/test split of previous work (Kipf & Welling, 2017). |
| Dataset Splits | Yes | We follow the train/validation/test split of previous work (Kipf & Welling, 2017). We use 20 samples per class for training, 500 samples for the validation, and 1000 samples for the test. |
| Hardware Specification | Yes | To demonstrate our model s efficiency, we measure the mean wall-clock time of the entire training process of three runs using a single GPU (Ge Force GTX 1080Ti). |
| Software Dependencies | No | The paper states: "All models are implemented in Py Torch (Paszke et al., 2019) and Py Torch Geometric (Fey & Lenssen, 2019)." While it cites the papers, it does not provide specific version numbers for these software components (e.g., PyTorch 1.x) within the text. |
| Experiment Setup | Yes | All parameters are initialized by Glorot initialization (Glorot & Bengio, 2010) and optimized by Adam (Kingma & Ba, 2014). We apply L2 regularization, dropout (Srivastava et al., 2014) to features and attention coefficients, and early stopping on validation loss and accuracy. We use ELU (Clevert et al., 2016) as a non-linear activation ρ. Unless specified, we employ a two-layer Super GAT with F = 8 features and K = 8 attention heads (total 64 features). For real-world datasets, we tune two hyperparameters (mixing coefficients λ2 and λE) by Bayesian optimization for the mean performance of 3 random seeds. We choose negative sampling ratio pn from {0.3, 0.5, 0.7, 0.9}, and edge sampling ratio pe from {0.6, 0.8, 1.0}. We fix dropout probability to 0.0 for PPI, 0.2 for ogbn-arxiv, 0.6 for others. We set learning rate to 0.05 (ogbn-arxiv), 0.01 (Pub Med, PPI, Wiki-CS, Photo, Computers, CS, Physics, Crocodile, Cora-Full, DBLP), 0.005 (Cora, Cite Seer, Cora-ML, Chameleon), 0.001 (Four-Univ). |