Tailoring Self-Attention for Graph via Rooted Subtrees
Authors: Siyuan Huang, Yunchong Song, Jiayue Zhou, Zhouhan Lin
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive evaluations on ten node classification datasets demonstrate that STA-based models outperform existing graph transformers and mainstream GNNs. The code is available at https://github.com/LUMIA-Group/Sub Tree-Attention. |
| Researcher Affiliation | Academia | Siyuan Huang Yunchong Song Jiayue Zhou Zhouhan Lin Shanghai Jiaotong University siyuan_huang_sjtu@outlook.com ycsong@sjtu.edu.cn lin.zhouhan@gmail.com |
| Pseudocode | No | The paper does not contain explicitly labeled "Pseudocode" or "Algorithm" blocks. It describes procedures using mathematical equations and textual explanations. |
| Open Source Code | Yes | The code is available at https://github.com/LUMIA-Group/Sub Tree-Attention. |
| Open Datasets | Yes | Comprehensive evaluations on ten node classification datasets demonstrate that STA-based models outperform existing graph transformers and mainstream GNNs. The detailed information for each dataset is presented in Table 3. |
| Dataset Splits | Yes | For Cora, Citeseer, Deezer and Actor, we apply the same random splits with train/valid/test ratios of 50%/25%/25% as [43]. For Pubmed, Corafull, Computer, Photo, CS and Physics, we apply the same random splits with train/valid/test ratios of 60%/20%/20% as [6]. |
| Hardware Specification | Yes | All experiments are conducted on an NVIDIA RTX4090 with 24 GB memory. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and implies the use of standard deep learning frameworks, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For the model configuration of STAGNN, we fix the number of hidden channels at 64. We use grid search for hyper-parameter settings. The learning rate is searched within {0.001,0.01}, dropout probability searched within {0.0,0.2,0.4,0.6}, weight decay searched within {0.0001,0.0005,0.001,0.005}, height of the rooted subtree K searched within {3,5,10}, number of attention heads searched within {1,2,4,6,8}. |