Tailoring Self-Attention for Graph via Rooted Subtrees

Authors: Siyuan Huang, Yunchong Song, Jiayue Zhou, Zhouhan Lin

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive evaluations on ten node classification datasets demonstrate that STA-based models outperform existing graph transformers and mainstream GNNs. The code is available at https://github.com/LUMIA-Group/Sub Tree-Attention.
Researcher Affiliation Academia Siyuan Huang Yunchong Song Jiayue Zhou Zhouhan Lin Shanghai Jiaotong University siyuan_huang_sjtu@outlook.com ycsong@sjtu.edu.cn lin.zhouhan@gmail.com
Pseudocode No The paper does not contain explicitly labeled "Pseudocode" or "Algorithm" blocks. It describes procedures using mathematical equations and textual explanations.
Open Source Code Yes The code is available at https://github.com/LUMIA-Group/Sub Tree-Attention.
Open Datasets Yes Comprehensive evaluations on ten node classification datasets demonstrate that STA-based models outperform existing graph transformers and mainstream GNNs. The detailed information for each dataset is presented in Table 3.
Dataset Splits Yes For Cora, Citeseer, Deezer and Actor, we apply the same random splits with train/valid/test ratios of 50%/25%/25% as [43]. For Pubmed, Corafull, Computer, Photo, CS and Physics, we apply the same random splits with train/valid/test ratios of 60%/20%/20% as [6].
Hardware Specification Yes All experiments are conducted on an NVIDIA RTX4090 with 24 GB memory.
Software Dependencies No The paper mentions using the Adam optimizer and implies the use of standard deep learning frameworks, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes For the model configuration of STAGNN, we fix the number of hidden channels at 64. We use grid search for hyper-parameter settings. The learning rate is searched within {0.001,0.01}, dropout probability searched within {0.0,0.2,0.4,0.6}, weight decay searched within {0.0001,0.0005,0.001,0.005}, height of the rooted subtree K searched within {3,5,10}, number of attention heads searched within {1,2,4,6,8}.