reproducibilityindex.ai

Training Graph Transformers via Curriculum-Enhanced Attention Distillation

Authors: Yisong Huang, Jin Li, Xinlong Chen, Yang-Geng Fu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that our method outperforms many state-of-the-art methods on seven public graph benchmarks, proving its effectiveness. We validate the effectiveness of our proposed method on seven graph benchmark datasets. Our method consistently outperforms existing GTs and GNNs, demonstrating enhanced performance and improved generalization capability.
Researcher Affiliation	Academia	Yisong Huang1, Jin Li1,2, Xinlong Chen1 & Yang-Geng Fu1 1College of Computer and Data Science, Fuzhou University, Fuzhou, China 2AI Thrust, Information Hub, HKUST (Guangzhou), Guangzhou, China
Pseudocode	No	The paper contains mathematical equations but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper does not include an explicit statement about releasing code or a link to a code repository.
Open Datasets	Yes	Datasets. We evaluate our method on seven benchmark datasets, including citation network datasets Cora, Citeseer, and Pubmed (Sen et al., 2008); Actor co-occurrence network dataset (Chien et al., 2021); and Web KB datasets (Pei et al., 2020) including Cornell, Texas, and Wisconsin.
Dataset Splits	Yes	For the remaining datasets, we set the train-validation-test split as 48%/32%/20%. We apply the standard splits for the citation network datasets, as in the previous work (Kipf & Welling, 2017). For the remaining datasets, we set the train-validation-test split as 48%/32%/20%.
Hardware Specification	Yes	All experiments are conducted on one Ge Force RTX 4090 GPU.
Software Dependencies	No	The paper mentions 'Python and Py Torch and use Adam as the optimizer' but does not specify version numbers for these software components.
Experiment Setup	Yes	We maintain fixed values for certain hyperparameters: the pre-training epochs of the teacher model are set to 200, the training epochs of the student model to 500, and the weight decay to 5e-4. We conduct hyperparameter tuning for other important hyperparameters on each dataset using grid search. The hyperparameter ranges are presented in Table 6. We provide the specific configurations of hyperparameters on each dataset in Table 7.