Training Graph Transformers via Curriculum-Enhanced Attention Distillation
Authors: Yisong Huang, Jin Li, Xinlong Chen, Yang-Geng Fu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our method outperforms many state-of-the-art methods on seven public graph benchmarks, proving its effectiveness. We validate the effectiveness of our proposed method on seven graph benchmark datasets. Our method consistently outperforms existing GTs and GNNs, demonstrating enhanced performance and improved generalization capability. |
| Researcher Affiliation | Academia | Yisong Huang1, Jin Li1,2, Xinlong Chen1 & Yang-Geng Fu1 1College of Computer and Data Science, Fuzhou University, Fuzhou, China 2AI Thrust, Information Hub, HKUST (Guangzhou), Guangzhou, China |
| Pseudocode | No | The paper contains mathematical equations but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not include an explicit statement about releasing code or a link to a code repository. |
| Open Datasets | Yes | Datasets. We evaluate our method on seven benchmark datasets, including citation network datasets Cora, Citeseer, and Pubmed (Sen et al., 2008); Actor co-occurrence network dataset (Chien et al., 2021); and Web KB datasets (Pei et al., 2020) including Cornell, Texas, and Wisconsin. |
| Dataset Splits | Yes | For the remaining datasets, we set the train-validation-test split as 48%/32%/20%. We apply the standard splits for the citation network datasets, as in the previous work (Kipf & Welling, 2017). For the remaining datasets, we set the train-validation-test split as 48%/32%/20%. |
| Hardware Specification | Yes | All experiments are conducted on one Ge Force RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions 'Python and Py Torch and use Adam as the optimizer' but does not specify version numbers for these software components. |
| Experiment Setup | Yes | We maintain fixed values for certain hyperparameters: the pre-training epochs of the teacher model are set to 200, the training epochs of the student model to 500, and the weight decay to 5e-4. We conduct hyperparameter tuning for other important hyperparameters on each dataset using grid search. The hyperparameter ranges are presented in Table 6. We provide the specific configurations of hyperparameters on each dataset in Table 7. |