Graph Propagation Transformer for Graph Representation Learning

Authors: Zhe Chen, Hao Tan, Tao Wang, Tianrun Shen, Tong Lu, Qiuying Peng, Cheng Cheng, Yue Qi

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify the performance of GPTrans in a wide range of graph learning experiments on several benchmark datasets. These results show that our method outperforms many state-of-the-art transformer-based graph models with better performance.
Researcher Affiliation Collaboration 1State Key Lab for Novel Software Technology, Nanjing Univerisity 2OPPO Research Institute
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code will be released at https://github.com/czczup/GPTrans.
Open Datasets Yes We verify the following graph-level tasks: (1) PCQM4M [Hu et al., 2021]... (2) PCQM4Mv2 [Hu et al., 2021]... (3) Mol HIV [Hu et al., 2020]... (4) Mol PCBA [Hu et al., 2020]... (5) ZINC [Dwivedi et al., 2020]
Dataset Splits Yes For the ZINC dataset, we train our GPTrans-Nano model from scratch. It has 10,000 train, 1,000 validation, and 1,000 test graphs. For the PATTERN and CLUSTER datasets, we train our GPTrans-Nano up to 1000 epochs with a batch size of 256. Specifically, PATTERN has 10,000 training, 2,000 validation, and 2,000 test graphs, and CLUSTER contains 10,000 training, 1,000 validation, and 1,000 test graphs. TSP dataset has 10,000 training, 1,000 validation, and 1,000 test graphs.
Hardware Specification Yes Training time is measured on 8 A100 GPUs with half-precision training, and the inference throughput is tested on a single A100 GPU.
Software Dependencies Yes These experiments are conducted with PyTorch1.12 and CUDA11.3.
Experiment Setup Yes For the large-scale PCQM4M and PCQM4Mv2 datasets, we use Adam W [Loshchilov and Hutter, 2018] with an initial learning rate of 1e-3 as the optimizer. Following common practice, we adopt a cosine decay learning rate scheduler with a 20-epoch warmup. All models are trained for 300 epochs with a total batch size of 1024. The dimension of each head is set to 10 for our nano model, and 32 for others. Following common practices, the expansion ratio of the FFN module is α = 1 for all model variants. The architecture hyper-parameters of these five models are as follows: GPTrans-Nano: d1 = 80, d2 = 40, layer number = 12; GPTrans-Tiny: d1 = 256, d2 = 32, layer number = 12; GPTrans-Small: d1 = 384, d2 = 48, layer number = 12; GPTrans-Base: d1 = 608, d2 = 76, layer number = 18; GPTrans-Large: d1 = 736, d2 = 92, layer number = 24.