reproducibilityindex.ai

Graph Propagation Transformer for Graph Representation Learning

Authors: Zhe Chen, Hao Tan, Tao Wang, Tianrun Shen, Tong Lu, Qiuying Peng, Cheng Cheng, Yue Qi

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify the performance of GPTrans in a wide range of graph learning experiments on several benchmark datasets. These results show that our method outperforms many state-of-the-art transformer-based graph models with better performance.
Researcher Affiliation	Collaboration	1State Key Lab for Novel Software Technology, Nanjing Univerisity 2OPPO Research Institute
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code will be released at https://github.com/czczup/GPTrans.
Open Datasets	Yes	We verify the following graph-level tasks: (1) PCQM4M [Hu et al., 2021]... (2) PCQM4Mv2 [Hu et al., 2021]... (3) Mol HIV [Hu et al., 2020]... (4) Mol PCBA [Hu et al., 2020]... (5) ZINC [Dwivedi et al., 2020]
Dataset Splits	Yes	For the ZINC dataset, we train our GPTrans-Nano model from scratch. It has 10,000 train, 1,000 validation, and 1,000 test graphs. For the PATTERN and CLUSTER datasets, we train our GPTrans-Nano up to 1000 epochs with a batch size of 256. Specifically, PATTERN has 10,000 training, 2,000 validation, and 2,000 test graphs, and CLUSTER contains 10,000 training, 1,000 validation, and 1,000 test graphs. TSP dataset has 10,000 training, 1,000 validation, and 1,000 test graphs.
Hardware Specification	Yes	Training time is measured on 8 A100 GPUs with half-precision training, and the inference throughput is tested on a single A100 GPU.
Software Dependencies	Yes	These experiments are conducted with PyTorch1.12 and CUDA11.3.
Experiment Setup	Yes	For the large-scale PCQM4M and PCQM4Mv2 datasets, we use Adam W [Loshchilov and Hutter, 2018] with an initial learning rate of 1e-3 as the optimizer. Following common practice, we adopt a cosine decay learning rate scheduler with a 20-epoch warmup. All models are trained for 300 epochs with a total batch size of 1024. The dimension of each head is set to 10 for our nano model, and 32 for others. Following common practices, the expansion ratio of the FFN module is α = 1 for all model variants. The architecture hyper-parameters of these five models are as follows: GPTrans-Nano: d1 = 80, d2 = 40, layer number = 12; GPTrans-Tiny: d1 = 256, d2 = 32, layer number = 12; GPTrans-Small: d1 = 384, d2 = 48, layer number = 12; GPTrans-Base: d1 = 608, d2 = 76, layer number = 18; GPTrans-Large: d1 = 736, d2 = 92, layer number = 24.