Do Transformers Really Perform Badly for Graph Representation?

Authors: Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first conduct experiments on the recent OGB-LSC [21] quantum chemistry regression (i.e., PCQM4M-LSC) challenge, which is currently the biggest graph-level prediction dataset and contains more than 3.8M graphs in total. Then, we report the results on the other three popular tasks: ogbg-molhiv, ogbg-molpcba and ZINC, which come from the OGB [22] and benchmarking-GNN [14] leaderboards.
Researcher Affiliation Collaboration 1Dalian University of Technology 2Princeton University 3Peking University 4Microsoft Research Asia
Pseudocode No No structured pseudocode or algorithm blocks were found. Equations (8) and (9) describe the Graphormer layer mathematically, but are not presented as pseudocode.
Open Source Code Yes The code and models of Graphormer will be made publicly available at https://github.com/Microsoft/Graphormer.
Open Datasets Yes We first conduct experiments on the recent OGB-LSC [21] quantum chemistry regression (i.e., PCQM4M-LSC) challenge, which is currently the biggest graph-level prediction dataset and contains more than 3.8M graphs in total. Then, we report the results on the other three popular tasks: ogbg-molhiv, ogbg-molpcba and ZINC, which come from the OGB [22] and benchmarking-GNN [14] leaderboards.
Dataset Splits No A detailed description of datasets and training strategies could be found in Appendix B. The main text mentions 'validate MAE' in tables, but does not provide specific split percentages or sample counts for the validation set within the provided text.
Hardware Specification Yes All models are trained on 8 NVIDIA V100 GPUS for about 2 days.
Software Dependencies No The paper mentions 'Adam W as the optimizer' but does not provide specific software library names with version numbers (e.g., PyTorch, TensorFlow, or scikit-learn versions) required for reproduction.
Experiment Setup Yes We primarily report results on two model sizes: Graphormer (L = 12, d = 768), and a smaller one Graphormer SMALL (L = 6, d = 512). Both the number of attention heads in the attention module and the dimensionality of edge features d E are set to 32. We use Adam W as the optimizer, and set the hyper-parameter ϵ to 1e-8 and (β1, β2) to (0.99,0.999). The peak learning rate is set to 2e-4 (3e-4 for Graphormer SMALL) with a 60k-step warm-up stage followed by a linear decay learning rate scheduler. The total training steps are 1M. The batch size is set to 1024.