reproducibilityindex.ai

Do Transformers Really Perform Badly for Graph Representation?

Authors: Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, Tie-Yan Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We first conduct experiments on the recent OGB-LSC [21] quantum chemistry regression (i.e., PCQM4M-LSC) challenge, which is currently the biggest graph-level prediction dataset and contains more than 3.8M graphs in total. Then, we report the results on the other three popular tasks: ogbg-molhiv, ogbg-molpcba and ZINC, which come from the OGB [22] and benchmarking-GNN [14] leaderboards.
Researcher Affiliation	Collaboration	1Dalian University of Technology 2Princeton University 3Peking University 4Microsoft Research Asia
Pseudocode	No	No structured pseudocode or algorithm blocks were found. Equations (8) and (9) describe the Graphormer layer mathematically, but are not presented as pseudocode.
Open Source Code	Yes	The code and models of Graphormer will be made publicly available at https://github.com/Microsoft/Graphormer.
Open Datasets	Yes	We first conduct experiments on the recent OGB-LSC [21] quantum chemistry regression (i.e., PCQM4M-LSC) challenge, which is currently the biggest graph-level prediction dataset and contains more than 3.8M graphs in total. Then, we report the results on the other three popular tasks: ogbg-molhiv, ogbg-molpcba and ZINC, which come from the OGB [22] and benchmarking-GNN [14] leaderboards.
Dataset Splits	No	A detailed description of datasets and training strategies could be found in Appendix B. The main text mentions 'validate MAE' in tables, but does not provide specific split percentages or sample counts for the validation set within the provided text.
Hardware Specification	Yes	All models are trained on 8 NVIDIA V100 GPUS for about 2 days.
Software Dependencies	No	The paper mentions 'Adam W as the optimizer' but does not provide specific software library names with version numbers (e.g., PyTorch, TensorFlow, or scikit-learn versions) required for reproduction.
Experiment Setup	Yes	We primarily report results on two model sizes: Graphormer (L = 12, d = 768), and a smaller one Graphormer SMALL (L = 6, d = 512). Both the number of attention heads in the attention module and the dimensionality of edge features d E are set to 32. We use Adam W as the optimizer, and set the hyper-parameter ϵ to 1e-8 and (β1, β2) to (0.99,0.999). The peak learning rate is set to 2e-4 (3e-4 for Graphormer SMALL) with a 60k-step warm-up stage followed by a linear decay learning rate scheduler. The total training steps are 1M. The batch size is set to 1024.