Are More Layers Beneficial to Graph Transformers?

Authors: Haiteng Zhao, Shuming Ma, Dongdong Zhang, Zhi-Hong Deng, Furu Wei

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our method unblocks the depth limitation of graph transformers and results in state-of-the-art performance across various graph benchmarks with deeper models.
Researcher Affiliation Collaboration Haiteng Zhao1 , Shuming Ma2, Dongdong Zhang2, Zhi-Hong Deng1 , Furu Wei2 1 Peking University 2 Microsoft Research
Pseudocode No The paper describes methods in text and mathematical equations but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Codes are available at https://github.com/zhao-ht/Deep Graph.
Open Datasets Yes Our method is validated on the tasks of the graph property prediction and node classification, specifically including PCQM4M-LSC (Hu et al., 2020), ZINC (Dwivedi et al., 2020), CLUSTER (Dwivedi et al., 2020) and PATTERN (Dwivedi et al., 2020), widely used in graph transformer studies.
Dataset Splits No The paper mentions using standard datasets but does not explicitly provide details about the train, validation, or test splits (e.g., percentages, sample counts, or explicit references to predefined splits).
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions using the "Adam optimizer" and the "Python package graph-tool" but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We implement Deep Graph with 12, 24, and 48 layers. The hidden dimension is 80 for ZINC and PATTERN, 48 for CLUSTER, and 768 for PCQM4M-LSC. The training uses Adam optimizer, with warm-up and decaying learning rates. Reported results are the average of over 4 seeds.