Are More Layers Beneficial to Graph Transformers?
Authors: Haiteng Zhao, Shuming Ma, Dongdong Zhang, Zhi-Hong Deng, Furu Wei
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our method unblocks the depth limitation of graph transformers and results in state-of-the-art performance across various graph benchmarks with deeper models. |
| Researcher Affiliation | Collaboration | Haiteng Zhao1 , Shuming Ma2, Dongdong Zhang2, Zhi-Hong Deng1 , Furu Wei2 1 Peking University 2 Microsoft Research |
| Pseudocode | No | The paper describes methods in text and mathematical equations but does not contain explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes are available at https://github.com/zhao-ht/Deep Graph. |
| Open Datasets | Yes | Our method is validated on the tasks of the graph property prediction and node classification, specifically including PCQM4M-LSC (Hu et al., 2020), ZINC (Dwivedi et al., 2020), CLUSTER (Dwivedi et al., 2020) and PATTERN (Dwivedi et al., 2020), widely used in graph transformer studies. |
| Dataset Splits | No | The paper mentions using standard datasets but does not explicitly provide details about the train, validation, or test splits (e.g., percentages, sample counts, or explicit references to predefined splits). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using the "Adam optimizer" and the "Python package graph-tool" but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We implement Deep Graph with 12, 24, and 48 layers. The hidden dimension is 80 for ZINC and PATTERN, 48 for CLUSTER, and 768 for PCQM4M-LSC. The training uses Adam optimizer, with warm-up and decaying learning rates. Reported results are the average of over 4 seeds. |