Graph Inductive Biases in Transformers without Message Passing
Authors: Liheng Ma, Chen Lin, Derek Lim, Adriana Romero-Soriano, Puneet K. Dokania, Mark Coates, Philip Torr, Ser-Nam Lim
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver. Along with theoretical justification, we provide ample empirical evidence to demonstrate the effectiveness of our design choices. GRIT achieves state-of-the-art empirical performance across a variety of graph learning benchmarks, both small and large-scale. |
| Researcher Affiliation | Collaboration | 1Mc Gill University 2Department of Engineering Science, University of Oxford 3CSAIL, Massachusetts Institute of Technology 4Meta AI 5Mila Quebec AI Institute 6Canada CIFAR AI Chair 7Five AI 8International Laboratory on Learning Systems (ILLS). |
| Pseudocode | No | The paper includes a visualization of the architecture (Figure 4) and mathematical equations, but no pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and models are publicly available at https:// github.com/Liam Ma/GRIT. |
| Open Datasets | Yes | We evaluate our proposed method on five benchmarks from the Benchmarking GNNs work (Dwivedi et al., 2022a) and two benchmarks from the recently developed Long-Range Graph Benchmark (Dwivedi et al., 2022b). In addition, we also conduct experiments on the larger datasets ZINC-full graphs ( 250,000 graphs) (Irwin et al., 2012) and PCQM4Mv2 ( 3,700,000 graphs) (Hu et al., 2021). |
| Dataset Splits | Yes | Our experiments are conducted on the standard train/validation/test splits of the evaluated benchmarks. For each dataset, we execute 4 runs with different random seeds (0,1,2,3) and report the mean performance and standard deviation. |
| Hardware Specification | Yes | The timing is conducted on a single NVIDIA V100 GPU and 20 threads of Intel(R) Xeon(R) GOld 6140 CPU @ 2.30GH. |
| Software Dependencies | No | The paper mentions using specific datasets from other works and refers to various GNN models, but does not list specific versions for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The final hyperparameters are presented in Tables. 9 and Tables. 10. These tables specify values for '# Transformer Layers', 'Hidden dim', '# Heads', 'Dropout', 'Attention dropout', 'Batch size', 'Learning Rate', '# Epochs', '# Warmup epochs', and 'Weight decay' for various datasets. |