Towards Principled Graph Transformers
Authors: Luis Müller, Daniel Kusuma, Blai Bonet, Christopher Morris
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that the Edge Transformer surpasses other theoretically aligned architectures regarding predictive performance and is competitive with state-of-theart models on algorithmic reasoning and molecular regression tasks while not relying on positional or structural encodings. Our code is available at https: //github.com/luis-mueller/towards-principled-gts. |
| Researcher Affiliation | Academia | Luis Müller RWTH Aachen University luis.mueller@cs.rwth-aachen.de Daniel Kusuma RWTH Aachen University Blai Bonet Universitat Pompeu Fabra Christopher Morris RWTH Aachen University |
| Pseudocode | Yes | Algorithm 1 Comparison between standard attention and triangular attention in PYTORCH-like pseudo-code. |
| Open Source Code | Yes | Our code is available at https: //github.com/luis-mueller/towards-principled-gts. |
| Open Datasets | Yes | ZINC (12K), ALCHEMY (12K) and ZINC-FULL are available at https://pyg.org under an MIT license. PCQM4MV2 is available at https://ogb.stanford.edu/docs/lsc/pcqm4mv2/ under a CC BY 4.0 license. The CLRS benchmark is available at https://github.com/ google-deepmind/clrs under an Apache 2.0 license. The BREC benchmark is available at https://github.com/Graph PKU/BREC under an MIT license. |
| Dataset Splits | Yes | For ZINC (12K), ZINC-FULL, PCQM4MV2, CLRS, and BREC, we follow the standard train/validation/test splits. |
| Hardware Specification | Yes | All experiments were performed on a mix of A10, L40, and A100 NVIDIA GPUs. For each run, we used at most 8 CPU cores and 64 GB of RAM, with the exception of PCQM4MV2 and ZINC-FULL, which were trained on 4 L40 GPUs with 16 CPU cores and 256 GB RAM. |
| Software Dependencies | No | The paper mentions software like 'PyTorch', 'Jax', and 'Triton [45]' but does not provide specific version numbers for these components, which are necessary for a reproducible description of ancillary software. |
| Experiment Setup | Yes | Table 6: Hyperparameters of the Edge Transformer across all datasets. This table provides detailed hyperparameters including Learning rate, Grad. clip norm, Batch size, Optimizer, Num. layers, Hidden dim., Num. heads, Activation, Pooling, RRWP dim., Weight decay, Dropout, Attention dropout, # Steps, # Warm-up steps, # Epochs, # Warm-up epochs, # RRWP steps for various datasets. |