Towards Principled Graph Transformers

Authors: Luis Müller, Daniel Kusuma, Blai Bonet, Christopher Morris

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that the Edge Transformer surpasses other theoretically aligned architectures regarding predictive performance and is competitive with state-of-theart models on algorithmic reasoning and molecular regression tasks while not relying on positional or structural encodings. Our code is available at https: //github.com/luis-mueller/towards-principled-gts.
Researcher Affiliation Academia Luis Müller RWTH Aachen University luis.mueller@cs.rwth-aachen.de Daniel Kusuma RWTH Aachen University Blai Bonet Universitat Pompeu Fabra Christopher Morris RWTH Aachen University
Pseudocode Yes Algorithm 1 Comparison between standard attention and triangular attention in PYTORCH-like pseudo-code.
Open Source Code Yes Our code is available at https: //github.com/luis-mueller/towards-principled-gts.
Open Datasets Yes ZINC (12K), ALCHEMY (12K) and ZINC-FULL are available at https://pyg.org under an MIT license. PCQM4MV2 is available at https://ogb.stanford.edu/docs/lsc/pcqm4mv2/ under a CC BY 4.0 license. The CLRS benchmark is available at https://github.com/ google-deepmind/clrs under an Apache 2.0 license. The BREC benchmark is available at https://github.com/Graph PKU/BREC under an MIT license.
Dataset Splits Yes For ZINC (12K), ZINC-FULL, PCQM4MV2, CLRS, and BREC, we follow the standard train/validation/test splits.
Hardware Specification Yes All experiments were performed on a mix of A10, L40, and A100 NVIDIA GPUs. For each run, we used at most 8 CPU cores and 64 GB of RAM, with the exception of PCQM4MV2 and ZINC-FULL, which were trained on 4 L40 GPUs with 16 CPU cores and 256 GB RAM.
Software Dependencies No The paper mentions software like 'PyTorch', 'Jax', and 'Triton [45]' but does not provide specific version numbers for these components, which are necessary for a reproducible description of ancillary software.
Experiment Setup Yes Table 6: Hyperparameters of the Edge Transformer across all datasets. This table provides detailed hyperparameters including Learning rate, Grad. clip norm, Batch size, Optimizer, Num. layers, Hidden dim., Num. heads, Activation, Pooling, RRWP dim., Weight decay, Dropout, Attention dropout, # Steps, # Warm-up steps, # Epochs, # Warm-up epochs, # RRWP steps for various datasets.