Aligning Transformers with Weisfeiler-Leman

Authors: Luis Müller, Christopher Morris

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our transformers on the large-scale PCQM4Mv2 dataset, showing competitive predictive performance with the stateof-the-art and demonstrating strong downstream performance when fine-tuning them on small-scale molecular datasets.
Researcher Affiliation Academia 1Department of Computer Science, RWTH Aachen University, Germany. Correspondence to: Luis Müller <luis.mueller@cs.rwthaachen.de>.
Pseudocode No The paper describes algorithms mathematically (e.g., in Appendix D for Transformers and Appendix E for Weisfeiler Leman algorithms) but does not provide explicit pseudocode blocks labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code Yes The source code for all experiments is available at https://github.com/luis-mueller/wl-transformers.
Open Datasets Yes For pre-training, we train on PCQM4MV2, one of the largest molecular regression datasets available (Hu et al., 2021).
Dataset Splits No The paper mentions using a validation set ('Validation MAE' in Table 2) and states, 'For model evaluation, we use the code provided by Hu et al. (2021), available at https://github.com/snap-stanford/ogb.' This implies standard splits are used, but specific percentages or sample counts for train/validation/test are not explicitly provided within the paper's text for all datasets.
Hardware Specification Yes For pre-training...on two A100 NVIDIA GPUs;...on a single A10 Nvidia GPU with 24GB RAM.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes Table 6: Hyper-parameters for (2, 1)-GT pre-training on PCQM4MV2. Parameter Value Learning rate 2e-4 Weight decay 0.1 Attention dropout 0.1 Post-attention dropout 0.1 Batch size 256 # gradient steps 2M # warmup steps 60K precision bfloat16