Aligning Transformers with Weisfeiler-Leman
Authors: Luis Müller, Christopher Morris
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our transformers on the large-scale PCQM4Mv2 dataset, showing competitive predictive performance with the stateof-the-art and demonstrating strong downstream performance when fine-tuning them on small-scale molecular datasets. |
| Researcher Affiliation | Academia | 1Department of Computer Science, RWTH Aachen University, Germany. Correspondence to: Luis Müller <luis.mueller@cs.rwthaachen.de>. |
| Pseudocode | No | The paper describes algorithms mathematically (e.g., in Appendix D for Transformers and Appendix E for Weisfeiler Leman algorithms) but does not provide explicit pseudocode blocks labeled as 'Algorithm' or 'Pseudocode'. |
| Open Source Code | Yes | The source code for all experiments is available at https://github.com/luis-mueller/wl-transformers. |
| Open Datasets | Yes | For pre-training, we train on PCQM4MV2, one of the largest molecular regression datasets available (Hu et al., 2021). |
| Dataset Splits | No | The paper mentions using a validation set ('Validation MAE' in Table 2) and states, 'For model evaluation, we use the code provided by Hu et al. (2021), available at https://github.com/snap-stanford/ogb.' This implies standard splits are used, but specific percentages or sample counts for train/validation/test are not explicitly provided within the paper's text for all datasets. |
| Hardware Specification | Yes | For pre-training...on two A100 NVIDIA GPUs;...on a single A10 Nvidia GPU with 24GB RAM. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Table 6: Hyper-parameters for (2, 1)-GT pre-training on PCQM4MV2. Parameter Value Learning rate 2e-4 Weight decay 0.1 Attention dropout 0.1 Post-attention dropout 0.1 Batch size 256 # gradient steps 2M # warmup steps 60K precision bfloat16 |