DRew: Dynamically Rewired Message Passing with Delay
Authors: Benjamin Gutteridge, Xiaowen Dong, Michael M. Bronstein, Francesco Di Giovanni
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we focus on two strengths of our model. First, we validate performance in comparison with benchmark models, including vanilla and multi-hop MPNNs and graph Transformers, over five real-world tasks spanning graph-, node- and edge-level tasks. Second, we validate the robustness of \nuDRew for long-range-dependent tasks and increased-depth architectures, using a synthetic task and a real-world molecular dataset. |
| Researcher Affiliation | Academia | 1Department of Engineering Science, University of Oxford 2Department of Computer Science, University of Oxford 3Department of Computer Science and Technology, University of Cambridge 4Faculty of Informatics, University of Lugano. |
| Pseudocode | No | The paper describes methods using equations and prose, but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | Yes | All code to reproduce experimental results is available at https://github.com/ Ben Gutteridge/DRew. |
| Open Datasets | Yes | The Long Range Graph Benchmark (LRGB; Dwivedi et al. (2022)) is a set of GNN benchmarks involving long-range interactions. We provide experiments for three datasets from this benchmark (two molecular property prediction, one image segmentation) spanning the full range of tasks associated with GNNs: graph regression (Peptides-func), graph classification (Peptides-struct), link prediction (PCQM-Contact) and node classification (Pascal VOC-SP)." and "QM9 (Ramakrishnan et al., 2014) is a molecular multi-task graph regression benchmark dataset of 130,000 graphs with 18 nodes each and a maximum graph diameter of 10. |
| Dataset Splits | Yes | We use an 80:10:10 split for train, test and validation. |
| Hardware Specification | Yes | All experiments were run on server nodes using a single GPU. A mixture of P100, V100, A100, Titan V and RTX GPUs were used, as well as a mixture of Broadwell, Haswell and Cacade Lake CPUs. |
| Software Dependencies | No | The paper mentions using the Adam W optimizer and 'torch geometric.transforms' but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | All experiments are averaged over three runs and were allowed to train for 300 epochs or until convergence. Classical MPNN and graph Transformer results are reproduced from Dwivedi et al. (2022), except Graph GPS which is reproduced from Ramp aˇsek et al. (2022). DRew-MPNN, DIGL and Mix Hop GCN models were trained using similar hyperparameterisations to their classical MPNN counterparts (see Appendix B. Some models include positional encoding (PE), either Laplacian (Lap PE; Dwivedi et al. (2020)) or Random Walk (RWSE; Dwivedi et al. (2021)), as this improves performance and is necessary to induce a notion of locality in Transformers. We provide the performance of the best-case \nuDRew model with respect to \nu {1, } and network depth L for both the PE and non-PE cases. Hyperparameters and other experimental details are available in Appendix B. As in Dwivedi et al. (2022), we use a fixed 500k parameter budget. |