Graph Positional and Structural Encoder
Authors: Semih Cantürk, Renming Liu, Olivier Lapointe-Gagné, Vincent Létourneau, Guy Wolf, Dominique Beaini, Ladislav Rampášek
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that across a wide range of benchmarks, GPSE-enhanced models can significantly outperform those that employ explicitly computed PSEs, and at least match their performance in others. Our results pave the way for the development of foundational pre-trained graph encoders for extracting positional and structural information, and highlight their potential as a more powerful and efficient alternative to explicitly computed PSEs and existing self-supervised pre-training approaches. |
| Researcher Affiliation | Collaboration | 1DIRO, Université de Montréal, Montréal, Canada 2Mila Quebec AI Institute, Montréal, Canada 3Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, United States 4Valence Labs, Montréal, Canada 5Isomorphic Labs, London, UK. |
| Pseudocode | No | The paper provides mathematical formulations of its architecture and loss functions, and describes its components verbally and diagrammatically (Figure 1), but it does not include formal pseudocode blocks or algorithms. |
| Open Source Code | Yes | Our framework and pre-trained models are publicly available1. For convenience, GPSE has also been integrated into the Py G library to facilitate downstream applications. 1https://github.com/G-Taxonomy-Workgroup/GPSE |
| Open Datasets | Yes | Training dataset PCQM4Mv2 (Hu et al., 2021) is a typical choice of pre-training dataset for molecular tasks. ... we train GPSE with Mol PCBA (Hu et al., 2020a) with 323,555 unique molecular graphs and an average number of 25 nodes. ... Mol HIV & Mol PCBA (Hu et al., 2020a) (MIT License) are molecular property prediction datasets derived from the Molecule Net benchmarks (Wu et al., 2018). |
| Dataset Splits | Yes | We randomly select 5% validation and 5% testing data fixed across runs, and use the remaining data for training GPSE. |
| Hardware Specification | Yes | All experiments are run using Tesla V100 GPUs (32GB), with varying numbers of CPUs from 4 to 8 and up to 48GB of memory (except for two cases: (i) 80GB of memory is needed when performing downstream evaluation on Mol PCBA, and (ii) 128GB is needed when pre-training GPSE on the Ch EMBL dataset). |
| Software Dependencies | No | Our codebase is based on Graph GPS (Rampášek et al., 2022), which uses Py G and its Graph Gym module (Fey & Lenssen, 2019; You et al., 2020a). The paper mentions these software packages but does not specify their version numbers. |
| Experiment Setup | Yes | Details about hyperparameters can be found in Table B.4. ... For completeness, we list all hyperparameters for our main benchmarking studies in Tables B.1 and B.2. ... Table B.1. GPS+GPSE hyperparameters for molecular property prediction benchmarks ... Table B.2. GPS+GPSE hyperparameters for transferability benchmarks ... Table B.3. Downstream MPNN hyperparameters for node-level benchmarks. ... Table B.4. GPSE processing encoder hyperparameters for Molecule Net small benchmarks and synthetic WL graph benchmarks. |