Improving Self-supervised Molecular Representation Learning using Persistent Homology

Authors: Yuankai Luo, Lei Shi, Veronika Thost

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We rigorously evaluate our approach for molecular property prediction and demonstrate its particular features in improving the embedding space: after SSL, the representations are better and offer considerably more predictive power than the baselines over different probing tasks; our loss increases baseline performance, sometimes largely; and we often obtain substantial improvements over very small datasets, a common scenario in practice.
Researcher Affiliation Collaboration Yuankai Luo Beihang University luoyk@buaa.edu.cn Lei Shi Beihang University leishi@buaa.edu.cn Veronika Thost MIT-IBM Watson AI Lab, IBM Research veronika.thost@ibm.com
Pseudocode No No explicit pseudocode or algorithm blocks are present in the paper.
Open Source Code Yes Our implementation is available at https://github.com/LUOyk1999/Molecular-homology.
Open Datasets Yes For pre-training, we considered the most common dataset following [Hu* et al., 2020], 2 million unlabeled molecules sampled from the ZINC15 database [Sterling and Irwin, 2015]. For downstream evaluation, we focus on the Molecule Net benchmark [Wu et al., 2018a] here, the appendix contains experiments on several other datasets.
Dataset Splits Yes Finally, scaffold-split [Ramsundar et al., 2019] is used to splits graphs into train/val/test set as 80%/10%/10% which mimics real-world use cases.
Hardware Specification Yes The experiments are conducted with two RTX 3090 GPUs.
Software Dependencies No The paper mentions using "Graph Isomorphism Network (GIN)" but does not specify software dependencies with version numbers (e.g., Python, PyTorch, or other libraries).
Experiment Setup Yes During the pre-training stage, GNNs are pre-trained for 100 epochs with batch-size as 256 and the learning rate as 0.001. During the fine-tuning stage, we train for 100 epochs with batch-size as 32, dropout rate as 0.5, and report the test performance using ROC-AUC at the best validation epoch.