Improving Self-supervised Molecular Representation Learning using Persistent Homology
Authors: Yuankai Luo, Lei Shi, Veronika Thost
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We rigorously evaluate our approach for molecular property prediction and demonstrate its particular features in improving the embedding space: after SSL, the representations are better and offer considerably more predictive power than the baselines over different probing tasks; our loss increases baseline performance, sometimes largely; and we often obtain substantial improvements over very small datasets, a common scenario in practice. |
| Researcher Affiliation | Collaboration | Yuankai Luo Beihang University luoyk@buaa.edu.cn Lei Shi Beihang University leishi@buaa.edu.cn Veronika Thost MIT-IBM Watson AI Lab, IBM Research veronika.thost@ibm.com |
| Pseudocode | No | No explicit pseudocode or algorithm blocks are present in the paper. |
| Open Source Code | Yes | Our implementation is available at https://github.com/LUOyk1999/Molecular-homology. |
| Open Datasets | Yes | For pre-training, we considered the most common dataset following [Hu* et al., 2020], 2 million unlabeled molecules sampled from the ZINC15 database [Sterling and Irwin, 2015]. For downstream evaluation, we focus on the Molecule Net benchmark [Wu et al., 2018a] here, the appendix contains experiments on several other datasets. |
| Dataset Splits | Yes | Finally, scaffold-split [Ramsundar et al., 2019] is used to splits graphs into train/val/test set as 80%/10%/10% which mimics real-world use cases. |
| Hardware Specification | Yes | The experiments are conducted with two RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions using "Graph Isomorphism Network (GIN)" but does not specify software dependencies with version numbers (e.g., Python, PyTorch, or other libraries). |
| Experiment Setup | Yes | During the pre-training stage, GNNs are pre-trained for 100 epochs with batch-size as 256 and the learning rate as 0.001. During the fine-tuning stage, we train for 100 epochs with batch-size as 32, dropout rate as 0.5, and report the test performance using ROC-AUC at the best validation epoch. |