Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improving Self-supervised Molecular Representation Learning using Persistent Homology
Authors: Yuankai Luo, Lei Shi, Veronika Thost
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We rigorously evaluate our approach for molecular property prediction and demonstrate its particular features in improving the embedding space: after SSL, the representations are better and offer considerably more predictive power than the baselines over different probing tasks; our loss increases baseline performance, sometimes largely; and we often obtain substantial improvements over very small datasets, a common scenario in practice. |
| Researcher Affiliation | Collaboration | Yuankai Luo Beihang University EMAIL Lei Shi Beihang University EMAIL Veronika Thost MIT-IBM Watson AI Lab, IBM Research EMAIL |
| Pseudocode | No | No explicit pseudocode or algorithm blocks are present in the paper. |
| Open Source Code | Yes | Our implementation is available at https://github.com/LUOyk1999/Molecular-homology. |
| Open Datasets | Yes | For pre-training, we considered the most common dataset following [Hu* et al., 2020], 2 million unlabeled molecules sampled from the ZINC15 database [Sterling and Irwin, 2015]. For downstream evaluation, we focus on the Molecule Net benchmark [Wu et al., 2018a] here, the appendix contains experiments on several other datasets. |
| Dataset Splits | Yes | Finally, scaffold-split [Ramsundar et al., 2019] is used to splits graphs into train/val/test set as 80%/10%/10% which mimics real-world use cases. |
| Hardware Specification | Yes | The experiments are conducted with two RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions using "Graph Isomorphism Network (GIN)" but does not specify software dependencies with version numbers (e.g., Python, PyTorch, or other libraries). |
| Experiment Setup | Yes | During the pre-training stage, GNNs are pre-trained for 100 epochs with batch-size as 256 and the learning rate as 0.001. During the fine-tuning stage, we train for 100 epochs with batch-size as 32, dropout rate as 0.5, and report the test performance using ROC-AUC at the best validation epoch. |