Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Motif-based Graph Self-Supervised Learning for Molecular Property Prediction

Authors: ZAIXI ZHANG, Qi Liu, Hao Wang, Chengqiang Lu, Chee-Kong Lee

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on various downstream benchmark tasks show that our methods outperform all state-of-the-art baselines.
Researcher Affiliation Collaboration Zaixi Zhang1, Qi Liu1 , Hao Wang1, Chengqiang Lu1, Chee-Kong Lee2 1: Anhui Province Key Lab of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China 2: Tencent America
Pseudocode Yes The pseudo codes of the training process is included in the Appendix.
Open Source Code Yes The implementation is publicly available at https://github.com/zaixizhang/MGSSL.
Open Datasets Yes we use 250k unlabeled molecules sampled from the ZINC15 database [38] for self-supervised pre-training tasks. As for the downstream finetune tasks, we consider 8 binary classification benchmark datasets contained in Molecule Net [45].
Dataset Splits Yes The split for train/validation/test sets is 80% : 10% : 10%.
Hardware Specification Yes All experiments are conducted on Tesla V100 GPUs.
Software Dependencies No The paper mentions using "the open-source package RDKit [22]" but does not specify a version number for it or any other software dependencies.
Experiment Setup Yes In the process of pre-training, GNNs are pre-trained for 100 epochs with Adam optimizer and learning rate 0.001. In the finetuning stage, we train for 100 epochs and report the testing score with the best cross-validation performance. The hidden dimension is set to 300 and the batch size is set to 32 for pre-training and finetuning.