S-MolSearch: 3D Semi-supervised Contrastive Learning for Bioactive Molecule Search

Authors: Gengmo Zhou, Zhen Wang, Feng Yu, Guolin Ke, Zhewei Wei, Zhifeng Gao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, S-Mol Search demonstrates superior performance on widely-used benchmarks LITPCBA and DUD-E. It surpasses both structure-based and ligand-based virtual screening methods for AUROC, BEDROC and EF.
Researcher Affiliation Collaboration Gengmo Zhou1,2 , Zhen Wang2 , Feng Yu2, Guolin Ke2 , Zhewei Wei1 , Zhifeng Gao2 1Renmin University of China 2DP Technology {zgm2015, zhewei}@ruc.edu.cn, {wangz, yufeng, kegl, gaozf}@dp.tech
Pseudocode No The paper describes the methodology using textual descriptions and mathematical equations (e.g., Equation 1, 2, 3, 5, 6, 8, 9, 10, 12, 13), but no explicit pseudocode or algorithm blocks are provided.
Open Source Code No The code, model, and data are made publicly available upon acceptance.
Open Datasets Yes The labeled data comes from Ch EMBL [28], an open-access database containing extensive information on bioactive compounds with drug-like properties. We choose the widely used virtual screening benchmarks DUD-E [22] and LIT-PCBA [13] to evaluate the performance of S-Mol Search.
Dataset Splits Yes For data splitting in both settings, we randomly selected 70% of the active molecules from each target in DUD-E as the training set for few-shot learning, while the remaining 30% and all inactive molecules serve as test data. We randomly sample data from the labeled and unlabeled datasets to form the validation set, and select checkpoints based on the loss.
Hardware Specification Yes The batch size is 128, and the training is conducted on 4 NVIDIA V100 32G GPUs.
Software Dependencies No For the training of S-Mol Search , we use Adam optimizer at a learning rate of 0.001. The batch size is 128, and the training is conducted on 4 NVIDIA V100 32G GPUs. For backbone model, we use the same parameters as Uni-Mol.
Experiment Setup Yes For the training of S-Mol Search , we use Adam optimizer at a learning rate of 0.001. The batch size is 128, and the training is conducted on 4 NVIDIA V100 32G GPUs. For backbone model, we use the same parameters as Uni-Mol. To enhance the training, we retain the masking and coordinate noise addition for atoms within molecules, as implemented in Uni-Mol. The parameters are identical to those in Uni-Mol, with a masking ratio of 0.15 and noise following a uniform distribution between -1 and 1 Å.