Instructor-inspired Machine Learning for Robust Molecular Property Prediction

Authors: Fang Wu, Shuting Jin, Siyuan Li, Stan Z. Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrated the high accuracy of Instruct Mol on several real-world molecular datasets and out-of-distribution (OOD) benchmarks. We carry out a wide scope of experiments in all contexts. Section 5.1 shows the benefits of Instruct Mol in predicting molecular properties compared with various SSL learning algorithms. Section 5.2 verifies the superiority of Instruct Mol in lowering the predictive error over existing OOD generalization algorithms.
Researcher Affiliation Academia Fang Wu1 , Shuting Jin2, Siyuan Li3, Stan Z. Li3 1 Computer Science Department, Stanford University 2 School of Computer Science and Technology, Wuhan University of Science and Technology 3 School of Engineering, Westlake University
Pseudocode Yes Algorithm 1 Instruct Mol Algorithm
Open Source Code No Regarding the code for reproducing the results, we are very pleased to release it once our paper is accepted by the conference.
Open Datasets Yes We use the ZINC15 [80] database to collect unlabeled molecular data... For Molecule Net, we utilize the 1M molecules as the unlabeled dataset.
Dataset Splits Yes Datasets are divided using scaffold splitting into training, validation, and test sets with a ratio of 8:1:1. In our experiment, we follow the previous work GEM [24] and Uni-Mol [76] and adopt the scaffold splitting to divide different datasets into training, validation, and test sets with a ratio of 80%, 10%, and 10%.
Hardware Specification Yes In our experiments for molecular property prediction, we utilize 4 A100 GPUs and an Adam Optimizer [84]... As mentioned in the Appendix, all experiments were implemented on 4 A100 GPUs with a memory of 80G.
Software Dependencies No In our experiments for molecular property prediction, we utilize 4 A100 GPUs and an Adam Optimizer [84]... Those unlabeled SMILES are then converted by RDKit [82] into 2D graphs. Semi-GAN is modified from https://github.com/opetrova/Semi Supervised Pytorch GAN. π-model is transformed from a simple Tensorflow-based version at https://github.com/geosada/PI. UPS is directly modified from its official Git Hub at https://github.com/nayeemrizve/ups. (No specific version numbers are provided for Adam Optimizer, RDKit, PyTorch, or Tensorflow.)
Experiment Setup Yes In our experiments for molecular property prediction, we utilize 4 A100 GPUs and an Adam Optimizer [84] with a weight decay of 1e-16 for all GNN models... A Reduce LROn Plateau scheduler is employed to automatically adjust the learning rate with a patience of 10 epochs. Before the SSL stage, we first pretrain the target molecular model via supervised learning for 100 epochs and then pretrain the instructor model for 50 epochs, where an early stopping mechanism is utilized with a patience of 5 epochs. Table 4: Hyperparameters setup for Instruct Mol in molecular property prediction. (The table lists specific values for various hyperparameters.)