Instructor-inspired Machine Learning for Robust Molecular Property Prediction
Authors: Fang Wu, Shuting Jin, Siyuan Li, Stan Z. Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrated the high accuracy of Instruct Mol on several real-world molecular datasets and out-of-distribution (OOD) benchmarks. We carry out a wide scope of experiments in all contexts. Section 5.1 shows the benefits of Instruct Mol in predicting molecular properties compared with various SSL learning algorithms. Section 5.2 verifies the superiority of Instruct Mol in lowering the predictive error over existing OOD generalization algorithms. |
| Researcher Affiliation | Academia | Fang Wu1 , Shuting Jin2, Siyuan Li3, Stan Z. Li3 1 Computer Science Department, Stanford University 2 School of Computer Science and Technology, Wuhan University of Science and Technology 3 School of Engineering, Westlake University |
| Pseudocode | Yes | Algorithm 1 Instruct Mol Algorithm |
| Open Source Code | No | Regarding the code for reproducing the results, we are very pleased to release it once our paper is accepted by the conference. |
| Open Datasets | Yes | We use the ZINC15 [80] database to collect unlabeled molecular data... For Molecule Net, we utilize the 1M molecules as the unlabeled dataset. |
| Dataset Splits | Yes | Datasets are divided using scaffold splitting into training, validation, and test sets with a ratio of 8:1:1. In our experiment, we follow the previous work GEM [24] and Uni-Mol [76] and adopt the scaffold splitting to divide different datasets into training, validation, and test sets with a ratio of 80%, 10%, and 10%. |
| Hardware Specification | Yes | In our experiments for molecular property prediction, we utilize 4 A100 GPUs and an Adam Optimizer [84]... As mentioned in the Appendix, all experiments were implemented on 4 A100 GPUs with a memory of 80G. |
| Software Dependencies | No | In our experiments for molecular property prediction, we utilize 4 A100 GPUs and an Adam Optimizer [84]... Those unlabeled SMILES are then converted by RDKit [82] into 2D graphs. Semi-GAN is modified from https://github.com/opetrova/Semi Supervised Pytorch GAN. π-model is transformed from a simple Tensorflow-based version at https://github.com/geosada/PI. UPS is directly modified from its official Git Hub at https://github.com/nayeemrizve/ups. (No specific version numbers are provided for Adam Optimizer, RDKit, PyTorch, or Tensorflow.) |
| Experiment Setup | Yes | In our experiments for molecular property prediction, we utilize 4 A100 GPUs and an Adam Optimizer [84] with a weight decay of 1e-16 for all GNN models... A Reduce LROn Plateau scheduler is employed to automatically adjust the learning rate with a patience of 10 epochs. Before the SSL stage, we first pretrain the target molecular model via supervised learning for 100 epochs and then pretrain the instructor model for 50 epochs, where an early stopping mechanism is utilized with a patience of 5 epochs. Table 4: Hyperparameters setup for Instruct Mol in molecular property prediction. (The table lists specific values for various hyperparameters.) |