Exploring Chemical Space with Score-based Out-of-distribution Generation
Authors: Seul Lee, Jaehyeong Jo, Sung Ju Hwang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool. |
| Researcher Affiliation | Academia | 1KAIST, Seoul, South Korea. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https: //github.com/Seul Lee05/MOOD. |
| Open Datasets | Yes | We measure the OOD-ness of the generated molecules with respect to the training dataset, ZINC250k (Irwin et al., 2012), using the following metrics. We additionally conduct the novel molecule generation task on the QM9 (Ramakrishnan et al., 2014) dataset. |
| Dataset Splits | No | The paper mentions "train/test split" for the ZINC250k dataset, but does not explicitly detail a separate "validation" split for its own model training or evaluation. The tuning of λ is described as being in the sampling phase, not the training phase. |
| Hardware Specification | Yes | We conduct all the experiments on TITAN RTX, Ge Force RTX 2080 Ti, or Ge Force RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions using "Quick Vina 2" and the "RDKit (Landrum et al., 2016) library" but does not specify exact version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We set the number of linear layers as 20 with residual paths, the hidden dimension as 512, the type of SDEs as VPSDE with βmin = 0.01 and βmax = 0.05, the number of training epochs as 5000, the batch size as 2048, and use an Adam optimizer (Kingma & Ba, 2014). Regarding the hyperparameters of the property prediction network, we set the number of the GNN operations L as 3 with the hidden dimension of 16. The number of linear layers in MLPs and MLPt are both 1, and the number of linear layers in the final MLPs is 2. |