Exploring Chemical Space with Score-based Out-of-distribution Generation

Authors: Seul Lee, Jaehyeong Jo, Sung Ju Hwang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
Researcher Affiliation Academia 1KAIST, Seoul, South Korea.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https: //github.com/Seul Lee05/MOOD.
Open Datasets Yes We measure the OOD-ness of the generated molecules with respect to the training dataset, ZINC250k (Irwin et al., 2012), using the following metrics. We additionally conduct the novel molecule generation task on the QM9 (Ramakrishnan et al., 2014) dataset.
Dataset Splits No The paper mentions "train/test split" for the ZINC250k dataset, but does not explicitly detail a separate "validation" split for its own model training or evaluation. The tuning of λ is described as being in the sampling phase, not the training phase.
Hardware Specification Yes We conduct all the experiments on TITAN RTX, Ge Force RTX 2080 Ti, or Ge Force RTX 3090 GPUs.
Software Dependencies No The paper mentions using "Quick Vina 2" and the "RDKit (Landrum et al., 2016) library" but does not specify exact version numbers for these or other software dependencies.
Experiment Setup Yes We set the number of linear layers as 20 with residual paths, the hidden dimension as 512, the type of SDEs as VPSDE with βmin = 0.01 and βmax = 0.05, the number of training epochs as 5000, the batch size as 2048, and use an Adam optimizer (Kingma & Ba, 2014). Regarding the hyperparameters of the property prediction network, we set the number of the GNN operations L as 3 with the hidden dimension of 16. The number of linear layers in MLPs and MLPt are both 1, and the number of linear layers in the final MLPs is 2.