Guiding Deep Molecular Optimization with Genetic Exploration

Authors: Sungsoo Ahn, Junsu Kim, Hankook Lee, Jinwoo Shin

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that GEGL significantly improves over state-of-the-art methods. We extensively evaluate our method on four experiments: (a) optimization of penalized octanol-water partition coefficient, (b) optimization of penalized octanol-water partition coefficient under similarity constraints, (c) the GuacaMol benchmark [31] consisting of 20 de novo molecular design tasks, and (d) the GuacaMol benchmark evaluated under post-hoc filtering procedure [32].
Researcher Affiliation Academia Sungsoo Ahn Junsu Kim Hankook Lee Jinwoo Shin Korea Advanced Institute of Science and Technology (KAIST) {sungsoo.ahn, junsu.kim, hankook.lee, jinwoos}@kaist.ac.kr
Pseudocode Yes We provide an illustration and a detailed description of our algorithm in Figure 1 and Algorithm 1, respectively.
Open Source Code Yes Our training code is available at https://github.com/ sungsoo-ahn/genetic-expert-guided-learning.
Open Datasets Yes Following prior works [12, 9], we pretrain the apprentice policy on the ZINC dataset [57]. For the experiments, we initialize the apprentice policy using the weights provided by Brown et al. [31],4 that was pretrained on the ChEMBL [62] dataset.
Dataset Splits No The paper mentions datasets used and refers to evaluations on the GuacaMol benchmark, but it does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and testing needed to reproduce the data partitioning for its own experiments.
Hardware Specification Yes All of the experiments were processed using single GPU (NVIDIA RTX 2080Ti) and eight instances from a virtual CPU system (Intel Xeon E5-2630 v4).
Software Dependencies No The paper mentions the use of Adam optimizer and LSTM networks but does not provide specific ancillary software details, such as library names with version numbers (e.g., Python 3.8, PyTorch 1.9, etc.), needed to replicate the experiment.
Experiment Setup Yes To implement GEGL, we use priority queue of fixed size K = 1024. At each step, we sample 8192 molecules from the apprentice and the expert policy to update the respective priority queues. Adam optimizer [52] with learning rate of 0.001 was used to optimize the neural network with a mini-batch of size 256. Gradients were clipped by a norm of 1.0. The apprentice policy is constructed using three-layered LSTM associated with hidden state of 1024 dimensions and dropout probability of 0.2.