Molecule Optimization by Explainable Evolution

Authors: Binghong Chen, Tianzhe Wang, Chengtao Li, Hanjun Dai, Le Song

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our approach against various baselines on a real-world multi-property optimization task where each method is given the same number of queries to the property oracle. We show that our evolution-by-explanation algorithm is 79% better than the best baseline in terms of a generic metric combining aspects such as success rate, novelty, and diversity.
Researcher Affiliation Collaboration Binghong Chen1, Tianzhe Wang1,5, Chengtao Li2, Hanjun Dai3, Le Song4 1Georgia Institute of Technology 2Galixir 3Google Research, Brain Team 4Mohamed bin Zayed University of AI 5Shanghai Qi Zhi Institute
Pseudocode Yes Algorithm 1: Molecule Optimization by Explainable Evolution (Mol Evol) Algorithm 2: Explainφ(g)
Open Source Code Yes Source code at https://github.com/binghong-ml/Mol Evol.
Open Datasets Yes We first pretrain the graph completion model pθ(g|s) on a dataset constructed from Ch EMBL (Gaulton et al., 2017), which contains over 1.4M drug-like molecules.
Dataset Splits No The paper mentions pretraining on ChEMBL and evaluating on generated molecules but does not provide specific train/validation/test dataset splits with percentages or counts for reproducing the experiment's data partitioning.
Hardware Specification Yes Each algorithm is allowed to query f-scores no more than 5M times and to run no more than 1 day on a Ubuntu 16.04.6 LTS server with 1 Nvidia RTX 2080 Ti GPU, and 20 Intel(R) Xeon(R) E5-2678 2.50GHz CPUs.
Software Dependencies No The paper mentions using specific software components like GCN and Graph RNN but does not provide specific version numbers for any of the software dependencies used in the experiments.
Experiment Setup Yes In our experiment, Mol Evol is run for 10 rounds. Within each round, 200 rationales are added to the support set during the explainable local search stage. ... In the molecule completion stage, the parameter θ is updated with gradient descent for 1 epoch using a total number of 20000 (s, g) pairs with a minibatch size of 10 and a learning rate of 1e-3.