Retrieval-based Controllable Molecule Generation

Authors: Zichao Wang, Weili Nie, Zhuoran Qiao, Chaowei Xiao, Richard Baraniuk, Anima Anandkumar

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On various tasks ranging from simple design criteria to a challenging real-world scenario for designing lead compounds that bind to the SARS-Co V-2 main protease, we demonstrate our approach extrapolates well beyond the retrieval database, and achieves better performance and wider applicability than previous methods.
Researcher Affiliation Collaboration Zichao Wang Rice University jzwang@rice.edu Weili Nie NVIDIA wnie@nvidia.com Zhuoran Qiao Caltech zqiao@caltech.edu Chaowei Xiao NVIDIA, ASU chaoweix@nvidia.com Richard G. Baraniuk Rice University richb@rice.edu Anima Anandkumar NVIDIA, Caltech aanandkumar@nvidia.edu
Pseudocode Yes Algorithm 1: Exemplar molecule retriever
Open Source Code Yes The source code is available at https://github.com/NVlabs/Ret Mol.
Open Datasets Yes The training dataset uses either ZINC250k (Irwin and Shoichet, 2004) (for the experiments in Section 3.1 or Che MBL (Gaulton et al., 2016).
Dataset Splits Yes For the ZINC250k dataset, we follow the train/validation/test splits in (Jin et al., 2019) and train on the train split.
Hardware Specification Yes Training is distributed over four V100 NVIDIA GPUs, each with 16GB memory... Inference uses a single V100 NVIDIA GPU with 16 GB memory... NNVIDIA Quadro RTX 8000.
Software Dependencies No The paper mentions software components like Megatron, Deep Speed, Apex, RDKit, Autodock-GPU, and Autodock software suite, but it does not provide specific version numbers for these dependencies.
Experiment Setup Yes Training is distributed over four V100 NVIDIA GPUs, each with 16GB memory, with a batch size of 256 samples on each GPU, for 50k iterations. The total training time is approximately 2 hours... for each input molecule, we set the maximum number of iterations to 1000 and sample 50 molecules at each iteration... we run the optimization for 80 iterations and sample 100 molecules at each iteration...