Retrieval-based Controllable Molecule Generation
Authors: Zichao Wang, Weili Nie, Zhuoran Qiao, Chaowei Xiao, Richard Baraniuk, Anima Anandkumar
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On various tasks ranging from simple design criteria to a challenging real-world scenario for designing lead compounds that bind to the SARS-Co V-2 main protease, we demonstrate our approach extrapolates well beyond the retrieval database, and achieves better performance and wider applicability than previous methods. |
| Researcher Affiliation | Collaboration | Zichao Wang Rice University jzwang@rice.edu Weili Nie NVIDIA wnie@nvidia.com Zhuoran Qiao Caltech zqiao@caltech.edu Chaowei Xiao NVIDIA, ASU chaoweix@nvidia.com Richard G. Baraniuk Rice University richb@rice.edu Anima Anandkumar NVIDIA, Caltech aanandkumar@nvidia.edu |
| Pseudocode | Yes | Algorithm 1: Exemplar molecule retriever |
| Open Source Code | Yes | The source code is available at https://github.com/NVlabs/Ret Mol. |
| Open Datasets | Yes | The training dataset uses either ZINC250k (Irwin and Shoichet, 2004) (for the experiments in Section 3.1 or Che MBL (Gaulton et al., 2016). |
| Dataset Splits | Yes | For the ZINC250k dataset, we follow the train/validation/test splits in (Jin et al., 2019) and train on the train split. |
| Hardware Specification | Yes | Training is distributed over four V100 NVIDIA GPUs, each with 16GB memory... Inference uses a single V100 NVIDIA GPU with 16 GB memory... NNVIDIA Quadro RTX 8000. |
| Software Dependencies | No | The paper mentions software components like Megatron, Deep Speed, Apex, RDKit, Autodock-GPU, and Autodock software suite, but it does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Training is distributed over four V100 NVIDIA GPUs, each with 16GB memory, with a batch size of 256 samples on each GPU, for 50k iterations. The total training time is approximately 2 hours... for each input molecule, we set the maximum number of iterations to 1000 and sample 50 molecules at each iteration... we run the optimization for 80 iterations and sample 100 molecules at each iteration... |