Reinforced Genetic Algorithm for Structure-based Drug Design
Authors: Tianfan Fu, Wenhao Gao, Connor Coley, Jimeng Sun
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct thorough empirical studies on optimizing binding affinity to various disease targets and show that RGA outperforms the baselines in terms of docking scores and is more robust to random initializations. |
| Researcher Affiliation | Academia | Tianfan Fu1 , Wenhao Gao2 , Connor W. Coley2,3, Jimeng Sun4,5, 1Department of Computational Science and Engineering, Georgia Institute of Technology, 2Department of Chemical Engineering, Massachusetts Institute of Technology, 3Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 4 Department of Computer Science, University of Illinois at Urbana-Champaign, 5 Carle Illinois College of Medicine, University of Illinois at Urbana-Champaign |
| Pseudocode | No | The paper describes the RGA process and provides a pipeline illustration in Figure 1, but it does not include pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | The code is available at https://github.com/futianfan/reinforced-genetic-algorithm. |
| Open Datasets | Yes | Dataset: we randomly select molecules from ZINC [56] database (around 250 thousands drug-like molecules) as 0-th generation of the genetic algorithms (RGA, Autogrow 4.0, GA+D). ZINC also serves as the training data for pretraining the model in JTVAE, REINVENT, Rationale RL, etc. We adopt Cross Docked2020 [57] dataset that contains around 22 million ligand-protein complexes as the training data for pretraining the policy neural networks, as mentioned in Section 3.3. More descriptions are available in Appendix. |
| Dataset Splits | No | The paper mentions using training data for pretraining and notes that "data splits" are covered in the supplementary materials, but it does not provide explicit train/validation/test dataset splits in the main text. |
| Hardware Specification | No | The paper mentions that the "total amount of compute and the type of resources used" are detailed in the supplementary materials, but no specific hardware specifications (e.g., GPU/CPU models) are provided in the main text. |
| Software Dependencies | No | The paper mentions using "Auto Dock Vina [52]" and refers to a "practical molecular optimization benchmark [15]" with associated software, but it does not provide specific version numbers for any software dependencies. |
| Experiment Setup | No | The paper states that "implementation details, dataset description & processing, hyperparameter tuning" are included in the Appendix, indicating that these specific experimental setup details are not in the main text. |