Drug Discovery with Dynamic Goal-aware Fragments
Authors: Seul Lee, Seanie Lee, Kenji Kawaguchi, Sung Ju Hwang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate that GEAM effectively discovers drug candidates through the generative cycle of the three modules in various drug discovery tasks. Our code is available at https://github.com/ Seul Lee05/GEAM. The experimental results show that GEAM significantly outperforms existing state-of-the-art methods, demonstrating its effectiveness in addressing real-world drug discovery problems. |
| Researcher Affiliation | Collaboration | 1KAIST 2National University of Singapore 3Deep Auto.ai. |
| Pseudocode | Yes | The single generation cycle of GEAM is described in Algorithm 1 in Sec. A. |
| Open Source Code | Yes | Our code is available at https://github.com/ Seul Lee05/GEAM. |
| Open Datasets | Yes | We used ZINC250k (Irwin et al., 2012) to train FGIB to predict Y and extract initial fragments. |
| Dataset Splits | Yes | Following Yang et al. (2021), Lee et al. (2023b) and Gao et al. (2022), we used the ZINC250k (Irwin et al., 2012) dataset with the same train/test split used by Kusner et al. (2017) in all the experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or specific cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions software like RDKit (Landrum et al., 2016) and Quick Vina 2 (Alhossary et al., 2015) but does not provide specific version numbers for these or any other software components, which is required for reproducibility. |
| Experiment Setup | Yes | Regarding the architecture of FGIB, we set the number of message passing in the MPNN to 3 and the number of layers of the MLP to 2. FGIB was trained to 10 epochs in each of the task with a learning rate of 1e 3 and β of 1e 5. The initial vocabulary size was set to K = 300. Regarding the dynamic vocabulary update, the maximum vocabulary update in a single cycle was set to 50 and the maximum vocabulary size was set to L = 1,000. We set the termination number of atoms in the SAC to n SAC = 40, so that an episode ends when the size of the current molecule exceeds 40. The population size of the GA was set to P = 100 and the mutation rate was set to 0.1. The minimum number of atoms of generated molecules was set to 15. |