Drug Discovery with Dynamic Goal-aware Fragments

Authors: Seul Lee, Seanie Lee, Kenji Kawaguchi, Sung Ju Hwang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally demonstrate that GEAM effectively discovers drug candidates through the generative cycle of the three modules in various drug discovery tasks. Our code is available at https://github.com/ Seul Lee05/GEAM. The experimental results show that GEAM significantly outperforms existing state-of-the-art methods, demonstrating its effectiveness in addressing real-world drug discovery problems.
Researcher Affiliation Collaboration 1KAIST 2National University of Singapore 3Deep Auto.ai.
Pseudocode Yes The single generation cycle of GEAM is described in Algorithm 1 in Sec. A.
Open Source Code Yes Our code is available at https://github.com/ Seul Lee05/GEAM.
Open Datasets Yes We used ZINC250k (Irwin et al., 2012) to train FGIB to predict Y and extract initial fragments.
Dataset Splits Yes Following Yang et al. (2021), Lee et al. (2023b) and Gao et al. (2022), we used the ZINC250k (Irwin et al., 2012) dataset with the same train/test split used by Kusner et al. (2017) in all the experiments.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or specific cloud instance types used for running the experiments.
Software Dependencies No The paper mentions software like RDKit (Landrum et al., 2016) and Quick Vina 2 (Alhossary et al., 2015) but does not provide specific version numbers for these or any other software components, which is required for reproducibility.
Experiment Setup Yes Regarding the architecture of FGIB, we set the number of message passing in the MPNN to 3 and the number of layers of the MLP to 2. FGIB was trained to 10 epochs in each of the task with a learning rate of 1e 3 and β of 1e 5. The initial vocabulary size was set to K = 300. Regarding the dynamic vocabulary update, the maximum vocabulary update in a single cycle was set to 50 and the maximum vocabulary size was set to L = 1,000. We set the termination number of atoms in the SAC to n SAC = 40, so that an episode ends when the size of the current molecule exceeds 40. The population size of the GA was set to P = 100 and the mutation rate was set to 0.1. The minimum number of atoms of generated molecules was set to 15.