Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Generation
Authors: Soojung Yang, Doyeong Hwang, Seul Lee, Seongok Ryu, Sung Ju Hwang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model produces molecules of higher quality compared to existing methods while achieving state-of-the-art performance on two of three targets in terms of the docking scores of the generated molecules. We further show with ablation studies that our method, predictive error-PER (FREED(PE)), significantly improves the model performance. Sections like '4 Results and Analysis', '4.2 Quantitative performance benchmark', and '4.3 Ablation studies: explorative algorithms' describe empirical evaluations, comparisons, and performance metrics. |
| Researcher Affiliation | Collaboration | Soojung Yang AITRICS soojungy@mit.edu Doyeong Hwang AITRICS desertbeagle11@gmail.com Seul Lee KAIST ellenlee7890@gmail.com Seongok Ryu AITRICS seongokryu@galux.co.kr Sung Ju Hwang AITRICS, KAIST sjhwang82@kaist.ac.kr |
| Pseudocode | No | The paper describes the generation method and policy network using text and diagrams (Figure 2) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured code-like procedures. |
| Open Source Code | Yes | Our code is released at https://github.com/AITRICS/FREED. |
| Open Datasets | Yes | We trained the models for three carefully chosen protein targets fa7, parp1, and 5ht1b. [...] For the experiments in this section, we use the small fragment library that includes 66 pharmacochemically acceptable fragments. [...] Hier VAE showed high quality scores, as the Hier VAE fragment library itself had very few problematic substructures (See Appendix A.5 for details). [...] We also plot known Active' and Inactive' molecules from DUD-E (fa7, parp1) or Ch EMBL (5ht1b) datasets for comparison. |
| Dataset Splits | No | The paper mentions using specific protein targets and fragments but does not explicitly provide details about training, validation, and test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or cloud computing resources used for running the experiments. |
| Software Dependencies | No | The paper mentions 'RDKit [35] ECFP (Extended Connectivity Fingerprint)' but does not provide specific version numbers for RDKit or any other software dependencies required to reproduce the experiments. |
| Experiment Setup | Yes | Our model is designed to finish the episodes after four steps (in de novo cases) or two steps (in scaffold-based cases). [...] For every metric, we repeated every experiment five times with five different random seeds and reported the mean and the standard deviation of the scores. Also, we calculated the scores when 3,000 molecules were generated and used to update the model during training. [...] We trained the models for three carefully chosen protein targets fa7, parp1, and 5ht1b. [...] For the experiments in this section, we use the small fragment library that includes 66 pharmacochemically acceptable fragments. |