Molecule Generation with Fragment Retrieval Augmentation
Authors: Seul Lee, Karsten Kreis, Srimukh Veccham, Meng Liu, Danny Reidenbach, Saee Paliwal, Arash Vahdat, Weili Nie
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate f-RAG on molecule generation tasks that simulate various real-world drug discovery problems. We first conduct experiments on the practical molecular optimization (PMO) benchmark [10] in Section 4.1. We then conduct experiments to generate novel molecules that have high binding affinity, drug-likeness, and synthesizability in Section 4.2. We further perform analyses in Section 4.3. |
| Researcher Affiliation | Collaboration | 1KAIST 2NVIDIA |
| Pseudocode | Yes | We summarize the generation process of f-RAG in Algorithm 1 in Section C. |
| Open Source Code | Yes | The code to reproduce the results in our paper is provided as the supplementary material. (from NeurIPS Paper Checklist, Section 4 & 5) |
| Open Datasets | Yes | We used ZINC250k [13] with the same train/test split used by Kusner et al. [22] to train the fragment injection module and construct the initial fragment vocabulary. |
| Dataset Splits | Yes | We used ZINC250k [13] with the same train/test split used by Kusner et al. [22] to train the fragment injection module and construct the initial fragment vocabulary. |
| Hardware Specification | Yes | We trained the fragment injection module using one Ge Force RTX 3090 GPU. The training took less than 4 hours. We generated molecules using one Titan XP (12GB), Ge Force RTX 2080 Ti (11GB), or Ge Force RTX 3090 GPU (24GB). |
| Software Dependencies | No | The paper mentions software components like 'Hugging Face Transformer library [44]', 'Quick Vina 2 [2]', and 'RDKit [23]', but it does not explicitly provide specific version numbers for these components, which is required for reproducibility. |
| Experiment Setup | Yes | The fragment injection module was trained to 8 epochs with a learning rate of 1 10 4 using the Adam W optimizer [29]. ... We set the size of the fragment vocabulary to Nfrag = 50 and the size of the molecule population to Nmol = 50. ... We set the mutation rate of the GA to 0.1. We set the number of SAFE-GPT generation and number of GA generation in one cycle to GSAFE-GPT = 10 and GGA = 10, respectively. |