Molecule Generation with Fragment Retrieval Augmentation

Authors: Seul Lee, Karsten Kreis, Srimukh Veccham, Meng Liu, Danny Reidenbach, Saee Paliwal, Arash Vahdat, Weili Nie

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate f-RAG on molecule generation tasks that simulate various real-world drug discovery problems. We first conduct experiments on the practical molecular optimization (PMO) benchmark [10] in Section 4.1. We then conduct experiments to generate novel molecules that have high binding affinity, drug-likeness, and synthesizability in Section 4.2. We further perform analyses in Section 4.3.
Researcher Affiliation Collaboration 1KAIST 2NVIDIA
Pseudocode Yes We summarize the generation process of f-RAG in Algorithm 1 in Section C.
Open Source Code Yes The code to reproduce the results in our paper is provided as the supplementary material. (from NeurIPS Paper Checklist, Section 4 & 5)
Open Datasets Yes We used ZINC250k [13] with the same train/test split used by Kusner et al. [22] to train the fragment injection module and construct the initial fragment vocabulary.
Dataset Splits Yes We used ZINC250k [13] with the same train/test split used by Kusner et al. [22] to train the fragment injection module and construct the initial fragment vocabulary.
Hardware Specification Yes We trained the fragment injection module using one Ge Force RTX 3090 GPU. The training took less than 4 hours. We generated molecules using one Titan XP (12GB), Ge Force RTX 2080 Ti (11GB), or Ge Force RTX 3090 GPU (24GB).
Software Dependencies No The paper mentions software components like 'Hugging Face Transformer library [44]', 'Quick Vina 2 [2]', and 'RDKit [23]', but it does not explicitly provide specific version numbers for these components, which is required for reproducibility.
Experiment Setup Yes The fragment injection module was trained to 8 epochs with a learning rate of 1 10 4 using the Adam W optimizer [29]. ... We set the size of the fragment vocabulary to Nfrag = 50 and the size of the molecule population to Nmol = 50. ... We set the mutation rate of the GA to 0.1. We set the number of SAFE-GPT generation and number of GA generation in one cycle to GSAFE-GPT = 10 and GGA = 10, respectively.