reproducibilityindex.ai

Molecule Generation with Fragment Retrieval Augmentation

Authors: Seul Lee, Karsten Kreis, Srimukh Veccham, Meng Liu, Danny Reidenbach, Saee Paliwal, Arash Vahdat, Weili Nie

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate f-RAG on molecule generation tasks that simulate various real-world drug discovery problems. We first conduct experiments on the practical molecular optimization (PMO) benchmark [10] in Section 4.1. We then conduct experiments to generate novel molecules that have high binding affinity, drug-likeness, and synthesizability in Section 4.2. We further perform analyses in Section 4.3.
Researcher Affiliation	Collaboration	1KAIST 2NVIDIA
Pseudocode	Yes	We summarize the generation process of f-RAG in Algorithm 1 in Section C.
Open Source Code	Yes	The code to reproduce the results in our paper is provided as the supplementary material. (from NeurIPS Paper Checklist, Section 4 & 5)
Open Datasets	Yes	We used ZINC250k [13] with the same train/test split used by Kusner et al. [22] to train the fragment injection module and construct the initial fragment vocabulary.
Dataset Splits	Yes	We used ZINC250k [13] with the same train/test split used by Kusner et al. [22] to train the fragment injection module and construct the initial fragment vocabulary.
Hardware Specification	Yes	We trained the fragment injection module using one Ge Force RTX 3090 GPU. The training took less than 4 hours. We generated molecules using one Titan XP (12GB), Ge Force RTX 2080 Ti (11GB), or Ge Force RTX 3090 GPU (24GB).
Software Dependencies	No	The paper mentions software components like 'Hugging Face Transformer library [44]', 'Quick Vina 2 [2]', and 'RDKit [23]', but it does not explicitly provide specific version numbers for these components, which is required for reproducibility.
Experiment Setup	Yes	The fragment injection module was trained to 8 epochs with a learning rate of 1 10 4 using the Adam W optimizer [29]. ... We set the size of the fragment vocabulary to Nfrag = 50 and the size of the molecule population to Nmol = 50. ... We set the mutation rate of the GA to 0.1. We set the number of SAFE-GPT generation and number of GA generation in one cycle to GSAFE-GPT = 10 and GGA = 10, respectively.