Domain-Agnostic Molecular Generation with Chemical Feedback

Authors: Yin Fang, Ningyu Zhang, Zhuo Chen, Lingbing Guo, Xiaohui Fan, Huajun Chen

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on well-known benchmarks underscore MOLGEN s optimization capabilities in properties such as penalized log P, QED, and molecular docking. Additional analyses confirm its proficiency in accurately capturing molecule distributions, discerning intricate structural patterns, and efficiently exploring the chemical space.
Researcher Affiliation Collaboration College of Computer Science and Technology, Zhejiang University ZJU-Ant Group Joint Research Center for Knowledge Graphs, Zhejiang University ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University
Pseudocode No The paper describes methods verbally and with diagrams (Figure 2), but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/zjunlp/Mol Gen. ... We have made MOLGEN accessible via Hugging Face in support of the broader scientific community... We make our pre-trained model, code, and data publicly available, in the hope that our work will foster future research in the field.
Open Datasets Yes In the first stage of pre-training, we randomly select over 100 million unlabelled molecules from the publicly available ZINC-15 dataset (Sterling & Irwin, 2015)... For the natural product dataset, we sourced 30,926 compounds from the Natural Product Activity & Species Source Database (NPASS) (Zhao et al., 2023).
Dataset Splits Yes Out of these, we arbitrarily chose 30,126 molecules for training and reserved 800 molecules for testing, utilizing the same sets for all ensuing molecule generation tasks.
Hardware Specification Yes MOLGEN is implemented using Pytorch and trained on 6 Nvidia V100 GPUs.
Software Dependencies No MOLGEN is implemented using Pytorch and trained on 6 Nvidia V100 GPUs. While PyTorch is mentioned, a specific version number is not provided, nor are other software dependencies with version numbers.
Experiment Setup Yes Appendix Table 2: Hyper-parameter settings. lists maximum sequence length {55, 148, 436}, learning rate {1e-5, 3e-5, 1e-4}, batch size {8, 32, 64, 200, 256}, weight of rank loss α {1,3,5}, prefix length 5. ... MOLGEN is trained using the Adam W optimizer with a batch size of 200 for the MOSES dataset and 32 for the natural product dataset on 6 Nvidia V100 GPUs for 100 epochs. A linear warm-up of 20000 steps was also employed.