Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Molecule Generation by Principal Subgraph Mining and Assembling
Authors: Xiangzhe Kong, Wenbing Huang, Zhixing Tan, Yang Liu
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on the ZINC250K [16] and QM9 [6, 37] datasets. Results demonstrate that our PS-VAE outperforms state-of-the-art models on distribution learning, (constrained) property optimization as well as Guaca Mol goal-directed benchmarks [7]. |
| Researcher Affiliation | Academia | Xiangzhe Kong1 Wenbing Huang4,5 Zhixing Tan 1 Yang Liu1,2,3 1Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, Tsinghua University 2Institute for AIR, Tsinghua University 3Beijing Academy of Artificial Intelligence 4Gaoling School of Artificial Intelligence, Renmin University of China 5 Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Principal Subgraph Extraction |
| Open Source Code | Yes | 2Codes for our PS-VAE are availabel at https://github.com/THUNLP-MT/PS-VAE. |
| Open Datasets | Yes | We use the ZINC250K [16] dataset for training, which contains 250,000 drug-like molecules up to 38 atoms. For Guaca Mol benchmark, we add extra results on the QM9 [6, 37] dataset, which has 133,014 molecules up to 23 atoms. |
| Dataset Splits | No | The paper states it uses ZINC250K for training and QM9 for Guaca Mol benchmarks, but does not explicitly provide specific training/validation/test dataset splits in percentages or sample counts in the main text. |
| Hardware Specification | No | The paper states that hardware specifications are in Appendix F, which is not provided in the given text. |
| Software Dependencies | No | The paper mentions software components like GNN, MLP, GRU, but does not provide specific version numbers for these or other software dependencies in the main text. It refers to Appendix G for more details, which is not provided. |
| Experiment Setup | Yes | PS-VAE is trained for 6 epochs with a batch size of 32 and a learning rate of 0.001. We set α = 0.1 and initialize β = 0. We adopt a warm-up method that increases β by 0.002 every 1000 steps to a maximum of 0.01. |