Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
De Novo Molecular Generation via Connection-aware Motif Mining
Authors: Zijie Geng, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Jie Wang, Yongdong Zhang, Feng Wu, Tie-Yan Liu
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our method on distribution-learning benchmarks (i.e., generating novel molecules to resemble the distribution of a given training set) and goal-directed benchmarks (i.e., generating molecules with target properties), and achieve significant improvements over previous fragment-based baselines. |
| Researcher Affiliation | Collaboration | Zijie Geng1 , Shufang Xie2 , Yingce Xia3 , Lijun Wu3, Tao Qin3, Jie Wang1,4 , Yongdong Zhang1, Feng Wu1, Tie-Yan Liu3 1 University of Science and Technology of China EMAIL, EMAIL 2 Gaoling School of Artificial Intelligence, Renmin University of China EMAIL 3 Microsoft Research AI4Science EMAIL 4 Institute of Artificial Intelligence, Hefei Comprehensive National Science Center |
| Pseudocode | Yes | Algorithm 1: Connection-awared Motif Mining; Algorithm 2: Generating a molecule |
| Open Source Code | Yes | The code of Mi Ca M is available at https://github.com/MIRALab-USTC/AI4Sci-Mi Ca M. |
| Open Datasets | Yes | We evaluate our method on three datasets: QM9 (Ruddigkeit et al., 2012), ZINC (Irwin et al., 2012), and Guaca Mol (a post-processed Ch EMBL (Mendez et al., 2019) dataset proposed by Brown et al. (2019)). |
| Dataset Splits | No | βprior and βprop are hyperparameters to be determined according to validation performances. |
| Hardware Specification | Yes | We measure the training and sampling speed on a single Ge Force RTX 3090. |
| Software Dependencies | No | We employ GINE (Hu et al., 2019) as the GNN structures... The target values are computed using the RDKit library. |
| Experiment Setup | Yes | For QM9, we use a short warm-up (3, 000 steps), and use a long sigmoid schedule (400, 000 steps) (Bowman et al., 2015) to let βprior to reach 0.4. ... a small βprop (about 0.3) is beneficial. |