Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision

Authors: Kangjie Zheng, Siyue Liang, Junwei Yang, Bin Feng, Zequn Liu, Wei Ju, Zhiping Xiao, Ming Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that SMI-EDITOR achieves state-of-the-art performance across multiple downstream molecular tasks, and even outperforming several 3D molecular representation models. In this section, we first evaluate the performance of SMI-EDITOR on molecular property prediction tasks and compare it with baseline models (see Section 4.2). The results show that SMI-EDITOR outperforms both the MLM and 3D molecular models, achieving state-of-the-art performance. To further validate the model design and pre-training framework, we conduct ablation studies on training signals and editing operations (see Section 4.3). In addition, analytical experiments confirm that SMI-EDITOR has a stronger ability to capture the semantics of molecular substructures compared to MLMs.
Researcher Affiliation	Academia	1School of Computer Science, Peking University. 2National Key Laboratory for Multimedia Information Processing, Peking University. 3Peking University-Anker Embodied AI Lab, Peking University. 4International Digital Economy Academy (IDEA), Shenzhen, China. 5College of Computer Science, Sichuan University, Chengdu, China. 6Computer Science and Engineering Department, University of Washington, U.S.A. EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the model architecture and training objective using descriptive text and mathematical formulas but does not include a distinct block explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	1Code is released at https://github.com/zhengkangjie/smi-editor
Open Datasets	Yes	For pre-training, we use the large-scale molecular dataset provided by Zhou et al. (2023), which includes SMILES information for 19 million molecules. For fine-tuning, we employ the widely-recognized Molecule Net benchmark (Wu et al., 2018) (see Appendix I for more details).
Dataset Splits	Yes	We follow the same data split as used by Zhou et al. (2023) and tokenize SMILES sequences with the regular expression from Schwaller et al. (2018). For all the seven tasks, we take the normalized SMILES information as model input and fine-tuning on each task separately. We use ROC-AUC as the evaluation metric, and the results are summarized in Table 1. ... Molecule Net (Wu et al., 2018) benchmark, focusing on the molecular property prediction task. ... In this section, we provide a detailed summary of the statistics and fundamental characteristics of the Molecule Net datasets we use in Table 7. This table offers information about the dataset sizes, task types, and compositions, providing readers with essential background information to better understand the experimental setup and subsequent analysis. ... Table 7: Summary information of the Molecule Net benchmark datasets. ... Molecules (train/valid/test) ... 902/113/113 ... Using a 5-fold setup, we evaluated SMI-EDITOR s performance on the training sets of BACE, BBBP, SIDER, Tox21, Tox Cast, Clin Tox and MUV. The results are shown in Table 17. These results demonstrate that SMI-EDITOR exhibits strong performance and stability across downstream tasks. Each dataset was evenly divided into five parts. In each run, one part was selected as the validation set, while the remaining four parts were used as the training set. The model was trained and evaluated on the validation set. This process was repeated five times to complete all runs.
Hardware Specification	Yes	We implement the SMI-EDITOR model using the Fairseq library 2 and train SMI-EDITOR on four RTX3090 GPUs for about 1 day.
Software Dependencies	No	We implement the SMI-EDITOR model using the Fairseq library 2 and train SMI-EDITOR on four RTX3090 GPUs for about 1 day. ... 2https://fairseq.readthedocs.io/en/latest/. The paper mentions the Fairseq library but does not provide a specific version number.
Experiment Setup	Yes	We use a Transformer block with a hidden size of 768 and 12 attention heads, comprising 12 layers in the SMILES encoder, which contains a total of 86.3 million trainable parameters. During pre-training, the fragment drop ratio is set to 0.15. For downstream tasks, we use the same fine-tuning dataset established by Uni-Mol. (cf. Appendix G for more details about hyper-parameter configuration.) ... Table 5: SMI-EDITOR hyper-parameters for pre-training. Hyper-parameters Value Learning rate 5e-4 LR scheduler polynomial decay Warmup updates 10K Max updates 120K Max tokens 64k FFN dropout 0.1 Attention dropout 0.1 Activation dropout 0 Num of layers 12 Num of attention heads 12 Encoder embedding dim 768 Encoder FFN dim 3072 Adam (β1, β2) (0.9,0.98) Fragments Drop ratio 0.15 Vocabulary size 369 Activation function GELU Weight Decay 0.0 Clip Norm 1.0