Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
RFL: Simplifying Chemical Structure Recognition with Ring-Free Language
Authors: Qikai Chang, Mingjun Chen, Changpeng Pi, Pengfei Hu, Zhenrong Zhang, Jiefeng Ma, Jun Du, Baocai Yin, Jinshui Hu
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that the proposed RFL and MSD can be applied to various mainstream methods, achieving superior performance compared to state-of-the-art approaches in both printed and handwritten scenarios. ... We validate our method on the handwritten dataset EDUCHEMC (Hu et al. 2023) and printed dataset Mini-CASIACSDB (Ding et al. 2022). ... Comprehensive experiments show that our method surpasses the state-of-the-art methods with different baselines on both printed and handwritten scenarios. |
| Researcher Affiliation | Collaboration | 1NERC-SLIP, University of Science and Technology of China 2i FLYTEK Research |
| Pseudocode | No | The paper describes the RFL and MSD methods with equations and figures (e.g., Figure 2 and 3 illustrating the process and architecture), but it does not contain explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/Jing Mog/RFL-MSD |
| Open Datasets | Yes | EDU-CHEMC (Hu et al. 2023) contains 48,998 training samples and 2,992 testing samples of handwritten molecular structure images collected from various educational scenarios in the real world. ... Mini-CASIA-CSDB (Ding et al. 2022) contains 89,023 training samples and 8,287 testing samples of printed molecular structure images collected from the chemical database Ch EMBL (Gaulton et al. 2017). |
| Dataset Splits | Yes | EDU-CHEMC (Hu et al. 2023) contains 48,998 training samples and 2,992 testing samples... Mini-CASIA-CSDB (Ding et al. 2022) contains 89,023 training samples and 8,287 testing samples... The dataset is divided into five levels based on structural complexity, with each level containing a similar number of samples, as shown in Figure 5. |
| Hardware Specification | Yes | All experiments are conducted on 4 NVIDIA Tesla V100 GPUs with 32GB RAM |
| Software Dependencies | No | The whole framework is implemented using Py Torch. (No version number provided for PyTorch or other libraries) |
| Experiment Setup | Yes | The growth rate and depth in each dense block are set to 24 and 32. The Molecular Skeleton Decoder (MSD) employs a GRU (Cho et al. 2014) with a hidden state dimension of 256. The embedding dimension is 256, and a dropout rate of 0.15 is applied. ... In our experiments, we set λ1 = λ2 = 1. The Adam optimizer (Kingma and Ba 2014) is used with an initial learning rate of 2 10 4, and the parameters are set as β1 = 0.9, β2 = 0.999, ε = 10 8. The learning rate adjustment strategy employs Multi Step LR with a decay factor γ = 0.5. All experiments are conducted on 4 NVIDIA Tesla V100 GPUs with 32GB RAM, using a batch size of 8 for the EDU-CHEMC dataset and 32 for the Mini-CASIACSDB dataset. The training epoch is set to 50 |