Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
RetroOOD: Understanding Out-of-Distribution Generalization in Retrosynthesis Prediction
Authors: Yemin Yu, Luotian Yuan, Ying Wei, Hanyu Gao, Fei Wu, Zhihua Wang, Xinhai Ye
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To this end, we first formally sort out two types of distribution shifts in retrosynthesis prediction and construct two groups of benchmark datasets. Next, through comprehensive experiments, we systematically compare state-of-the-art retrosynthesis prediction models on the two groups of benchmarks, revealing the limitations of previous in-distribution evaluation and re-examining the advantages of each model. |
| Researcher Affiliation | Academia | Yemin Yu1,5*, Luotian Yuan2*, Ying Wei3 , Hanyu Gao 4, Fei Wu 2,5, Zhihua Wang 5, Xinhai Ye 5 1City University of Hong Kong 2Zhejiang University 3Nanyang Technological University 4Hong Kong University of Science and Technology 5Shanghai Institute for Advanced Study of Zhejiang University |
| Pseudocode | Yes | The complete algorithm is listed as Alg. 1 in the Appendix. |
| Open Source Code | No | The paper does not provide an explicit statement or a link to the open-source code for the methodology it describes. |
| Open Datasets | Yes | on the benchmark USPTO50K dataset (Schneider, Stiefl, and Landrum 2016) |
| Dataset Splits | No | The paper mentions "train-test data split" and discusses N/N as sample sizes for train/test data, but it does not explicitly specify the proportions or existence of a validation set, nor does it provide concrete percentages or sample counts for the splits needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running its experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific libraries) that would be needed to replicate the experiments. |
| Experiment Setup | No | The paper mentions that "All baseline models are re-trained on each of the four OOD datasets separately for evaluation" and "The complete architecture details of the EBM and the ablation study on the different settings of n are elaborated in the Appendix," implying that specific hyperparameter values or detailed training configurations are not present in the main text. |