Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

KeeA: Epistemic Exploratory A Search via Knowledge Calibration

Authors: Dengwei Zhao, Shikui Tu, Yanan Sun, Lei Xu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, empirical results on retrosynthetic planning and logic synthesis demonstrate superior performance of Kee A compared to state-of-the-art heuristic search algorithms. Experiments are conducted on two real-world applications: retrosynthetic planning in organic chemistry and logic synthesis in VLSI design.
Researcher Affiliation	Academia	Dengwei Zhao1, Shikui Tu1 , Yanan Sun2 Lei Xu1,3 1School of Computer Science, Shanghai Jiao Tong University 2School of Integrated Circuits, Shanghai Jiao Tong University 3Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ) EMAIL
Pseudocode	Yes	Algorithmic details are provided in Algorithm 1. Algorithm 1: Kee A search algorithm
Open Source Code	Yes	The source code is publicly available at https://github.com/CMACH508/Kee A.
Open Datasets	Yes	Experiments are conducted on the widely used USPTO benchmark, which comprises 190 target molecules [4], and additional 4719 molecules collected from log P [5], log S [54], Toxicity LD50 [49], Ames [17], BBBP [29], and Clin Tox [15] dataset.
Dataset Splits	Yes	Experiments are conducted on the widely used USPTO benchmark, which comprises 190 target molecules [4], and additional 4719 molecules collected from log P [5], log S [54], Toxicity LD50 [49], Ames [17], BBBP [29], and Clin Tox [15] dataset. Molecules from the e Molecules database3 are used as the set of commercially available building blocks. The synthesis outcomes of 4, 909 molecules across seven datasets are used for hypothesis testing. 12 MCNC benchmark circuits {C1 C12} [55] are used for evaluation.
Hardware Specification	Yes	All experiments are conducted on NVIDIA Tesla V100 GPUs and an Intel(R) Xeon(R) Gold 6238R CPU.
Software Dependencies	No	The paper mentions 'ABC synthesis tool [2]' and 'BERT model [13]' but does not provide specific version numbers for these software components, nor for any other libraries or programming languages.
Experiment Setup	Yes	All search algorithms are constrained to a maximum of 500 calls or 10 minutes of wall-clock time, following prior works [4, 23]. The candidate size is fixed at N c = 50, and the number of clusters is K = 5, which are consistent with See A . The hyperparameters α and β in Kee A are set as 0.5 and 0.8, respectively. For the logic synthesis task, an And-Inverter Graph (AIG) is optimized to minimize the area-delay product (ADP) via a sequence of functionality-preserving transformations. 7 legal transformations are allowed, and the action sequence is constrained to be 10 steps. The candidate size is fixed at N c = 10, with the number of clusters K = 5. Five nodes are sampled from each cluster for scouting. Hyperparameters α and β are set to 0.5 and 0.8, respectively.

KeeA*: Epistemic Exploratory A* Search via Knowledge Calibration

KeeA: Epistemic Exploratory A Search via Knowledge Calibration