Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
KeeA*: Epistemic Exploratory A* Search via Knowledge Calibration
Authors: Dengwei Zhao, Shikui Tu, Yanan Sun, Lei Xu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, empirical results on retrosynthetic planning and logic synthesis demonstrate superior performance of Kee A compared to state-of-the-art heuristic search algorithms. Experiments are conducted on two real-world applications: retrosynthetic planning in organic chemistry and logic synthesis in VLSI design. |
| Researcher Affiliation | Academia | Dengwei Zhao1, Shikui Tu1 , Yanan Sun2 Lei Xu1,3 1School of Computer Science, Shanghai Jiao Tong University 2School of Integrated Circuits, Shanghai Jiao Tong University 3Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ) EMAIL |
| Pseudocode | Yes | Algorithmic details are provided in Algorithm 1. Algorithm 1: Kee A search algorithm |
| Open Source Code | Yes | The source code is publicly available at https://github.com/CMACH508/Kee A. |
| Open Datasets | Yes | Experiments are conducted on the widely used USPTO benchmark, which comprises 190 target molecules [4], and additional 4719 molecules collected from log P [5], log S [54], Toxicity LD50 [49], Ames [17], BBBP [29], and Clin Tox [15] dataset. |
| Dataset Splits | Yes | Experiments are conducted on the widely used USPTO benchmark, which comprises 190 target molecules [4], and additional 4719 molecules collected from log P [5], log S [54], Toxicity LD50 [49], Ames [17], BBBP [29], and Clin Tox [15] dataset. Molecules from the e Molecules database3 are used as the set of commercially available building blocks. The synthesis outcomes of 4, 909 molecules across seven datasets are used for hypothesis testing. 12 MCNC benchmark circuits {C1 C12} [55] are used for evaluation. |
| Hardware Specification | Yes | All experiments are conducted on NVIDIA Tesla V100 GPUs and an Intel(R) Xeon(R) Gold 6238R CPU. |
| Software Dependencies | No | The paper mentions 'ABC synthesis tool [2]' and 'BERT model [13]' but does not provide specific version numbers for these software components, nor for any other libraries or programming languages. |
| Experiment Setup | Yes | All search algorithms are constrained to a maximum of 500 calls or 10 minutes of wall-clock time, following prior works [4, 23]. The candidate size is fixed at N c = 50, and the number of clusters is K = 5, which are consistent with See A . The hyperparameters α and β in Kee A are set as 0.5 and 0.8, respectively. For the logic synthesis task, an And-Inverter Graph (AIG) is optimized to minimize the area-delay product (ADP) via a sequence of functionality-preserving transformations. 7 legal transformations are allowed, and the action sequence is constrained to be 10 steps. The candidate size is fixed at N c = 10, with the number of clusters K = 5. Five nodes are sampled from each cluster for scouting. Hyperparameters α and β are set to 0.5 and 0.8, respectively. |