Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Retrosynthesis Planning via Worst-path Policy Optimisation in Tree-structured MDPs

Authors: Mianchu Wang, Giovanni Montana

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we aim to answer the following questions: (1) What are the advantages of our proposed method compared to the SOTA algorithms? (2) How does each component of the method contribute to the performance? Additionally, we examine the real-world feasibility of the proposed synthetic routes and present illustrative examples in Appendix A. 5.1 Benchmark Results The proposed method is trained by creating synthetic routes for nearly 300k molecules in the USPTO50k dataset with commercially available building blocks from the e Molecules dataset2. We evaluate performance on three benchmarks of increasing difficulty: Retro-190 [1], Ch EMBL-1000 [10, 35], and GDB17-1000 [10, 18], where the suffix indicates the dataset size. We compare against established retrosynthesis planning methods: MCTS, Retro [1], Self-improve [9], PDVN [10], and Graph Retro [31]. Additionally, we include two single-step methods, MEGAN [19] and Graph2Edits [38], combined with Retro* as the search algorithm, and two recently proposed methods, Dream Retroer [36] and Retro Captioner [11]. Baseline results are produced from their official implementations or the Syntheseus project [13]. Success Rate. Success rate measures the percentage of target molecules that can be successfully decomposed into building blocks. In Table 1, we firstly compare methods under different model-call budgets: 100, 200, and 500. Our proposed Inter Retro significantly outperforms SOTA algorithms across all three test sets.
Researcher Affiliation	Academia	Mianchu Wang Giovanni Montana University of Warwick The Alan Turing Institute EMAIL
Pseudocode	Yes	Algorithm 1 Interactive retrosynthesis planning (Inter Retro). Input: pre-trained one-step policy πθ, value function Vϕ, training set D, replay buffer B. def EXPLORE(πθ, m): 1: tree Tree(root = m) 2: q {m} 3: step 0 4: while q = and step < max_steps do 5: s q.pop() 6: a, Sr πθ.get_reactants(s) 7: tree.add_branch(s, a, Sr) 8: 9: # Add non-building blocks 10: q q { s Sr \| s / Sbb } 11: 12: step step + 1 13: end while 14: return tree def INTERRETRO(πθ, m): 1: for i = 1, . . . , I do 2: while D is not empty do 3: m D.pop() 4: tree EXPLORE(πθ, m) 5: brs {} 6: for each subtree τ tree do 7: if τ is successful then 8: brs brs ALLBRANCHES(τ) 9: end if 10: end for 11: B.append(brs) 12: branches B.sample() 13: Vϕ.update(branches) Eq. 15 14: πθ.update(Vϕ, branches) Eq. 16 15: end while 16: end for
Open Source Code	Yes	The code has been open-sourced1. 1Git Hub repository: https://github.com/Mianchu Wang/Inter Retro.
Open Datasets	Yes	The proposed method is trained by creating synthetic routes for nearly 300k molecules in the USPTO50k dataset with commercially available building blocks from the e Molecules dataset2. We evaluate performance on three benchmarks of increasing difficulty: Retro*-190 [1], Ch EMBL-1000 [10, 35], and GDB17-1000 [10, 18], where the suffix indicates the dataset size. 2e Molecules: https://downloads.emolecules.com/free/.
Dataset Splits	Yes	The proposed method is trained by creating synthetic routes for nearly 300k molecules in the USPTO50k dataset with commercially available building blocks from the e Molecules dataset2. We evaluate performance on three benchmarks of increasing difficulty: Retro-190 [1], Ch EMBL-1000 [10, 35], and GDB17-1000 [10, 18], where the suffix indicates the dataset size. We compare against established retrosynthesis planning methods: MCTS, Retro [1], Self-improve [9], PDVN [10], and Graph Retro [31]. Additionally, we include two single-step methods, MEGAN [19] and Graph2Edits [38], combined with Retro* as the search algorithm, and two recently proposed methods, Dream Retroer [36] and Retro Captioner [11]. Baseline results are produced from their official implementations or the Syntheseus project [13]. ... Sample Efficiency. Sample efficiency refers to the amount of training data required to learn an effective policy. We evaluate the policy using subsets of the training data at 1%, 2%, 5%, 10%, and 100%, corresponding to approximately 3k, 6k, 15k, 30k, and 300k molecules.
Hardware Specification	Yes	Our models are trained on a single NVIDIA RTX A5000 GPU and, without pre-training the single-step model, require approximately 48 hours to fully converge.
Software Dependencies	No	The paper does not explicitly state specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	For training, we run 6 parallel exploration processes and collect 36 synthetic trees per iteration. The networks are updated 5 times per iteration using data from a compact replay buffer of maximal 20, 000 branches, chosen to reduce CPU memory usage and maintain close alignment between the data-collecting policy πi and the updated policy πi+1. Our models are trained on a single NVIDIA RTX A5000 GPU and, without pre-training the single-step model, require approximately 48 hours to fully converge.