Retro-fallback: retrosynthetic planning in an uncertain world

Authors: Austin Tripp, Krzysztof Maziarz, Sarah Lewis, Marwin Segler, José Miguel Hernández-Lobato

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using insilico benchmarks we demonstrate that retro-fallback generally produces better sets of synthesis plans than the popular MCTS and retro* algorithms.
Researcher Affiliation Collaboration 1University of Cambridge 2Microsoft Research AI4Science
Pseudocode Yes Algorithm 1 Retro-fallback algorithm (see 4.2)
Open Source Code Yes Code to reproduce all experiments is available at: https://github.com/Austin T/retro-fallback-iclr24.
Open Datasets Yes We have based our experiment design on the USPTO benchmark from Chen et al. (2020)... We downloaded data from e Molecules downloads page, specifically their orderable molecules and building blocks with quotes... We tested all algorithms on the set of 190 hard molecules from Chen et al. (2020)... We also performed experiments on a set of 1000 randomly selected molecules from the Guaca Mol test set (Brown et al., 2019).
Dataset Splits No The paper does not explicitly describe train/validation/test splits, only referencing '190 test molecules' or 'Guaca Mol test set' from existing benchmarks without detailing their own partitioning strategy for training, validation, and testing as a new contribution.
Hardware Specification No The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for the experiments.
Software Dependencies No The paper lists several software libraries and frameworks: pytorch (Paszke et al., 2019), rdkit (Landrum et al., 2023), rdchiral (Coley et al., 2019a), networkx (Hagberg et al., 2008), numpy (Harris et al., 2020), scipy (Virtanen et al., 2020), and scikit-learn (Pedregosa et al., 2011). However, only rdkit has a specific version (2022.09.4) mentioned; the others are cited without explicit version numbers.
Experiment Setup Yes Retro-fallback was run with k = 256 samples from ξf, ξb. A graph-structured AND/OR search graph was used (which may contain cycles). s( ), ψ( ), and ρ( ) were solved by iterating the recursive equations (including around cycles) until convergence (if this did not occur we reset all values to 0 and resumed iteration). All other algorithms were configured to maximize SSP, as described in Appendix E.2. In particular, this means: Breadth-first search was run with no modifications, using the implementation from syntheseus. retro* was run using log Ef[f(r)] as the reaction cost and log Eb[b(m)]. MCTS was run using σ(T; ξf, ξb) as the reward for finding synthesis plan T (empirically estimated from a finite number of samples). To allow the algorithm to best make use of its budget of reaction model calls, we only expanded nodes after they were visited 10 times. The marginal feasibility value of reach reaction was used as the policy in the upperconfidence bound. We used an exploration constant of c = 0.01 to avoid wasting reaction model calls on exploration, and only gave non-zero rewards for up to 100 visits to the same synthesis plan to avoid endlessly re-visiting the same solutions.