Double-Ended Synthesis Planning with Goal-Constrained Bidirectional Search

Authors: Kevin Yu, Jihye Roh, Ziang Li, Wenhao Gao, Runzhong Wang, Connor Coley

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the utility of DESP in improving solve rates and reducing the number of search expansions by biasing synthesis planning towards expert goals on multiple new benchmarks.
Researcher Affiliation Academia Kevin Yu MIT kyu3@mit.edu Jihye Roh MIT jroh99@mit.edu Ziang Li Georgia Tech ziang@gatech.edu Wenhao Gao MIT whgao@mit.edu Runzhong Wang MIT runzhong@mit.edu Connor W. Coley MIT ccoley@mit.edu
Pseudocode Yes Algorithm 1: FORWARD_EXPAND(m1, m2, GF , N, K) m1: molecule selected for expansion, m2: molecule to condition expansion on, GF : bottom search graph, N: num. templates to propose, K: num. building blocks to search t TOP_N(σ(MLPt(zm(m1) zm(m2)))) ; /* Get top N forward templates */ for i 1 to N do if t [i] is unimolecular then p t [i](m1) ; /* Apply fwd. template to m */ GF .ADD_RXN({m1}, p, t [i]) ; /* Add reaction and product to GF */ else /* t [i] is bimolecular */ b KNNB(MLPb(zm(m1) zm(m2) zt(t [i]))) ; /* Get K nearest BBs by cosine sim. */ j 1 to K: GF .ADD_RXN({m1, b}, t [i](m1, b[j]), t [i]) ; /* Apply t [i] */ end end
Open Source Code Yes Relevant code with documentation can be found at https://github.com/coleygroup/desp.
Open Datasets Yes In this work, we use the public USPTO-Full dataset [53, 54] of approximately 1 million deduplicated reactions. The dataset is filtered and processed (details in Section A.3), and a template set TUSPTO is extracted with RDChiral [55]. The dataset is randomly divided into training and validation splits with ratio 90:10. From the training split RUSPTO we construct the graph GUSPTO. We also create and release two additional benchmark sets, which we call Pistachio Reachable and Pistachio Hard. Details of their construction are provided in Section A.6.
Dataset Splits Yes The dataset is randomly divided into training and validation splits with ratio 90:10.
Hardware Specification Yes All experiments were performed on a 32-core AMD Threadripper Pro 5975WX processor and with a single NVIDIA RTX 4090 GPU.
Software Dependencies No The paper mentions specific software like the "Python library Faiss" and "RDKit implementation of the Morgan Fingerprint [63]" but does not specify their version numbers or the versions of other critical software components like Python or deep learning frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup Yes For all methods, we enforce a maximum molecule depth of 11, a maximum of 500 total expansions (retro or forward), and apply 50 retro templates per expansion. For DESP, we also enforce a maximum molecule depth of 6 for the bottom-up search, apply 25 forward templates per expansion, and use the top 2 building blocks found in the k-NN search. Due to the asymmetry of the bidirectional search, we also introduce a hyperparameter λ, the number of times we repeat a select, expand, and update cycle for GR before performing one cycle for GF . For all experiments, we set λ = 2. Details and tabular summaries of the evaluations performed and hyperparameters chosen are provided in Section A.7.