Double-Ended Synthesis Planning with Goal-Constrained Bidirectional Search
Authors: Kevin Yu, Jihye Roh, Ziang Li, Wenhao Gao, Runzhong Wang, Connor Coley
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the utility of DESP in improving solve rates and reducing the number of search expansions by biasing synthesis planning towards expert goals on multiple new benchmarks. |
| Researcher Affiliation | Academia | Kevin Yu MIT kyu3@mit.edu Jihye Roh MIT jroh99@mit.edu Ziang Li Georgia Tech ziang@gatech.edu Wenhao Gao MIT whgao@mit.edu Runzhong Wang MIT runzhong@mit.edu Connor W. Coley MIT ccoley@mit.edu |
| Pseudocode | Yes | Algorithm 1: FORWARD_EXPAND(m1, m2, GF , N, K) m1: molecule selected for expansion, m2: molecule to condition expansion on, GF : bottom search graph, N: num. templates to propose, K: num. building blocks to search t TOP_N(σ(MLPt(zm(m1) zm(m2)))) ; /* Get top N forward templates */ for i 1 to N do if t [i] is unimolecular then p t [i](m1) ; /* Apply fwd. template to m */ GF .ADD_RXN({m1}, p, t [i]) ; /* Add reaction and product to GF */ else /* t [i] is bimolecular */ b KNNB(MLPb(zm(m1) zm(m2) zt(t [i]))) ; /* Get K nearest BBs by cosine sim. */ j 1 to K: GF .ADD_RXN({m1, b}, t [i](m1, b[j]), t [i]) ; /* Apply t [i] */ end end |
| Open Source Code | Yes | Relevant code with documentation can be found at https://github.com/coleygroup/desp. |
| Open Datasets | Yes | In this work, we use the public USPTO-Full dataset [53, 54] of approximately 1 million deduplicated reactions. The dataset is filtered and processed (details in Section A.3), and a template set TUSPTO is extracted with RDChiral [55]. The dataset is randomly divided into training and validation splits with ratio 90:10. From the training split RUSPTO we construct the graph GUSPTO. We also create and release two additional benchmark sets, which we call Pistachio Reachable and Pistachio Hard. Details of their construction are provided in Section A.6. |
| Dataset Splits | Yes | The dataset is randomly divided into training and validation splits with ratio 90:10. |
| Hardware Specification | Yes | All experiments were performed on a 32-core AMD Threadripper Pro 5975WX processor and with a single NVIDIA RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions specific software like the "Python library Faiss" and "RDKit implementation of the Morgan Fingerprint [63]" but does not specify their version numbers or the versions of other critical software components like Python or deep learning frameworks (e.g., PyTorch, TensorFlow). |
| Experiment Setup | Yes | For all methods, we enforce a maximum molecule depth of 11, a maximum of 500 total expansions (retro or forward), and apply 50 retro templates per expansion. For DESP, we also enforce a maximum molecule depth of 6 for the bottom-up search, apply 25 forward templates per expansion, and use the top 2 building blocks found in the k-NN search. Due to the asymmetry of the bidirectional search, we also introduce a hyperparameter λ, the number of times we repeat a select, expand, and update cycle for GR before performing one cycle for GF . For all experiments, we set λ = 2. Details and tabular summaries of the evaluations performed and hyperparameters chosen are provided in Section A.7. |