Retroformer: Pushing the Limits of End-to-end Retrosynthesis Transformer

Authors: Yue Wan, Chang-Yu Hsieh, Ben Liao, Shengyu Zhang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our model can improve over the vanilla Transformer by 12.5% and 14.4% top-10 accuracy in the reaction class known and unknown settings, respectively. It reaches the new state-of-the-art accuracy for template-free methods and is competitive against both template-based and semi-template-based methods. It also enjoys better molecule and reaction validity compared to strong baseline models.
Researcher Affiliation Industry 1Tencent Quantum Laboratory, Shenzhen, China. Correspondence to: Chang-Yu Hsieh <kimhsieh@tencent.com>, Shengyu Zhang <shengyzhang@tencent.com>.
Pseudocode Yes Algorithm 1 SMILES Graph Construction; Algorithm 2 SMILES Token Alignment Computation; Algorithm 3 Reaction Center Subgraph Search
Open Source Code Yes Our code is available at https://github.com/yuewan2/Retroformer.
Open Datasets Yes We use the conventional retrosynthesis benchmark dataset USPTO-50K (Schneider et al., 2016) to evaluate our method.
Dataset Splits Yes We use the conventional retrosynthesis benchmark dataset USPTO-50K (Schneider et al., 2016) to evaluate our method. It contains 50016 atom-mapped reactions that are grouped into 10 reaction classes. We use the same data split as (Coley et al., 2017).
Hardware Specification Yes Retroformerbase is trained on 1 NVIDIA Tesla V100 GPU for 24 hours.
Software Dependencies No The paper mentions using the Adam optimizer and being built on top of the vanilla Transformer, and it trains a vanilla retrosynthesis Transformer from scratch using Open NMT (Klein et al., 2017), but it does not specify software dependencies like Python, PyTorch/TensorFlow versions, or other library versions for its own implementation of Retroformer.
Experiment Setup Yes The model is trained using the Adam optimizer (Kingma & Ba, 2017) with a fixed learning rate of 1e 4, and a dropout rate of 0.3. The embedding dimension is set to 256, and the total amount of heads is set to 8. We split the heads by half for global and local heads.