Learning Chemical Rules of Retrosynthesis with Pre-training

Authors: Yinjie Jiang, Ying WEI, Fei Wu, Zhengxing Huang, Kun Kuang, Zhihua Wang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our empirical evaluation, the proposed pre-training solution substantially improves the single-step retrosynthesis accuracies in three downstream datasets.
Researcher Affiliation Academia 1 Zhejiang University 2 City University of Hong Kong 3 Shanghai Institute for Advanced Study of Zhejiang University
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a direct link to a code repository for the described methodology.
Open Datasets Yes We pre-train our model on Pistachio (Mayfield, Lowe, and Sayle 2017), which is automatically extracted from U.S., European and WIPO patents, including 13.3 million reactions. [...] We conduct experiments on USPTO-50K (Schneider, Stiefl, and Landrum 2016; Coley et al. 2017; Liu et al. 2017) and USPTO-full (Dai et al. 2019; Yan et al. 2020).
Dataset Splits Yes We split the remaining data into a training set with 3.74M reactions and a validation set with 0.2M reactions. [...] We split USPTO-50K into a training set with 40012 reactions, a validation set with 5000 reactions and a test set with 4997 reactions randomly.
Hardware Specification Yes The pre-training process runs on 8 NVIDIA A100 GPU cards for 740K steps and the batch size is 14000 tokens.
Software Dependencies No The paper mentions 'RDKit toolkit (Landrum 2021)' and 'Adam optimizer (Kingma and Ba 2015)' but does not provide specific version numbers for software dependencies like RDKit, or for programming languages/libraries like Python or PyTorch.
Experiment Setup Yes We fine-tune on USPTO-50K with reaction type known and unknown for 100 epochs with a learning rate of 5 10 4. [...] In the encoder, we use cross-entropy loss on masked tokens. Besides, we use label-smoothed cross-entropy loss with a label-smoothing factor of 0.1 in auto-regression and molecule recovery task of the decoder. In contrastive classification, the projection layer has 2048 hidden units, and we regularize the contrastive loss by a weight of 0.1. We use Adam optimizer (Kingma and Ba 2015) and vary the learning rate with Noam (Vaswani et al. 2017) schedule with 8000 warm-up steps.