Learning Chemical Rules of Retrosynthesis with Pre-training
Authors: Yinjie Jiang, Ying WEI, Fei Wu, Zhengxing Huang, Kun Kuang, Zhihua Wang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our empirical evaluation, the proposed pre-training solution substantially improves the single-step retrosynthesis accuracies in three downstream datasets. |
| Researcher Affiliation | Academia | 1 Zhejiang University 2 City University of Hong Kong 3 Shanghai Institute for Advanced Study of Zhejiang University |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a direct link to a code repository for the described methodology. |
| Open Datasets | Yes | We pre-train our model on Pistachio (Mayfield, Lowe, and Sayle 2017), which is automatically extracted from U.S., European and WIPO patents, including 13.3 million reactions. [...] We conduct experiments on USPTO-50K (Schneider, Stiefl, and Landrum 2016; Coley et al. 2017; Liu et al. 2017) and USPTO-full (Dai et al. 2019; Yan et al. 2020). |
| Dataset Splits | Yes | We split the remaining data into a training set with 3.74M reactions and a validation set with 0.2M reactions. [...] We split USPTO-50K into a training set with 40012 reactions, a validation set with 5000 reactions and a test set with 4997 reactions randomly. |
| Hardware Specification | Yes | The pre-training process runs on 8 NVIDIA A100 GPU cards for 740K steps and the batch size is 14000 tokens. |
| Software Dependencies | No | The paper mentions 'RDKit toolkit (Landrum 2021)' and 'Adam optimizer (Kingma and Ba 2015)' but does not provide specific version numbers for software dependencies like RDKit, or for programming languages/libraries like Python or PyTorch. |
| Experiment Setup | Yes | We fine-tune on USPTO-50K with reaction type known and unknown for 100 epochs with a learning rate of 5 10 4. [...] In the encoder, we use cross-entropy loss on masked tokens. Besides, we use label-smoothed cross-entropy loss with a label-smoothing factor of 0.1 in auto-regression and molecule recovery task of the decoder. In contrastive classification, the projection layer has 2048 hidden units, and we regularize the contrastive loss by a weight of 0.1. We use Adam optimizer (Kingma and Ba 2015) and vary the learning rate with Noam (Vaswani et al. 2017) schedule with 8000 warm-up steps. |