A Graph to Graphs Framework for Retrosynthesis Prediction

Authors: Chence Shi, Minkai Xu, Hongyu Guo, Ming Zhang, Jian Tang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that G2Gs significantly outperforms existing template-free approaches by up to 63% in terms of the top-1 accuracy and achieves a performance close to that of state-of-the-art templatebased approaches, but does not require domain knowledge and is much more scalable.
Researcher Affiliation Academia 1Department of Computer Science, School of EECS, Peking University 2Shanghai Jiao Tong University 3National Research Council Canada 4Montr eal Institute for Learning Algorithms (MILA) 5Canadian Institute for Advanced Research (CIFAR) 6HEC Montr eal. Correspondence to: Chence Shi <chenceshi@pku.edu.cn>, Jian Tang <jian.tang@hec.ca>.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not include an unambiguous statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We evaluate our approach on the widely used benchmark data set USPTO-50k, which contains 50k atommapped reactions with 10 reaction types.
Dataset Splits Yes Following (Liu et al., 2017), we randomly select 80% of the reactions as training set and divide the rest into validation and test sets with equal size.
Hardware Specification Yes We train our G2Gs for 100 epochs with a batch size of 128 and a learning rate of 0.0001 with Adam (Kingma & Ba, 2014) optimizer on a single GTX 1080Ti GPU card.
Software Dependencies No G2Gs is implemented in Py-torch (Paszke et al., 2017). We use the open-source chemical software RDkit (Landrum, 2016) to preprocess molecules for the training and generate canonical SMILES strings for the evaluation. Specific version numbers for PyTorch and RDKit are not provided.
Experiment Setup Yes The R-GCN in G2Gs is implemented with 4 layers and the embedding size is set as 512 for both modules. We use latent codes of dimension |z| = 10. We train our G2Gs for 100 epochs with a batch size of 128 and a learning rate of 0.0001 with Adam (Kingma & Ba, 2014) optimizer on a single GTX 1080Ti GPU card. The λ is set as 20 for reaction center identification module, and the beam size is 10 during inference. The maximal number of transformation steps is set as 20.