A Graph to Graphs Framework for Retrosynthesis Prediction
Authors: Chence Shi, Minkai Xu, Hongyu Guo, Ming Zhang, Jian Tang
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that G2Gs significantly outperforms existing template-free approaches by up to 63% in terms of the top-1 accuracy and achieves a performance close to that of state-of-the-art templatebased approaches, but does not require domain knowledge and is much more scalable. |
| Researcher Affiliation | Academia | 1Department of Computer Science, School of EECS, Peking University 2Shanghai Jiao Tong University 3National Research Council Canada 4Montr eal Institute for Learning Algorithms (MILA) 5Canadian Institute for Advanced Research (CIFAR) 6HEC Montr eal. Correspondence to: Chence Shi <chenceshi@pku.edu.cn>, Jian Tang <jian.tang@hec.ca>. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include an unambiguous statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We evaluate our approach on the widely used benchmark data set USPTO-50k, which contains 50k atommapped reactions with 10 reaction types. |
| Dataset Splits | Yes | Following (Liu et al., 2017), we randomly select 80% of the reactions as training set and divide the rest into validation and test sets with equal size. |
| Hardware Specification | Yes | We train our G2Gs for 100 epochs with a batch size of 128 and a learning rate of 0.0001 with Adam (Kingma & Ba, 2014) optimizer on a single GTX 1080Ti GPU card. |
| Software Dependencies | No | G2Gs is implemented in Py-torch (Paszke et al., 2017). We use the open-source chemical software RDkit (Landrum, 2016) to preprocess molecules for the training and generate canonical SMILES strings for the evaluation. Specific version numbers for PyTorch and RDKit are not provided. |
| Experiment Setup | Yes | The R-GCN in G2Gs is implemented with 4 layers and the embedding size is set as 512 for both modules. We use latent codes of dimension |z| = 10. We train our G2Gs for 100 epochs with a batch size of 128 and a learning rate of 0.0001 with Adam (Kingma & Ba, 2014) optimizer on a single GTX 1080Ti GPU card. The λ is set as 20 for reaction center identification module, and the beam size is 10 during inference. The maximal number of transformation steps is set as 20. |