RetCL: A Selection-based Approach for Retrosynthesis via Contrastive Learning
Authors: Hankook Lee, Sungsoo Ahn, Seung-Woo Seo, You Young Song, Eunho Yang, Sung Ju Hwang, Jinwoo Shin
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the benefits of the proposed selectionbased approach. For example, when all 671k reactants in the USPTO database are given as candidates, our RETCL achieves top-1 exact match accuracy of 71.3% for the USPTO-50k benchmark, while a recent transformer-based approach achieves 59.6%. We also demonstrate that RETCL generalizes well to unseen templates in various settings in contrast to template-based approaches. |
| Researcher Affiliation | Collaboration | 1Korea Advanced Institute of Science and Technology 2Mohamed bin Zaayed University of Artificial Intelligence 3Standigm 4Samsung Electronics 5AITRICS |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: 'The supplementary material is available at ar Xiv:2105.00795.', but does not explicitly state that the source code for the methodology is provided within this material. |
| Open Datasets | Yes | We mainly evaluate our framework in USPTO-50k, which is a standard benchmark for the task of retrosynthesis. It contains 50k reactions of 10 reaction types derived from the US patent literature, and we divide it into training/validation/test splits following [Coley et al., 2017]. To apply our framework, we choose the candidate set of commercially available molecules C as the all reactants in the entire USPTO database as [Guo et al., 2020] did. This results in the candidate set with a size of 671,518. |
| Dataset Splits | Yes | We mainly evaluate our framework in USPTO-50k, which is a standard benchmark for the task of retrosynthesis. It contains 50k reactions of 10 reaction types derived from the US patent literature, and we divide it into training/validation/test splits following [Coley et al., 2017]. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other library versions). |
| Experiment Setup | Yes | Hyperparameters. We use a single shared 5-layer structure2vec [Dai et al., 2016; Dai et al., 2019] architecture and three separate 2-layer residual blocks with an embedding size of 256. To obtain graph-level embedding vectors, we use sum pooling over mean pooling since it captures the size information of molecules. For contrastive learning, we use a temperature of τ = 0.1 and K = 4 nearest neighbors for hard negative mining. |