GTA: Graph Truncated Attention for Retrosynthesis
Authors: Seung-Woo Seo, You Young Song, June Yong Yang, Seohui Bae, Hankook Lee, Jinwoo Shin, Sung Ju Hwang, Eunho Yang531-539
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model achieves new state-of-the-art records, i.e., exact match top-1 and top-10 accuracies of 51.1 % and 81.6 % on the USPTO-50k benchmark dataset, respectively, and 46.0 % and 70.0 % on the USPTO-full dataset, respectively, both without any reaction class information. In this section, we provide experimental justifications to our statements. First, as stated in our contributions, we show that even the naive Transformer is capable of achieving stateof-the-art performance simply by tuning hyperparameters. Second, we demonstrate the even higher performance of the vanilla Transformer equipped with our graph-truncated attention. |
| Researcher Affiliation | Collaboration | 1 Samsung Advanced Institute of Technology (SAIT), Samsung Electronics 2 Korea Advanced Institute of Science and Technology (KAIST) |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | In addition, our implementation, data, and pretrained weight details can be found in Supplementary. |
| Open Datasets | Yes | We use the opensource reaction database from U.S. patent, USPTO-full and USPTO-50k as a benchmark in this study which was used in previous studies. Detailed information on each dataset and difference between them is well summarized in (Thakkar et al. 2020). USPTO-full contains the reactions in USPTO patent from 1976 to 2016, curated by (Lowe 2012, 2017)... |
| Dataset Splits | Yes | We follow data splitting strategy of (Dai et al. 2019) which is randomly dividing train/valid/test set to 80 %/10 %/10 % of data. |
| Hardware Specification | Yes | Adam (Kingma and Ba 2015) optimization method with noam decay (Vaswani et al. 2017a) and learning rate scheduling for 8000 warm-up steps on a single Nvidia Tesla V100 GPU takes approximately 7, 18 hours, and 15 days of training time for USPTO-50k plain, 2P2R s, and USPTO-full dataset respectively. |
| Software Dependencies | No | GTA is build-up on the work of (Chen et al. 2019) which is based on Open Neural Machine Translation (ONMT) (Klein et al. 2017, 2018) and Pytorch (Paszke et al. 2017). We also used RDkit (Landrum et al. 2006) for extracting distance matrix, atom-mapping, and SMILES preand post-processing. |
| Experiment Setup | Yes | GTA implements Transformer architecture with 6 and 10 layers of both encoder and decoder for USPTO-50k and USPTO-full dataset, respectively. Embedding size is set to 256, the number of heads is fixed to 8, and dropout probability to 0.3. We train our model using earlystopping method, training was stopped without improvement within 40 times in validation loss and accuracy for every 1000 (for USPTO-50k) and 10000 (for USPTO-full) steps with a batch size of maximum 4096 tokens in batch. Relative positional encoding (Shaw, Uszkoreit, and Vaswani 2018) is used with maximum relative distance of 4. Adam (Kingma and Ba 2015) optimization method with noam decay (Vaswani et al. 2017a) and learning rate scheduling for 8000 warm-up steps on a single Nvidia Tesla V100 GPU takes approximately 7, 18 hours, and 15 days of training time for USPTO-50k plain, 2P2R s, and USPTO-full dataset respectively. |