Decoupled Attention Network for Text Recognition
Authors: Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Canjie Luo, Xiaoxue Chen, Yaqiang Wu, Qianying Wang, Mingxiang Cai12216-12224
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that DAN achieves state-of-the-art performance on multiple text recognition tasks, including offline handwritten text recognition and regular/irregular scene text recognition. |
| Researcher Affiliation | Collaboration | Tianwei Wang,1 Yuanzhi Zhu,1 Lianwen Jin,1 Canjie Luo,1 Xiaoxue Chen,1 Yaqiang Wu,2 Qianying Wang,2 Mingxiang Cai2 1School of Electronic and Information Engineering, South China University of Technology 2Lenovo Research |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codes will be released.1 1https://github.com/Wang-Tianwei/Decoupled-attention-network |
| Open Datasets | Yes | Two public handwritten datasets are used to evaluate the effectiveness of DAN, including IAM (Marti and Bunke 2002) and RIMES (Grosicki et al. 2009). ... Two types of datasets are used for scene text recognition: regular scene text datasets, including IIIT5KWords (Mishra, Alahari, and Jawahar 2012), Street View Text (Wang, Babenko, and Belongie 2011), ICDAR 2003 (Lucas et al. 2003) and ICDAR 2013 (Karatzas et al. 2013); and irregular scene text datasets, including SVT-Perspective (Neumann and Matas 2012), CUTE80 (Risnumawan et al. 2014) and ICDAR 2015 (Karatzas et al. 2015). |
| Dataset Splits | Yes | The IAM dataset... contains 747 documents (6,482 lines) in the training set, 116 documents (976 lines) in the validation set and 336 documents (2,915 lines) in the test set. The RIMES dataset... There are 1,500 paragraphs (11,333 lines) in the training set, and 100 paragraphs (778 lines) in the testing set. |
| Hardware Specification | Yes | The time/iter means forward time per iteration on TITAN X GPU. |
| Software Dependencies | No | The paper mentions tools and optimizers (e.g., ADADELTA, an open-source data-augmentation toolkit), but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | The height of the input image is normalized as 192 and the width is calculated with the original aspect ratio (up to 2048). ... max T is set to 150... All the layers of CAM except the last one are set as 128 channels... The height of the input image is set to 32 and the width is calculated with the original aspect ratio (up to 128). max T is set as 25; L is set as 8; and all the layers of CAM except the last one are set as 64. We use the bi-directional decoder proposed in (Shi et al. 2018) for final prediction. With ADADELTA (Zeiler 2012) optimization method, the learning rate is set as 1.0 and reduced to 0.1 after the third epoch. |