reproducibilityindex.ai

Decoupled Attention Network for Text Recognition

Authors: Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Canjie Luo, Xiaoxue Chen, Yaqiang Wu, Qianying Wang, Mingxiang Cai12216-12224

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that DAN achieves state-of-the-art performance on multiple text recognition tasks, including ofﬂine handwritten text recognition and regular/irregular scene text recognition.
Researcher Affiliation	Collaboration	Tianwei Wang,1 Yuanzhi Zhu,1 Lianwen Jin,1 Canjie Luo,1 Xiaoxue Chen,1 Yaqiang Wu,2 Qianying Wang,2 Mingxiang Cai2 1School of Electronic and Information Engineering, South China University of Technology 2Lenovo Research
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Codes will be released.1 1https://github.com/Wang-Tianwei/Decoupled-attention-network
Open Datasets	Yes	Two public handwritten datasets are used to evaluate the effectiveness of DAN, including IAM (Marti and Bunke 2002) and RIMES (Grosicki et al. 2009). ... Two types of datasets are used for scene text recognition: regular scene text datasets, including IIIT5KWords (Mishra, Alahari, and Jawahar 2012), Street View Text (Wang, Babenko, and Belongie 2011), ICDAR 2003 (Lucas et al. 2003) and ICDAR 2013 (Karatzas et al. 2013); and irregular scene text datasets, including SVT-Perspective (Neumann and Matas 2012), CUTE80 (Risnumawan et al. 2014) and ICDAR 2015 (Karatzas et al. 2015).
Dataset Splits	Yes	The IAM dataset... contains 747 documents (6,482 lines) in the training set, 116 documents (976 lines) in the validation set and 336 documents (2,915 lines) in the test set. The RIMES dataset... There are 1,500 paragraphs (11,333 lines) in the training set, and 100 paragraphs (778 lines) in the testing set.
Hardware Specification	Yes	The time/iter means forward time per iteration on TITAN X GPU.
Software Dependencies	No	The paper mentions tools and optimizers (e.g., ADADELTA, an open-source data-augmentation toolkit), but does not specify version numbers for any software dependencies.
Experiment Setup	Yes	The height of the input image is normalized as 192 and the width is calculated with the original aspect ratio (up to 2048). ... max T is set to 150... All the layers of CAM except the last one are set as 128 channels... The height of the input image is set to 32 and the width is calculated with the original aspect ratio (up to 128). max T is set as 25; L is set as 8; and all the layers of CAM except the last one are set as 64. We use the bi-directional decoder proposed in (Shi et al. 2018) for ﬁnal prediction. With ADADELTA (Zeiler 2012) optimization method, the learning rate is set as 1.0 and reduced to 0.1 after the third epoch.