Learning to Read Irregular Text with Attention Mechanisms

Authors: Xiao Yang, Dafang He, Zihan Zhou, Daniel Kifer, C. Lee Giles

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our model outperforms previous work on two irregular-text datasets: SVT-Perspective and CUTE80, and is also highly-competitive on several regular-text datasets containing primarily horizontal and frontal text. 5 Experiments We first conduct ablation experiments to carefully investigate the effectiveness of the model components. After that, we evaluate our model on a number of standard benchmark datasets for scene text recognition, and report word prediction accuracy.
Researcher Affiliation Academia Xiao Yang, Dafang He, Zihan Zhou, Daniel Kifer, C. Lee Giles The Pennsylvania State University, University Park, PA 16802, USA {xuy111, duh188}@psu.edu, zzhou@ist.psu.edu, dkifer@cse.psu.edu, giles@ist.psu.edu
Pseudocode No The paper describes the model architecture and components but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper mentions that a generated dataset 'will be made public', but there is no explicit statement about releasing the source code for the methodology.
Open Datasets Yes SVT-Perspective [Quy Phan et al., 2013], CUTE80 [Risnumawan et al., 2014], ICDAR03 [Lucas et al., 2003], SVT [Wang et al., 2011], III5K [Mishra et al., 2012], Following a similar method, we generate a large-scale synthetic dataset containing perspectively distorted and curved text... Such dataset will be made public to support future research for irregular text reading.
Dataset Splits No The paper mentions 'validation set' in Figure 5 but does not provide specific details on the dataset split percentages or sample counts for training, validation, or testing.
Hardware Specification No The paper does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions the use of 'Ada Delta [Zeiler, 2012]' but does not provide specific version numbers for software dependencies or other libraries used in the implementation.
Experiment Setup Yes The hyper parameters λ1 and λ2 in our training objective L are set to 10 at the beginning and decrease throughout training. To approximate WD, we project the 2D attention weights along 4 directions: 0 (horizontal), 90 (vertical), 45 and -45 . Beam Search with a window size of 3 is used for decoding in r. The proposed model is trained in an end-to-end manner using stochastic gradient decent. We adopt Ada Delta [Zeiler, 2012] to automatically adjust the learning rate.