Learning to Read Irregular Text with Attention Mechanisms
Authors: Xiao Yang, Dafang He, Zihan Zhou, Daniel Kifer, C. Lee Giles
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model outperforms previous work on two irregular-text datasets: SVT-Perspective and CUTE80, and is also highly-competitive on several regular-text datasets containing primarily horizontal and frontal text. 5 Experiments We first conduct ablation experiments to carefully investigate the effectiveness of the model components. After that, we evaluate our model on a number of standard benchmark datasets for scene text recognition, and report word prediction accuracy. |
| Researcher Affiliation | Academia | Xiao Yang, Dafang He, Zihan Zhou, Daniel Kifer, C. Lee Giles The Pennsylvania State University, University Park, PA 16802, USA {xuy111, duh188}@psu.edu, zzhou@ist.psu.edu, dkifer@cse.psu.edu, giles@ist.psu.edu |
| Pseudocode | No | The paper describes the model architecture and components but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions that a generated dataset 'will be made public', but there is no explicit statement about releasing the source code for the methodology. |
| Open Datasets | Yes | SVT-Perspective [Quy Phan et al., 2013], CUTE80 [Risnumawan et al., 2014], ICDAR03 [Lucas et al., 2003], SVT [Wang et al., 2011], III5K [Mishra et al., 2012], Following a similar method, we generate a large-scale synthetic dataset containing perspectively distorted and curved text... Such dataset will be made public to support future research for irregular text reading. |
| Dataset Splits | No | The paper mentions 'validation set' in Figure 5 but does not provide specific details on the dataset split percentages or sample counts for training, validation, or testing. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of 'Ada Delta [Zeiler, 2012]' but does not provide specific version numbers for software dependencies or other libraries used in the implementation. |
| Experiment Setup | Yes | The hyper parameters λ1 and λ2 in our training objective L are set to 10 at the beginning and decrease throughout training. To approximate WD, we project the 2D attention weights along 4 directions: 0 (horizontal), 90 (vertical), 45 and -45 . Beam Search with a window size of 3 is used for decoding in r. The proposed model is trained in an end-to-end manner using stochastic gradient decent. We adopt Ada Delta [Zeiler, 2012] to automatically adjust the learning rate. |