Reading Scene Text in Deep Convolutional Sequences
Authors: Pan He, Weilin Huang, Yu Qiao, Chen Loy, Xiaoou Tang
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | It achieves impressive results on several benchmarks, advancing the-state-of-the-art substantially. The experiments were conducted on three standard benchmarks for cropped word image recognition: the Street View Text (SV T) (Wang, Babenko, and Belongie 2011), ICDAR 2003 (IC03) (Lucas et al. 2003) and IIIT 5K-word (IIIT5K) (Mishra., Alahari, and Jawahar 2012). The recognition results by the DTRN are presented in Fig. 5, including both the correct and incorrect recognitions. The results on three benchmarks are compared with the state-of-the-art in Table 1. |
| Researcher Affiliation | Academia | 1Shenzhen Key Lab of Comp. Vis and Pat. Rec., Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China 2Department of Information Engineering, The Chinese University of Hong Kong |
| Pseudocode | No | The paper provides descriptions of the model architecture and mathematical formulations, but no structured pseudocode or algorithm blocks are present. |
| Open Source Code | No | The paper does not provide any explicit statement about open-sourcing the code for their described methodology or a link to a code repository. |
| Open Datasets | Yes | Our CNN model is trained on about 1.8 105 character images cropped from the training sets of a number of benchmarks by (Jaderberg, Vedaldi, and Zisserman 2014). The RNN is trained on about 3000 word images (all characters of them are included in previously-used 1.8 105 character images), taken from the training sets of three benchmarks used bellow. The experiments were conducted on three standard benchmarks for cropped word image recognition: the Street View Text (SV T) (Wang, Babenko, and Belongie 2011), ICDAR 2003 (IC03) (Lucas et al. 2003) and IIIT 5K-word (IIIT5K) (Mishra., Alahari, and Jawahar 2012). |
| Dataset Splits | No | The paper mentions training and testing datasets, for example, 'The IIIT5K is comprised of 5000 cropped word images from both scene and born-digital images. The dataset is split into subsets of 2000 and 3000 images for training and test.' However, it does not explicitly specify a separate validation set or its split details. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU/CPU models, memory, or specific computing environments used for running the experiments. |
| Software Dependencies | No | The paper mentions using a 'Maxout CNN' and 'bidirectional LSTM' and 'CTC', but does not provide specific version numbers for any software, libraries, or frameworks used. |
| Experiment Setup | Yes | The recurrent model is trained with steepest descent. The parameters are updated per training sequence by using a learning rate of 10 4 and a momentum of 0.9. Our CNN model is trained on 36-class case insensitive character images. |