Gated Recurrent Convolution Neural Network for OCR

Authors: Jianfeng Wang, Xiaolin Hu

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that the proposed model outperforms existing methods on several benchmark datasets including the IIIT-5K, Street View Text (SVT) and ICDAR. ... The proposed method outperforms most existing models for both constrained and unconstrained text recognition.
Researcher Affiliation Academia Jianfeng Wang Beijing University of Posts and Telecommunications Beijing 100876, China jianfengwang1991@gmail.com Xiaolin Hu Tsinghua National Laboratory for Information Science and Technology (TNList) Department of Computer Science and Technology Center for Brain-Inspired Computing Research (CBICR) Tsinghua University, Beijing 100084, China xlhu@tsinghua.edu.cn
Pseudocode No No pseudocode or algorithm blocks found.
Open Source Code Yes The code and pre-trained model will be released at https://github.com/ Jianfeng1991/GRCNN-for-OCR.
Open Datasets Yes ICDAR2003: ICDAR2003 [24] contains 251 scene images and there are 860 cropped images of the words. ... IIIT5K: This dataset has 3000 cropped testing word images and 2000 cropped training images collected from the Internet [31]. ... Street View Text (SVT): This dataset has 647 cropped word images from Google Street View [36]. ... Synth90k: This dataset contains around 7 million training images, 800k validation images and 900k test images [15].
Dataset Splits Yes The validation set of Synth90k is used for model selection.
Hardware Specification No No specific hardware details (GPU/CPU models, processors, memory) mentioned for experiments.
Software Dependencies No The ADADELTA method [41] is used for training with the parameter ρ=0.9.
Experiment Setup Yes The input is a gray-scale image which is resized to 100 32. Before input to the network, the pixel values are rescaled to the range (-1, 1). The final output of the feature extractor is a feature sequence of 26 frames. The recurrent layer is a bidirectional LSTM with 512 units without dropout. The ADADELTA method [41] is used for training with the parameter ρ=0.9. The batch size is set to 192 and training is stopped after 300k iterations.