Gated Recurrent Convolution Neural Network for OCR
Authors: Jianfeng Wang, Xiaolin Hu
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that the proposed model outperforms existing methods on several benchmark datasets including the IIIT-5K, Street View Text (SVT) and ICDAR. ... The proposed method outperforms most existing models for both constrained and unconstrained text recognition. |
| Researcher Affiliation | Academia | Jianfeng Wang Beijing University of Posts and Telecommunications Beijing 100876, China jianfengwang1991@gmail.com Xiaolin Hu Tsinghua National Laboratory for Information Science and Technology (TNList) Department of Computer Science and Technology Center for Brain-Inspired Computing Research (CBICR) Tsinghua University, Beijing 100084, China xlhu@tsinghua.edu.cn |
| Pseudocode | No | No pseudocode or algorithm blocks found. |
| Open Source Code | Yes | The code and pre-trained model will be released at https://github.com/ Jianfeng1991/GRCNN-for-OCR. |
| Open Datasets | Yes | ICDAR2003: ICDAR2003 [24] contains 251 scene images and there are 860 cropped images of the words. ... IIIT5K: This dataset has 3000 cropped testing word images and 2000 cropped training images collected from the Internet [31]. ... Street View Text (SVT): This dataset has 647 cropped word images from Google Street View [36]. ... Synth90k: This dataset contains around 7 million training images, 800k validation images and 900k test images [15]. |
| Dataset Splits | Yes | The validation set of Synth90k is used for model selection. |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, processors, memory) mentioned for experiments. |
| Software Dependencies | No | The ADADELTA method [41] is used for training with the parameter ρ=0.9. |
| Experiment Setup | Yes | The input is a gray-scale image which is resized to 100 32. Before input to the network, the pixel values are rescaled to the range (-1, 1). The final output of the feature extractor is a feature sequence of 26 frames. The recurrent layer is a bidirectional LSTM with 512 units without dropout. The ADADELTA method [41] is used for training with the parameter ρ=0.9. The batch size is set to 192 and training is stopped after 300k iterations. |