SqueezedText: A Real-Time Scene Text Recognition by Binary Convolutional Encoder-Decoder Network
Authors: Zichuan Liu, Yixing Li, Fengbo Ren, Wang Ling Goh, Hao Yu
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By training with over 1,000,000 synthetic scene text images, the B-CEDNet achieves a recall rate of 0.86, precision of 0.88 and F-score of 0.87 on ICDAR-03 and ICDAR-13. With the correction and classification by Bi-RNN, the proposed real-time scene text recognition achieves state-of-the-art accuracy while only consumes less than 1-ms inference run-time. |
| Researcher Affiliation | Academia | Nanyang Technological University, Singapore1, Arizona State Unviversity, the USA2 and Southern University of Science and Technology, China3 |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It provides architectural diagrams and mathematical equations. |
| Open Source Code | No | The paper does not provide any concrete access (e.g., specific repository link, explicit release statement) to the source code for the methodology described. |
| Open Datasets | Yes | To achieve generality of trained model, it usually needs a large amount of labeled data for training. However, the existing datasets are limited to wordlevel annotation (Veit et al. 2016) or cannot provide enough pixel-wise labeled data (Karatzas et al. 2013). Therefore, we create a text rendering engine that generates texts with different fonts, graylevels and projective distortions. The labeled image has the same size with the corresponding text image and provides a pixel-wise labeling over the category space. This dataset contains over 1,000,000 synthesized text images. Some examples are shown in Fig. 4. [...] Four popular benchmarks for scene text recognition are used for performance evaluation, ICDAR-2003 (IC03), ICDAR-2013 (IC13), IIIT 5k-word (IIIT5k) and Synth90k. |
| Dataset Splits | No | The paper mentions training data, but does not explicitly provide details for a validation split (e.g., percentages, sample counts, or specific strategies like k-fold cross-validation). |
| Hardware Specification | Yes | The experiments are carried out on Dell Precision T7500 server with Intel Xeon 5600 processor, 64GB memory and NVIDIA TITAN X GPU. |
| Software Dependencies | Yes | Both the B-CEDNet model and the Bi-RNN model are built based on Tensorflow 0.9v (Abadi et al. 2016). |
| Experiment Setup | Yes | Both networks are trained using Adam optimizer with learning rate of 0.0005, default decay rates β1 = 0.9 and β2 = 0.999, and a batch size of 20. The B-CEDNet is trained for up to 50 epochs and the bidirectional RNN is trained for 40 epochs before the convergence is observed. |