SqueezedText: A Real-Time Scene Text Recognition by Binary Convolutional Encoder-Decoder Network

Authors: Zichuan Liu, Yixing Li, Fengbo Ren, Wang Ling Goh, Hao Yu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental By training with over 1,000,000 synthetic scene text images, the B-CEDNet achieves a recall rate of 0.86, precision of 0.88 and F-score of 0.87 on ICDAR-03 and ICDAR-13. With the correction and classification by Bi-RNN, the proposed real-time scene text recognition achieves state-of-the-art accuracy while only consumes less than 1-ms inference run-time.
Researcher Affiliation Academia Nanyang Technological University, Singapore1, Arizona State Unviversity, the USA2 and Southern University of Science and Technology, China3
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It provides architectural diagrams and mathematical equations.
Open Source Code No The paper does not provide any concrete access (e.g., specific repository link, explicit release statement) to the source code for the methodology described.
Open Datasets Yes To achieve generality of trained model, it usually needs a large amount of labeled data for training. However, the existing datasets are limited to wordlevel annotation (Veit et al. 2016) or cannot provide enough pixel-wise labeled data (Karatzas et al. 2013). Therefore, we create a text rendering engine that generates texts with different fonts, graylevels and projective distortions. The labeled image has the same size with the corresponding text image and provides a pixel-wise labeling over the category space. This dataset contains over 1,000,000 synthesized text images. Some examples are shown in Fig. 4. [...] Four popular benchmarks for scene text recognition are used for performance evaluation, ICDAR-2003 (IC03), ICDAR-2013 (IC13), IIIT 5k-word (IIIT5k) and Synth90k.
Dataset Splits No The paper mentions training data, but does not explicitly provide details for a validation split (e.g., percentages, sample counts, or specific strategies like k-fold cross-validation).
Hardware Specification Yes The experiments are carried out on Dell Precision T7500 server with Intel Xeon 5600 processor, 64GB memory and NVIDIA TITAN X GPU.
Software Dependencies Yes Both the B-CEDNet model and the Bi-RNN model are built based on Tensorflow 0.9v (Abadi et al. 2016).
Experiment Setup Yes Both networks are trained using Adam optimizer with learning rate of 0.0005, default decay rates β1 = 0.9 and β2 = 0.999, and a batch size of 20. The B-CEDNet is trained for up to 50 epochs and the bidirectional RNN is trained for 40 epochs before the convergence is observed.