reproducibilityindex.ai

SqueezedText: A Real-Time Scene Text Recognition by Binary Convolutional Encoder-Decoder Network

Authors: Zichuan Liu, Yixing Li, Fengbo Ren, Wang Ling Goh, Hao Yu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By training with over 1,000,000 synthetic scene text images, the B-CEDNet achieves a recall rate of 0.86, precision of 0.88 and F-score of 0.87 on ICDAR-03 and ICDAR-13. With the correction and classiﬁcation by Bi-RNN, the proposed real-time scene text recognition achieves state-of-the-art accuracy while only consumes less than 1-ms inference run-time.
Researcher Affiliation	Academia	Nanyang Technological University, Singapore1, Arizona State Unviversity, the USA2 and Southern University of Science and Technology, China3
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It provides architectural diagrams and mathematical equations.
Open Source Code	No	The paper does not provide any concrete access (e.g., specific repository link, explicit release statement) to the source code for the methodology described.
Open Datasets	Yes	To achieve generality of trained model, it usually needs a large amount of labeled data for training. However, the existing datasets are limited to wordlevel annotation (Veit et al. 2016) or cannot provide enough pixel-wise labeled data (Karatzas et al. 2013). Therefore, we create a text rendering engine that generates texts with different fonts, graylevels and projective distortions. The labeled image has the same size with the corresponding text image and provides a pixel-wise labeling over the category space. This dataset contains over 1,000,000 synthesized text images. Some examples are shown in Fig. 4. [...] Four popular benchmarks for scene text recognition are used for performance evaluation, ICDAR-2003 (IC03), ICDAR-2013 (IC13), IIIT 5k-word (IIIT5k) and Synth90k.
Dataset Splits	No	The paper mentions training data, but does not explicitly provide details for a validation split (e.g., percentages, sample counts, or specific strategies like k-fold cross-validation).
Hardware Specification	Yes	The experiments are carried out on Dell Precision T7500 server with Intel Xeon 5600 processor, 64GB memory and NVIDIA TITAN X GPU.
Software Dependencies	Yes	Both the B-CEDNet model and the Bi-RNN model are built based on Tensorﬂow 0.9v (Abadi et al. 2016).
Experiment Setup	Yes	Both networks are trained using Adam optimizer with learning rate of 0.0005, default decay rates β1 = 0.9 and β2 = 0.999, and a batch size of 20. The B-CEDNet is trained for up to 50 epochs and the bidirectional RNN is trained for 40 epochs before the convergence is observed.