Scene Text Recognition from Two-Dimensional Perspective

Authors: Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai8714-8721

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that the proposed algorithm outperforms previous methods on both regular and irregular text datasets. Moreover, it is proven to be more robust to imprecise localizations in the text detection phase, which are very common in practice.
Researcher Affiliation Collaboration 1Huazhong University of Science and Technology, 2Megvii (Face++)
Pseudocode No The paper describes formulas and steps for label generation and loss function calculation but does not provide a formal pseudocode block or algorithm.
Open Source Code No The paper does not provide an explicit statement or link for the open-sourcing of the code for their methodology.
Open Datasets Yes Synth Text is a synthetic text dataset proposed in (Gupta, Vedaldi, and Zisserman 2016). It contains 800,000 training images which are aimed at text detection. We crop them based on their word bounding boxes. It generates about 7 million images for text recognition. These images are with character-level annotations.
Dataset Splits No The paper mentions training data and test datasets but does not explicitly describe a validation split for hyperparameter tuning or early stopping during training.
Hardware Specification Yes We test our method with a single Titan Xp GPU.
Software Dependencies No The paper mentions using Adam (Kingma and Ba 2014) as an optimizer but does not specify versions for other key software components like deep learning frameworks (e.g., TensorFlow, PyTorch) or programming languages.
Experiment Setup Yes The input images are randomly resized to 32 × 128, 48 × 192, and 64 × 256. Besides, data augmentation is also applied in the training period, including random rotation, hue, brightness, contrast, and blur. Specifically, we randomly rotate the image with an angle in the range of [−15◦, 15◦]. We use Adam (Kingma and Ba 2014) to optimize our training with the initial learning rate 10−4. The learning rate is decreased to 10−5 and 10−6 at epoch 3 and epoch 4. The model is totally trained for about 5 epochs.