reproducibilityindex.ai

Scene Text Recognition from Two-Dimensional Perspective

Authors: Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai8714-8721

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that the proposed algorithm outperforms previous methods on both regular and irregular text datasets. Moreover, it is proven to be more robust to imprecise localizations in the text detection phase, which are very common in practice.
Researcher Affiliation	Collaboration	1Huazhong University of Science and Technology, 2Megvii (Face++)
Pseudocode	No	The paper describes formulas and steps for label generation and loss function calculation but does not provide a formal pseudocode block or algorithm.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of the code for their methodology.
Open Datasets	Yes	Synth Text is a synthetic text dataset proposed in (Gupta, Vedaldi, and Zisserman 2016). It contains 800,000 training images which are aimed at text detection. We crop them based on their word bounding boxes. It generates about 7 million images for text recognition. These images are with character-level annotations.
Dataset Splits	No	The paper mentions training data and test datasets but does not explicitly describe a validation split for hyperparameter tuning or early stopping during training.
Hardware Specification	Yes	We test our method with a single Titan Xp GPU.
Software Dependencies	No	The paper mentions using Adam (Kingma and Ba 2014) as an optimizer but does not specify versions for other key software components like deep learning frameworks (e.g., TensorFlow, PyTorch) or programming languages.
Experiment Setup	Yes	The input images are randomly resized to 32 × 128, 48 × 192, and 64 × 256. Besides, data augmentation is also applied in the training period, including random rotation, hue, brightness, contrast, and blur. Speciﬁcally, we randomly rotate the image with an angle in the range of [−15◦, 15◦]. We use Adam (Kingma and Ba 2014) to optimize our training with the initial learning rate 10−4. The learning rate is decreased to 10−5 and 10−6 at epoch 3 and epoch 4. The model is totally trained for about 5 epochs.