Orientation-Independent Chinese Text Recognition in Scene Images

Authors: Haiyang Yu, Xiaocong Wang, Bin Li, Xiangyang Xue

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on a scene dataset for benchmarking Chinese text recognition, and the results demonstrate that the proposed method can indeed improve performance through disentangling content and orientation information. To further validate the effectiveness of our method, we additionally collect a Vertical Chinese Text Recognition (VCTR) dataset. The experimental results show that the proposed method achieves 45.63% improvement on VCTR when introducing CIRN to the baseline model.
Researcher Affiliation Academia Shanghai Key Laboratory of Intelligent Information Processing School of Computer Science, Fudan University {hyyu20, xcwang20, libin, xyxue}@fudan.edu.cn
Pseudocode No No explicit pseudocode or algorithm blocks are present in the paper.
Open Source Code Yes The code of our method and VCTR dataset are available at Git Hub1. 1https://github.com/FudanVI/FudanOCR/orientation-independent-CTR
Open Datasets Yes The scene dataset is collected by [Chen et al., 2021c] and derives from six existing datasets, including RCTW [Shi et al., 2017], Re CTS [Zhang et al., 2019], LSVT [Sun et al., 2019], Ar T [Chng et al., 2019], and CTW [Yuan et al., 2019]. ... To validate the effectiveness of our method in tackling vertical text images, we collect a Vertical Chinese Text Recognition (VCTR) dataset from Poster Erase [Jiang et al., 2022]
Dataset Splits Yes This dataset contains 509,164 samples for training, 63,645 for validation, and 63,646 for test.
Hardware Specification Yes Our method is implemented with Py Torch, and all experiments are conducted on an NVIDIA RTX 2080Ti GPU with 11GB memory.
Software Dependencies No The paper mentions 'Py Torch' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Our method is implemented with Py Torch, and all experiments are conducted on an NVIDIA RTX 2080Ti GPU with 11GB memory. The Ada Delta [Zeiler, 2012] optimizer is adopted to train our model with an initial learning rate 1.0, and the hyperparameters ρ and weight decay are set to 0.9 and 10 4, respectively. The batch size is set to 64. For fair comparison with the previous method [Chen et al., 2021c], the input text images are resized into 32 256. For vertical text images, we follow [Li et al., 2019] to rotate them by 90 degrees anti-clockwise. However, different from the rule in [Li et al., 2019] that regards the samples with height larger than width as vertical ones, we assume that the samples with height larger than 1.5 width are vertical text images.