reproducibilityindex.ai

Orientation-Independent Chinese Text Recognition in Scene Images

Authors: Haiyang Yu, Xiaocong Wang, Bin Li, Xiangyang Xue

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on a scene dataset for benchmarking Chinese text recognition, and the results demonstrate that the proposed method can indeed improve performance through disentangling content and orientation information. To further validate the effectiveness of our method, we additionally collect a Vertical Chinese Text Recognition (VCTR) dataset. The experimental results show that the proposed method achieves 45.63% improvement on VCTR when introducing CIRN to the baseline model.
Researcher Affiliation	Academia	Shanghai Key Laboratory of Intelligent Information Processing School of Computer Science, Fudan University {hyyu20, xcwang20, libin, xyxue}@fudan.edu.cn
Pseudocode	No	No explicit pseudocode or algorithm blocks are present in the paper.
Open Source Code	Yes	The code of our method and VCTR dataset are available at Git Hub1. 1https://github.com/FudanVI/FudanOCR/orientation-independent-CTR
Open Datasets	Yes	The scene dataset is collected by [Chen et al., 2021c] and derives from six existing datasets, including RCTW [Shi et al., 2017], Re CTS [Zhang et al., 2019], LSVT [Sun et al., 2019], Ar T [Chng et al., 2019], and CTW [Yuan et al., 2019]. ... To validate the effectiveness of our method in tackling vertical text images, we collect a Vertical Chinese Text Recognition (VCTR) dataset from Poster Erase [Jiang et al., 2022]
Dataset Splits	Yes	This dataset contains 509,164 samples for training, 63,645 for validation, and 63,646 for test.
Hardware Specification	Yes	Our method is implemented with Py Torch, and all experiments are conducted on an NVIDIA RTX 2080Ti GPU with 11GB memory.
Software Dependencies	No	The paper mentions 'Py Torch' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	Our method is implemented with Py Torch, and all experiments are conducted on an NVIDIA RTX 2080Ti GPU with 11GB memory. The Ada Delta [Zeiler, 2012] optimizer is adopted to train our model with an initial learning rate 1.0, and the hyperparameters ρ and weight decay are set to 0.9 and 10 4, respectively. The batch size is set to 64. For fair comparison with the previous method [Chen et al., 2021c], the input text images are resized into 32 256. For vertical text images, we follow [Li et al., 2019] to rotate them by 90 degrees anti-clockwise. However, different from the rule in [Li et al., 2019] that regards the samples with height larger than width as vertical ones, we assume that the samples with height larger than 1.5 width are vertical text images.