CentripetalText: An Efficient Text Instance Representation for Scene Text Detection

Authors: Tao Sheng, Jie Chen, Zhouhui Lian

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate the effectiveness of our method, we conduct experiments on several commonly used scene text benchmarks, including both curved and multi-oriented text datasets. For the task of scene text detection, our approach achieves superior or competitive performance compared to other existing methods, e.g., F-measure of 86.3% at 40.0 FPS on Total-Text, F-measure of 86.1% at 34.8 FPS on MSRA-TD500, etc.
Researcher Affiliation Academia Tao Sheng, Jie Chen, Zhouhui Lian Wangxuan Institute of Computer Technology Peking University, Beijing, China {shengtao, jiechen01, lianzhouhui}@pku.edu.cn
Pseudocode No The paper describes the model and inference steps but does not include structured pseudocode or an algorithm block.
Open Source Code Yes The source code is available at https://github.com/shengtao96/Centripetal Text
Open Datasets Yes Synth Text [6] is a synthetic dataset, consisting of more than 800,000 synthetic images. Total-Text [2] is a curved text dataset including 1,255 training images and 300 testing images. CTW1500 [47] is another curved text dataset including 1,000 training images and 500 testing images. MSRA-TD500 [45] is a multi-oriented text dataset which contains 300 training images and 200 testing images with text-line level annotation.
Dataset Splits No Total-Text [2] is a curved text dataset including 1,255 training images and 300 testing images. CTW1500 [47] is another curved text dataset including 1,000 training images and 500 testing images. MSRA-TD500 [45] is a multi-oriented text dataset which contains 300 training images and 200 testing images with text-line level annotation. Due to its small scale, we follow the previous works [51, 23] to include 400 extra training images from HUST-TR400 [44]. The paper specifies train/test splits but does not explicitly provide a separate validation dataset split with counts or percentages.
Hardware Specification Yes All those models are tested with a batch size of 1 on a GTX 1080Ti GPU without bells and whistles.
Software Dependencies No The paper mentions software components like 'Res Net', 'Adam optimizer', and 'Open CV library' but does not specify their version numbers.
Experiment Setup Yes All models are optimized by the Adam optimizer with the batch size of 16 on 4 GPUs. We train our model under two training strategies: (1) learning from scratch; (2) fine-tuning models pre-trained on the Synth Text dataset. Whichever training strategies, we pre-train models on Synth Text for 50k iterations with a fixed learning rate of 1 10 3, and train models on real datasets for 36k iterations with the poly learning rate strategy [50], where power is set to 0.9 and the initial learning rate is 1 10 3. In addition, we set the negative-positive ratio of OHEM to 3, and the shrinking rate of text kernel to 0.7.