CentripetalText: An Efficient Text Instance Representation for Scene Text Detection
Authors: Tao Sheng, Jie Chen, Zhouhui Lian
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate the effectiveness of our method, we conduct experiments on several commonly used scene text benchmarks, including both curved and multi-oriented text datasets. For the task of scene text detection, our approach achieves superior or competitive performance compared to other existing methods, e.g., F-measure of 86.3% at 40.0 FPS on Total-Text, F-measure of 86.1% at 34.8 FPS on MSRA-TD500, etc. |
| Researcher Affiliation | Academia | Tao Sheng, Jie Chen, Zhouhui Lian Wangxuan Institute of Computer Technology Peking University, Beijing, China {shengtao, jiechen01, lianzhouhui}@pku.edu.cn |
| Pseudocode | No | The paper describes the model and inference steps but does not include structured pseudocode or an algorithm block. |
| Open Source Code | Yes | The source code is available at https://github.com/shengtao96/Centripetal Text |
| Open Datasets | Yes | Synth Text [6] is a synthetic dataset, consisting of more than 800,000 synthetic images. Total-Text [2] is a curved text dataset including 1,255 training images and 300 testing images. CTW1500 [47] is another curved text dataset including 1,000 training images and 500 testing images. MSRA-TD500 [45] is a multi-oriented text dataset which contains 300 training images and 200 testing images with text-line level annotation. |
| Dataset Splits | No | Total-Text [2] is a curved text dataset including 1,255 training images and 300 testing images. CTW1500 [47] is another curved text dataset including 1,000 training images and 500 testing images. MSRA-TD500 [45] is a multi-oriented text dataset which contains 300 training images and 200 testing images with text-line level annotation. Due to its small scale, we follow the previous works [51, 23] to include 400 extra training images from HUST-TR400 [44]. The paper specifies train/test splits but does not explicitly provide a separate validation dataset split with counts or percentages. |
| Hardware Specification | Yes | All those models are tested with a batch size of 1 on a GTX 1080Ti GPU without bells and whistles. |
| Software Dependencies | No | The paper mentions software components like 'Res Net', 'Adam optimizer', and 'Open CV library' but does not specify their version numbers. |
| Experiment Setup | Yes | All models are optimized by the Adam optimizer with the batch size of 16 on 4 GPUs. We train our model under two training strategies: (1) learning from scratch; (2) fine-tuning models pre-trained on the Synth Text dataset. Whichever training strategies, we pre-train models on Synth Text for 50k iterations with a fixed learning rate of 1 10 3, and train models on real datasets for 36k iterations with the poly learning rate strategy [50], where power is set to 0.9 and the initial learning rate is 1 10 3. In addition, we set the negative-positive ratio of OHEM to 3, and the shrinking rate of text kernel to 0.7. |