Scene Text Detection with Supervised Pyramid Context Network

Authors: Enze Xie, Yuhang Zang, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li9038-9045

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on standard datasets demonstrate that our SPCNET clearly outperforms start-of-the-art methods. Specifically, it achieves an F-measure of 92.1% on ICDAR2013, 87.2% on ICDAR2015, 74.1% on ICDAR2017 MLT and 82.9% on Total-Text.
Researcher Affiliation Collaboration Enze Xie,1,3 Yuhang Zang,2,3 Shuai Shao,3 Gang Yu,3 Cong Yao,3 Guangyao Li1 1Department of Comuter Science and Technology, Tongji University 2School of Information and Software Engineering, University of Electronic Science and Technology of China 3Megvii (Face++) Technology Inc.
Pseudocode No The paper presents architectural diagrams and mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include any explicit statement about releasing open-source code or provide a link to a code repository.
Open Datasets Yes Synth Text (Gupta, Vedaldi, and Zisserman 2016) is a synthetically generated dataset composed of 800000 synthetic images. We use the dataset with word-level labels to pre-train our model. ICDAR2017 MLT (Nayef et al. 2017) is a dataset focuses on multi-oriented, multi-scripting, and multi-lingual aspects of scene text. ... ICDAR2015 (Karatzas et al. 2015) is a dataset proposed for incidental scene text detection. ... ICDAR2013 (Karatzas et al. 2013) is a dataset points at horizontal text in the scene. ... Total-Text (Ch ng and Chan 2017) is a newly-released benchmark for curved text detection.
Dataset Splits Yes ICDAR2017 MLT (Nayef et al. 2017) is a dataset focuses on multi-oriented, multi-scripting, and multi-lingual aspects of scene text. It consists of 7200 training images, 1800 validation images, and 9000 test images. We use both training set and validation set to train our model.
Hardware Specification No The paper states 'The network only takes 6h and 1h to complete training when use 8 GPUs' but does not provide specific GPU models, CPU types, or other detailed hardware specifications.
Software Dependencies No The paper mentions software components such as 'Res Net50', 'Mask R-CNN', 'Adam as optimizer', and 'Open CV' but does not provide specific version numbers for any of them.
Experiment Setup Yes We use Adam as optimizer with batch size 16, momentum 0.9 and weight decay 1e-4 in training. ... The initial learning rate is 2 * 10^-3 for all experiments. ... The aspect ratios of anchors are set to 1/5, 1/2, 1, 2, 5 for all experiments.