Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting

Authors: Liang Qiao, Sanli Tang, Zhanzhan Cheng, Yunlu Xu, Yi Niu, Shiliang Pu, Fei Wu11899-11907

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our method achieves competitive performance on two standard text benchmarks, i.e., ICDAR 2013 and ICDAR 2015, and also obviously outperforms existing methods on irregular text benchmarks SCUT-CTW1500 and Total-Text.
Researcher Affiliation Collaboration Liang Qiao,1 Sanli Tang,1 Zhanzhan Cheng,2,1 Yunlu Xu,1 Yi Niu,1 Shiliang Pu,1 Fei Wu2 1Hikvision Research Institute, China; 2Zhejiang University, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The code will be published soon.
Open Datasets Yes The datasets used in this work are listed as follows: Synth Text 800k (Gupta, Vedaldi, and Zisserman 2016) contains 800k synthetic images... ICDAR2013 (Karatzas et al. 2013)... ICDAR2015 (Karatzas et al. 2015)... Total-Text (Ch ng and Chan 2017)... SCUT-CTW1500 (Liu et al. 2019a)...
Dataset Splits No The paper mentions 'validation' as part of the overall schema but does not explicitly provide training/test/validation dataset splits (percentages, counts, or references to predefined splits) in the text.
Hardware Specification Yes All experiments are implemented in Caffe with 8 32GB-Tesla-V100 GPUs.
Software Dependencies No The paper mentions 'Caffe' but does not provide specific version numbers for software components or libraries.
Experiment Setup Yes Training details. The networks are trained by SGD with batch-size=8, momentum=0.9 and weight-decay=5 10 4. For both detection and recognition part, we separately pretrain them on Synth Text for 5 epochs with initial learning rate 2 10 3. Then, we jointly fine-tune the whole network using the soft loss weight strategy mention previously on each dataset for other 80 epochs. The initial learning rate is 1 10 3. The learning rate will be divided by 10 for every 20 epochs. Online hard example mining (OHEM) (Shrivastava, Gupta, and Girshick 2016) strategy is also applied for balancing the foreground and background samples. Data augmentation. We conduct data augmentation by simultaneously 1) randomly scaling the longer side of input images with length in range of [720, 1600], 2) randomly rotating the images with the degree in range of [ 15 , 15 ], and 3) applying random brightness, jitters, and contrast on input images.