DSRN: A Deep Scale Relationship Network for Scene Text Detection

Authors: Yuxin Wang, Hongtao Xie, Zilong Fu, Yongdong Zhang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On standard datasets including ICDAR2015 and MSRA-TD500, the proposed algorithm achieves the state-of-art performance with impressive speed (8.8 FPS on ICDAR2015 and 13.3 FPS on MSRA-TD500).
Researcher Affiliation Academia Yuxin Wang , Hongtao Xie , Zilong Fu and Yongdong Zhang School of Information Science and Technology, University of Science and Technology of China {wangyx58, Jerome F}@mail.ustc.edu.cn, {htxie, zhyd73}@ustc.edu.cn
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not include any statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes ICDAR2015. This is a dataset for incidental scene text detection proposed in the Challenge 4 of ICDAR 2015 Robust Reading Competition [Karatzas et al., 2015]. MSRA-TD500. This is a dataset proposed in [Yao et al., 2012] for detecting arbitrary-oriented and multi-lingual long text lines. HUST. [Yao et al., 2014] is a dataset contains 400 images, which consists of Arabic numbers and English letters of different fonts with text line level labels.
Dataset Splits Yes ICDAR2015. It includes 1000 training images and 500 test images with annotations labeled as 4 vertices of a word level quadrangle. MSRA-TD500. It contains 300 images for training and 200 images for testing. Since the size of training data is too small to learn a deep newtwork, we also use 400 images from HUST [Yao et al., 2014] in training stage.
Hardware Specification Yes Our proposed network is trained end-to-end on NVIDIA TITAN X GPU using ADAM[Kingma and Ba, 2014] optimizer.
Software Dependencies No The paper mentions using ADAM optimizer and ResNet50 as a basic network, but does not specify version numbers for any software dependencies like programming languages, frameworks (e.g., PyTorch, TensorFlow), or libraries.
Experiment Setup Yes We perform data augmentation by randomly cropping each image and resize it to 512 512 for training. We update the learning rate by a multi-step strategy. The initial learning rate is 1e-3, and decays by 0.94 every 10k steps. We set batchsize to be 14 and training continues until convergence. In test stage, NMS is implemented to reduce the redundant results. We choose θ and β to 1.7 and 0.3 respectively in training stage and resize images to 768 1280 in test stage.