DSRN: A Deep Scale Relationship Network for Scene Text Detection
Authors: Yuxin Wang, Hongtao Xie, Zilong Fu, Yongdong Zhang
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On standard datasets including ICDAR2015 and MSRA-TD500, the proposed algorithm achieves the state-of-art performance with impressive speed (8.8 FPS on ICDAR2015 and 13.3 FPS on MSRA-TD500). |
| Researcher Affiliation | Academia | Yuxin Wang , Hongtao Xie , Zilong Fu and Yongdong Zhang School of Information Science and Technology, University of Science and Technology of China {wangyx58, Jerome F}@mail.ustc.edu.cn, {htxie, zhyd73}@ustc.edu.cn |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not include any statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | ICDAR2015. This is a dataset for incidental scene text detection proposed in the Challenge 4 of ICDAR 2015 Robust Reading Competition [Karatzas et al., 2015]. MSRA-TD500. This is a dataset proposed in [Yao et al., 2012] for detecting arbitrary-oriented and multi-lingual long text lines. HUST. [Yao et al., 2014] is a dataset contains 400 images, which consists of Arabic numbers and English letters of different fonts with text line level labels. |
| Dataset Splits | Yes | ICDAR2015. It includes 1000 training images and 500 test images with annotations labeled as 4 vertices of a word level quadrangle. MSRA-TD500. It contains 300 images for training and 200 images for testing. Since the size of training data is too small to learn a deep newtwork, we also use 400 images from HUST [Yao et al., 2014] in training stage. |
| Hardware Specification | Yes | Our proposed network is trained end-to-end on NVIDIA TITAN X GPU using ADAM[Kingma and Ba, 2014] optimizer. |
| Software Dependencies | No | The paper mentions using ADAM optimizer and ResNet50 as a basic network, but does not specify version numbers for any software dependencies like programming languages, frameworks (e.g., PyTorch, TensorFlow), or libraries. |
| Experiment Setup | Yes | We perform data augmentation by randomly cropping each image and resize it to 512 512 for training. We update the learning rate by a multi-step strategy. The initial learning rate is 1e-3, and decays by 0.94 every 10k steps. We set batchsize to be 14 and training continues until convergence. In test stage, NMS is implemented to reduce the redundant results. We choose θ and β to 1.7 and 0.3 respectively in training stage and resize images to 768 1280 in test stage. |