TextFuseNet: Scene Text Detection with Richer Fused Features
Authors: Jian Ye, Zhe Chen, Juhua Liu, Bo Du
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on several datasets show that the proposed Text Fuse Net achieves state-of-the-art performance. Specifically, we achieve an F-measure of 94.3% on ICDAR2013, 92.1% on ICDAR2015, 87.1% on Total-Text and 86.6% on CTW-1500, respectively. |
| Researcher Affiliation | Academia | Jian Ye1 , Zhe Chen2 , Juhua Liu3 and Bo Du1 1School of Computer Science, Institute of Artificial Intelligence, and National Engineering Research Center for Multimedia Software, Wuhan University, China 2UBTECH Sydney AI Centre, School of Computer Science, Faculty of Engineering, The University of Sydney, Australia 3School of Printing and Packaging, and Institute of Artificial Intelligence, Wuhan University, China {leaf-yej, liujuhua, dubo}@whu.edu.cn, zhe.chen1@sydney.edu.au |
| Pseudocode | No | The paper describes the methodology using textual descriptions and architectural diagrams (Figure 2, Figure 3) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'We implemented our framework based on the Maskrcnn-benchmark' indicating usage of an existing open-source framework, but does not provide any explicit statement about releasing the source code for Text Fuse Net or a direct link to it. |
| Open Datasets | Yes | Synth Text is a synthetically generated dataset and usually used for pre-training the text detection models. ICDAR2013 is a typical horizontal text dataset and is proposed in Challenge 2 of the ICDAR 2013 Robust Reading Competition. ICDAR2015 is a multi-orient text dataset and is proposed in Challenge 4 of the ICDAR 2015 Robust Reading Competition. Total-Text is a comprehensive arbitrary shape text dataset for scene text reading. CTW-1500 also focuses on arbitrary shape text reading. |
| Dataset Splits | No | The paper specifies training and test image counts for datasets like ICDAR2013 (229 training, 233 test), ICDAR2015 (1000 training, 500 test), Total-Text (1255 training, 300 test), and CTW-1500 (1000 training, 500 test), but does not explicitly mention or detail a separate validation set split. |
| Hardware Specification | Yes | We implemented our framework based on the Maskrcnn-benchmark, and all experiments are conducted on a high-performance server with NVidia Tesla V100 (16G) GPUs. |
| Software Dependencies | No | The paper states 'We implemented our framework based on the Maskrcnn-benchmark' but does not provide specific version numbers for this framework or any other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The weight decay is set to 0.0001, momentum is set to 0.9, and batch size is set to 8. In the pre-training stage, we train the model on Synth Text for 20 epochs. The learning rate is set to 0.01 in the first 10 epochs, divided by 10 in the last 10 epochs. In the fine-tuning stage, the training iterations on every dataset are set to 20K. The learning rate is set to 0.005 in the first 10K iterations, divided by 10 in the remains. |