Feature Enhancement Network: A Refined Scene Text Detector
Authors: Sheng Zhang, Yuliang Liu, Lianwen Jin, Canjie Luo
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on ICDAR 2011 and 2013 robust text detection benchmarks demonstrate that our method can achieve state-of-the-art results, outperforming all reported methods in terms of F-measure. |
| Researcher Affiliation | Academia | Sheng Zhang, Yuliang Liu, Lianwen Jin, Canjie Luo School of Electronic and Information Engineering, South China University of Technology zsscut90@gmail.com, liu.yuliang@mail.scut.edu.cn, {lianwen.jin, canjie.luo}@gmail.com |
| Pseudocode | No | The paper includes architectural diagrams (e.g., Figure 1) and describes algorithmic steps in prose, but it does not contain formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | To prove the effectiveness of our approach, we have tested it on two challenging benchmark datasets, i.e. ICDAR 2011 (Shahab, Shafait, and Dengel 2011) and ICDAR 2013 (Karatzas et al. 2013) robust text detection datasets. ... we also gather about 4000 real scene images for training our network. |
| Dataset Splits | No | The paper mentions training and testing on ICDAR 2011 and 2013 datasets, but does not explicitly describe a validation set or specific train/validation/test splits with percentages or sample counts. |
| Hardware Specification | Yes | All the experiments are carried out on a PC with one Titan X GPU. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific libraries with their versions). |
| Experiment Setup | Yes | During the training procedure, we choose the similar multitask loss functions for both text region proposal and text detection refinement stages, i.e. L (s, c, b, g) = 1 / N (Lcls(s, c) + λ Lloc(b, g)) where N is the amount of anchors or proposals that match ground-truth boxes, and λ (λ = 1) is a balance factor which weighs the importance between two losses... Basically, the batchsize of input images in R-FCN (Dai et al. 2016) framework is only one... Our approach and the original R-FCN are both trained and tested with short side 720 except the multi-scale test |