Feature Enhancement Network: A Refined Scene Text Detector

Authors: Sheng Zhang, Yuliang Liu, Lianwen Jin, Canjie Luo

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on ICDAR 2011 and 2013 robust text detection benchmarks demonstrate that our method can achieve state-of-the-art results, outperforming all reported methods in terms of F-measure.
Researcher Affiliation Academia Sheng Zhang, Yuliang Liu, Lianwen Jin, Canjie Luo School of Electronic and Information Engineering, South China University of Technology zsscut90@gmail.com, liu.yuliang@mail.scut.edu.cn, {lianwen.jin, canjie.luo}@gmail.com
Pseudocode No The paper includes architectural diagrams (e.g., Figure 1) and describes algorithmic steps in prose, but it does not contain formal pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes To prove the effectiveness of our approach, we have tested it on two challenging benchmark datasets, i.e. ICDAR 2011 (Shahab, Shafait, and Dengel 2011) and ICDAR 2013 (Karatzas et al. 2013) robust text detection datasets. ... we also gather about 4000 real scene images for training our network.
Dataset Splits No The paper mentions training and testing on ICDAR 2011 and 2013 datasets, but does not explicitly describe a validation set or specific train/validation/test splits with percentages or sample counts.
Hardware Specification Yes All the experiments are carried out on a PC with one Titan X GPU.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific libraries with their versions).
Experiment Setup Yes During the training procedure, we choose the similar multitask loss functions for both text region proposal and text detection refinement stages, i.e. L (s, c, b, g) = 1 / N (Lcls(s, c) + λ Lloc(b, g)) where N is the amount of anchors or proposals that match ground-truth boxes, and λ (λ = 1) is a balance factor which weighs the importance between two losses... Basically, the batchsize of input images in R-FCN (Dai et al. 2016) framework is only one... Our approach and the original R-FCN are both trained and tested with short side 720 except the multi-scale test