Scene Text Detection in Video by Learning Locally and Globally

Authors: Shu Tian, Wei-Yi Pei, Ze-Yu Zuo, Xu-Cheng Yin

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Moreover, our proposed techniques are extensively evaluated on several public scene video text databases, and are much better than the state-of-the-art methods. Experimental results show that our approach significantly outperforms the state-of-the-art methods on all datasets.
Researcher Affiliation Academia Shu Tian , Wei-Yi Pei , Ze-Yu Zuo, and Xu-Cheng Yin Department of Computer Science, School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China Corresponding author: xuchengyin@ustb.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing open-source code for the described methodology or a link to a code repository.
Open Datasets Yes Moreover, our proposed system is verified on a variety of public scene text video databases, i.e., the Minetto [Minetto et al., 2011] and ICDAR 15 datasets [Karatzas et al., 2015a]. To evaluate our tracking based detection approach, a public video dataset with a variety of scene videos is first used in our experiments 2. (Footnote 2: http://www.liv.ic.unicamp.br/ minetto/datasets/text/VIDEOS/) Moreover, we also perform experiments of our method on the recent challenging dataset of ICDAR 2015 Robust Reading Competition Challenge 3 (Text Detection and Recognition in Scene Videos) 3. (Footnote 3: http://rrc.cvc.uab.es/?ch=3&com=introduction)
Dataset Splits No The MSRA-TD500 dataset is a multi-orientation dataset with 500 images where 300 images is for training and the rest is for testing. This dataset includes a training set of 25 videos (13450 frames in total) and a test set of 24 videos (14374 frames in total). While training and test splits are mentioned, a validation set is not explicitly specified or quantified, nor are full split details provided for all datasets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using a "CNN-based word recognition technique" but does not specify any software libraries, frameworks, or solvers with version numbers.
Experiment Setup No The paper describes the overall system and some algorithmic details, but it does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific training configurations.