Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning

Authors: Xinyan Zu, Haiyang Yu, Bin Li, Xiangyang Xue

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results on multiple VTS benchmarks demonstrate that the proposed VLSpotter outperforms the existing state-of-the-art methods in end-to-end video text spotting.Extensive experiments of text spotting, tracking and detection are conducted on three VTS benchmarks, ICDAR2013 Video [Karatzas et al., 2013], ICDAR2015 Video [Karatzas et al., 2015], and BOVText [Wu et al., 2021], to evaluate the effectiveness of the proposed VLSpotter.
Researcher Affiliation Academia Xinyan Zu, Haiyang Yu, Bin Li , Xiangyang Xue Shanghai Key Laboratory of Intelligent Information Processing School of Computer Science, Fudan University {xyzu20, hyyu20, libin, xyxue}@fudan.edu.cn
Pseudocode No The paper describes its methods in prose and with a system diagram (Figure 2) but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The code of VLSpotter is available at Git Hub1. 1https://github.com/Fudan VI/Fudan OCR/VLSpotter
Open Datasets Yes We conduct experiments on three commonly-used datasets: ICDAR2013 Video [Karatzas et al., 2013], ICDAR2015 Video [Karatzas et al., 2015], and BOVText [Wu et al., 2021].
Dataset Splits Yes ICDAR2013 Video ... 13 videos are used for training and 15 videos for testing. ... ICDAR2015 Video ... 25 videos for training and 24 videos for testing. ... BOVText ... 1,328,575 frames from 1,541 videos are used for training and 429,023 frames from 480 videos are used for testing.
Hardware Specification Yes All experiments are conducted on a single RTX 3090 GPU with 24GB memory.
Software Dependencies No The paper states 'The proposed VLSpotter is implemented with Py Torch.' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Based on the empirical experiments, we set the hyper-parameters ε, σ, ϕ, λ1, λ2 to 1, 0.5, 3, 1, 1, respectively. We use the Adadelta optimizer with an initial learning rate 0.1, which further shrinks every 200 epochs.