Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning

Authors: Xinyan Zu, Haiyang Yu, Bin Li, Xiangyang Xue

IJCAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results on multiple VTS benchmarks demonstrate that the proposed VLSpotter outperforms the existing state-of-the-art methods in end-to-end video text spotting.Extensive experiments of text spotting, tracking and detection are conducted on three VTS benchmarks, ICDAR2013 Video [Karatzas et al., 2013], ICDAR2015 Video [Karatzas et al., 2015], and BOVText [Wu et al., 2021], to evaluate the effectiveness of the proposed VLSpotter.
Researcher Affiliation Academia Xinyan Zu, Haiyang Yu, Bin Li , Xiangyang Xue Shanghai Key Laboratory of Intelligent Information Processing School of Computer Science, Fudan University EMAIL
Pseudocode No The paper describes its methods in prose and with a system diagram (Figure 2) but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The code of VLSpotter is available at Git Hub1. 1https://github.com/Fudan VI/Fudan OCR/VLSpotter
Open Datasets Yes We conduct experiments on three commonly-used datasets: ICDAR2013 Video [Karatzas et al., 2013], ICDAR2015 Video [Karatzas et al., 2015], and BOVText [Wu et al., 2021].
Dataset Splits Yes ICDAR2013 Video ... 13 videos are used for training and 15 videos for testing. ... ICDAR2015 Video ... 25 videos for training and 24 videos for testing. ... BOVText ... 1,328,575 frames from 1,541 videos are used for training and 429,023 frames from 480 videos are used for testing.
Hardware Specification Yes All experiments are conducted on a single RTX 3090 GPU with 24GB memory.
Software Dependencies No The paper states 'The proposed VLSpotter is implemented with Py Torch.' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Based on the empirical experiments, we set the hyper-parameters ε, σ, ϕ, λ1, λ2 to 1, 0.5, 3, 1, 1, respectively. We use the Adadelta optimizer with an initial learning rate 0.1, which further shrinks every 200 epochs.