Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning
Authors: Xinyan Zu, Haiyang Yu, Bin Li, Xiangyang Xue
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results on multiple VTS benchmarks demonstrate that the proposed VLSpotter outperforms the existing state-of-the-art methods in end-to-end video text spotting.Extensive experiments of text spotting, tracking and detection are conducted on three VTS benchmarks, ICDAR2013 Video [Karatzas et al., 2013], ICDAR2015 Video [Karatzas et al., 2015], and BOVText [Wu et al., 2021], to evaluate the effectiveness of the proposed VLSpotter. |
| Researcher Affiliation | Academia | Xinyan Zu, Haiyang Yu, Bin Li , Xiangyang Xue Shanghai Key Laboratory of Intelligent Information Processing School of Computer Science, Fudan University {xyzu20, hyyu20, libin, xyxue}@fudan.edu.cn |
| Pseudocode | No | The paper describes its methods in prose and with a system diagram (Figure 2) but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code of VLSpotter is available at Git Hub1. 1https://github.com/Fudan VI/Fudan OCR/VLSpotter |
| Open Datasets | Yes | We conduct experiments on three commonly-used datasets: ICDAR2013 Video [Karatzas et al., 2013], ICDAR2015 Video [Karatzas et al., 2015], and BOVText [Wu et al., 2021]. |
| Dataset Splits | Yes | ICDAR2013 Video ... 13 videos are used for training and 15 videos for testing. ... ICDAR2015 Video ... 25 videos for training and 24 videos for testing. ... BOVText ... 1,328,575 frames from 1,541 videos are used for training and 429,023 frames from 480 videos are used for testing. |
| Hardware Specification | Yes | All experiments are conducted on a single RTX 3090 GPU with 24GB memory. |
| Software Dependencies | No | The paper states 'The proposed VLSpotter is implemented with Py Torch.' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Based on the empirical experiments, we set the hyper-parameters ε, σ, ϕ, λ1, λ2 to 1, 0.5, 3, 1, 1, respectively. We use the Adadelta optimizer with an initial learning rate 0.1, which further shrinks every 200 epochs. |