Exploring Stroke-Level Modifications for Scene Text Editing

Authors: Yadong Qu, Qingfeng Tan, Hongtao Xie, Jianjun Xu, YuXin Wang, Yongdong Zhang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our MOSTEL outperforms previous methods both qualitatively and quantitatively. Experiment Datasets The datasets used for training and evaluation are introduced as follows. Table 1: Quantitative results on Tamper-Syn2k and Tamper Scene.
Researcher Affiliation Academia 1 University of Science and Technology of China, 2 Cyberspace Institute of Advanced Technology, Guang Zhou University, Guang Zhou, China
Pseudocode No The paper describes the proposed network and modules but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Datasets and code will be available at https://github.com/qqqyd/MOSTEL.
Open Datasets Yes Real Data. We use MLT-20171 to train MOSTEL on real-world scene text images, including 34,625 images. [...] In addition, two new STE datasets named Tamper-Syn2k and Tamper-Scene are proposed to evaluate the performance of scene text editors. As far as we know, they are the first publicly available STE evaluation datasets, which will significantly facilitate a fair comparison of STE methods and promote the development of STE task.
Dataset Splits No The paper specifies training data (150k synthetic images, MLT-2017) and evaluation datasets (Tamper-Syn2k, Tamper-Scene) but does not explicitly define a separate validation dataset split from the training data for hyperparameter tuning.
Hardware Specification Yes MOSTEL only needs to be trained 3 days using 1 NVIDIA 2080Ti GPU.
Software Dependencies No The paper mentions using a pre-trained text recognizer (Baek et al. 2019) but does not provide specific version numbers for other key software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA versions used for the main model training.
Experiment Setup Yes Input images are resized to 256 64. We adopt Adam optimizer with β1 = 0.9 and β2 = 0.999, and learning rate is set to 5 10 5. We totally train 300k iterations with a batch size of 16, consisting of 14 labeled synthetic image pairs and 2 unannotated real scene text images. Style Augmentation in Pre-Transformation includes random rotation with an angle from 15 to 15 and random flipping with a probability of 0.5 during the training stage.