Exploring Stroke-Level Modifications for Scene Text Editing
Authors: Yadong Qu, Qingfeng Tan, Hongtao Xie, Jianjun Xu, YuXin Wang, Yongdong Zhang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our MOSTEL outperforms previous methods both qualitatively and quantitatively. Experiment Datasets The datasets used for training and evaluation are introduced as follows. Table 1: Quantitative results on Tamper-Syn2k and Tamper Scene. |
| Researcher Affiliation | Academia | 1 University of Science and Technology of China, 2 Cyberspace Institute of Advanced Technology, Guang Zhou University, Guang Zhou, China |
| Pseudocode | No | The paper describes the proposed network and modules but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Datasets and code will be available at https://github.com/qqqyd/MOSTEL. |
| Open Datasets | Yes | Real Data. We use MLT-20171 to train MOSTEL on real-world scene text images, including 34,625 images. [...] In addition, two new STE datasets named Tamper-Syn2k and Tamper-Scene are proposed to evaluate the performance of scene text editors. As far as we know, they are the first publicly available STE evaluation datasets, which will significantly facilitate a fair comparison of STE methods and promote the development of STE task. |
| Dataset Splits | No | The paper specifies training data (150k synthetic images, MLT-2017) and evaluation datasets (Tamper-Syn2k, Tamper-Scene) but does not explicitly define a separate validation dataset split from the training data for hyperparameter tuning. |
| Hardware Specification | Yes | MOSTEL only needs to be trained 3 days using 1 NVIDIA 2080Ti GPU. |
| Software Dependencies | No | The paper mentions using a pre-trained text recognizer (Baek et al. 2019) but does not provide specific version numbers for other key software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA versions used for the main model training. |
| Experiment Setup | Yes | Input images are resized to 256 64. We adopt Adam optimizer with β1 = 0.9 and β2 = 0.999, and learning rate is set to 5 10 5. We totally train 300k iterations with a batch size of 16, consisting of 14 labeled synthetic image pairs and 2 unannotated real scene text images. Style Augmentation in Pre-Transformation includes random rotation with an angle from 15 to 15 and random flipping with a probability of 0.5 during the training stage. |