Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement

Authors: Hang Guo, Tao Dai, Guanghao Meng, Shu-Tao Xia

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Text Zoom and four scene text recognition benchmarks demonstrate the superiority of our method over other state-of-the-art methods.
Researcher Affiliation Collaboration Hang Guo1 , Tao Dai2, , Guanghao Meng1,3 , Shu-Tao Xia1,3 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2College of Computer Science and Software Engineering, Shenzhen University 3Peng Cheng Laboratory, Shenzhen, China
Pseudocode No The paper describes its methods in detail using natural language and diagrams (Figure 2, 3) but does not include any formal pseudocode blocks or algorithms.
Open Source Code Yes Code is available at https://github.com/csguoh/LEMMA.
Open Datasets Yes Scene Text Image Super-resolution Dataset Text Zoom [Wang et al., 2020] is widely used in STISR works. This dataset is derived from two single image superresolution datasets, Real SR [Cai et al., 2019] and SR-RAW [Zhang et al., 2019]. The images are captured by digital cameras in real-world scenes. In total, Text Zoom contains 17367 LR-HR pairs for training and 4373 pairs for testing.
Dataset Splits No The paper explicitly states the size of the training and testing sets ('17367 LR-HR pairs for training and 4373 pairs for testing') but does not specify a separate validation split or its size.
Hardware Specification No The paper specifies training details like batch size, epochs, and learning rates, but it does not provide any specific information about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'ABINet [Fang et al., 2021] as the attention-based text recognizer' and 'Adam [Kingma and Ba, 2014] for optimization', but it does not specify version numbers for these software components or any other libraries/frameworks used.
Experiment Setup Yes We train our model with batch size 64 for 500 epochs using Adam [Kingma and Ba, 2014] for optimization. The learning rate is set to 1e-3 for the super-resolution and 1e-4 for fine-tuning ABINet, both are decayed with a factor of 0.5 after 400 epochs. We refer to the hyperparameters on Ltxt given in [Chen et al., 2021], namely λ1 = 10, λ2 = 0.0005. For the other hyperparameters, we use α1 = 0.5, α2 = 0.01, see supplementary material for details.