Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement
Authors: Hang Guo, Tao Dai, Guanghao Meng, Shu-Tao Xia
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Text Zoom and four scene text recognition benchmarks demonstrate the superiority of our method over other state-of-the-art methods. |
| Researcher Affiliation | Collaboration | Hang Guo1 , Tao Dai2, , Guanghao Meng1,3 , Shu-Tao Xia1,3 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2College of Computer Science and Software Engineering, Shenzhen University 3Peng Cheng Laboratory, Shenzhen, China |
| Pseudocode | No | The paper describes its methods in detail using natural language and diagrams (Figure 2, 3) but does not include any formal pseudocode blocks or algorithms. |
| Open Source Code | Yes | Code is available at https://github.com/csguoh/LEMMA. |
| Open Datasets | Yes | Scene Text Image Super-resolution Dataset Text Zoom [Wang et al., 2020] is widely used in STISR works. This dataset is derived from two single image superresolution datasets, Real SR [Cai et al., 2019] and SR-RAW [Zhang et al., 2019]. The images are captured by digital cameras in real-world scenes. In total, Text Zoom contains 17367 LR-HR pairs for training and 4373 pairs for testing. |
| Dataset Splits | No | The paper explicitly states the size of the training and testing sets ('17367 LR-HR pairs for training and 4373 pairs for testing') but does not specify a separate validation split or its size. |
| Hardware Specification | No | The paper specifies training details like batch size, epochs, and learning rates, but it does not provide any specific information about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'ABINet [Fang et al., 2021] as the attention-based text recognizer' and 'Adam [Kingma and Ba, 2014] for optimization', but it does not specify version numbers for these software components or any other libraries/frameworks used. |
| Experiment Setup | Yes | We train our model with batch size 64 for 500 epochs using Adam [Kingma and Ba, 2014] for optimization. The learning rate is set to 1e-3 for the super-resolution and 1e-4 for fine-tuning ABINet, both are decayed with a factor of 0.5 after 400 epochs. We refer to the hyperparameters on Ltxt given in [Chen et al., 2021], namely λ1 = 10, λ2 = 0.0005. For the other hyperparameters, we use α1 = 0.5, α2 = 0.01, see supplementary material for details. |