C3-STISR: Scene Text Image Super-resolution with Triple Clues
Authors: Minyi Zhao, Miao Wang, Fan Bai, Bingjia Li, Jie Wang, Shuigeng Zhou
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on Text Zoom show that C3-STISR outperforms the SOTA methods in fidelity and recognition performance. and 4 Performance Evaluation In this section, we first introduce the dataset and metrics used in the experiments and the implementation details. Then we compare our method with the state-of-the-art approaches. Finally, we conduct extensive ablation studies to validate the design of our method. |
| Researcher Affiliation | Collaboration | Minyi Zhao1 , Miao Wang2, Fan Bai1, Bingjia Li1 , Jie Wang2, Shuigeng Zhou1, 1Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200438, China, 2Byte Dance, China |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code is available in https://github.com/zhaominyiz/C3-STISR. |
| Open Datasets | Yes | The Text Zoom [Wang et al., 2020] dataset consists of 21,740 LR-HR text image pairs collected by lens zooming of the camera in real-world scenarios. |
| Dataset Splits | No | The paper states: 'The training set has 17,367 pairs, while the test set is divided into three settings based on the camera focal length, namely easy (1,619 samples), medium (1,411 samples) and hard (1,343 samples).' It does not explicitly mention a separate validation set split or its size. |
| Hardware Specification | Yes | All experiments are conducted on 8 NVIDIA Tesla V100 GPUs with 32GB memory. |
| Software Dependencies | Yes | Our model is implemented in Py Torch1.8. |
| Experiment Setup | Yes | The model is trained using Adam [Kingma and Ba, 2014] optimizer with a learning rate of 0.001. The batch size is set to 48. The hyper-parameters in our method are set as follows: λ1 = 10, λ2 = 0.0005, k1 = 1.0, k2 = 1.0, α1 = 20, α2 = 20, α3 = 1, α4 = 0.2, C = 32, which are recommended in [Chen et al., 2021a; Ma et al., 2021]. |