Improving Scene Text Image Super-resolution via Dual Prior Modulation Network
Authors: Shipeng Zhu, Zuoyan Zhao, Pengfei Fang, Hui Xue
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate that our method improves the image quality and boosts the performance of downstream tasks over five typical approaches on the benchmark. Substantial visualizations and ablation studies demonstrate the advantages of the proposed DPMN. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2MOE Key Laboratory of Computer Network and Information Integration (Southeast University), China {shipengzhu, zuoyanzhao, fangpengfei, hxue}@seu.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at: https://github.com/jdfxzzy/DPMN. |
| Open Datasets | Yes | The STISR benchmark Text Zoom (Wang et al. 2020) is collected in real-world scenarios. It consists of 17,367 LR-HR image pairs for training and 4,373 pairs for testing. |
| Dataset Splits | No | The paper specifies training and testing splits, but does not explicitly mention a separate validation set or its size. |
| Hardware Specification | Yes | all the experiments are conducted on one RTX 3090 GPU. |
| Software Dependencies | Yes | We implement our model with PyTorch 1.10 deep learning library (Paszke et al. 2019) |
| Experiment Setup | Yes | The learning rate is set to 0.001, and the size of the mini-batch is 48. We empirically observe that the loss function is insensitive to the parameter λ, and we set all the λ to 1. In the inference phase, the output fusion ratio α is selected based on the pre-trained baselines. The size of original SR results and modulated images is 32 × 128. We apply the pre-trained Vision LANs (Wang et al. 2021b) in PGRMs as the text prior generators. In terms of the network architecture, the number of PGRMs in each branch, N, is set to 3. In the Vi T block, the window numbers of the DW-MCA and the DSW-MCA are 2, 4, and 8 with patch size 2, while the head number of the MCA is set to 6. |