reproducibilityindex.ai

Improving Scene Text Image Super-resolution via Dual Prior Modulation Network

Authors: Shipeng Zhu, Zuoyan Zhao, Pengfei Fang, Hui Xue

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate that our method improves the image quality and boosts the performance of downstream tasks over five typical approaches on the benchmark. Substantial visualizations and ablation studies demonstrate the advantages of the proposed DPMN.
Researcher Affiliation	Academia	1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2MOE Key Laboratory of Computer Network and Information Integration (Southeast University), China {shipengzhu, zuoyanzhao, fangpengfei, hxue}@seu.edu.cn
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at: https://github.com/jdfxzzy/DPMN.
Open Datasets	Yes	The STISR benchmark Text Zoom (Wang et al. 2020) is collected in real-world scenarios. It consists of 17,367 LR-HR image pairs for training and 4,373 pairs for testing.
Dataset Splits	No	The paper specifies training and testing splits, but does not explicitly mention a separate validation set or its size.
Hardware Specification	Yes	all the experiments are conducted on one RTX 3090 GPU.
Software Dependencies	Yes	We implement our model with PyTorch 1.10 deep learning library (Paszke et al. 2019)
Experiment Setup	Yes	The learning rate is set to 0.001, and the size of the mini-batch is 48. We empirically observe that the loss function is insensitive to the parameter λ, and we set all the λ to 1. In the inference phase, the output fusion ratio α is selected based on the pre-trained baselines. The size of original SR results and modulated images is 32 × 128. We apply the pre-trained Vision LANs (Wang et al. 2021b) in PGRMs as the text prior generators. In terms of the network architecture, the number of PGRMs in each branch, N, is set to 3. In the Vi T block, the window numbers of the DW-MCA and the DSW-MCA are 2, 4, and 8 with patch size 2, while the head number of the MCA is set to 6.