Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Text-Aware Real-World Image Super-Resolution via Diffusion Model with Joint Segmentation Decoders
Authors: Qiming Hu, Linlong Fan, Yiyan Luo, Yuhang Yu, Xiaojie Guo, Qingnan Fan
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our approach substantially enhances text legibility in super-resolved images, achieving state-of-the-art performance across multiple evaluation metrics and exhibiting strong generalization to real-world scenarios. Our code is available at here. |
| Researcher Affiliation | Collaboration | Qiming Hu1,2 , Linlong Fan2 , Yiyan Luo2, Yuhang Yu2, Xiaojie Guo1 , Qingnan Fan2 1College of Intelligence and Computing, Tianjin University 2vivo Mobile Communication Co. Ltd |
| Pseudocode | No | The paper describes the methodology in Section 3, including equations and architectural diagrams (Figure 2), but does not present a structured pseudocode or algorithm block. |
| Open Source Code | Yes | Our code is available at here. |
| Open Datasets | Yes | Existing text segmentation datasets, such as Text Seg [37] and BTS [38], ... General-purpose super-resolution datasets such as DIV2K [1], Flicker2K [27], and LSDIR [13] ... Scene text image super-resolution datasets like Real-CE [18] typically provide both low-resolution and high-resolution image pairs in text-rich scenarios... We then apply the trained model to infer text segmentation maps for the CTR [3] dataset... |
| Dataset Splits | Yes | The FTSR dataset contains a total of 50,000 triplets (x L, x H, s), where the first 45,000 triplets are allocated for training, and the remaining 5,000 are used for testing. Since some image pairs in the Real-CE dataset are misaligned [46], we manually filtered out such samples, resulting in 337 training pairs and 189 testing pairs. |
| Hardware Specification | Yes | Training is conducted on four H20 GPUs with a per-GPU batch size of 1 for 200,000 iterations. |
| Software Dependencies | No | We train the model using the Py Torch framework with the Adam W optimizer and a fixed learning rate of 5 10 5. |
| Experiment Setup | Yes | The diffusion time step t is fixed as 200. We train the model using the Py Torch framework with the Adam W optimizer and a fixed learning rate of 5 10 5. Training is conducted on four H20 GPUs with a per-GPU batch size of 1 for 200,000 iterations. |