Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

4KAgent: Agentic Any Image to 4K Super-Resolution

Authors: Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang, Yide Zhang, Gengchen Mai, Lihong Wang, James Y Zou, Xiaoyu Wang, Ming-Hsuan Yang, Zhengzhong Tu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We rigorously evaluate our 4KAgent across 11 distinct task categories encompassing a total of 26 diverse benchmarks, setting new state-of-the-art on a broad spectrum of imaging domains. Our evaluations cover natural images, portrait photos, AI-generated content, satellite imagery, ﬂuorescence microscopy, and medical imaging like fundoscopy, ultrasound, and X-ray, demonstrating superior performance in terms of both perceptual (e.g., NIQE, MUSIQ) and ﬁdelity (e.g., PSNR) metrics.
Researcher Affiliation	Collaboration	1Texas A&M University 2Stanford University 3Snap Inc. 4CU Boulder 5UT Austin 6California Institute of Technology 7Topaz Labs 8UC Merced
Pseudocode	No	The paper describes the system architecture and workflow in detail with figures (e.g., Figure 2: 4KAgent system overview) and descriptive text, but it does not contain any explicit section or block labeled "Pseudocode" or "Algorithm" with a structured, code-like format for any part of its methodology.
Open Source Code	Yes	We release all the code, models, and results at: https://4kagent.github.io.
Open Datasets	Yes	To evaluate 4K super-resolution performances, we build the DIV4K-50 dataset as a challenging test set to upscale a low-quality (LQ) image in 256 256 resolution with multiple degradations to a high-quality (HQ) 4K image in 4096 4096 resolution. ... The summary of datasets used in experiments is shown in Tab. 6, which can be classiﬁed as natural degraded images ( C, D), AI-generated images ( E), and scientiﬁc images ( F).
Dataset Splits	Yes	The full dataset contains 9,937 patches for each cell extracted from scanning confocal volumes, with tiles 9, 10, 14, 20 used as the test set. ... The final bc SR dataset consists of 1,200 unique images, which were split into a 1,000-image training set and a 200-image test set. ... Chest X-ray 2017 is a dataset of 5,856 pediatric images from Guangzhou Women and Childrens Medical Centre, split into 5,232 images for training and 624 images for testing. ... DRIVE [176] consists of 40 color fundus images from a diabetic retinopathy screening program in the Netherlands collected by a Canon CR5 non-mydriatic 3CCD camera. The dataset is equally divided into 20 for training and 20 for testing.
Hardware Specification	Yes	Most of our experiments are conducted using two NVIDIA RTX 4090 GPUs.
Software Dependencies	No	The paper mentions specific models like Llama-3.2-Vision (11B), GPT-4, and Qwen2.5-VL for the Perception Agent, and lists various state-of-the-art methods integrated into the restoration toolbox (e.g., MAXIM, MPRNet, NAFNet, Restormer, X-Restormer, Diff BIR, GFPGAN). However, it does not specify software environment dependencies such as Python versions, deep learning framework versions (e.g., PyTorch, TensorFlow), or CUDA versions.
Experiment Setup	Yes	Hyperparameters in 4KAgent mainly reside in the Restoration Agent, including the weights used to compute the quality scores Qs and Qf s in execution, as well as the quality threshold η for the rollback mechanism.. Speciﬁcally, we set w NIQE = 1.0, w MUSIQ = 0.01, w MANIQA = 1.0, w CLIPIQA = 1.0 for Qs, w IP = 0.001, w IQA = 1.0 for Qf s, and η = 0.5 for rollback.