Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

4KAgent: Agentic Any Image to 4K Super-Resolution

Authors: Yushen Zuo, Qi Zheng, Mingyang Wu, Xinrui Jiang, Renjie Li, Jian Wang, Yide Zhang, Gengchen Mai, Lihong Wang, James Y Zou, Xiaoyu Wang, Ming-Hsuan Yang, Zhengzhong Tu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We rigorously evaluate our 4KAgent across 11 distinct task categories encompassing a total of 26 diverse benchmarks, setting new state-of-the-art on a broad spectrum of imaging domains. Our evaluations cover natural images, portrait photos, AI-generated content, satellite imagery, fluorescence microscopy, and medical imaging like fundoscopy, ultrasound, and X-ray, demonstrating superior performance in terms of both perceptual (e.g., NIQE, MUSIQ) and fidelity (e.g., PSNR) metrics.
Researcher Affiliation Collaboration 1Texas A&M University 2Stanford University 3Snap Inc. 4CU Boulder 5UT Austin 6California Institute of Technology 7Topaz Labs 8UC Merced
Pseudocode No The paper describes the system architecture and workflow in detail with figures (e.g., Figure 2: 4KAgent system overview) and descriptive text, but it does not contain any explicit section or block labeled "Pseudocode" or "Algorithm" with a structured, code-like format for any part of its methodology.
Open Source Code Yes We release all the code, models, and results at: https://4kagent.github.io.
Open Datasets Yes To evaluate 4K super-resolution performances, we build the DIV4K-50 dataset as a challenging test set to upscale a low-quality (LQ) image in 256 256 resolution with multiple degradations to a high-quality (HQ) 4K image in 4096 4096 resolution. ... The summary of datasets used in experiments is shown in Tab. 6, which can be classified as natural degraded images ( C, D), AI-generated images ( E), and scientific images ( F).
Dataset Splits Yes The full dataset contains 9,937 patches for each cell extracted from scanning confocal volumes, with tiles 9, 10, 14, 20 used as the test set. ... The final bc SR dataset consists of 1,200 unique images, which were split into a 1,000-image training set and a 200-image test set. ... Chest X-ray 2017 is a dataset of 5,856 pediatric images from Guangzhou Women and Childrens Medical Centre, split into 5,232 images for training and 624 images for testing. ... DRIVE [176] consists of 40 color fundus images from a diabetic retinopathy screening program in the Netherlands collected by a Canon CR5 non-mydriatic 3CCD camera. The dataset is equally divided into 20 for training and 20 for testing.
Hardware Specification Yes Most of our experiments are conducted using two NVIDIA RTX 4090 GPUs.
Software Dependencies No The paper mentions specific models like Llama-3.2-Vision (11B), GPT-4, and Qwen2.5-VL for the Perception Agent, and lists various state-of-the-art methods integrated into the restoration toolbox (e.g., MAXIM, MPRNet, NAFNet, Restormer, X-Restormer, Diff BIR, GFPGAN). However, it does not specify software environment dependencies such as Python versions, deep learning framework versions (e.g., PyTorch, TensorFlow), or CUDA versions.
Experiment Setup Yes Hyperparameters in 4KAgent mainly reside in the Restoration Agent, including the weights used to compute the quality scores Qs and Qf s in execution, as well as the quality threshold η for the rollback mechanism.. Specifically, we set w NIQE = 1.0, w MUSIQ = 0.01, w MANIQA = 1.0, w CLIPIQA = 1.0 for Qs, w IP = 0.001, w IQA = 1.0 for Qf s, and η = 0.5 for rollback.