Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

InstructRestore: Region-Customized Image Restoration with Human Instructions

Authors: Shuaizheng Liu, Jianqi Ma, Lingchen Sun, Xiangtao Kong, Lei Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our proposed Instruct Restore approach enables effective human-instructed image restoration, including restoration with controllable bokeh blur effects and region-specific restoration with continuous intensity control. Our work advances the investigation of interactive image restoration and enhancement techniques. Data, code, and models are publicly available at https://github.com/shuaizhengliu/Instruct Restore.git. ... With this engine and careful data screening, we construct a comprehensive dataset comprising 536,945 triplets to support the training and evaluation of this task.
Researcher Affiliation	Collaboration	Shuaizheng Liu1,2, Jianqi Ma1, Lingchen Sun1,2, Xiangtao Kong1,2, Lei Zhang1,2, 1The Hong Kong Polytechnic University 2OPPO Research Institute EMAIL, EMAIL EMAIL
Pseudocode	No	The paper describes methods and processes using mathematical equations and textual descriptions, but it does not include a clearly labeled pseudocode block or algorithm section.
Open Source Code	Yes	Data, code, and models are publicly available at https://github.com/shuaizhengliu/Instruct Restore.git.
Open Datasets	Yes	Utilizing Semantic-Sam [18] and Osprey [55] models, we obtain masks and initial descriptions from a set of selected high-quality images. We then use large language models (LLMs), more specifically Qwen [48], to iteratively parse and refine these descriptions, formatting them to meet the instructional requirements of IR tasks. Finally, we build a dataset of 536,945 triplets, covering diverse scenes such as plants, buildings, animals, etc. ... Data, code, and models are publicly available at https://github.com/shuaizhengliu/Instruct Restore.git.
Dataset Splits	No	The paper describes the creation of a dataset (Tri-IR) with 536,945 triplets and mentions specific test datasets (Instruct100Set with 100 images and a bokeh testset with 70 images). It details training for '120K iterations' on a 'general degradation dataset' and '14k iterations' on a 'bokeh dataset' with 'sampling probability is set to 25% for the general degradation dataset and 75% for the bokeh dataset'. However, it does not explicitly provide the specific training/validation/test splits (e.g., exact percentages or sample counts) for any of these datasets in a way that would allow direct reproduction of the data partitioning.
Hardware Specification	Yes	The training is conducted on two A100 GPUs with a batch size of 64 and an initial learning rate of 5e 5.
Software Dependencies	No	The paper mentions several models and frameworks such as SD2.1, Real-ESRGAN, Semantic-SAM, Osprey, Qwen, and Control Net. While these are specific tools, the paper does not list general software dependencies like Python, PyTorch, or CUDA with their specific version numbers, which is required for a reproducible software description.
Experiment Setup	Yes	Our method is built on SD2.1 [33]. Training data is generated by the data generation engine described in Section 3. The LQ images are obtained by the Real-ESRGAN [39] degradation pipeline. ... Our model is first trained on the general degradation dataset for 120K iterations, guided by the instruction template make the { region caption } clear . The training continues by combining the bokeh dataset with the general degradation dataset for 14k iterations. During this stage, the sampling probability is set to 25% for the general degradation dataset and 75% for the bokeh dataset, which is paired with the instruction template make the { region caption } clear and keep other parts bokeh blur. . The training is conducted on two A100 GPUs with a batch size of 64 and an initial learning rate of 5e 5. Adam W is adopted as the optimizer for network training.