Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

AutoEdit: Automatic Hyperparameter Tuning for Image Editing

Authors: Chau Pham, Quan Dao, Mahesh Bhosale, Yunjie Tian, Dimitris Metaxas, DAVID DOERMANN

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate significant reduction in search time and computational overhead compared to existing brute-force approaches, advancing the practical deployment of a diffusion-based image editing framework in the real world.
Researcher Affiliation	Academia	1University at Buffalo 2Rutgers University
Pseudocode	Yes	Algorithm 1: Auto Edit Inference Algorithm 2: PPO Training of Auto Edit
Open Source Code	Yes	Codes can be found at https://github.com/chaupham1709/Auto Edit.git.
Open Datasets	Yes	Our training data originates from the Edit Bench collection [22], with task selection strictly aligned to Pie Bench [16] benchmark objectives. ... For evaluation, we adopt the Pie Bench dataset [16], which contains 700 samples across diverse editing scenarios, ensuring comprehensive capability assessment.
Dataset Splits	Yes	Our training data originates from the Edit Bench collection [22], with task selection strictly aligned to Pie Bench [16] benchmark objectives. ... For evaluation, we adopt the Pie Bench dataset [16], which contains 700 samples across diverse editing scenarios, ensuring comprehensive capability assessment.
Hardware Specification	Yes	All experiments were carried out on a 2 RTX A6000 GPU server.
Software Dependencies	No	The paper mentions several tools and models like 'Adam', 'Chat GPT', 'SAM', 'CLIP', and 'Qwen VL-2.5-7B' but does not provide specific version numbers for general software dependencies like programming languages or libraries (e.g., Python, PyTorch).
Experiment Setup	Yes	We choose denoising step T = 50 by default. During Phase-1 training, we implement parameter initialization policies based on different editing method requirements... Both policy and value models are optimized using Adam [18] with a fixed learning rate of 5 10 5. For the reward function configuration, we set the coefficients α = 30 and β = 30 as default values unless otherwise specified. Consistent with prior works [24, 30], the KL divergence coefficient remains γ = 0.02 across all experiments.