Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

AutoEdit: Automatic Hyperparameter Tuning for Image Editing

Authors: Chau Pham, Quan Dao, Mahesh Bhosale, Yunjie Tian, Dimitris Metaxas, DAVID DOERMANN

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate significant reduction in search time and computational overhead compared to existing brute-force approaches, advancing the practical deployment of a diffusion-based image editing framework in the real world.
Researcher Affiliation Academia 1University at Buffalo 2Rutgers University
Pseudocode Yes Algorithm 1: Auto Edit Inference Algorithm 2: PPO Training of Auto Edit
Open Source Code Yes Codes can be found at https://github.com/chaupham1709/Auto Edit.git.
Open Datasets Yes Our training data originates from the Edit Bench collection [22], with task selection strictly aligned to Pie Bench [16] benchmark objectives. ... For evaluation, we adopt the Pie Bench dataset [16], which contains 700 samples across diverse editing scenarios, ensuring comprehensive capability assessment.
Dataset Splits Yes Our training data originates from the Edit Bench collection [22], with task selection strictly aligned to Pie Bench [16] benchmark objectives. ... For evaluation, we adopt the Pie Bench dataset [16], which contains 700 samples across diverse editing scenarios, ensuring comprehensive capability assessment.
Hardware Specification Yes All experiments were carried out on a 2 RTX A6000 GPU server.
Software Dependencies No The paper mentions several tools and models like 'Adam', 'Chat GPT', 'SAM', 'CLIP', and 'Qwen VL-2.5-7B' but does not provide specific version numbers for general software dependencies like programming languages or libraries (e.g., Python, PyTorch).
Experiment Setup Yes We choose denoising step T = 50 by default. During Phase-1 training, we implement parameter initialization policies based on different editing method requirements... Both policy and value models are optimized using Adam [18] with a fixed learning rate of 5 10 5. For the reward function configuration, we set the coefficients α = 30 and β = 30 as default values unless otherwise specified. Consistent with prior works [24, 30], the KL divergence coefficient remains γ = 0.02 across all experiments.