Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
AutoEdit: Automatic Hyperparameter Tuning for Image Editing
Authors: Chau Pham, Quan Dao, Mahesh Bhosale, Yunjie Tian, Dimitris Metaxas, DAVID DOERMANN
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate significant reduction in search time and computational overhead compared to existing brute-force approaches, advancing the practical deployment of a diffusion-based image editing framework in the real world. |
| Researcher Affiliation | Academia | 1University at Buffalo 2Rutgers University |
| Pseudocode | Yes | Algorithm 1: Auto Edit Inference Algorithm 2: PPO Training of Auto Edit |
| Open Source Code | Yes | Codes can be found at https://github.com/chaupham1709/Auto Edit.git. |
| Open Datasets | Yes | Our training data originates from the Edit Bench collection [22], with task selection strictly aligned to Pie Bench [16] benchmark objectives. ... For evaluation, we adopt the Pie Bench dataset [16], which contains 700 samples across diverse editing scenarios, ensuring comprehensive capability assessment. |
| Dataset Splits | Yes | Our training data originates from the Edit Bench collection [22], with task selection strictly aligned to Pie Bench [16] benchmark objectives. ... For evaluation, we adopt the Pie Bench dataset [16], which contains 700 samples across diverse editing scenarios, ensuring comprehensive capability assessment. |
| Hardware Specification | Yes | All experiments were carried out on a 2 RTX A6000 GPU server. |
| Software Dependencies | No | The paper mentions several tools and models like 'Adam', 'Chat GPT', 'SAM', 'CLIP', and 'Qwen VL-2.5-7B' but does not provide specific version numbers for general software dependencies like programming languages or libraries (e.g., Python, PyTorch). |
| Experiment Setup | Yes | We choose denoising step T = 50 by default. During Phase-1 training, we implement parameter initialization policies based on different editing method requirements... Both policy and value models are optimized using Adam [18] with a fixed learning rate of 5 10 5. For the reward function configuration, we set the coefficients α = 30 and β = 30 as default values unless otherwise specified. Consistent with prior works [24, 30], the KL divergence coefficient remains γ = 0.02 across all experiments. |