Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fast constrained sampling in pre-trained diffusion models
Authors: Alexandros Graikos, Nebojsa Jojic, Dimitris Samaras
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate, we generate images under both linear and non-linear constraints. We first show that our approach matches the results of state-of-the-art methods on free-form inpainting and 8 super-resolution at a fraction of the inference time. We then demonstrate how existing methods fail at inpainting large regions, while our algorithm obtains results closer to a fully fine-tuned diffusion model on inpainting. Finally, we show how we can apply our algorithm to non-linear constraints and perform style-guided and mask-guided generation, where the proposed method consistently generates images that satisfy the constraints better than existing approaches. 4 Experiments 4.1 Linear Constraints We first verify our algorithm by generating images under linear constraints, which has been the main application of many previous algorithms [5, 28, 6]. We follow the evaluation setting of Saharia et al. [29] and test our method on Image Net [8], using the first 1000 images from the 10k validation set of the ctest10k split. For evaluation, we measure the PSNR, LPIPS [46], and FID [14] between the real and generated images. We use Stable Diffusion 1.4, which is pre-trained on the LAION [31] text-image pair dataset. Experiments were run on an NVIDIA RTX A5000 24GB GPU. Table 1: Quantitative evaluation (PSNR, LPIPS, FID) on free-form inpainting. Table 2: Quantitative evaluation (PSNR, LPIPS, FID) on 8 super-resolution. Table 3: Quantitative evaluation on large area (box) inpainting. Table 4: Quantitative evaluation of style generation. |
| Researcher Affiliation | Collaboration | Alexandros Graikos Stony Brook University, Stony Brook, NY EMAIL Nebojsa Jojic Microsoft Research, Redmond, WA EMAIL Dimitris Samaras Stony Brook University, Stony Brook, NY EMAIL |
| Pseudocode | Yes | Algorithm 1 The proposed algorithm for sampling under linear and non-linear constraints. ... Algorithm 2 Pseudo-algorithm for sampling using a solver and a pre-trained diffusion model. ... Algorithm 3 Pseudo-algorithm for constrained sampling with a solver, a pre-trained diffusion model and using the proposed algorithm. |
| Open Source Code | Yes | An implementation is provided at this Git Hub repository. |
| Open Datasets | Yes | The models (Stable Diffusion) and datasets (Image Net, Wiki Art, Parti Prompts, FFHQ) used are all publicly available. Furthermore, we include the code to reproduce the algorithm in the supplementary material. |
| Dataset Splits | Yes | We follow the evaluation setting of Saharia et al. [29] and test our method on Image Net [8], using the first 1000 images from the 10k validation set of the ctest10k split. ... We use 1000 random pairs of reference style images from Wiki Art [37] and prompts from Parti Promtps [43]. ... Using 100 images from the FFHQ [17] validation set, we run both our method and MPGD [13], which is the fastest baseline that works very well with dense constraints, i.e. constraints that are applied to all pixels. |
| Hardware Specification | Yes | Experiments were run on an NVIDIA RTX A5000 24GB GPU. |
| Software Dependencies | No | We use Stable Diffusion 1.4, which is pre-trained on the LAION [31] text-image pair dataset. ... with text prompts generated from the downsampled images using Qwen-2.5 [2] as the VLM. ... We use the CLIP Vi T-B/16 model for guiding the style of the image and evaluating. We also repeat the experiment using the Open CLIP Vi T-B/32 model [4] for guidance. ... In the case of Py Torch, which is what we use for running our experiments, forward mode differentiation is not directly implemented for many of the custom layers of the Stable Diffusion model. |
| Experiment Setup | Yes | For inpainting, we set the number of optimization steps K = 5 over which we linearly decrease the learning rate λ from 0.5 to 0.1. For super-resolution, we use K = 10 and a constant λ = 0.1. For both degradations, we also include additive white Gaussian noise with σy = 0.05, use 20 DDIM [33] steps and normalize the computed gradient et with its -norm. ... We perform K = 5 gradient updates for every denoising step, using a linearly decreasing learning rate λ from 0.5 to 0.1 and classifier-free guidance [15] w = 2 and w = 5 for the denoiser. |