Diffusion Self-Guidance for Controllable Image Generation
Authors: Dave Epstein, Allan Jabri, Ben Poole, Alexei Efros, Aleksander Holynski
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | All experiments were performed on Imagen [32], producing 1024x1024 samples. |
| Researcher Affiliation | Collaboration | Dave Epstein1,2 Allan Jabri1 Ben Poole2 Alexei A. Efros1 Aleksander Holynski1,2 1UC Berkeley 2Google Research |
| Pseudocode | No | The paper contains mathematical equations and descriptive text for its method, but no explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions a project page for 'results and an interactive demo' (https://dave.ml/selfguidance) but does not explicitly state that the source code for the methodology described in the paper is released or available at this link or elsewhere. |
| Open Datasets | No | The paper states that experiments were performed on Imagen [32], which is a pre-trained model, but does not provide specific access information (link, DOI, or explicit citation for dataset authors/year) for any dataset used in their experiments, nor for the 'real images' mentioned. |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits for the experiments conducted using self-guidance. Mentions of training pertain to the underlying diffusion model used (Imagen), not the data splits for their own evaluation. |
| Hardware Specification | No | The paper states 'All experiments were performed on Imagen [32]' but does not provide specific hardware details such as GPU models, CPU types, or memory used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or CUDA versions) required to replicate the experiments. |
| Experiment Setup | Yes | We apply our self-guidance term following best practices for classifier-free guidance on Imagen [32]. Specifically, where N is the number of DDPM steps, we take the first 3N/16 steps with self-guidance and the last N/32 without. The remaining 25N/32 steps are alternated between using self-guidance and not using it. We use N = 1024 steps. Our method works with 256 and 512 steps as well, though self-guidance weights occasionally require adjustment. We set v = 7500 in Eqn. 4 as an overall scale for gradients of the functions g defined below we find that the magnitude of per-pixel gradients is quite small (often in the range of 10 7 to 10 6, so such a high weight is needed to induce changes. |