Generalization in diffusion models arises from geometry-adaptive harmonic representations
Authors: Zahra Kadkhodaie, Florentin Guth, Eero P Simoncelli, Stéphane Mallat
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We trained denoisers on subsets of the (downsampled) Celeb A dataset (Liu et al., 2015) of size N = 100, 101, 102, 103, 104, 105. We used a UNet architecture (Ronneberger et al., 2015)... Results are shown in Figure 1. When N = 1, the denoiser essentially memorizes the single training image, leading to a high test error. Increasing N substantially increases the performance on the test set while worsening performance on the training set, as the network transitions from memorization to generalization. At N = 105, empirical test and train error are matched for all noise levels. |
| Researcher Affiliation | Academia | Zahra Kadkhodaie Ctr. for Data Science, New York University zk388@nyu.edu Florentin Guth Ctr. for Data Science, New York University Flatiron Institute, Simons Foundation florentin.guth@nyu.edu Eero P. Simoncelli New York University Flatiron Institute, Simons Foundation eero.simoncelli@nyu.edu Stéphane Mallat Collège de France Flatiron Institute, Simons Foundation stephane.mallat@ens.fr |
| Pseudocode | Yes | Algorithm 1 Sampling via ascent of the log-likelihood gradient from a denoiser residual |
| Open Source Code | Yes | Source code: https://github.com/Lab For Computational Vision/memorization_generalization_in_diffusion_models |
| Open Datasets | Yes | We trained denoisers on subsets of the (downsampled) Celeb A dataset (Liu et al., 2015) of size N = 100, 101, 102, 103, 104, 105. ... For experiments shown in Figures 9 and 10, we use images drawn from the LSUN bedroom dataset (Yu et al., 2015) downsampled to 80 × 80 resolution. ... For experiments shown in Figure 11 we use Celeb A HQ dataset (Karras et al., 2018) downsampled to 40 × 40 resolution. |
| Dataset Splits | No | The paper discusses training and test data performance (e.g., "At N = 10^5, empirical test and train error are matched"), but it does not explicitly mention a separate 'validation' dataset or its split details. |
| Hardware Specification | No | The paper acknowledges computing resources from the Flatiron Institute and NYU but does not specify any particular hardware components like GPU models, CPU types, or memory used for experiments. |
| Software Dependencies | No | The paper mentions using specific network architectures like UNet and BF-CNN, but it does not list any specific software libraries, frameworks, or operating systems with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Training is carried out on batches of size 512, for 1000 epochs. ... We chose h = 0.01, β = 0.1, σ0 = 1, and σ = 0.05. |