Generalization in diffusion models arises from geometry-adaptive harmonic representations

Authors: Zahra Kadkhodaie, Florentin Guth, Eero P Simoncelli, Stéphane Mallat

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We trained denoisers on subsets of the (downsampled) Celeb A dataset (Liu et al., 2015) of size N = 100, 101, 102, 103, 104, 105. We used a UNet architecture (Ronneberger et al., 2015)... Results are shown in Figure 1. When N = 1, the denoiser essentially memorizes the single training image, leading to a high test error. Increasing N substantially increases the performance on the test set while worsening performance on the training set, as the network transitions from memorization to generalization. At N = 105, empirical test and train error are matched for all noise levels.
Researcher Affiliation Academia Zahra Kadkhodaie Ctr. for Data Science, New York University zk388@nyu.edu Florentin Guth Ctr. for Data Science, New York University Flatiron Institute, Simons Foundation florentin.guth@nyu.edu Eero P. Simoncelli New York University Flatiron Institute, Simons Foundation eero.simoncelli@nyu.edu Stéphane Mallat Collège de France Flatiron Institute, Simons Foundation stephane.mallat@ens.fr
Pseudocode Yes Algorithm 1 Sampling via ascent of the log-likelihood gradient from a denoiser residual
Open Source Code Yes Source code: https://github.com/Lab For Computational Vision/memorization_generalization_in_diffusion_models
Open Datasets Yes We trained denoisers on subsets of the (downsampled) Celeb A dataset (Liu et al., 2015) of size N = 100, 101, 102, 103, 104, 105. ... For experiments shown in Figures 9 and 10, we use images drawn from the LSUN bedroom dataset (Yu et al., 2015) downsampled to 80 × 80 resolution. ... For experiments shown in Figure 11 we use Celeb A HQ dataset (Karras et al., 2018) downsampled to 40 × 40 resolution.
Dataset Splits No The paper discusses training and test data performance (e.g., "At N = 10^5, empirical test and train error are matched"), but it does not explicitly mention a separate 'validation' dataset or its split details.
Hardware Specification No The paper acknowledges computing resources from the Flatiron Institute and NYU but does not specify any particular hardware components like GPU models, CPU types, or memory used for experiments.
Software Dependencies No The paper mentions using specific network architectures like UNet and BF-CNN, but it does not list any specific software libraries, frameworks, or operating systems with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Training is carried out on batches of size 512, for 1000 epochs. ... We chose h = 0.01, β = 0.1, σ0 = 1, and σ = 0.05.