Learning multi-scale local conditional probability models of images
Authors: Zahra Kadkhodaie, Florentin Guth, Stéphane Mallat, Eero P Simoncelli
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test this model on a dataset of face images, which are highly non-stationary and contain large-scale geometric structures. Remarkably, denoising, super-resolution, and image synthesis results all demonstrate that these structures can be captured with significantly smaller conditioning neighborhoods than required by a Markov model implemented in the pixel domain. Our results show that score estimation for large complex images can be reduced to low-dimensional Markov conditional models across scales, alleviating the curse of dimensionality. Using a coarse-to-fine anti-diffusion strategy for drawing samples from the posterior (Kadkhodaie & Simoncelli, 2021), we evaluate the model on denoising, super-resolution, and synthesis, and show that locality and stationarity assumptions hold for conditional RF sizes as small as 9x9 without harming performance. We now evaluate our Markov wavelet conditional model on a denoising task. We use the Celeb A dataset (Liu et al., 2015) at 160x160 resolution. Figure 3 shows that the multi-scale denoiser based on a conditional wavelet Markov model outperforms a conventional denoiser that implements a Markov probability model in the pixel domain. |
| Researcher Affiliation | Academia | Zahra Kadkhodaie CDS, New York University zk388@nyu.edu Florentin Guth DI, ENS, CNRS, PSL University florentin.guth@ens.fr Stéphane Mallat Collège de France Flatiron Institute, Simons Foundation stephane.mallat@ens.fr Eero P. Simoncelli CNS, Courant, and CDS, New York University Flatiron Institute, Simons Foundation eero.simoncelli@nyu.edu |
| Pseudocode | Yes | Algorithm 1 Sampling via ascent of the log-likelihood gradient from a denoiser residual; Algorithm 2 Wavelet Conditional Synthesis |
| Open Source Code | Yes | A software implementation is available at https://github.com/ Lab For Computational Vision/local-probability-models-of-images |
| Open Datasets | Yes | We use the Celeb A dataset (Liu et al., 2015) at 160x160 resolution. Train and test images are from the Celeb A HQ dataset (Karras et al., 2018) and of size 320x320. |
| Dataset Splits | Yes | For experiments shown in Figure 3 and Figure 4, we use 202, 499 training and 100 test images of resolution 160x160 from the Celeb A dataset (Liu et al., 2015). For experiments shown in Figure 5, Figure 7 and Figure 6, we use 29, 900 train and 100 test images, drawn from the Celeb A HQ dataset (Karras et al., 2018) at 320x320 resolution. |
| Hardware Specification | No | The paper mentions 'computing resources of the Flatiron Institute' in the acknowledgments but does not specify any particular GPU/CPU models or other hardware details used for the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | All networks contain 21 convolutional layers with no subsampling, each consisting of 64 channels. Each layer, except for the first and the last, is followed by a Re LU non-linearity and bias-free batch-normalization. All convolutional kernels in the low-pass CNN are of size 3x3, resulting in a 43x43 RF size and 665, 856 parameters in total. Convolutional kernels in the c CNNs are adjusted to achieve different RF sizes. For example, a 13x13 RF arises from choosing 3x3 kernels in every 4th layer and 1x1 (i.e., pointwise linear combinations across all channels) for the rest, resulting in a total of 214, 144 parameters. We follow the training procedure described in (Mohan* et al., 2020), minimizing the mean squared error in denoisingd images corrupted by i.i.d. Gaussian noise with standard deviations drawn from the range [0, 1] (relative to image intensity range [0, 1]). Training is carried out on batches of size 512. For the examples in Figure 5, Figure 7 and Figure 6, we chose h = 0.01, σ0 = 1, β = 0.1 and σ = 0.01. |