Density Modeling of Images using a Generalized Normalization Transformation
Authors: Johannes Ballé, Valero Laparra, Eero Simoncelli
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We optimize the parameters of the full transformation (linear transform, exponents, weights, constant) over a database of natural images, directly minimizing the negentropy of the responses. The optimized transformation substantially Gaussianizes the data, achieving a significantly smaller mutual information between transformed components than alternative methods including ICA and radial Gaussianization. We show that samples of this model are visually similar to samples of natural image patches. We demonstrate the use of the model as a prior probability density that can be used to remove additive noise. Finally, we show that the transformation can be cascaded, with each layer optimized using the same Gaussianization objective, thus offering an unsupervised method of optimizing a deep network architecture. |
| Researcher Affiliation | Academia | Johannes Ballé, Valero Laparra & Eero P. Simoncelli Center for Neural Science New York University New York, NY 10004, USA {johannes.balle,valero,eero.simoncelli}@nyu.edu EPS is also affiliated with the Courant Institute of Mathematical Sciences at NYU; VL is also affiliated with the University of València, Spain. |
| Pseudocode | No | The paper describes the transformation and inversion process mathematically and textually, but it does not include a pseudocode block or a clearly labeled algorithm. |
| Open Source Code | No | The paper does not provide any statement about releasing source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | First, we computed the responses of an oriented filter (specifically, we used a subband of the steerable pyramid (Simoncelli & Freeman, 1995)) to images taken from the van Hateren dataset (van Hateren & van der Schaaf, 1998)... We also examined model behavior when applied to vectorized 16×16 blocks of pixels drawn from the Kodak set1. 1downloaded from http://www.cipr.rpi.edu/resource/stills/kodak.html... To further assess how our model compares to existing work, we trained the model on image patches of 8×8 pixels from the BSDS300 dataset which had the patch mean removed (see Theis & Bethge, 2015, left column of table 1). |
| Dataset Splits | No | The paper mentions "cross-validated average log likelihood" for the BSDS300 dataset, implying that validation was part of the experimental process, but it does not provide specific details on the dataset splits (e.g., percentages, number of folds, or specific counts for train/validation/test). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using "the stochastic optimization algorithm ADAM (Kingma & Ba, 2014)", but it does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other libraries). |
| Experiment Setup | Yes | We used the stochastic optimization algorithm ADAM to facilitate the optimization (Kingma & Ba, 2014) and somewhat reduced the complexity of the model by forcing α to be constant along its columns (i.e., αij αj). We constructed a two-stage model, trained greedily layer-by-layer, consisting of the transformations CICA GDN CICA GDN. The first CICA instance implements a complete, invertible linear transformation with a set of 256 convolutional filters of support 48x48, with each filter response subsampled by a factor of 16 (both horizontally and vertically). The output thus consists of 256 reduced-resolution feature maps. The first GDN operation then acts on the 256-vectors of responses at a given spatial location across all maps. Thus, the responses of the first CICA GDN stage are Gaussianized across maps, but not across spatial locations. The second-stage CICA instance is applied to vectors of first-stage responses across all maps within a 9x9 spatial neighborhood thus seeking new non-Gaussian directions across spatial locations and across maps. |