PixelCNN Models with Auxiliary Variables for Natural Image Modeling
Authors: Alexander Kolesnikov, Christoph H. Lampert
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate benefits of the proposed models, in particular showing that they produce much more realistically looking image samples than previous state-of-the-art probabilistic models. 4. Experiments In this section we experimentally study the proposed Grayscale Pixel CNN and Pyramid Pixel CNN models on natural image modeling task and report quantitative and qualitative evaluation results. |
| Researcher Affiliation | Academia | Alexander Kolesnikov 1 Christoph H. Lampert 1 1IST Austria, Klosterneuburg, Austria. |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | No | No concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper was found. |
| Open Datasets | Yes | We evaluate the modeling performance of a Grayscale Pixel CNN on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009). We rely on the aligned&cropped Celeb A dataset (Liu et al., 2015) that contains approximately 200,000 images of size 218x178. |
| Dataset Splits | No | The paper specifies training and test splits (e.g., 'training set with 50,000 images and a test set with 10,000 images' for CIFAR-10, and 'random 95% subset of all images as training set and the remaining images as a test set' for Celeb A), but does not explicitly provide details for a separate validation dataset split. |
| Hardware Specification | Yes | Concretely, on an NVidia Titan X GPU, our Pyramid Pixel CNN without caching optimizations requires approximately 0.004 seconds on average to generate one image pixel, while a Pixel CNN++ even with recently suggested caching optimizations requires roughly 0.05 seconds for the same task. |
| Software Dependencies | No | The paper mentions using 'Adam (Kingma & Ba, 2014)' for optimization and 'Pixel CNN++ architecture' but does not provide specific version numbers for any software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | In the Adam optimizer we use an initial learning rate of 0.001, a batch size of 64 images and an exponential learning rate decay of 0.99999 that is applied after each iteration. We train the grayscale model pˆθ( b X) for 30 epochs and the conditional model pθ(X| b X) for 200 epochs. In the Adam optimizer we use an initial learning rate 0.001, a batch size of 16 and a learning rate decay of 0.999995. We train the model for 60 epochs. For the embedding fw( b X) we use a Pixel CNN++ architecture with 15 residual blocks with downsampling layer after the residual block number 3 and upsampling layers after the residual blocks number 9 and 12. For all convolutional layers we set the number of filters to 100. |