PixelSNAIL: An Improved Autoregressive Generative Model

Authors: XI Chen, Nikhil Mishra, Mostafa Rohaninejad, Pieter Abbeel

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we describe the resulting model and present state-of-the-art log-likelihood results on heavily benchmarked datasets: CIFAR-10 (2.85 bits per dim), 32 32 Image Net (3.80 bits per dim) and 64 64 Image Net (3.52 bits per dim).
Researcher Affiliation Collaboration 1covariant.ai 2UC Berkeley, EECS Dept..
Pseudocode No Figure 4 shows diagrams of the Residual Block and Attention Block components, but these are flowcharts/schematics rather than structured text-based pseudocode or algorithm blocks.
Open Source Code Yes Our code will be made available, and can be found at: https://github.com/neocxi/ pixelsnail-public.
Open Datasets Yes CIFAR-10, 32 32 Image Net and 64 64 Image Net
Dataset Splits No The paper mentions using Polyak averaging over training parameters and specifies dataset properties and mixture components but does not explicitly describe the methodology for creating or using validation splits.
Hardware Specification No Due to computational limits, we can only train these models on 4 GPUs but are able to outperform the previous state-of-the-art model that was trained on 32 GPUs (van den Oord et al., 2016b).
Software Dependencies No The paper mentions techniques like 'Polyak averaging' and 'discretized mixture of logistics' and 'Weight Normalization' but does not specify software dependencies (e.g., libraries, frameworks) with version numbers.
Experiment Setup Yes For both datasets, we used residual blocks with 256 filters and 4 repeats, and attention blocks with key size 16 and value size 128. In the CIFAR-10 model only, we applied dropout of 0.5 after the first convolution in every residual block, to prevent overfitting. We used an exponential moving average weight of 0.9995 for CIFAR-10 and 0.9997 for Image Net. As the output distribution, we use the discretized mixture of logistics introduced by Salimans et al. (2017), with 10 mixture components for CIFAR-10 and 32 for Image Net. We used 12 blocks for both datasets, with 10 mixture components for CIFAR-10 and 32 for Image Net.