PixelSNAIL: An Improved Autoregressive Generative Model
Authors: XI Chen, Nikhil Mishra, Mostafa Rohaninejad, Pieter Abbeel
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we describe the resulting model and present state-of-the-art log-likelihood results on heavily benchmarked datasets: CIFAR-10 (2.85 bits per dim), 32 32 Image Net (3.80 bits per dim) and 64 64 Image Net (3.52 bits per dim). |
| Researcher Affiliation | Collaboration | 1covariant.ai 2UC Berkeley, EECS Dept.. |
| Pseudocode | No | Figure 4 shows diagrams of the Residual Block and Attention Block components, but these are flowcharts/schematics rather than structured text-based pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code will be made available, and can be found at: https://github.com/neocxi/ pixelsnail-public. |
| Open Datasets | Yes | CIFAR-10, 32 32 Image Net and 64 64 Image Net |
| Dataset Splits | No | The paper mentions using Polyak averaging over training parameters and specifies dataset properties and mixture components but does not explicitly describe the methodology for creating or using validation splits. |
| Hardware Specification | No | Due to computational limits, we can only train these models on 4 GPUs but are able to outperform the previous state-of-the-art model that was trained on 32 GPUs (van den Oord et al., 2016b). |
| Software Dependencies | No | The paper mentions techniques like 'Polyak averaging' and 'discretized mixture of logistics' and 'Weight Normalization' but does not specify software dependencies (e.g., libraries, frameworks) with version numbers. |
| Experiment Setup | Yes | For both datasets, we used residual blocks with 256 filters and 4 repeats, and attention blocks with key size 16 and value size 128. In the CIFAR-10 model only, we applied dropout of 0.5 after the first convolution in every residual block, to prevent overfitting. We used an exponential moving average weight of 0.9995 for CIFAR-10 and 0.9997 for Image Net. As the output distribution, we use the discretized mixture of logistics introduced by Salimans et al. (2017), with 10 mixture components for CIFAR-10 and 32 for Image Net. We used 12 blocks for both datasets, with 10 mixture components for CIFAR-10 and 32 for Image Net. |