Distribution Augmentation for Generative Modeling
Authors: Heewoo Jun, Rewon Child, Mark Chen, John Schulman, Aditya Ramesh, Alec Radford, Ilya Sutskever
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate this is a more effective regularizer than standard methods, and use it to train a 152M parameter autoregressive model on CIFAR-10 to 2.56 bits per dim (relative to the state-of-the-art 2.80). Samples from this model attain FID 12.75 and IS 8.40, outperforming the majority of GANs. |
| Researcher Affiliation | Industry | 1Open AI, San Francisco, California, USA. Correspondence to: Heewoo Jun <heewoo@openai.com>, Rewon Child <rewon@openai.com>. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our model weights and code at https://github.com/openai/distribution_augmentation. |
| Open Datasets | Yes | In this section, we primarily study an autoregressive model (the Sparse Transformer (Child et al., 2019)) and its performance on the natural image benchmark datasets CIFAR-10 and Image Net-64. |
| Dataset Splits | Yes | CIFAR-10 validation bits per dim of different augmentation strategies across model sizes (in millions of parameters). Baseline and horizontal flipping do not use Dist Aug. (Figure 3a) |
| Hardware Specification | No | The paper does not specify the hardware used for the experiments (e.g., GPU models, CPU types, memory). |
| Software Dependencies | No | The paper mentions using existing codebases like those from Salimans et al. (2017) and Ho et al. (2019), but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Detailed hyperparameter settings for experiments are available in the Supplementary Material. (Section 4) For CIFAR-10, 58M and 152M models use the same hyperparameters as the ones in (Child et al., 2019) except they are trained with a learning rate of 0.00015 for 1000 1500 epochs with a cosine decay (Radford et al., 2018) over 10000 epochs. Batch size for all CIFAR-10 experiments was 16. (A.1) |