Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distribution Augmentation for Generative Modeling

Authors: Heewoo Jun, Rewon Child, Mark Chen, John Schulman, Aditya Ramesh, Alec Radford, Ilya Sutskever

ICML 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate this is a more effective regularizer than standard methods, and use it to train a 152M parameter autoregressive model on CIFAR-10 to 2.56 bits per dim (relative to the state-of-the-art 2.80). Samples from this model attain FID 12.75 and IS 8.40, outperforming the majority of GANs.
Researcher Affiliation Industry 1Open AI, San Francisco, California, USA. Correspondence to: Heewoo Jun <EMAIL>, Rewon Child <EMAIL>.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes We release our model weights and code at https://github.com/openai/distribution_augmentation.
Open Datasets Yes In this section, we primarily study an autoregressive model (the Sparse Transformer (Child et al., 2019)) and its performance on the natural image benchmark datasets CIFAR-10 and Image Net-64.
Dataset Splits Yes CIFAR-10 validation bits per dim of different augmentation strategies across model sizes (in millions of parameters). Baseline and horizontal ๏ฌ‚ipping do not use Dist Aug. (Figure 3a)
Hardware Specification No The paper does not specify the hardware used for the experiments (e.g., GPU models, CPU types, memory).
Software Dependencies No The paper mentions using existing codebases like those from Salimans et al. (2017) and Ho et al. (2019), but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Detailed hyperparameter settings for experiments are available in the Supplementary Material. (Section 4) For CIFAR-10, 58M and 152M models use the same hyperparameters as the ones in (Child et al., 2019) except they are trained with a learning rate of 0.00015 for 1000 1500 epochs with a cosine decay (Radford et al., 2018) over 10000 epochs. Batch size for all CIFAR-10 experiments was 16. (A.1)