Generating images with sparse representations

Authors: Charlie Nash, Jacob Menick, Sander Dieleman, Peter Battaglia

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a range of image datasets, we demonstrate that our approach can generate high quality, diverse images, with sample metric scores competitive with state of the art methods. We additionally show that simple modifications to our method yield effective image colorization and super-resolution models.
Researcher Affiliation Industry 1Deep Mind, London, United Kingdom. Correspondence to: Charlie Nash <charlienash@google.com>.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about open-sourcing the code or links to a code repository for the methodology described.
Open Datasets Yes For all FID scores we follow Brock et al. (2019) and compare 50k model samples to features computed on the entire training set. LSUN datasets (Yu et al., 2015), Flickr faces HQ (FFHQ, Karras et al. (2019)) as well as class-conditional Image Net (Russakovsky et al., 2015). We train a model on Open Images V4 (Kuznetsova et al., 2018). (a) Plant leaves (Chouhan et al., 2019) (b) Diabetic retinopathy (Kaggle & Eye Pacs, 2015) (c) CLEVR (Johnson et al., 2017)
Dataset Splits No The paper mentions using a "held-out set" for overfitting detection and refers to "Val. set" in Table 2 for evaluation. However, it does not provide specific split percentages or sample counts for training, validation, and test sets. It only mentions that FID scores are computed against the "entire training set" and compares upsampled validation images to the original validation set.
Hardware Specification Yes For our class-conditional Image Net model (738M parameters) sampling on a TPUv3 (Google, 2018) takes 20-30 minutes for a batch of 24 samples, resulting in sparse DCT sequences of length 20-40k.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes We use a target chunk size of 896 in all our experiments, and add an overlap of size 128 into the input chunk, so that the first predictions in the target chunk can attend directly into their preceding elements. We use quality-parameterized quantization matrices for 8x8 blocks as defined by the Independent JPEG Group in all our experiments.