Generating images with sparse representations
Authors: Charlie Nash, Jacob Menick, Sander Dieleman, Peter Battaglia
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On a range of image datasets, we demonstrate that our approach can generate high quality, diverse images, with sample metric scores competitive with state of the art methods. We additionally show that simple modifications to our method yield effective image colorization and super-resolution models. |
| Researcher Affiliation | Industry | 1Deep Mind, London, United Kingdom. Correspondence to: Charlie Nash <charlienash@google.com>. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing the code or links to a code repository for the methodology described. |
| Open Datasets | Yes | For all FID scores we follow Brock et al. (2019) and compare 50k model samples to features computed on the entire training set. LSUN datasets (Yu et al., 2015), Flickr faces HQ (FFHQ, Karras et al. (2019)) as well as class-conditional Image Net (Russakovsky et al., 2015). We train a model on Open Images V4 (Kuznetsova et al., 2018). (a) Plant leaves (Chouhan et al., 2019) (b) Diabetic retinopathy (Kaggle & Eye Pacs, 2015) (c) CLEVR (Johnson et al., 2017) |
| Dataset Splits | No | The paper mentions using a "held-out set" for overfitting detection and refers to "Val. set" in Table 2 for evaluation. However, it does not provide specific split percentages or sample counts for training, validation, and test sets. It only mentions that FID scores are computed against the "entire training set" and compares upsampled validation images to the original validation set. |
| Hardware Specification | Yes | For our class-conditional Image Net model (738M parameters) sampling on a TPUv3 (Google, 2018) takes 20-30 minutes for a batch of 24 samples, resulting in sparse DCT sequences of length 20-40k. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We use a target chunk size of 896 in all our experiments, and add an overlap of size 128 into the input chunk, so that the first predictions in the target chunk can attend directly into their preceding elements. We use quality-parameterized quantization matrices for 8x8 blocks as defined by the Independent JPEG Group in all our experiments. |