Illiterate DALL-E Learns to Compose

Authors: Gautam Singh, Fei Deng, Sungjin Ahn

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we show that this simple and easy-to-implement architecture not requiring a text prompt achieves significant improvement in in-distribution and out-of-distribution (zero-shot) image generation and qualitatively comparable or better slot-attention structure than the models based on mixture decoders. https://sites.google.com/view/slate-autoencoder
Researcher Affiliation Academia Gautam Singh1, Fei Deng1 & Sungjin Ahn2 1Rutgers University 2KAIST
Pseudocode No No explicit pseudocode or algorithm blocks were found.
Open Source Code Yes 1The implementation is available at https://github.com/singhgautam/slate.
Open Datasets Yes We evaluate the models on 7 datasets which contain composable objects: 3D Shapes (Burgess & Kim, 2018), CLEVR-Mirror which we develop from the CLEVR (Johnson et al., 2017) dataset by adding a mirror in the scene, Shapestacks (Groth et al., 2018), Bitmoji (Graux, 2021), Textured MNIST, CLEVRTex (Karazija et al., 2021) and Celeb A.
Dataset Splits Yes If the validation loss does not decrease for 8 consecutive epochs, we reduce the learning rate by a factor of 1/2.
Hardware Specification No The paper mentions "GPU Usage" and "Days" for training cost in tables 7 and 8 (e.g., "GPU Usage 8GB", "GPU Usage 64GB"), but does not specify exact GPU models (e.g., NVIDIA A100, Tesla V100) or other hardware components like CPU or memory details for the machines used.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) were mentioned.
Experiment Setup Yes We report the hyperparameters and the computational resources required for training our model in Table 8. ... Table 7: Hyperparameters used for our model and computation requirements for 3D Shapes, CLEVR-Mirror, Shapestacks and Bitmoji. ... Table 8: Hyperparameters used for our model and computation requirements for Textured-MNIST, Celeb A and CLEVRTex.