Illiterate DALL-E Learns to Compose
Authors: Gautam Singh, Fei Deng, Sungjin Ahn
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we show that this simple and easy-to-implement architecture not requiring a text prompt achieves significant improvement in in-distribution and out-of-distribution (zero-shot) image generation and qualitatively comparable or better slot-attention structure than the models based on mixture decoders. https://sites.google.com/view/slate-autoencoder |
| Researcher Affiliation | Academia | Gautam Singh1, Fei Deng1 & Sungjin Ahn2 1Rutgers University 2KAIST |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | 1The implementation is available at https://github.com/singhgautam/slate. |
| Open Datasets | Yes | We evaluate the models on 7 datasets which contain composable objects: 3D Shapes (Burgess & Kim, 2018), CLEVR-Mirror which we develop from the CLEVR (Johnson et al., 2017) dataset by adding a mirror in the scene, Shapestacks (Groth et al., 2018), Bitmoji (Graux, 2021), Textured MNIST, CLEVRTex (Karazija et al., 2021) and Celeb A. |
| Dataset Splits | Yes | If the validation loss does not decrease for 8 consecutive epochs, we reduce the learning rate by a factor of 1/2. |
| Hardware Specification | No | The paper mentions "GPU Usage" and "Days" for training cost in tables 7 and 8 (e.g., "GPU Usage 8GB", "GPU Usage 64GB"), but does not specify exact GPU models (e.g., NVIDIA A100, Tesla V100) or other hardware components like CPU or memory details for the machines used. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) were mentioned. |
| Experiment Setup | Yes | We report the hyperparameters and the computational resources required for training our model in Table 8. ... Table 7: Hyperparameters used for our model and computation requirements for 3D Shapes, CLEVR-Mirror, Shapestacks and Bitmoji. ... Table 8: Hyperparameters used for our model and computation requirements for Textured-MNIST, Celeb A and CLEVRTex. |