Generating Images from Captions with Attention
Authors: Elman Mansimov, Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | After training on Microsoft COCO, we compare our model with several baseline generative models on image generation and retrieval tasks. We demonstrate that our model produces higher quality samples than other approaches and generates images with novel scene compositions corresponding to previously unseen captions in the dataset. |
| Researcher Affiliation | Academia | Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba & Ruslan Salakhutdinov Department of Computer Science University of Toronto Toronto, Ontario, Canada {emansim,eparisotto,rsalakhu}@cs.toronto.edu, jimmy@psi.utoronto.ca |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. It presents mathematical equations for its model. |
| Open Source Code | Yes | The code is available at https://github.com/emansim/text2image. |
| Open Datasets | Yes | Microsoft COCO (Lin et al., 2014) is a large dataset containing 82,783 images, each annotated with at least 5 captions. |
| Dataset Splits | No | The paper states 'Table 3 shows the estimated variational lower bounds on the average train/validation/test logprobabilities,' indicating the use of these splits. However, it does not provide specific percentages or counts for these splits for the Microsoft COCO dataset, which is necessary for reproducibility of the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions 'developers of Theano (Bastien et al., 2012)' in the acknowledgements, indicating Theano was used. However, it does not specify a version number for Theano or any other software dependencies used in the experiments. |
| Experiment Setup | Yes | Training details, hyperparameter settings, and the overall model architecture are speciļ¬ed in Appendix B. Each parameter in align DRAW was initialized by sampling from a Gaussian distribution with mean 0 and standard deviation 0.01. The model was trained using RMSprop with an initial learning rate of 0.001. For the Microsoft COCO task, we trained our model for 18 epochs. The learning rate was reduced to 0.0001 after 11 epochs. |