reproducibilityindex.ai

Generating Images from Captions with Attention

Authors: Elman Mansimov, Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	After training on Microsoft COCO, we compare our model with several baseline generative models on image generation and retrieval tasks. We demonstrate that our model produces higher quality samples than other approaches and generates images with novel scene compositions corresponding to previously unseen captions in the dataset.
Researcher Affiliation	Academia	Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba & Ruslan Salakhutdinov Department of Computer Science University of Toronto Toronto, Ontario, Canada {emansim,eparisotto,rsalakhu}@cs.toronto.edu, jimmy@psi.utoronto.ca
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. It presents mathematical equations for its model.
Open Source Code	Yes	The code is available at https://github.com/emansim/text2image.
Open Datasets	Yes	Microsoft COCO (Lin et al., 2014) is a large dataset containing 82,783 images, each annotated with at least 5 captions.
Dataset Splits	No	The paper states 'Table 3 shows the estimated variational lower bounds on the average train/validation/test logprobabilities,' indicating the use of these splits. However, it does not provide specific percentages or counts for these splits for the Microsoft COCO dataset, which is necessary for reproducibility of the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions 'developers of Theano (Bastien et al., 2012)' in the acknowledgements, indicating Theano was used. However, it does not specify a version number for Theano or any other software dependencies used in the experiments.
Experiment Setup	Yes	Training details, hyperparameter settings, and the overall model architecture are speciﬁed in Appendix B. Each parameter in align DRAW was initialized by sampling from a Gaussian distribution with mean 0 and standard deviation 0.01. The model was trained using RMSprop with an initial learning rate of 0.001. For the Microsoft COCO task, we trained our model for 18 epochs. The learning rate was reduced to 0.0001 after 11 epochs.