reproducibilityindex.ai

Generative Adversarial Text to Image Synthesis

Authors: Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the capability of our model to generate plausible images of birds and ﬂowers from detailed text descriptions. We compare the GAN baseline, our GAN-CLS with image-text matching discriminator (subsection 4.2), GAN-INT learned with text manifold interpolation (subsection 4.3) and GAN-INT-CLS which combines both. Results on CUB can be seen in Figure 3. Results on the Oxford-102 Flowers dataset can be seen in Figure 4. To quantify the degree of disentangling on CUB we set up two prediction tasks with noise z as the input: pose veriﬁcation and background color veriﬁcation. We present results on Figure 5.
Researcher Affiliation	Academia	1 University of Michigan, Ann Arbor, MI, USA (UMICH.EDU) 2 Max Planck Institute for Informatics, Saarbr ucken, Germany (MPI-INF.MPG.DE)
Pseudocode	Yes	Algorithm 1 GAN-CLS training algorithm with step size α, using minibatch SGD for simplicity.
Open Source Code	No	Our implementation was built on top of dcgan.torch2. 2https://github.com/soumith/dcgan.torch. The provided link is to a general DCGAN framework, not specific code for the methodology or modifications presented in this paper.
Open Datasets	Yes	We mainly use the Caltech-UCSD Birds dataset and the Oxford-102 Flowers dataset along with ﬁve text descriptions per image we collected as our evaluation setting. CUB has 11,788 images of birds belonging to one of 200 different categories. The Oxford-102 contains 8,189 images of ﬂowers from 102 different categories.
Dataset Splits	Yes	As in Akata et al. (2015) and Reed et al. (2016), we split these into class-disjoint training and test sets. CUB has 150 train+val classes and 50 test classes, while Oxford-102 has 82 train+val and 20 test classes.
Hardware Specification	No	The paper does not specify the hardware used (e.g., GPU models, CPU types, or memory specifications) for running the experiments.
Software Dependencies	No	Our implementation was built on top of dcgan.torch2. This mentions a software framework but does not provide specific version numbers for it or other key software dependencies (e.g., Python, CUDA, PyTorch versions).
Experiment Setup	Yes	The training image size was set to 64 64 3. The text encoder produced 1, 024-dimensional embeddings that were projected to 128 dimensions in both the generator and discriminator. We used the same base learning rate of 0.0002, and used the ADAM solver (Ba & Kingma, 2015) with momentum 0.5. The generator noise was sampled from a 100-dimensional unit normal distribution. We used a minibatch size of 64 and trained for 600 epochs.