Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling
Authors: Hao Zhang, Bo Chen, Long Tian, Zhengjue Wang, Mingyuan Zhou
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For joint image-text learning, following previous work, we evaluate the proposed VHE-Stack GAN++ and VHE-raster-scan-GAN on three datasets: CUB (Wah et al., 2011), Flower (Nilsback & Zisserman, 2008), and COCO (Lin et al., 2014), as described in Appendix F. Besides the usual text-to-image generation task, due to the distinct bidirectional inference capability of the proposed models, we can perform a rich set of additional tasks such as image-to-text, image-to-image, and noise-to-image-textpair generations. Due to space constraint, we present below some representative results, and defer additional ones to the Appendix. We provide the details of our experimental settings in Appendix F. Py Torch code is provided at https://github.com/Bo Chen Group/VHE-GAN. |
| Researcher Affiliation | Academia | Hao Zhang, Bo Chen , Long Tian, Zhengjue Wang National Laboratory of Radar Signal Processing Xidian University, Xian, China zhanghao xidian@163.com bchen@mail.xidian.edu.cn tianlong xidian@163.com zhengjuewang@163.com Mingyuan Zhou Mc Combs School of Business The University of Texas at Austin, Austin, TX 78712, USA mingyuan.zhou@mccombs.utexas.edu |
| Pseudocode | Yes | Algorithm 1 Hybrid TLASGR-MCMC/VHE inference algorithm for VHE-raster-scan-GAN. |
| Open Source Code | Yes | Py Torch code is provided at https://github.com/Bo Chen Group/VHE-GAN. |
| Open Datasets | Yes | For joint image-text learning, following previous work, we evaluate the proposed VHE-Stack GAN++ and VHE-raster-scan-GAN on three datasets: CUB (Wah et al., 2011), Flower (Nilsback & Zisserman, 2008), and COCO (Lin et al., 2014), as described in Appendix F. |
| Dataset Splits | Yes | For CUB, there are two split settings: the hard one and the easy one. The hard one ensures that the bird subspecies belonging to the same super-category should belong to either the training split or test one without overlapping, referred to as CUB-hard (CUB-H in our manuscript). A recently used split setting (Qiao et al., 2016; Akata et al., 2015) is super-category split, where for each super-category, except for one subspecies that is left as unseen, all the other are used for training, referred to as CUB-easy (CUB-E in our manuscript). For CUB-H, there are 150 species containing 9410 samples for training and 50 species containing 2378 samples for testing. For CUB-E, there are 150 species containing 8855 samples for training and 50 species containing 2933 samples to testing. We use both of them the for the text-based ZSL, and only CUB-E for all the other experiments as usual. For text-based ZSL, we follow the same way in Elhoseiny et al. (2017a) to split the data. Specifically, five random splits are performed, in each of which 4/5 of the classes are considered as seen classes for training and 1/5 of the classes as unseen classes for testing. For other experiments, we follow Zhang et al. (2017b) to split the data. |
| Hardware Specification | Yes | we train VHE-rater-scan-GAN in four Nvidia Ge Force RTX2080 TI GPUs. The experiments are performed with mini-batch size 32 and about 30.2G GPU memory space. |
| Software Dependencies | No | The paper mentions "Py Torch code is provided" but does not specify a version number for PyTorch or any other software libraries or dependencies with specific version numbers. |
| Experiment Setup | Yes | We run 600 epochs to train the models on CUB and Flower, taking about 797 seconds for CUB-E and 713 seconds for Flower for each epoch. We run 100 epochs to train the models on COCO, taking about 6315 seconds for each epoch. We use the Adam optimizer (Kingma & Ba, 2014) with learning rate 2 10 4, β1 = 0.5, and β2 = 0.999 to optimize the parameters of the GAN generator and discriminator, and use Adam with learning rate 10 4, β1 = 0.9, and β2 = 0.999 to optimize the VHE parameters. The hyper-parameters to update the topics Φ with TLASGR-MCMC are the same with those in Cong et al. (2017). |