Variational Autoencoder for Deep Learning of Images, Labels and Captions

Authors: Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, Lawrence Carin

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first present image classification results on MNIST, CIFAR-10 & -100 [23], Caltech 101 [24] & 256 [25], and Image Net 2012 datasets. For Caltech 101 and Caltech 256, we use 30 and 60 images per class for training, respectively. The predictions are based on averaging the decision values of Ns = 50 collected samples from the approximate posterior distribution over the latent variables from qφ(s|X). As a reference for computational cost, our model takes about 5 days to train on Image Net.
Researcher Affiliation Collaboration Department of Electrical and Computer Engineering, Duke University {yp42, zg27, r.henao, cl319, ajs104, lcarin}@duke.edu Nokia Bell Labs, Murray Hill xyuan@bell-labs.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It describes the methods using mathematical equations and textual explanations.
Open Source Code No The paper states 'All the experiments of our models are implemented in Theano [22] using a NVIDIA Ge Force GTX TITAN X GPU with 12GB memory.' but does not provide a concrete link or explicit statement about the availability of the source code for their models.
Open Datasets Yes We first present image classification results on MNIST, CIFAR-10 & -100 [23], Caltech 101 [24] & 256 [25], and Image Net 2012 datasets. We present image captioning results on three benchmark datasets: Flickr8k [29], Flickr30k [30] and Microsoft (MS) COCO [31].
Dataset Splits Yes We randomly split the 60,000 training samples into a 50,000-sample training set and a 10,000-sample validation set (used to evaluate early stopping). We use 1000 images for validation, 1000 for test and the rest for training on Flickr8k and Flickr30k. For MS COCO, 5000 images are used for both validation and testing.
Hardware Specification Yes All the experiments of our models are implemented in Theano [22] using a NVIDIA Ge Force GTX TITAN X GPU with 12GB memory.
Software Dependencies No The paper mentions 'All the experiments of our models are implemented in Theano [22]' and 'The Adam algorithm [20] with learning rate 0.0002 is utilized'. However, it does not specify version numbers for Theano or any other software libraries, which is necessary for reproducibility.
Experiment Setup Yes The Adam algorithm [20] with learning rate 0.0002 is utilized for optimization of the variational learning expressions in Section 4. We use mini-batches of size 64. Gradients are clipped if the norm of the parameter vector exceeds 5, as suggested in [21].