VAE Learning via Stein Variational Gradient Descent
Authors: Yuchen Pu, Zhe Gan, Ricardo Henao, Chunyuan Li, Shaobo Han, Lawrence Carin
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Excellent performance is demonstrated across multiple unsupervised and semi-supervised problems, including semi-supervised analysis of the Image Net data, demonstrating the scalability of the model to large datasets. |
| Researcher Affiliation | Academia | Yunchen Pu, Zhe Gan, Ricardo Henao, Chunyuan Li, Shaobo Han, Lawrence Carin Department of Electrical and Computer Engineering, Duke University {yp42, zg27, r.henao, cl319, shaobo.han, lcarin}@duke.edu |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any specific links or explicit statements about the availability of open-source code for the described methodology. |
| Open Datasets | Yes | We consider five benchmark datasets: MNIST and four text corpora: 20 Newsgroups (20News), New York Times (NYT), Science and RCV1-v2 (RCV2). For MNIST, we used the standard split of 50K training, 10K validation and 10K test examples. ... Image Net 2012 We consider scalability of our model to large datasets. We split the 1.3 million training images into an unlabeled and labeled set... |
| Dataset Splits | Yes | For MNIST, we used the standard split of 50K training, 10K validation and 10K test examples. The data are partitioned into 10,314 training, 1,000 validation and 7,531 test documents [20 Newsgroups]. The latter three text corpora consist of 133K, 166K and 794K documents. These three datasets are split into 1K validation, 10K testing and the rest for training. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components and techniques like Adam optimizer, dropout, softplus activation, and leaky rectified activation, but does not specify version numbers for any libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | We set M = 100 and k = 50, and use minibatches of size 64 for all experiments, unless otherwise specified. The samples of θ and z, and parameters of the recognition model, η, are optimized via Adam [9] with learning rate 0.0002. We do not perform any dataset-specific tuning or regularization other than dropout [32] and early stopping on validation sets. |