Variational Autoencoder with Implicit Optimal Priors

Authors: Hiroshi Takahashi, Tomoharu Iwata, Yuki Yamanaka, Masanori Yamada, Satoshi Yagi5066-5073

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on various datasets show that the VAE with this implicit optimal prior achieves high density estimation performance.
Researcher Affiliation Industry Hiroshi Takahashi,1 Tomoharu Iwata,2 Yuki Yamanaka,3 Masanori Yamada,3 Satoshi Yagi1 1 NTT Software Innovation Center 2 NTT Communication Science Laboratories 3 NTT Secure Platform Laboratories {takahashi.hiroshi, iwata.tomoharu, yamanaka.yuki, yamada.m, yagi.satoshi}@lab.ntt.co.jp
Pseudocode Yes Algorithm 1 shows the pseudo code of the optimization procedure of this model, where K is the minibatch size of SGD.
Open Source Code No The paper does not provide concrete access to its source code, nor does it explicitly state that its code is being released.
Open Datasets Yes We used five datasets: One Hot (Mescheder, Nowozin, and Geiger 2017), MNIST (Salakhutdinov and Murray 2008), OMNIGLOT (Burda, Grosse, and Salakhutdinov 2015), Frey Faces3, and Histopathology (Tomczak and Welling 2016).
Dataset Splits Yes Table 1: Number and dimensions of datasets Dimension Train size Valid size Test size. MNIST 784 50,000 10,000 10,000
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Adam (Kingma and Ba 2014)' as an optimizer, but does not provide specific ancillary software details with version numbers for libraries or full software environments.
Experiment Setup Yes We used two-layer neural networks (500 hidden units per layer) for the encoder, the decoder, and the density ratio estimator. We trained all methods by using Adam (Kingma and Ba 2014) with a mini-batch size of 100 and learning rate in 10 4, 10 3. We set the maximum number of epochs to 1,000 and used earlystopping (Goodfellow, Bengio, and Courville 2016) on the basis of validation data. We set the sample size of the reparameterization trick to L = 1. In addition, we used warmup (Bowman et al. 2015) for the first 100 epochs of Adam. With our approach, we used dropout (Srivastava et al. 2014) in the training of the density ratio estimator since it is likely to over-fit. We set the keep probability of dropout to 50%. We updated the parameter of the density ratio estimator: ψ for 10 epochs during the updating of the parameters of VAE: θ and φ for one epoch. We set the sampling size of Monte Carlo approximation in Eq. (20) to M = N.