MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders

Authors: Xuezhe Ma, Chunting Zhou, Eduard Hovy

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on three image benchmark datasets demonstrate that, when equipped with powerful decoders, our model performs well both on density estimation and representation learning.
Researcher Affiliation Academia Xuezhe Ma, Chunting Zhou & Eduard Hovy Carnegie Mellon University {xuezhem, ctzhou, ehovy}@cs.cmu.edu
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide explicit links to open-source code for the described methodology or state that code is made available.
Open Datasets Yes We evaluate MAE on two binary images that are commonly used for evaluating deep generative models: MNIST (Le Cun et al., 1998) and OMNIGLOT (Lake et al., 2013; Burda et al., 2015), both with dynamically binarized version (Burda et al., 2015). ... In addition to binary image datasets, we also applied MAE to CIFAR-10 dataset (Krizhevsky & Hinton, 2009) of natural images.
Dataset Splits No The paper does not explicitly provide details on dataset splits (e.g., specific percentages or counts for training, validation, and test sets), beyond mentioning the datasets used.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using "Adam optimizer (JLB, 2015)" but does not specify version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup Yes For hyperparameters η and γ, we explored a few configurations: η is selected from [0.5, 1.0, 2.0], and γ from [0.1, 0.5, 1.0]. ... In term of training, we use Adam optimizer (JLB, 2015) with learning rate 0.001, instead of Adamax used in Chen et al. (2017a). 0.01 nats/data-dim free bits was used in all the experiments. In order to get a relatively accurate approximation of Ldiverse and Lsmooth, we used a much larger batch size 100 in our experiments. Polyak averaging (Polyak & Juditsky, 1992) was used to compute the final parameters, with α = 0.999.