MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders
Authors: Xuezhe Ma, Chunting Zhou, Eduard Hovy
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three image benchmark datasets demonstrate that, when equipped with powerful decoders, our model performs well both on density estimation and representation learning. |
| Researcher Affiliation | Academia | Xuezhe Ma, Chunting Zhou & Eduard Hovy Carnegie Mellon University {xuezhem, ctzhou, ehovy}@cs.cmu.edu |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide explicit links to open-source code for the described methodology or state that code is made available. |
| Open Datasets | Yes | We evaluate MAE on two binary images that are commonly used for evaluating deep generative models: MNIST (Le Cun et al., 1998) and OMNIGLOT (Lake et al., 2013; Burda et al., 2015), both with dynamically binarized version (Burda et al., 2015). ... In addition to binary image datasets, we also applied MAE to CIFAR-10 dataset (Krizhevsky & Hinton, 2009) of natural images. |
| Dataset Splits | No | The paper does not explicitly provide details on dataset splits (e.g., specific percentages or counts for training, validation, and test sets), beyond mentioning the datasets used. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Adam optimizer (JLB, 2015)" but does not specify version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | For hyperparameters η and γ, we explored a few configurations: η is selected from [0.5, 1.0, 2.0], and γ from [0.1, 0.5, 1.0]. ... In term of training, we use Adam optimizer (JLB, 2015) with learning rate 0.001, instead of Adamax used in Chen et al. (2017a). 0.01 nats/data-dim free bits was used in all the experiments. In order to get a relatively accurate approximation of Ldiverse and Lsmooth, we used a much larger batch size 100 in our experiments. Polyak averaging (Polyak & Juditsky, 1992) was used to compute the final parameters, with α = 0.999. |