The Usual Suspects? Reassessing Blame for VAE Posterior Collapse

Authors: Bin Dai, Ziyu Wang, David Wipf

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we empirically demonstrate the existence of bad AE local minima with high reconstruction errors at increasing depth, as well as the association between these bad minima and imminent VAE posterior collapse. For this purpose, we first train fully connected AE and VAE models with 1, 2, 4, 6, 8 and 10 hidden layers on the Fashion MNIST dataset (Xiao et al., 2017). Each hidden layer is 512-dimensional and followed by ReLU activations (see the supplementary file for further details). The reconstruction error is shown in Figure 1(top) (see supplementary for repeated trials and error bars, as well as complementary FID scores).
Researcher Affiliation Collaboration 1Samsung Research China-Beijing 2Tsinghua University 3AWS Shanghai AI Lab.
Pseudocode No The paper describes mathematical formulations and theoretical proofs for VAE models, but it does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement about making its source code publicly available or provide a link to a code repository.
Open Datasets Yes For this purpose, we first train fully connected AE and VAE models with 1, 2, 4, 6, 8 and 10 hidden layers on the Fashion MNIST dataset (Xiao et al., 2017). We next train AE and VAE models using a more complex convolutional network on Cifar100 data (Krizhevsky & Hinton, 2009).
Dataset Splits No The paper mentions using datasets like Fashion MNIST and Cifar100 for training, but it does not specify exact training, validation, or test split percentages, absolute sample counts, or refer to predefined splits with citations for reproducibility in the main text.
Hardware Specification No The paper describes the experimental setup and results for training deep learning models but does not provide any specific details about the hardware used, such as GPU or CPU models, or cloud computing specifications.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions) that would be needed to replicate the experiments.
Experiment Setup Yes For this purpose, we first train fully connected AE and VAE models with 1, 2, 4, 6, 8 and 10 hidden layers on the Fashion MNIST dataset (Xiao et al., 2017). Each hidden layer is 512-dimensional and followed by ReLU activations. At each spatial scale, we use 1 to 5 convolution layers followed by ReLU activations.