Posterior Collapse of a Linear Latent Variable Model
Authors: Zihao Wang, Liu Ziyin
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section empirically examines our theoretical claims for linear models and demonstrates that our key theoretical insights generalize well to nonlinear models and natural data. We illustrate our results on both synthetic data and natural data. For synthetic data, we sample input data x from multivariate normal distribution N(0,A), and target data y = Mx is obtained by a linear transformation. Specifically, we choose d0 = d2 = 5. As an example of natural data, we also experiment with the standard MNIST data. The model is optimized by Adam with a learning rate of 10-3. The results are reported after the convergence. For MNIST, the learning rate is 10-4. |
| Researcher Affiliation | Academia | Zihao Wang Department of CSE HKUST Liu Ziyin Department of Physics The University of Tokyo |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] The experiments are only for demonstration and are straightforward to reproduce following the theory. |
| Open Datasets | Yes | As an example of natural data, we also experiment with the standard MNIST data. |
| Dataset Splits | No | The paper states it uses 'synthetic data' and 'standard MNIST data' but does not specify any training, validation, or test splits (e.g., 80/10/10 split, or specific sample counts for each). |
| Hardware Specification | Yes | Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] They are done on a single 3080Ti GPU. |
| Software Dependencies | No | The paper mentions that the model is optimized by 'Adam' and implicitly uses a programming language for implementation, but it does not specify any software libraries or frameworks with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8'). |
| Experiment Setup | Yes | For non-linear VAE models, we consider two-layer fully connected neural networks for the encoder and decoder with both ReLU and Tanh activation functions and with hidden dimension dh. For synthetic dataset dh = 8, and dh = 2048 for real-world data. The model is optimized by Adam with a learning rate of 10-3. The results are reported after the convergence. For MNIST, the learning rate is 10-4. |