On the Value of Infinite Gradients in Variational Autoencoder Models
Authors: Bin Dai, Li Wenliang, David Wipf
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results are displayed in Figure 4(a), where as expected the reconstruction errors are nearly identical, but the learnable γ case leads to much lower MMD values, indicative of a better local solution with reduced under-regularization. We also plot the evolution of the gradient magnitudes d L(θ,φ) dz 2 in Figure 4(b) (other gradients are similar). When γ is learned, the gradient increases slowly; however, with fixed γ = γ , there exists a large gradient right from the start since γ is small but the reconstruction error is high. This contributes to a worse final solution per the results in Figure 4(a). |
| Researcher Affiliation | Collaboration | Bin Dai Institue for Advanced Study Tsinghua University daib09physics@hotmail.com Li K. Wenliang Gatsby Computational Neuroscience Unit University College London kevinli@gatsby.ucl.ac.uk David Wipf Shanghai AI Research Lab Amazon Web Services davidwipf@gmail.com |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about open-source code availability or links to code repositories. |
| Open Datasets | Yes | Additionally, in the supplementary we demonstrate that indeed, if the inlier data (in this case Fashion MNIST samples) come from a low-dimensional manifold, outlier points (MNIST samples) can be reliably differentiated... To this effect, we first train a VAE model on Celeb A data [Liu et al., 2015] and learn an appropriate small value of γ denoted γ . |
| Dataset Splits | No | The paper mentions using Celeb A data for training but does not specify the exact percentages or counts for training, validation, or test splits. It does not provide sufficient detail to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | No | The paper states that 'network and training details' are in the supplementary, but it does not provide these details (e.g., learning rate, batch size, optimizer) in the main text. |