Variational Bayesian dropout: pitfalls and fixes
Authors: Jiri Hron, Alex Matthews, Zoubin Ghahramani
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We show that the proposed framework suffers from several issues; from undefined or pathological behaviour of the true posterior related to use of improper priors, to an ill-defined variational objective due to singularity of the approximating distribution relative to the true posterior. Our analysis of the improper log uniform prior used in variational Gaussian dropout suggests the pathologies are generally irredeemable, and that the algorithm still works only because the variational formulation annuls some of the pathologies. To address the singularity issue, we proffer Quasi-KL (QKL) divergence, a new approximate inference objective for approximation of high-dimensional distributions. We show that motivations for variational Bernoulli dropout based on discretisation and noise have QKL as a limit. Properties of QKL are studied both theoretically and on a simple practical example which shows that the QKL-optimal approximation of a full rank Gaussian with a degenerate one naturally leads to the Principal Component Analysis solution. |
| Researcher Affiliation | Collaboration | 1Department of Engineering, University of Cambridge, Cambridge, United Kingdom 2Uber AI Labs, San Francisco, California, USA. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper does not conduct experiments on datasets; it is a theoretical analysis. Therefore, no access information for a publicly available dataset is provided. |
| Dataset Splits | No | The paper is theoretical and does not involve dataset splits for empirical evaluation. Therefore, no specific dataset split information is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe experiments that would require specific hardware specifications. |
| Software Dependencies | No | The paper mentions 'Johnson, S. G. Faddeeva Package' as an example for numerical implementations but does not specify any software dependencies with version numbers required to reproduce the paper's theoretical contributions or analyses. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or system-level training settings. |