Variational Bayesian dropout: pitfalls and fixes

Authors: Jiri Hron, Alex Matthews, Zoubin Ghahramani

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We show that the proposed framework suffers from several issues; from undefined or pathological behaviour of the true posterior related to use of improper priors, to an ill-defined variational objective due to singularity of the approximating distribution relative to the true posterior. Our analysis of the improper log uniform prior used in variational Gaussian dropout suggests the pathologies are generally irredeemable, and that the algorithm still works only because the variational formulation annuls some of the pathologies. To address the singularity issue, we proffer Quasi-KL (QKL) divergence, a new approximate inference objective for approximation of high-dimensional distributions. We show that motivations for variational Bernoulli dropout based on discretisation and noise have QKL as a limit. Properties of QKL are studied both theoretically and on a simple practical example which shows that the QKL-optimal approximation of a full rank Gaussian with a degenerate one naturally leads to the Principal Component Analysis solution.
Researcher Affiliation Collaboration 1Department of Engineering, University of Cambridge, Cambridge, United Kingdom 2Uber AI Labs, San Francisco, California, USA.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper does not conduct experiments on datasets; it is a theoretical analysis. Therefore, no access information for a publicly available dataset is provided.
Dataset Splits No The paper is theoretical and does not involve dataset splits for empirical evaluation. Therefore, no specific dataset split information is provided.
Hardware Specification No The paper is theoretical and does not describe experiments that would require specific hardware specifications.
Software Dependencies No The paper mentions 'Johnson, S. G. Faddeeva Package' as an example for numerical implementations but does not specify any software dependencies with version numbers required to reproduce the paper's theoretical contributions or analyses.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or system-level training settings.