f-Divergence Variational Inference

Authors: Neng Wan, Dapeng Li, NAIRA HOVAKIMYAN

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical examples, including variational autoencoders and Bayesian neural networks, are provided to demonstrate the effectiveness and the wide applicability of f-VI.
Researcher Affiliation Collaboration Neng Wan1 nengwan2@illinois.edu Dapeng Li2 dapeng.ustc@gmail.com Naira Hovakimyan1 nhovakim@illinois.edu 1 University of Illinois at Urbana-Champaign, Urbana, IL 61801 2 Anker Innovations, Shenzhen, China
Pseudocode No A reference black-box f-VI algorithm and the optimization schemes for a few concrete divergences are given in the SM. ... A reference mean-field VI algorithm along with a concrete realization example under KL divergence is provided in the SM.
Open Source Code No The paper does not provide a direct link to the source code for the methodology or explicitly state that the code is publicly released in the main text.
Open Datasets Yes The linear regression is performed with twelve datasets from the UCI Machine Learning Repository [36]. ... Bayesian VAE for image reconstruction and generation on the datasets of Caltech 101 Silhouettes [37], Frey Face [38], MNIST [39], and Omniglot [40].
Dataset Splits Yes Each dataset is randomly split into 90%/10% for training and testing
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No Adam optimizer with recommended parameters in [35] is employed for stochastic optimization, if not specified. (Note: No version number for Adam is provided.)
Experiment Setup Yes Adam optimizer with recommended parameters in [35] is employed for stochastic optimization, if not specified. ... The IW-reparameterization gradient (14) with L = 3 and K = 1000 is adopted for the training on a dataset of 500 observations... The IW-reparameterization gradient with L = 5, K = 50 and mini-batch size of 32 is employed for training. After 20 trials with 500 training epochs in each trial... The reparameterization gradient with K = 3, L = 1 is used for training. After 20 trials with 200 training epochs in each trial...