Understanding Stochastic Natural Gradient Variational Inference

Authors: Kaiwen Wu, Jacob R. Gardner

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section presents supporting numerical simulations on datasets from the UCI repository (Bike and Mushroom) and MNIST (Kelly et al., 2017; Le Cun et al., 1998) [...] Figure 1 presents Bayesian linear regression on the Bike dataset (n = 17, 389)... Figure 2 shows Bayesian logistic regression on the Mushroom dataset (n = 8124) and MNIST...
Researcher Affiliation Academia Kaiwen Wu 1 Jacob R. Gardner 1 1Department of Computer and Information Science, University of Pennsylvania, Philadelphia, United States.
Pseudocode Yes Algorithm 1: Auto Differentiation Stochastic Gradient
Open Source Code No The paper does not provide any explicit statements or links about releasing open-source code for the methodology described.
Open Datasets Yes datasets from the UCI repository (Bike and Mushroom) and MNIST (Kelly et al., 2017; Le Cun et al., 1998)
Dataset Splits No The paper does not specify explicit training/validation/test dataset splits (e.g., percentages or counts).
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'torch' and 'torch.distributions' in Algorithm 1, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes SGD uses a step size schedule γt = 1 / (10^5+t)... Stochastic NGD uses a step size schedule γt = 2 / (2+t)... The (negative) ELBO is optimized by SGD and stochastic NGD with a mini-batch size of 1000. and On Mushroom, the step size of SGD is tuned from {10^-3, 10^-4, 10^-5, 10^-6}, while the step size of NGD is tuned from {5*10^-1, 10^-1, 10^-2, 10^-3}. ... We use 10 samples from the variational distribution to estimate the stochastic gradient in every iteration.