Understanding Stochastic Natural Gradient Variational Inference
Authors: Kaiwen Wu, Jacob R. Gardner
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section presents supporting numerical simulations on datasets from the UCI repository (Bike and Mushroom) and MNIST (Kelly et al., 2017; Le Cun et al., 1998) [...] Figure 1 presents Bayesian linear regression on the Bike dataset (n = 17, 389)... Figure 2 shows Bayesian logistic regression on the Mushroom dataset (n = 8124) and MNIST... |
| Researcher Affiliation | Academia | Kaiwen Wu 1 Jacob R. Gardner 1 1Department of Computer and Information Science, University of Pennsylvania, Philadelphia, United States. |
| Pseudocode | Yes | Algorithm 1: Auto Differentiation Stochastic Gradient |
| Open Source Code | No | The paper does not provide any explicit statements or links about releasing open-source code for the methodology described. |
| Open Datasets | Yes | datasets from the UCI repository (Bike and Mushroom) and MNIST (Kelly et al., 2017; Le Cun et al., 1998) |
| Dataset Splits | No | The paper does not specify explicit training/validation/test dataset splits (e.g., percentages or counts). |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'torch' and 'torch.distributions' in Algorithm 1, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | SGD uses a step size schedule γt = 1 / (10^5+t)... Stochastic NGD uses a step size schedule γt = 2 / (2+t)... The (negative) ELBO is optimized by SGD and stochastic NGD with a mini-batch size of 1000. and On Mushroom, the step size of SGD is tuned from {10^-3, 10^-4, 10^-5, 10^-6}, while the step size of NGD is tuned from {5*10^-1, 10^-1, 10^-2, 10^-3}. ... We use 10 samples from the variational distribution to estimate the stochastic gradient in every iteration. |