reproducibilityindex.ai

Local Expectation Gradients for Black Box Variational Inference

Authors: Michalis Titsias RC AUEB, Miguel Lázaro-Gredilla

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we apply local expectation gradients (Le Grad) to two different types of stochastic variational inference problems and we compare it against the standard stochastic gradient based on the log derivative trick (Ld Grad), that incorporates also variance reduction3, as well as the reparametrization-based gradient (Re Grad) given by eq. (6). In Section 4.1, we consider a two-class classiﬁcation problem using two digits from the MNIST database and we approximate a Bayesian logistic regression model using stochastic variational inference. Then, in Section 4.2 we consider sigmoid belief networks [11] and we ﬁt them to the binarized version of the MNIST digits.
Researcher Affiliation	Collaboration	Michalis K. Titsias Athens University of Economics and Business mtitsias@aueb.gr Miguel L azaro-Gredilla Vicarious miguel@vicarious.com
Pseudocode	Yes	Algorithm 1 Stochastic variational inference using local expectation gradients Input: f(x), qv(x). Initialize v(0), t = 0. repeat Set t = t + 1. Draw pivot sample x(t) qv(x). for i = 1 to n do dvi = Eq(xi\|mb(t) i ) h f(x(t) \i , xi) vi log qvi(xi\|pa(t) i ) i . vi = vi + ηtdvi. end for until convergence criterion is met.
Open Source Code	No	The paper does not provide a concrete link to source code or explicitly state that the code is publicly available.
Open Datasets	Yes	In Section 4.1, we consider a two-class classiﬁcation problem using two digits from the MNIST database and we approximate a Bayesian logistic regression model using stochastic variational inference.
Dataset Splits	No	The paper mentions using a "subset of the MNIST dataset that includes all 12660 training examples" and "5 * 10^4 training examples of the binarized MNIST", but does not specify exact training, validation, and test splits (e.g., percentages or counts).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU specifications, or memory amounts used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	To obtain the local expectation gradient for each (µi, ℓi) we need to apply 1-D numerical integration. We used the quadrature rule having K = 5 nodes4 so that Le Grad was using S = 785 5 function evaluations per gradient estimation. For Ld Grad we also set the number of samples to S = 785 5 so that Le Grad and Ld Grad match exactly in the number of function evaluations and roughly in computational cost. When using the Re Grad approach based on (6) we construct the stochastic gradient using K = 5 target function gradient samples.