Local Expectation Gradients for Black Box Variational Inference

Authors: Michalis Titsias RC AUEB, Miguel Lázaro-Gredilla

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we apply local expectation gradients (Le Grad) to two different types of stochastic variational inference problems and we compare it against the standard stochastic gradient based on the log derivative trick (Ld Grad), that incorporates also variance reduction3, as well as the reparametrization-based gradient (Re Grad) given by eq. (6). In Section 4.1, we consider a two-class classification problem using two digits from the MNIST database and we approximate a Bayesian logistic regression model using stochastic variational inference. Then, in Section 4.2 we consider sigmoid belief networks [11] and we fit them to the binarized version of the MNIST digits.
Researcher Affiliation Collaboration Michalis K. Titsias Athens University of Economics and Business mtitsias@aueb.gr Miguel L azaro-Gredilla Vicarious miguel@vicarious.com
Pseudocode Yes Algorithm 1 Stochastic variational inference using local expectation gradients Input: f(x), qv(x). Initialize v(0), t = 0. repeat Set t = t + 1. Draw pivot sample x(t) qv(x). for i = 1 to n do dvi = Eq(xi|mb(t) i ) h f(x(t) \i , xi) vi log qvi(xi|pa(t) i ) i . vi = vi + ηtdvi. end for until convergence criterion is met.
Open Source Code No The paper does not provide a concrete link to source code or explicitly state that the code is publicly available.
Open Datasets Yes In Section 4.1, we consider a two-class classification problem using two digits from the MNIST database and we approximate a Bayesian logistic regression model using stochastic variational inference.
Dataset Splits No The paper mentions using a "subset of the MNIST dataset that includes all 12660 training examples" and "5 * 10^4 training examples of the binarized MNIST", but does not specify exact training, validation, and test splits (e.g., percentages or counts).
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU specifications, or memory amounts used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes To obtain the local expectation gradient for each (µi, ℓi) we need to apply 1-D numerical integration. We used the quadrature rule having K = 5 nodes4 so that Le Grad was using S = 785 5 function evaluations per gradient estimation. For Ld Grad we also set the number of samples to S = 785 5 so that Le Grad and Ld Grad match exactly in the number of function evaluations and roughly in computational cost. When using the Re Grad approach based on (6) we construct the stochastic gradient using K = 5 target function gradient samples.