Stochastic Expectation Propagation

Authors: Yingzhen Li, José Miguel Hernández-Lobato, Richard E. Turner

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on a number of canonical learning problems using synthetic and real-world datasets indicate that SEP performs almost as well as full EP, but reduces the memory consumption by a factor of N.
Researcher Affiliation Academia Yingzhen Li University of Cambridge Cambridge, CB2 1PZ, UK yl494@cam.ac.uk Jose Miguel Hern andez-Lobato Harvard University Cambridge, MA 02138 USA jmh@seas.harvard.edu Richard E. Turner University of Cambridge Cambridge, CB2 1PZ, UK ret26@cam.ac.uk
Pseudocode Yes Algorithm 1 EP; Algorithm 2 ADF; Algorithm 3 SEP
Open Source Code No The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes To verify whether these conclusions about the granularity of the approximation hold in real datasets, we sampled N = 1, 000 datapoints for each of the digits in MNIST and performed odd-vs-even classification. Finally, we tested SEP s performance on six small binary classification datasets from the UCI machine learning repository.1
Dataset Splits No The paper discusses 'training data' and 'test set' usage and presents 'test results', but it does not explicitly specify the proportions or methodology for training/validation/test dataset splits, such as '80/10/10 split' or 'cross-validation setup'.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., programming language versions, library versions, or solver versions).
Experiment Setup Yes The model comprised a probit likelihood function P(yn = 1|θ) = Φ(θT xn) and a Gaussian prior over the hyper-plane parameter p(θ) = N(θ; 0, γI). We considered neural networks with 50 hidden units (except for Year and Protein which we used 100). We ran the tests with damping and stopped learning after convergence (by monitoring the updates of approximating factors).