Nonparametric learning from Bayesian models with randomized objective functions

Authors: Simon Lyddon, Stephen Walker, Chris C. Holmes

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the approach on a number of examples including VB classifiers and Bayesian random forests. ... 3 Illustrations ... Figure 1 shows a mean-field normal approximation ... We generated 100 observations ... We demonstrate this in practice through a VB logistic regression model fit to the Statlog German Credit dataset ... See Fig. 3 for boxplots of the test accuracy over 100 repetitions.
Researcher Affiliation Academia Simon Lyddon Department of Statistics University of Oxford Oxford, UK lyddon@stats.ox.ac.uk Stephen Walker Department of Mathematics University of Texas at Austin Austin, TX s.g.walker@math.utexas.edu Chris Holmes Department of Statistics University of Oxford Oxford, UK cholmes@stats.ox.ac.uk
Pseudocode Yes Algorithm 1: The Posterior Bootstrap Data: Dataset x1:n = (x1, . . . , xn). Parameter of interest α0 = α(F0) = arg maxα R u(x, α)d F0(x). Mixing posterior π(γ|x1:n), concentration parameter c, centering model fγ(x). Number of centering model samples T. begin for i = 1, . . . , B do Draw centering model parameter γ(i) π(γ|x1:n); Draw posterior pseudo-sample x(i) (n+1):(n+T ) iid fγ(i); Generate weights (w(i) 1 , . . . , w(i) n , w(i) n+1, . . . , w(i) n+T ) Dirichlet(1, . . . , 1, c/T, . . . , c/T); Compute parameter update α(i) = arg maxα j=1 w(i) j u(xj, α) + TP j=1 w(i) n+ju(x(i) n+j, α) end Return NP posterior sample { α(i)}B i=1. end
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Statlog German Credit dataset, containing 1000 observations and 25 covariates (including intercept), from the UCI ML repository [10], preprocessing via [11]. ... [10] Dua Dheeru and EfiKarra Taniskidou. UCI Machine Learning Repository, 2017.
Dataset Splits No The paper mentions generating synthetic data and using the Statlog German Credit dataset, and for the BRF experiments mentions 'training and test dataset of equal size', but it does not provide specific train/validation/test split percentages, sample counts, or references to predefined standard splits for reproducibility.
Hardware Specification Yes The runtime to generate 1 million samples by MCMC ..., was 33 minutes, compared to 21 seconds with NPL, using an m5.24xlarge AWS instance with 96 v CPUs; a speed-up of 95 times.
Software Dependencies No The paper mentions using 'automatic differentiation variational inference (ADVI) in Stan [17]' but does not provide a specific version number for Stan or any other software dependencies.
Experiment Setup Yes We generated 100 observations from a bivariate normal distribution, centered at ( 1 2), with dimension-wise variances both equal to 1 and correlation equal to 0.9, and independent normal priors on each dimension, both centered at 0 with variance 102. Each posterior contour plotted is based on 10, 000 posterior samples. ... An independent normal prior with variance 100 was assigned to each covariate, and 1000 posterior samples were generated for each method. ... setting c equal to the number of observations in the prior dataset ... over 100 repetitions.