Scalable Spike-and-Slab

Authors: Niloy Biswas, Lester Mackey, Xiao-Li Meng

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply S3 on synthetic and real-world datasets, demonstrating orders of magnitude speed-ups over existing exact samplers and significant gains in inferential quality over approximate samplers with comparable cost.
Researcher Affiliation Collaboration 1Department of Statistics, Harvard University 2Microsoft Research New England.
Pseudocode Yes Algorithm 1 An Ω(n2p) sampler of (2) (Bhattacharya et al., 2016); Algorithm 2 Bayesian linear regression with S3; Algorithm 3 Bayesian logistic & probit regression with S3
Open Source Code Yes The open-source packages Scale Spike Slab in R and Python (www.github.com/niloyb/Scale Spike Slab) implement our methods and recreate the experiments in this paper.
Open Datasets Yes We first consider the Gordon microarray dataset (Gordon et al., 2002)... The Malware detection dataset from the UCI machine learning repository (Dua & Graff, 2017)... The Maize GWAS dataset has n = 2266 observations... (Romay et al., 2013; Liu et al., 2016; Zeng & Zhou, 2017).
Dataset Splits Yes Figures 8 14 (Right) show the 10-fold cross-validation average root-mean-square error (RMSE) against the total time elapsed to run one S3 and one SOTA chain. To compute this evaluation, we partition the observed dataset into 10 folds uniformly at random and, for each fold k, run a chain conditioned on all data outside of fold k and evaluate its performance on the held-out data in the k-th fold.
Hardware Specification Yes All timings were obtained using a single core of an Apple M1 chip on a Macbook Air 2020 laptop with 16 GB RAM.
Software Dependencies No The paper mentions 'R and Python' and specific packages like 'skinnybasad R package' and 'mcmcse package', but does not provide specific version numbers for these software components, which is required for reproducibility.
Experiment Setup Yes For each synthetic dataset, we run S3 for logistic and probit regression and the Skinny Gibbs sampler for 1000 iterations, run the SOTA sampler for 100 iterations... For each synthetic dataset, we implement S3 for logistic and probit regression and the Skinny Gibbs sampler for 5000 iterations with a burn-in of 1000 iterations... we choose τ 2 0 = 1/n, τ 2 1 = max{ p2.1/100n, 1} and q = P(Pp j=1 I{zj = 1} > K) = 0.1 for K = max{10, log n}.