Scalable Spike-and-Slab
Authors: Niloy Biswas, Lester Mackey, Xiao-Li Meng
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply S3 on synthetic and real-world datasets, demonstrating orders of magnitude speed-ups over existing exact samplers and significant gains in inferential quality over approximate samplers with comparable cost. |
| Researcher Affiliation | Collaboration | 1Department of Statistics, Harvard University 2Microsoft Research New England. |
| Pseudocode | Yes | Algorithm 1 An Ω(n2p) sampler of (2) (Bhattacharya et al., 2016); Algorithm 2 Bayesian linear regression with S3; Algorithm 3 Bayesian logistic & probit regression with S3 |
| Open Source Code | Yes | The open-source packages Scale Spike Slab in R and Python (www.github.com/niloyb/Scale Spike Slab) implement our methods and recreate the experiments in this paper. |
| Open Datasets | Yes | We first consider the Gordon microarray dataset (Gordon et al., 2002)... The Malware detection dataset from the UCI machine learning repository (Dua & Graff, 2017)... The Maize GWAS dataset has n = 2266 observations... (Romay et al., 2013; Liu et al., 2016; Zeng & Zhou, 2017). |
| Dataset Splits | Yes | Figures 8 14 (Right) show the 10-fold cross-validation average root-mean-square error (RMSE) against the total time elapsed to run one S3 and one SOTA chain. To compute this evaluation, we partition the observed dataset into 10 folds uniformly at random and, for each fold k, run a chain conditioned on all data outside of fold k and evaluate its performance on the held-out data in the k-th fold. |
| Hardware Specification | Yes | All timings were obtained using a single core of an Apple M1 chip on a Macbook Air 2020 laptop with 16 GB RAM. |
| Software Dependencies | No | The paper mentions 'R and Python' and specific packages like 'skinnybasad R package' and 'mcmcse package', but does not provide specific version numbers for these software components, which is required for reproducibility. |
| Experiment Setup | Yes | For each synthetic dataset, we run S3 for logistic and probit regression and the Skinny Gibbs sampler for 1000 iterations, run the SOTA sampler for 100 iterations... For each synthetic dataset, we implement S3 for logistic and probit regression and the Skinny Gibbs sampler for 5000 iterations with a burn-in of 1000 iterations... we choose τ 2 0 = 1/n, τ 2 1 = max{ p2.1/100n, 1} and q = P(Pp j=1 I{zj = 1} > K) = 0.1 for K = max{10, log n}. |