reproducibilityindex.ai

Scalable Spike-and-Slab

Authors: Niloy Biswas, Lester Mackey, Xiao-Li Meng

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply S3 on synthetic and real-world datasets, demonstrating orders of magnitude speed-ups over existing exact samplers and significant gains in inferential quality over approximate samplers with comparable cost.
Researcher Affiliation	Collaboration	1Department of Statistics, Harvard University 2Microsoft Research New England.
Pseudocode	Yes	Algorithm 1 An Ω(n2p) sampler of (2) (Bhattacharya et al., 2016); Algorithm 2 Bayesian linear regression with S3; Algorithm 3 Bayesian logistic & probit regression with S3
Open Source Code	Yes	The open-source packages Scale Spike Slab in R and Python (www.github.com/niloyb/Scale Spike Slab) implement our methods and recreate the experiments in this paper.
Open Datasets	Yes	We first consider the Gordon microarray dataset (Gordon et al., 2002)... The Malware detection dataset from the UCI machine learning repository (Dua & Graff, 2017)... The Maize GWAS dataset has n = 2266 observations... (Romay et al., 2013; Liu et al., 2016; Zeng & Zhou, 2017).
Dataset Splits	Yes	Figures 8 14 (Right) show the 10-fold cross-validation average root-mean-square error (RMSE) against the total time elapsed to run one S3 and one SOTA chain. To compute this evaluation, we partition the observed dataset into 10 folds uniformly at random and, for each fold k, run a chain conditioned on all data outside of fold k and evaluate its performance on the held-out data in the k-th fold.
Hardware Specification	Yes	All timings were obtained using a single core of an Apple M1 chip on a Macbook Air 2020 laptop with 16 GB RAM.
Software Dependencies	No	The paper mentions 'R and Python' and specific packages like 'skinnybasad R package' and 'mcmcse package', but does not provide specific version numbers for these software components, which is required for reproducibility.
Experiment Setup	Yes	For each synthetic dataset, we run S3 for logistic and probit regression and the Skinny Gibbs sampler for 1000 iterations, run the SOTA sampler for 100 iterations... For each synthetic dataset, we implement S3 for logistic and probit regression and the Skinny Gibbs sampler for 5000 iterations with a burn-in of 1000 iterations... we choose τ 2 0 = 1/n, τ 2 1 = max{ p2.1/100n, 1} and q = P(Pp j=1 I{zj = 1} > K) = 0.1 for K = max{10, log n}.