Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Spike and slab variational Bayes for high dimensional logistic regression
Authors: Kolyan Ray, Botond Szabo, Gabriel Clara
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We confirm the improved performance of our VB algorithm over common sparse VB approaches in a numerical study. We verify in a numerical study that the empirical performance of the proposed method reflects these theoretical guarantees. |
| Researcher Affiliation | Academia | Kolyan Ray Department of Mathematics Imperial College London EMAIL Botond Szabó Department of Mathematics Vrije Universiteit Amsterdam EMAIL Gabriel Clara Department of Mathematics Vrije Universiteit Amsterdam EMAIL |
| Pseudocode | Yes | Algorithm 1: Modified CAVI for variational Bayes with Laplace slabs |
| Open Source Code | Yes | We are currently working on a more efficient implementation as an R-package sparsevb [15] that should improve the run-time. [15] CLARA, G., SZABO, B., AND RAY, K. sparsevb: spike and slab variational Bayes for linear and logistic regression, 2020. R package version 1.0. |
| Open Datasets | No | The paper describes synthetic data generation, e.g., "We take n = 250, p = 500 and X to be a standard Gaussian design matrix, i.e. Xij iid N(0, 1), and set the true signal θ0 = (2, 2, 0, . . . , 0)T to be s = 2 sparse." It does not provide access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions "n training examples" but does not specify any training, validation, or test dataset splits (e.g., percentages, counts, or a specific splitting methodology) needed for reproduction. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments were found in the paper. |
| Software Dependencies | No | The paper mentions "C++ using the Rcpp interface and used the Armadillo linear algebra library and ensmallen optimization library" but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We take n = 250, p = 500 and X to be a standard Gaussian design matrix, i.e. Xij iid N(0, 1), and set the true signal θ0 = (2, 2, 0, . . . , 0)T to be s = 2 sparse. We ran the experiment 200 times for each method and report the means and standard deviations of the following performance measures:... |