LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

Authors: Brian Trippe, Jonathan Huggins, Raj Agrawal, Tamara Broderick

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments support our theory and demonstrate the efficacy of LR-GLM on real large-scale datasets.
Researcher Affiliation Academia 1Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 2Department of Biostatistics, Harvard, Cambridge, MA.
Pseudocode Yes Algorithm 1 LR-Laplace for Bayesian inference in GLMs with low-rank data approximations and zero-mean prior with computation costs. See Appendix H for the general algorithm.
Open Source Code No The paper does not contain any statement about making its source code available or provide a link to a code repository.
Open Datasets Yes The first is the UCI Farm-Ads dataset, which consists of N = 4,143 online advertisements for animal-related topics together with binary labels indicating whether the content provider approved of the ad; there are D = 54,877 bag-of-words features per ad (Dheeru & Karra Taniskidou, 2017). As a second real dataset we evaluated our approach on the Reuters RCV1 text categorization test collection (Amini et al., 2009; Chang & Lin, 2011). RCV1 consists of D = 47,236 bag-of-words features for N = 20,241 English documents grouped into two different categories.
Dataset Splits No The paper mentions datasets used but does not provide specific percentages or sample counts for training, validation, or test splits.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance specifications used for running experiments.
Software Dependencies No The paper mentions software tools like 'Stan' and 'py Stan' but does not provide specific version numbers for these or other software dependencies, which are necessary for reproducible descriptions.
Experiment Setup Yes For synthetic data experiments, we considered logistic regression with covariates of dimension D = 250 and D = 500. In each replicate, we generated the latent parameter from an isotropic Gaussian prior, β N(0, ID), correlated covariates from a multivariate Gaussian, and responses from the logistic regression likelihood (see Appendix A.1 for details)... As a practical rule of thumb, we recommend setting M to be as large as is allowable for the given application without the resulting inference becoming too slow. For our experiments with LR-Laplace, this limit was M 20,000.