LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations
Authors: Brian Trippe, Jonathan Huggins, Raj Agrawal, Tamara Broderick
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments support our theory and demonstrate the efficacy of LR-GLM on real large-scale datasets. |
| Researcher Affiliation | Academia | 1Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 2Department of Biostatistics, Harvard, Cambridge, MA. |
| Pseudocode | Yes | Algorithm 1 LR-Laplace for Bayesian inference in GLMs with low-rank data approximations and zero-mean prior with computation costs. See Appendix H for the general algorithm. |
| Open Source Code | No | The paper does not contain any statement about making its source code available or provide a link to a code repository. |
| Open Datasets | Yes | The first is the UCI Farm-Ads dataset, which consists of N = 4,143 online advertisements for animal-related topics together with binary labels indicating whether the content provider approved of the ad; there are D = 54,877 bag-of-words features per ad (Dheeru & Karra Taniskidou, 2017). As a second real dataset we evaluated our approach on the Reuters RCV1 text categorization test collection (Amini et al., 2009; Chang & Lin, 2011). RCV1 consists of D = 47,236 bag-of-words features for N = 20,241 English documents grouped into two different categories. |
| Dataset Splits | No | The paper mentions datasets used but does not provide specific percentages or sample counts for training, validation, or test splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance specifications used for running experiments. |
| Software Dependencies | No | The paper mentions software tools like 'Stan' and 'py Stan' but does not provide specific version numbers for these or other software dependencies, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | For synthetic data experiments, we considered logistic regression with covariates of dimension D = 250 and D = 500. In each replicate, we generated the latent parameter from an isotropic Gaussian prior, β N(0, ID), correlated covariates from a multivariate Gaussian, and responses from the logistic regression likelihood (see Appendix A.1 for details)... As a practical rule of thumb, we recommend setting M to be as large as is allowable for the given application without the resulting inference becoming too slow. For our experiments with LR-Laplace, this limit was M 20,000. |