PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference

Authors: Jonathan Huggins, Ryan P. Adams, Tamara Broderick

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our approach empirically in the case of logistic regression using a quadratic approximation and show competitive performance with stochastic gradient descent, MCMC, and the Laplace approximation in terms of speed and multiple measures of accuracy including on an advertising data set with 40 million data points and 20,000 covariates. We demonstrate empirically that PASS-GLM can be scaled with almost no loss of efficiency to multi-core architectures. We show on a number of real-world datasets including a large, high-dimensional advertising dataset (40 million examples with 20,000 dimensions) that PASS-GLM provides an attractive trade-off between computation and accuracy.
Researcher Affiliation Collaboration Jonathan H. Huggins CSAIL, MIT jhuggins@mit.edu Ryan P. Adams Google Brain and Princeton rpa@princeton.edu Tamara Broderick CSAIL, MIT tbroderick@csail.mit.edu
Pseudocode Yes Algorithm 1 PASS-GLM inference
Open Source Code Yes Code is available at https://bitbucket.org/jhhuggins/pass-glm.
Open Datasets No The CHEMREACT dataset consists of N = 26,733 chemicals, each with d = 100 properties. The WEBSPAM corpus consists of N = 350,000 web pages and the covariates consist of the d = 127 features that each appear in at least 25 documents. The cover type (COVTYPE) dataset consists of N = 581,012 cartographic observations with d = 54 features. The CODRNA dataset consists of N = 488,565 and d = 8 RNA-related features. ... on a subset of 40 million data points from the Criteo terabyte ad click prediction dataset (CRITEO). No specific access links or formal citations for these datasets are provided within the paper.
Dataset Splits No The paper discusses
Hardware Specification No We demonstrate experimentally that PASS-GLM can be scaled with almost no loss of efficiency to multi-core architectures. To validate the efficiency of distributed computation with PASS-LR2, we compared running times on 6M examples with dimensionality reduced to 1,000 when using 1–22 cores. No specific hardware models (GPU/CPU) or detailed specifications are mentioned.
Software Dependencies No We used Ray to implement the distributed version of PASS-LR2 [28]. No specific version numbers for Ray or other software dependencies are provided.
Experiment Setup No SGD was run for between 1 and 20 epochs. The true posterior was estimated by running three chains of adaptive MALA for 50,000 iterations, which produced Gelman-Rubin statistics well below 1.1 for all datasets. This provides some detail for comparison methods, but not for the proposed PASS-GLM in terms of hyperparameters or general training setup like learning rates, batch sizes, optimizers, etc. for reproducibility.