Coresets for Scalable Bayesian Logistic Regression
Authors: Jonathan Huggins, Trevor Campbell, Tamara Broderick
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the performance of the logistic regression coreset algorithm on a number of synthetic and real-world datasets. Experiments on a variety of synthetic and real-world datasets validate our approach and demonstrate robustness to the choice of algorithm hyperparameters. |
| Researcher Affiliation | Academia | Jonathan H. Huggins Trevor Campbell Tamara Broderick Computer Science and Artiļ¬cial Intelligence Laboratory, MIT {jhuggins@, tdjc@, tbroderick@csail.}mit.edu |
| Pseudocode | Yes | Algorithm 1 Construction of logistic regression coreset |
| Open Source Code | Yes | Code to recreate all of our experiments is available at https://bitbucket.org/jhhuggins/lrcoresets. |
| Open Datasets | Yes | The CHEMREACT dataset consists of N = 26,733 chemicals... The WEBSPAM corpus consists of N = 350,000 web pages... The cover type (COVTYPE) dataset consists of N = 581,012 cartographic observations... (Synthetic data generation refers to Scott et al. [21]) |
| Dataset Splits | No | The paper specifies test set sizes (e.g., '10^3 additional data points were generated for testing' for synthetic data, and '2,500 (resp. 50,000 and 29,000) data points of the CHEMREACT (resp. WEBSPAM and COVTYPE) dataset were held out for testing' for real data), but does not explicitly provide information about a separate validation dataset split. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper states 'We implemented the logistic regression coreset algorithm in Python' but does not provide specific version numbers for Python or any other key software dependencies. |
| Experiment Setup | Yes | We ran adaptive MALA for 100,000 iterations on the full dataset and each subsampled dataset. For the synthetic datasets... we used k = 4 while for the real-world datasets... we used k = 6. We used a heuristic to choose R as large as was feasible... Our experiments used a = 3. |