Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Coresets for Scalable Bayesian Logistic Regression
Authors: Jonathan Huggins, Trevor Campbell, Tamara Broderick
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the performance of the logistic regression coreset algorithm on a number of synthetic and real-world datasets. Experiments on a variety of synthetic and real-world datasets validate our approach and demonstrate robustness to the choice of algorithm hyperparameters. |
| Researcher Affiliation | Academia | Jonathan H. Huggins Trevor Campbell Tamara Broderick Computer Science and Arti๏ฌcial Intelligence Laboratory, MIT {jhuggins@, tdjc@, tbroderick@csail.}mit.edu |
| Pseudocode | Yes | Algorithm 1 Construction of logistic regression coreset |
| Open Source Code | Yes | Code to recreate all of our experiments is available at https://bitbucket.org/jhhuggins/lrcoresets. |
| Open Datasets | Yes | The CHEMREACT dataset consists of N = 26,733 chemicals... The WEBSPAM corpus consists of N = 350,000 web pages... The cover type (COVTYPE) dataset consists of N = 581,012 cartographic observations... (Synthetic data generation refers to Scott et al. [21]) |
| Dataset Splits | No | The paper specifies test set sizes (e.g., '10^3 additional data points were generated for testing' for synthetic data, and '2,500 (resp. 50,000 and 29,000) data points of the CHEMREACT (resp. WEBSPAM and COVTYPE) dataset were held out for testing' for real data), but does not explicitly provide information about a separate validation dataset split. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper states 'We implemented the logistic regression coreset algorithm in Python' but does not provide specific version numbers for Python or any other key software dependencies. |
| Experiment Setup | Yes | We ran adaptive MALA for 100,000 iterations on the full dataset and each subsampled dataset. For the synthetic datasets... we used k = 4 while for the real-world datasets... we used k = 6. We used a heuristic to choose R as large as was feasible... Our experiments used a = 3. |