Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement
Authors: Cian Naik, Judith Rousseau, Trevor Campbell
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that our method provides significant improvements in coreset quality against alternatives with comparable construction times, with far less storage cost and user input required. |
| Researcher Affiliation | Academia | Cian Naik Department of Statistics University of Oxford cian.naik@stats.ox.ac.uk Judith Rousseau Department of Statistics University of Oxford judith.rousseau@stats.ox.ac.uk Trevor Campbell Department of Statistics University of British Columbia trevor@stat.ubc.ca |
| Pseudocode | Yes | The pseudocode for the quasi-Newton coreset construction method is shown in Algorithm 1. ... Algorithm 1 QNC (QUASI-NEWTON CORESET) |
| Open Source Code | Yes | Experiments were performed on a machine with a 2.6GHz 6-Core Intel Core i7 processor, and 16GB memory; code is available at https://github.com/trevorcampbell/quasi-newton-coresets-experiments. |
| Open Datasets | Yes | The dataset we study is a flight delays dataset,2 with N = 100, 000 and D = 13... This dataset was constructed by merging airport on-time data from the US Bureau of Transportation Statistics https://www.transtats.bts.gov/DL_Select Fields.asp?gnoyr_VQ=FGJ with historical weather records from https://wunderground.com. |
| Dataset Splits | No | The paper does not provide specific training, validation, and test dataset splits (e.g., percentages or sample counts). It mentions '10 random trials' for experiments but this refers to repeated runs, not data partitioning. While data binarization is mentioned, the exact splits for training/validation are not specified. |
| Hardware Specification | Yes | Experiments were performed on a machine with a 2.6GHz 6-Core Intel Core i7 processor, and 16GB memory; |
| Software Dependencies | No | The paper mentions software used, specifically 'STAN [31]', but does not provide specific version numbers for STAN or any other libraries, which is necessary for reproducibility. |
| Experiment Setup | Yes | In each case, we use S = 500 Monte Carlo samples during coreset construction. ... We set the regularization parameter τ by examining the condition number of ˆGk + τI and keeping it below a reasonable value. We can tune γk using a line search method. ... Thus, we only tune γk for k Ktune, and leave it as a constant thereafter. |