Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement

Authors: Cian Naik, Judith Rousseau, Trevor Campbell

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that our method provides significant improvements in coreset quality against alternatives with comparable construction times, with far less storage cost and user input required.
Researcher Affiliation Academia Cian Naik Department of Statistics University of Oxford cian.naik@stats.ox.ac.uk Judith Rousseau Department of Statistics University of Oxford judith.rousseau@stats.ox.ac.uk Trevor Campbell Department of Statistics University of British Columbia trevor@stat.ubc.ca
Pseudocode Yes The pseudocode for the quasi-Newton coreset construction method is shown in Algorithm 1. ... Algorithm 1 QNC (QUASI-NEWTON CORESET)
Open Source Code Yes Experiments were performed on a machine with a 2.6GHz 6-Core Intel Core i7 processor, and 16GB memory; code is available at https://github.com/trevorcampbell/quasi-newton-coresets-experiments.
Open Datasets Yes The dataset we study is a flight delays dataset,2 with N = 100, 000 and D = 13... This dataset was constructed by merging airport on-time data from the US Bureau of Transportation Statistics https://www.transtats.bts.gov/DL_Select Fields.asp?gnoyr_VQ=FGJ with historical weather records from https://wunderground.com.
Dataset Splits No The paper does not provide specific training, validation, and test dataset splits (e.g., percentages or sample counts). It mentions '10 random trials' for experiments but this refers to repeated runs, not data partitioning. While data binarization is mentioned, the exact splits for training/validation are not specified.
Hardware Specification Yes Experiments were performed on a machine with a 2.6GHz 6-Core Intel Core i7 processor, and 16GB memory;
Software Dependencies No The paper mentions software used, specifically 'STAN [31]', but does not provide specific version numbers for STAN or any other libraries, which is necessary for reproducibility.
Experiment Setup Yes In each case, we use S = 500 Monte Carlo samples during coreset construction. ... We set the regularization parameter τ by examining the condition number of ˆGk + τI and keeping it below a reasonable value. We can tune γk using a line search method. ... Thus, we only tune γk for k Ktune, and leave it as a constant thereafter.