Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement

Authors: Cian Naik, Judith Rousseau, Trevor Campbell

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that our method provides significant improvements in coreset quality against alternatives with comparable construction times, with far less storage cost and user input required.
Researcher Affiliation	Academia	Cian Naik Department of Statistics University of Oxford EMAIL Judith Rousseau Department of Statistics University of Oxford EMAIL Trevor Campbell Department of Statistics University of British Columbia EMAIL
Pseudocode	Yes	The pseudocode for the quasi-Newton coreset construction method is shown in Algorithm 1. ... Algorithm 1 QNC (QUASI-NEWTON CORESET)
Open Source Code	Yes	Experiments were performed on a machine with a 2.6GHz 6-Core Intel Core i7 processor, and 16GB memory; code is available at https://github.com/trevorcampbell/quasi-newton-coresets-experiments.
Open Datasets	Yes	The dataset we study is a flight delays dataset,2 with N = 100, 000 and D = 13... This dataset was constructed by merging airport on-time data from the US Bureau of Transportation Statistics https://www.transtats.bts.gov/DL_Select Fields.asp?gnoyr_VQ=FGJ with historical weather records from https://wunderground.com.
Dataset Splits	No	The paper does not provide specific training, validation, and test dataset splits (e.g., percentages or sample counts). It mentions '10 random trials' for experiments but this refers to repeated runs, not data partitioning. While data binarization is mentioned, the exact splits for training/validation are not specified.
Hardware Specification	Yes	Experiments were performed on a machine with a 2.6GHz 6-Core Intel Core i7 processor, and 16GB memory;
Software Dependencies	No	The paper mentions software used, specifically 'STAN [31]', but does not provide specific version numbers for STAN or any other libraries, which is necessary for reproducibility.
Experiment Setup	Yes	In each case, we use S = 500 Monte Carlo samples during coreset construction. ... We set the regularization parameter τ by examining the condition number of ˆGk + τI and keeping it below a reasonable value. We can tune γk using a line search method. ... Thus, we only tune γk for k Ktune, and leave it as a constant thereafter.