Testing Calibration in Nearly-Linear Time

Authors: Lunjia Hu, Arun Jambulapati, Kevin Tian, Chutong Yang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we present experiments showing the testing problem we define faithfully captures standard notions of calibration, and that our algorithms scale efficiently to accommodate large sample sizes.
Researcher Affiliation Academia Lunjia Hu Harvard University lunjia@alumni.stanford.edu Arun Jambulapati University of Michigan jmblpati@gmail.com Kevin Tian University of Texas at Austin kjtian@cs.utexas.edu Chutong Yang University of Texas at Austin cyang98@utexas.edu
Pseudocode Yes Algorithm 1 Apply(g, ℓ, r, τ)
Open Source Code Yes Our code is included in the supplementary material.
Open Datasets Yes We trained a Dense Net40 model [HLvd MW17] on the CIFAR-100 dataset [Kri09]
Dataset Splits No The paper mentions synthetic datasets, CIFAR-100, and drawing samples, but does not specify explicit train/validation/test splits, exact percentages, or sample counts for these splits.
Hardware Specification Yes The experiments in the first and third part of this section are run on a 2018 laptop with 2.2 GHz 6-Core Intel Core i7 processor. The experiments in the second part are run on a cluster using 2x AMD EPYC 7763 64-Core Processor and a single NVIDIA A100 PCIE 40GB.
Software Dependencies No The paper mentions using 'a linear program solver from CVXPY [DB16, AVDB18]' and 'a commercial minimum-cost flow solver from Gurobi Optimization [Opt23]', and 'the Py Py package [Py P19]'. However, it does not specify version numbers for these software dependencies.
Experiment Setup No The paper describes training a Dense Net40 model and learning postprocessing functions but does not explicitly provide specific hyperparameter values like learning rate, batch size, or detailed optimizer settings in the main text.