reproducibilityindex.ai

Testing Calibration in Nearly-Linear Time

Authors: Lunjia Hu, Arun Jambulapati, Kevin Tian, Chutong Yang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we present experiments showing the testing problem we define faithfully captures standard notions of calibration, and that our algorithms scale efficiently to accommodate large sample sizes.
Researcher Affiliation	Academia	Lunjia Hu Harvard University lunjia@alumni.stanford.edu Arun Jambulapati University of Michigan jmblpati@gmail.com Kevin Tian University of Texas at Austin kjtian@cs.utexas.edu Chutong Yang University of Texas at Austin cyang98@utexas.edu
Pseudocode	Yes	Algorithm 1 Apply(g, ℓ, r, τ)
Open Source Code	Yes	Our code is included in the supplementary material.
Open Datasets	Yes	We trained a Dense Net40 model [HLvd MW17] on the CIFAR-100 dataset [Kri09]
Dataset Splits	No	The paper mentions synthetic datasets, CIFAR-100, and drawing samples, but does not specify explicit train/validation/test splits, exact percentages, or sample counts for these splits.
Hardware Specification	Yes	The experiments in the first and third part of this section are run on a 2018 laptop with 2.2 GHz 6-Core Intel Core i7 processor. The experiments in the second part are run on a cluster using 2x AMD EPYC 7763 64-Core Processor and a single NVIDIA A100 PCIE 40GB.
Software Dependencies	No	The paper mentions using 'a linear program solver from CVXPY [DB16, AVDB18]' and 'a commercial minimum-cost flow solver from Gurobi Optimization [Opt23]', and 'the Py Py package [Py P19]'. However, it does not specify version numbers for these software dependencies.
Experiment Setup	No	The paper describes training a Dense Net40 model and learning postprocessing functions but does not explicitly provide specific hyperparameter values like learning rate, batch size, or detailed optimizer settings in the main text.