Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Procrastinating with Confidence: Near-Optimal, Anytime, Adaptive Algorithm Configuration

Authors: Robert Kleinberg, Kevin Leyton-Brown, Brendan Lucier, Devon Graham

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically both that such settings arise frequently in practice and that the anytime property is useful for ﬁnding good conﬁgurations quickly. 5 Experimental Results We experiment with SPC on the benchmark set of runtimes generated by Weisz et al. (2018b) for testing LEAPSANDBOUNDS.
Researcher Affiliation	Collaboration	Robert Kleinberg Department of Computer Science Cornell University EMAIL Kevin Leyton-Brown Department of Computer Science University of British Columbia EMAIL Brendan Lucier Microsoft Research EMAIL Devon Graham Department of Computer Science University of British Columbia EMAIL
Pseudocode	Yes	Algorithm 1: Structured Procrastination w/ Conﬁdence
Open Source Code	Yes	3Code to reproduce experiments is available at https://github.com/drgrhm/alg_config
Open Datasets	Yes	We experiment with SPC on the benchmark set of runtimes generated by Weisz et al. (2018b) for testing LEAPSANDBOUNDS. This data consists of pre-computed runtimes for 972 conﬁgurations of the minisat (Sorensson & Een, 2005) SAT solver on 20118 SAT instances generated using CNFuzz DD4.4http://fmv.jku.at/cnfuzzdd/
Dataset Splits	No	The paper uses a benchmark set of pre-computed runtimes but does not specify any explicit training, validation, or test dataset splits.
Hardware Specification	No	The paper mentions 'CPU time in days' for experimental runtime but does not provide specific hardware details such as CPU/GPU models, memory, or other system specifications.
Software Dependencies	No	The paper mentions the 'minisat' SAT solver used to generate the dataset but does not list specific software dependencies with version numbers required to replicate the experiments.
Experiment Setup	No	The paper describes the benchmark data and comparisons made, but does not provide specific hyperparameters or system-level training settings for SPC within its experimental setup.