reproducibilityindex.ai

Formalizing Preferences Over Runtime Distributions

Authors: Devon R. Graham, Kevin Leyton-Brown, Tim Roughgarden

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper aims to lay theoretical foundations for such choices by formalizing preferences over runtime distributions. ... Finally, in Section 5 we present some real-world examples where the choice of utility function really is important and changes our conclusions about which algorithm is considered best." and later in Section 5: "Algorithm Configuration. We considered a dataset due to Weisz et al. (2018) which evaluated 972 randomly-sampled configurations of the minisat (Sorensson & Een, 2005) SAT solver... Our results (Figure 3) show that these differences were significant in practice: we often lost a substantial fraction of the available utility when we optimized for the wrong utility function. International SAT Competition. Figure 4 shows the ranking of the Parallel Track of the 2021 International SAT Competition.
Researcher Affiliation	Collaboration	1Department of Computer Science, University of British Columbia, Vancouver, BC 2Department of Computer Science, Columbia University, New York, New York 3a16z crypto. Correspondence to: Devon R. Graham <drgraham@cs.ubc.ca>.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code to reproduce all figures can be found at https://github.com/drgrhm/formalizing-preferences
Open Datasets	Yes	We considered a dataset due to Weisz et al. (2018) which evaluated 972 randomly-sampled configurations of the minisat (Sorensson & Een, 2005) SAT solver on 20118 instances generated by CNFuzz DD.
Dataset Splits	No	The paper mentions evaluating configurations on '20118 instances generated by CNFuzz DD' but does not specify any training, validation, or test splits for these instances.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies	No	The paper mentions the 'minisat SAT solver' and 'CNFuzz DD' but does not provide specific version numbers for these or any other software dependencies used in the experiments.
Experiment Setup	No	The paper mentions evaluating 'randomly-sampled configurations' and analyzing results from the SAT Competition, but it does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level settings used for its own analysis.