Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Balancing Performance and Costs in Best Arm Identification

Authors: Michael Harding, Kirthevasan Kandasamy

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then demonstrate the performance of DBCARE on a number of simulated models, comparing to fixed budget and confidence algorithms to show the shortfalls of existing BAI paradigms on this problem.
Researcher Affiliation Academia Michael O. Harding Department of Statistics University of Wisconsin-Madison EMAIL Kirthevasan Kandasamy Department of Computer Science University of Wisconsin-Madison EMAIL
Pseudocode Yes Algorithm 1 Dynamically Budgeted Cost-Adapted Risk-minimizing Elimination
Open Source Code No Answer: [No] Justification: We do not provide access to the data and code.
Open Datasets Yes We present the results of a real data experiment on a drug discovery dataset. For this experiment, we take the results from Table 2 of Genovese et al. [19]
Dataset Splits No Results are averaged across 10^5 runs each with different random seeds.
Hardware Specification Yes All experiments were performed using a 3.7GHz AMD Ryzen 9 5900X 12-Core processor with 24 GB of memory.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes We study the performance across a range of suboptimality gaps for Gaussian and Bernoulli rewards in the two-arm setting using the cost c = 10^-4. In the Gaussian setting, the arms have variance σ^2 = 1 with means ε/2, for ε ∈ [0.05, 2]; for Bernoulli arms, the means are 0.5 ± ε/2, for ε ∈ [0.01, 0.95]. Results are averaged across 10^5 runs each with different random seeds. We compare to Sequential Halving for fixed budget and elimination procedures using the optimized stopping rules of [30] for fixed confidence. We use budgets T = 10 and T = 500 and confidences of δ = 0.1 and δ = 0.01 for comparison against relatively low and high confidence/budget choices.