Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Optimal Batched Best Arm Identification
Authors: Tianyuan Jin, Yu Yang, Jing Tang, Xiaokui Xiao, Pan Xu
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also conduct numerical experiments to compare our proposed algorithms with the optimal sequential algorithm Track-and-Stop [17], and the batched algorithm Top-k δ-Elimination [22] on various problem instances. |
| Researcher Affiliation | Academia | Tianyuan Jin1, Yu Yang3, Jing Tang2, Xiaokui Xiao1, Pan Xu3 1National University of Singapore 2The Hong Kong University of Science and Technology (Guangzhou) 3Duke University EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Three-Batch Best Arm Identification (Tri-BBAI) |
| Open Source Code | Yes | The implementation of this work can be found at https://github.com/panxulab/Optimal-Batched-Best-Arm-Identification |
| Open Datasets | No | For all experiments in this section, we set the number of arms n = 10, where each arm has Bernoulli reward distribution with mean µi for i [10]. More specifically, the mean rewards are generated by the following two cases. Uniform: The best arm has µ1 = 0.5, and the mean rewards of the rest of the arms follow uniform distribution over [0.2, 0.4], i.e., µi is uniformly generated from [0.2, 0.4] for i [n] {1}. Normal: The best arm has µ1 = 0.6, and the mean rewards of the rest of the arms are first generated from normal distribution N(0.2, 0.2) and then projected to the interval [0, 0.4]. |
| Dataset Splits | No | The paper describes how the reward distributions for the bandit arms are generated for experiments but does not specify any explicit training, validation, or test dataset splits (e.g., 80/10/10 split or specific sample counts). |
| Hardware Specification | Yes | We perform all computations in Python on R9 5900HX for all our experiments. |
| Software Dependencies | No | The paper mentions performing computations 'in Python' but does not specify the Python version or any other software dependencies with their respective version numbers (e.g., specific libraries or frameworks). |
| Experiment Setup | Yes | The hyperparameters of all methods are chosen as follows... For Tri-BBAI and Opt-BBAI, we set α = 1.0017, and ϵ = 0.01. We use the same β(t) function for Chernoff s stopping condition as in Track-and-Stop. Moreover, for the lengths of the batches, we set L1, L2 and L3 to be the value calculated by Theorem 3.1. |