Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Batch Ensemble for Variance Dependent Regret in Stochastic Bandits

Authors: Asaf Cassel, Orin Levy, Yishay Mansour

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement our theoretical results by demonstrating the effectiveness of our algorithm on synthetic benchmarks. In this section, we compare the performance of our ensemble method to other provable methods such as UCB, UCB-V, UCB-KL, TS, and MARS, on synthetic Bernoulli, Gaussian, and beta MAB environments. In our experiments, Ensemble Adaptive refers to running Algorithm 1 with an any-time (adaptive) batch size that scales as log t, and Ensemble Adaptive Efficient refers to a variant that has a better running time. The performance criterion is the averaged pseudo-regret across all simulations, and we plot its 95% confidence set using the normalized empirical variance.
Researcher Affiliation Collaboration 1Blavatnik School of Computer Science, Tel Aviv University 2Google Research, Tel-Aviv EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Batch Ensemble for MAB 1: input: number of batches lt for all t 1. 2: initialize: pull counts n0,a = 0 for all a [K]. 3: for time step t = 1, 2, . . . do 4: calculate ˆµnt 1,a,a as in Eq. (2) with lt batches and choose at arg min a [K] ˆµnt 1,a,a. (3) 5: observe ℓ(nt,a),a and update nt,a = nt 1,a+1{at=a}. 6: end for
Open Source Code Yes Code in https://gist.github.com/asafcassel/17ef6b4cd849fb6f0be4180aac7ed646
Open Datasets No No specific public datasets are mentioned. The paper states: "on synthetic Bernoulli, Gaussian, and beta MAB environments." It describes the generation of these synthetic environments but does not provide concrete access information (link, DOI, repository, or citation) for specific datasets.
Dataset Splits No The paper does not provide specific training/test/validation dataset splits. It describes generating synthetic data for various test cases and running simulations, e.g., "In all test cases, we run 100 simulations each with T = 2000 steps." but this is not a traditional dataset split.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions).
Experiment Setup Yes Suppose we run Algorithm 1 with a fixed number of batches with l = (7/2) log(2T/δ). In all test cases, we run 100 simulations each with T = 2000 steps. The performance criterion is the averaged pseudo-regret across all simulations, and we plot its 95% confidence set using the normalized empirical variance. Test case 1: Bernoulli arms with clear, low-variance, best arm. The arms expectations tested are 0.001, 0.15, 0.2, 0.25, and 0.3 for each arm respectively. Test case 2: Bernoulli arms with high expected loss and low variance. In this test case, we examine the performance of the algorithms in a high-expected loss scenario where the means are 0.9, 0.91, 0.92, . . . , 0.99. Test case 3: Bernoulli arms with random means. In this experiment, in each one of the 100 simulations, we sampled 10 numbers uniformly from the interval [0, 1]. Test case 4: Gaussian arms with random means and variance 1. We also consider the case of 10 Gaussian arms with randomly chosen means in [0, 1] and standard deviation σ = 1. Test case 5: Beta-distributed arms with random means. Finally, we simulated 100 runs of MAB with 10 Beta arms that have randomly chosen means in [0, 1] and α in [0, 5].