A One-Size-Fits-All Solution to Conservative Bandit Problems

Authors: Yihan Du, Siwei Wang, Longbo Huang7254-7261

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments for the considered problems. The results match our theoretical bounds and demonstrate that our algorithms achieve the performance superiority compared to existing algorithms.
Researcher Affiliation Academia Yihan Du,1 Siwei Wang,1 Longbo Huang1 1 Tsinghua University
Pseudocode Yes Algorithm 1: General Solution to Conservative Bandits (Gen CB). Algorithm 2: MV-CUCB.
Open Source Code No The paper does not provide a direct link to the source code or explicitly state that the code for the described methodology is open-source. The URL in the reference section points to the paper's arXiv preprint.
Open Datasets No In all experiments, we assume the rewards to take i.i.d. Bernoulli values. For CMAB, we set K {24, 72, 144}, α {0.05, 0.1, 0.15}, µ0 = 0.7 and µ1, . . . , µK as an arithmetic sequence from 0.8 to 0.2. For CLB and CCCB, we set d {5, 7, 9}, α {0.01, 0.02, 0.03}, K = 2d and f(A, w ) = P e A w e. For MV-CBP, we use the same parameter settings as CMAB and additionally set ρ {10, 30, 60}. This indicates that the data is simulated, not from a publicly available dataset with concrete access information.
Dataset Splits No The paper describes how synthetic data is generated for simulations, but does not mention specific training, validation, or testing splits of any fixed dataset.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies No The paper mentions various algorithms (e.g., UCB, Lin UCB, C2UCB) that are integrated into Gen CB, but it does not specify any software dependencies (e.g., programming languages, libraries, frameworks) with version numbers.
Experiment Setup Yes In all experiments, we assume the rewards to take i.i.d. Bernoulli values. For CMAB, we set K {24, 72, 144}, α {0.05, 0.1, 0.15}, µ0 = 0.7 and µ1, . . . , µK as an arithmetic sequence from 0.8 to 0.2. For CLB and CCCB, we set d {5, 7, 9}, α {0.01, 0.02, 0.03}, K = 2d and f(A, w ) = P e A w e. For MV-CBP, we use the same parameter settings as CMAB and additionally set ρ {10, 30, 60}. For each algorithm, we perform 50 independent runs and present the average (middle curve), maximum (upper curve) and minimum (bottom curve) cumulative regrets across runs.