A One-Size-Fits-All Solution to Conservative Bandit Problems
Authors: Yihan Du, Siwei Wang, Longbo Huang7254-7261
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments for the considered problems. The results match our theoretical bounds and demonstrate that our algorithms achieve the performance superiority compared to existing algorithms. |
| Researcher Affiliation | Academia | Yihan Du,1 Siwei Wang,1 Longbo Huang1 1 Tsinghua University |
| Pseudocode | Yes | Algorithm 1: General Solution to Conservative Bandits (Gen CB). Algorithm 2: MV-CUCB. |
| Open Source Code | No | The paper does not provide a direct link to the source code or explicitly state that the code for the described methodology is open-source. The URL in the reference section points to the paper's arXiv preprint. |
| Open Datasets | No | In all experiments, we assume the rewards to take i.i.d. Bernoulli values. For CMAB, we set K {24, 72, 144}, α {0.05, 0.1, 0.15}, µ0 = 0.7 and µ1, . . . , µK as an arithmetic sequence from 0.8 to 0.2. For CLB and CCCB, we set d {5, 7, 9}, α {0.01, 0.02, 0.03}, K = 2d and f(A, w ) = P e A w e. For MV-CBP, we use the same parameter settings as CMAB and additionally set ρ {10, 30, 60}. This indicates that the data is simulated, not from a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes how synthetic data is generated for simulations, but does not mention specific training, validation, or testing splits of any fixed dataset. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions various algorithms (e.g., UCB, Lin UCB, C2UCB) that are integrated into Gen CB, but it does not specify any software dependencies (e.g., programming languages, libraries, frameworks) with version numbers. |
| Experiment Setup | Yes | In all experiments, we assume the rewards to take i.i.d. Bernoulli values. For CMAB, we set K {24, 72, 144}, α {0.05, 0.1, 0.15}, µ0 = 0.7 and µ1, . . . , µK as an arithmetic sequence from 0.8 to 0.2. For CLB and CCCB, we set d {5, 7, 9}, α {0.01, 0.02, 0.03}, K = 2d and f(A, w ) = P e A w e. For MV-CBP, we use the same parameter settings as CMAB and additionally set ρ {10, 30, 60}. For each algorithm, we perform 50 independent runs and present the average (middle curve), maximum (upper curve) and minimum (bottom curve) cumulative regrets across runs. |