Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback
Authors: Siwei Wang, Haoyun Wang, Longbo Huang10210-10217
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also conduct simulations based on real-world dataset. The results show that our algorithms outperform existing benchmarks. |
| Researcher Affiliation | Academia | Siwei Wang1, Haoyun Wang2, Longbo Huang1 1Tsinghua University 2Georgia Institute of Technology |
| Pseudocode | Yes | Algorithm 1 Adaptive Round-Size UCB (ARS-UCB); Algorithm 2 Adaptive Round-Size EXP3 (ARS-EXP3) |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-sourcing the code for their algorithms or methods. It references an arXiv preprint of the paper itself. |
| Open Datasets | Yes | Here we use two datasets in Kaggle: the Outbrain Click Prediction (Outbrain) dataset4, and the Coupon Purchase Prediction (Coupon) dataset5. 4https://www.kaggle.com/c/outbrain-click-prediction 5https://www.kaggle.com/c/coupon-purchase-prediction |
| Dataset Splits | No | The paper mentions using datasets for experiments but does not provide specific train/validation/test splits (e.g., percentages or sample counts) or references to predefined splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies or their version numbers used in the experiments. |
| Experiment Setup | Yes | Theorem 1. Suppose α > 4 and the function f satisfies (i) f is increasing, and (ii) k0 such that k > k0, F(k) f(k + 1). For example, f(k) = ckβ satisfies the two properties with integers c ≥ 1 and β ≥ 1. Another example is f(k) = 2k+c with c ≥ 0 (here we need to set f(1) = 22+c specifically). As for α, when the delay measures d1, d2 are large and T is small, a larger α can reduce the regret. On the other hand, when d1, d2 are small but T is large, a smaller α behaves better. Set γ = min{1, r(e − 1)((β+1)T )−1 β+1 } and g(k) = kβ. In particular, if β = 1/2, Algorithm 2 achieves a regret upper bound O((d + (N log N)1/2)T2/3). From these results, we can see that the combination of α = 4 and f(k) = k2 behaves better in most of these problem instances, and leads to both small regrets and small variances. |