Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

Authors: Siwei Wang, Haoyun Wang, Longbo Huang10210-10217

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also conduct simulations based on real-world dataset. The results show that our algorithms outperform existing benchmarks.
Researcher Affiliation Academia Siwei Wang1, Haoyun Wang2, Longbo Huang1 1Tsinghua University 2Georgia Institute of Technology
Pseudocode Yes Algorithm 1 Adaptive Round-Size UCB (ARS-UCB); Algorithm 2 Adaptive Round-Size EXP3 (ARS-EXP3)
Open Source Code No The paper does not provide any explicit statement or link for open-sourcing the code for their algorithms or methods. It references an arXiv preprint of the paper itself.
Open Datasets Yes Here we use two datasets in Kaggle: the Outbrain Click Prediction (Outbrain) dataset4, and the Coupon Purchase Prediction (Coupon) dataset5. 4https://www.kaggle.com/c/outbrain-click-prediction 5https://www.kaggle.com/c/coupon-purchase-prediction
Dataset Splits No The paper mentions using datasets for experiments but does not provide specific train/validation/test splits (e.g., percentages or sample counts) or references to predefined splits.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, or memory) used for running the experiments.
Software Dependencies No The paper does not list any specific software dependencies or their version numbers used in the experiments.
Experiment Setup Yes Theorem 1. Suppose α > 4 and the function f satisfies (i) f is increasing, and (ii) k0 such that k > k0, F(k) f(k + 1). For example, f(k) = ckβ satisfies the two properties with integers c ≥ 1 and β ≥ 1. Another example is f(k) = 2k+c with c ≥ 0 (here we need to set f(1) = 22+c specifically). As for α, when the delay measures d1, d2 are large and T is small, a larger α can reduce the regret. On the other hand, when d1, d2 are small but T is large, a smaller α behaves better. Set γ = min{1, r(e − 1)((β+1)T )−1 β+1 } and g(k) = kβ. In particular, if β = 1/2, Algorithm 2 achieves a regret upper bound O((d + (N log N)1/2)T2/3). From these results, we can see that the combination of α = 4 and f(k) = k2 behaves better in most of these problem instances, and leads to both small regrets and small variances.