reproducibilityindex.ai

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

Authors: Siwei Wang, Haoyun Wang, Longbo Huang10210-10217

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also conduct simulations based on real-world dataset. The results show that our algorithms outperform existing benchmarks.
Researcher Affiliation	Academia	Siwei Wang1, Haoyun Wang2, Longbo Huang1 1Tsinghua University 2Georgia Institute of Technology
Pseudocode	Yes	Algorithm 1 Adaptive Round-Size UCB (ARS-UCB); Algorithm 2 Adaptive Round-Size EXP3 (ARS-EXP3)
Open Source Code	No	The paper does not provide any explicit statement or link for open-sourcing the code for their algorithms or methods. It references an arXiv preprint of the paper itself.
Open Datasets	Yes	Here we use two datasets in Kaggle: the Outbrain Click Prediction (Outbrain) dataset4, and the Coupon Purchase Prediction (Coupon) dataset5. 4https://www.kaggle.com/c/outbrain-click-prediction 5https://www.kaggle.com/c/coupon-purchase-prediction
Dataset Splits	No	The paper mentions using datasets for experiments but does not provide specific train/validation/test splits (e.g., percentages or sample counts) or references to predefined splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper does not list any specific software dependencies or their version numbers used in the experiments.
Experiment Setup	Yes	Theorem 1. Suppose α > 4 and the function f satisﬁes (i) f is increasing, and (ii) k0 such that k > k0, F(k) f(k + 1). For example, f(k) = ckβ satisﬁes the two properties with integers c ≥ 1 and β ≥ 1. Another example is f(k) = 2k+c with c ≥ 0 (here we need to set f(1) = 22+c speciﬁcally). As for α, when the delay measures d1, d2 are large and T is small, a larger α can reduce the regret. On the other hand, when d1, d2 are small but T is large, a smaller α behaves better. Set γ = min{1, r(e − 1)((β+1)T )−1 β+1 } and g(k) = kβ. In particular, if β = 1/2, Algorithm 2 achieves a regret upper bound O((d + (N log N)1/2)T2/3). From these results, we can see that the combination of α = 4 and f(k) = k2 behaves better in most of these problem instances, and leads to both small regrets and small variances.