reproducibilityindex.ai

Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits

Authors: Huasen Wu, Xueying Guo, Xin Liu

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we propose and study opportunistic bandits a new variant of bandits where the regret of pulling a suboptimal arm varies under different environmental conditions, such as network load or produce price. When the load/price is low, so is the cost/regret of pulling a suboptimal arm (e.g., trying a suboptimal network conﬁguration). Therefore, intuitively, we could explore more when the load/price is low and exploit more when the load/price is high. Inspired by this intuition, we propose an Adaptive Upper-Conﬁdence-Bound (Ada UCB) algorithm to adaptively balance the exploration-exploitation tradeoff for opportunistic bandits. We prove that Ada UCB achieves O(log T) regret with a smaller coefﬁcient than the traditional UCB algorithm. Furthermore, Ada UCB achieves O(1) regret with respect to T if the exploration cost is zero when the load level is below a certain threshold. Last, based on both synthetic data and real-world traces, experimental results show that Ada UCB signiﬁcantly outperforms other bandit algorithms, such as UCB and TS (Thompson Sampling), under large load/price ﬂuctuations.
Researcher Affiliation	Collaboration	1Twitter Inc., San Francisco, California, USA; 2University of California, Davis, California, USA.
Pseudocode	Yes	Algorithm 1 Ada UCB
Open Source Code	No	The paper does not provide any specific link or statement about releasing its own source code for the methodology described.
Open Datasets	Yes	We use experiment data from Speedometer (Speedometer, https://storage.cloud.google.com/speedometer) and another anonymous operator to conduct the evaluation.
Dataset Splits	No	The paper mentions using synthetic and real-world traces for evaluation but does not specify any training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	The paper does not list specific software dependencies with version numbers required for reproducibility.
Experiment Setup	Yes	In both Ada UCB and UCB(α), we set α as α = 0.51, which is close to 1/2 and performs better than a larger α. We consider a 5-armed bandit with Bernoulli rewards, where the expected reward vector is [0.05, 0.1, 0.15, 0.2, 0.25]. ... we choose l( ) and l(+) such that P{Lt l( )} = P{Lt l(+)} = 0.05.