Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits
Authors: Huasen Wu, Xueying Guo, Xin Liu
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose and study opportunistic bandits a new variant of bandits where the regret of pulling a suboptimal arm varies under different environmental conditions, such as network load or produce price. When the load/price is low, so is the cost/regret of pulling a suboptimal arm (e.g., trying a suboptimal network configuration). Therefore, intuitively, we could explore more when the load/price is low and exploit more when the load/price is high. Inspired by this intuition, we propose an Adaptive Upper-Confidence-Bound (Ada UCB) algorithm to adaptively balance the exploration-exploitation tradeoff for opportunistic bandits. We prove that Ada UCB achieves O(log T) regret with a smaller coefficient than the traditional UCB algorithm. Furthermore, Ada UCB achieves O(1) regret with respect to T if the exploration cost is zero when the load level is below a certain threshold. Last, based on both synthetic data and real-world traces, experimental results show that Ada UCB significantly outperforms other bandit algorithms, such as UCB and TS (Thompson Sampling), under large load/price fluctuations. |
| Researcher Affiliation | Collaboration | 1Twitter Inc., San Francisco, California, USA; 2University of California, Davis, California, USA. |
| Pseudocode | Yes | Algorithm 1 Ada UCB |
| Open Source Code | No | The paper does not provide any specific link or statement about releasing its own source code for the methodology described. |
| Open Datasets | Yes | We use experiment data from Speedometer (Speedometer, https://storage.cloud.google.com/speedometer) and another anonymous operator to conduct the evaluation. |
| Dataset Splits | No | The paper mentions using synthetic and real-world traces for evaluation but does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers required for reproducibility. |
| Experiment Setup | Yes | In both Ada UCB and UCB(α), we set α as α = 0.51, which is close to 1/2 and performs better than a larger α. We consider a 5-armed bandit with Bernoulli rewards, where the expected reward vector is [0.05, 0.1, 0.15, 0.2, 0.25]. ... we choose l( ) and l(+) such that P{Lt l( )} = P{Lt l(+)} = 0.05. |