Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits
Authors: Huasen Wu, Xueying Guo, Xin Liu
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose and study opportunistic bandits a new variant of bandits where the regret of pulling a suboptimal arm varies under different environmental conditions, such as network load or produce price. When the load/price is low, so is the cost/regret of pulling a suboptimal arm (e.g., trying a suboptimal network configuration). Therefore, intuitively, we could explore more when the load/price is low and exploit more when the load/price is high. Inspired by this intuition, we propose an Adaptive Upper-Confidence-Bound (Ada UCB) algorithm to adaptively balance the exploration-exploitation tradeoff for opportunistic bandits. We prove that Ada UCB achieves O(log T) regret with a smaller coefficient than the traditional UCB algorithm. Furthermore, Ada UCB achieves O(1) regret with respect to T if the exploration cost is zero when the load level is below a certain threshold. Last, based on both synthetic data and real-world traces, experimental results show that Ada UCB significantly outperforms other bandit algorithms, such as UCB and TS (Thompson Sampling), under large load/price fluctuations. |
| Researcher Affiliation | Collaboration | 1Twitter Inc., San Francisco, California, USA; 2University of California, Davis, California, USA. |
| Pseudocode | Yes | Algorithm 1 Ada UCB |
| Open Source Code | No | The paper does not provide any specific link or statement about releasing its own source code for the methodology described. |
| Open Datasets | Yes | We use experiment data from Speedometer (Speedometer, https://storage.cloud.google.com/speedometer) and another anonymous operator to conduct the evaluation. |
| Dataset Splits | No | The paper mentions using synthetic and real-world traces for evaluation but does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers required for reproducibility. |
| Experiment Setup | Yes | In both Ada UCB and UCB(α), we set α as α = 0.51, which is close to 1/2 and performs better than a larger α. We consider a 5-armed bandit with Bernoulli rewards, where the expected reward vector is [0.05, 0.1, 0.15, 0.2, 0.25]. ... we choose l( ) and l(+) such that P{Lt l( )} = P{Lt l(+)} = 0.05. |