Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions
Authors: Tal Lancewicki, Shahar Segal, Tomer Koren, Yishay Mansour
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiments We conducted a variety of synthetic experiments to support our theoretical findings. We provide additional experiments in the full version of the paper (Lancewicki et al., 2021). Fixed delays. In Fig. 1 we show the effect of different fixed delays on UCB and SE. We ran both algorithms with a confidence radius λt(i) = p2/nt(i), for K = 20 arms, each with Bernoulli rewards with mean uniform in [0.25, 0.75], under various fixed delays. Top plots show cumulative regret until T = 2 104. Bottom plot shows regret over increasing delays for T = 2 105. The results are averaged over 100 runs and intervals in both plots are 4 times the standard error. |
| Researcher Affiliation | Collaboration | 1Blavatnik School of Computer Science, Tel Aviv University, Israel 2Google Research, Tel Aviv. |
| Pseudocode | Yes | Protocol 1 MAB with stochastic delays... Procedure 2 Update-Parameters... Algorithm 3 UCB with Delays... Algorithm 4 Successive Elimination with Delays... Algorithm 5 Phased Successive Elimination (PSE)... Algorithm 6 Optimistic-Pessimistic Successive Elimination |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code for the described methodologies, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper describes synthetic experiments where data is generated (e.g., 'Bernoulli rewards with mean uniform in [0.25, 0.75]', 'Pareto distribution'). It does not use or reference any specific publicly available or open datasets with concrete access information (link, DOI, citation). |
| Dataset Splits | No | The paper conducts synthetic experiments where data is generated based on specified distributions and parameters. It does not mention or specify any explicit training, validation, or test dataset splits in the traditional sense, as the evaluation is based on simulated runs over a total number of rounds (T). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper does not provide specific software dependencies or their version numbers (e.g., programming languages, libraries, frameworks, or solvers) that were used to conduct the experiments. |
| Experiment Setup | Yes | Fixed delays. In Fig. 1 we show the effect of different fixed delays on UCB and SE. We ran both algorithms with a confidence radius λt(i) = p2/nt(i), for K = 20 arms, each with Bernoulli rewards with mean uniform in [0.25, 0.75], under various fixed delays... α-Pareto delays. We reproduce an experiment done by Gael et al. (2020)... For T = 3000 rounds and K = 2 arms, we ran sub-optimality gaps [0.04, 0.6]. The expected rewards are µ1 = 0.4 and µ2 = 0.4 + . The delay is sampled from Pareto distribution with α1 = 1 for arm 1 and α2 = 0.2 for arm 2... Packet-loss... We ran the algorithms for T = 2 104 rounds and K = 10 arms with randomized values of sub-optimality gaps between [0.15, 0.25]. The probability to observe the best arm is 0.1, and 1 for the sub-optimal arms... Reward-dependent case... We set T = 6 104 and K = 3 arms with random sub-optimality gaps of [0.15, 0.25]. The delay is biased with fixed delay of 5,000 rounds for reward 1 of the best arm and reward 0 of the sub-optimal arms. |