Improved Sleeping Bandits with Stochastic Action Sets and Adversarial Rewards
Authors: Aadirupa Saha, Pierre Gaillard, Michal Valko
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we present the empirical evaluation of our proposed algorithms (Sec. 3 and 4) comparing their performances with the two existing sleeping bandit algorithms that apply to our problem setting, i.e. for adversarial losses and stochastic availabilities. |
| Researcher Affiliation | Collaboration | 1Indian Institute of Science, Bangalore, India. 2Sierra Team, Inria, Paris, France. 3Deep Mind, Paris, France. |
| Pseudocode | Yes | Algorithm 1 Sleeping-EXP3 |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository link or explicit statement of code release) for its source code. |
| Open Datasets | No | The paper describes generating data for its experiments (e.g., 'We consider K = 20 and generate the probabilities of item availabilities {ai}i [K] independently and uniformly at random from the interval [0.3, 0.9].'), but it does not use a publicly available dataset or provide access information for the data generated for the experiments. |
| Dataset Splits | No | The paper mentions 'T = 5000 time steps' for the experimental runs but does not specify any explicit training, validation, or test dataset splits or splitting methodology. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory, cloud resources) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | The algorithm parameters η, λt, δ are set as defined in Thm7. In all cases, we report the cumulative regret of the algorithms for T = 5000 time steps, each averaged over 50 runs. We consider K = 20. |