Perturbed-History Exploration in Stochastic Multi-Armed Bandits
Authors: Branislav Kveton, Csaba Szepesvári, Mohammad Ghavamzadeh, Craig Boutilier
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically evaluate PHE and show that it is competitive with state-of-the-art baselines. |
| Researcher Affiliation | Collaboration | Branislav Kveton1 , Csaba Szepesv ari2,3 , Mohammad Ghavamzadeh4 and Craig Boutilier1 1Google Research 2Deep Mind 3 University of Alberta 4Facebook AI Research |
| Pseudocode | Yes | Algorithm 1 Perturbed-history exploration in a multi-armed bandit with [0, 1] rewards |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | No | The paper describes generating '100 randomly chosen problems in each class' (Bernoulli and Beta bandit problems) and references 'Kveton et al. [2019b]' for the problem classes, but does not provide access information for a specific, pre-existing public dataset. |
| Dataset Splits | No | The paper describes an online learning problem (multi-armed bandits) and measures regret over 'n' rounds, which means there is no traditional train/validation/test dataset split. |
| Hardware Specification | No | The paper provides run times in its experimental section but does not specify any hardware details such as GPU/CPU models or other system specifications. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers for any libraries, frameworks, or programming languages used in the experiments. |
| Experiment Setup | Yes | We experiment with three settings of perturbation scales a in PHE: 2.1, 1.1, and 0.5. ... We experiment with 100 randomly chosen problems in each class. Each problem has K = 10 arms and the mean rewards of these arms are chosen uniformly at random from interval [0.25, 0.75]. The horizon is n = 10000 rounds. |