Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
Authors: Kyungjae Lee, Hongjun Yang, Sungbin Lim, Songhwai Oh
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In simulation, the proposed estimator shows favorable performance compared to existing robust estimators for various p values and, for MAB problems, the proposed perturbation strategy outperforms existing exploration methods. |
| Researcher Affiliation | Academia | Kyungjae Lee Department of Electrical and Computer Engineering Seoul National University kyungjae.lee@rllab.snu.ac.kr Hongjun Yang Artificial Intelligence Graduate School UNIST hj42@unist.ac.kr Sungbin Lim Artificial Intelligence Graduate School UNIST sungbin@unist.ac.kr Songhwai Oh Department of Electrical and Computer Engineering Seoul National University songhwai@snu.ac.kr |
| Pseudocode | Yes | Algorithm 1 Adaptively Perturbed Exploration with a p-robust estimator (APE2) |
| Open Source Code | No | The paper does not provide an explicit statement or a link for open-source code availability. |
| Open Datasets | No | The paper describes synthesizing data for its experiments: 'To make a heavy-tailed noise, we employ a Pareto random variable zt with parameters αϵ and λϵ. Then, a noise is defined as ϵt := zt E[zt] to make the mean of the noise zero.' It does not refer to a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper mentions synthesizing data for experiments and uses terms like 'averaged error over 60 runs' or 'averaged error over 40 runs' but does not specify training, validation, or test dataset splits in terms of percentages or counts, nor does it refer to predefined splits from known datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | For APE2, we employ the optimal hyperparameter of perturbations shown in Table 1. Note that GEV with ζ = 0 is a Gumbel distribution and Gamma with α = 1 (or Weibull with k = 1) is an Exponential distribution and λ of Gumbel and Exponential is set to be one. Thus, we compare four perturbations: Gumbel, Exponential, Pareto, and Fréchet. For APE2 and DSEE, the best hyperparameter is found by using a grid search. For robust UCB, since the original robust UCB consistently shows poor performance, we modify the confidence bound by multiplying a scale parameter c and optimize c using a grid search. All hyperparameters can be found in the supplementary material. |