Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
Authors: Kyungjae Lee, Hongjun Yang, Sungbin Lim, Songhwai Oh
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In simulation, the proposed estimator shows favorable performance compared to existing robust estimators for various p values and, for MAB problems, the proposed perturbation strategy outperforms existing exploration methods. |
| Researcher Affiliation | Academia | Kyungjae Lee Department of Electrical and Computer Engineering Seoul National University EMAIL Hongjun Yang Artificial Intelligence Graduate School UNIST EMAIL Sungbin Lim Artificial Intelligence Graduate School UNIST EMAIL Songhwai Oh Department of Electrical and Computer Engineering Seoul National University EMAIL |
| Pseudocode | Yes | Algorithm 1 Adaptively Perturbed Exploration with a p-robust estimator (APE2) |
| Open Source Code | No | The paper does not provide an explicit statement or a link for open-source code availability. |
| Open Datasets | No | The paper describes synthesizing data for its experiments: 'To make a heavy-tailed noise, we employ a Pareto random variable zt with parameters αϵ and λϵ. Then, a noise is defined as ϵt := zt E[zt] to make the mean of the noise zero.' It does not refer to a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper mentions synthesizing data for experiments and uses terms like 'averaged error over 60 runs' or 'averaged error over 40 runs' but does not specify training, validation, or test dataset splits in terms of percentages or counts, nor does it refer to predefined splits from known datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | For APE2, we employ the optimal hyperparameter of perturbations shown in Table 1. Note that GEV with ζ = 0 is a Gumbel distribution and Gamma with α = 1 (or Weibull with k = 1) is an Exponential distribution and λ of Gumbel and Exponential is set to be one. Thus, we compare four perturbations: Gumbel, Exponential, Pareto, and Fréchet. For APE2 and DSEE, the best hyperparameter is found by using a grid search. For robust UCB, since the original robust UCB consistently shows poor performance, we modify the confidence bound by multiplying a scale parameter c and optimize c using a grid search. All hyperparameters can be found in the supplementary material. |