reproducibilityindex.ai

Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards

Authors: Kyungjae Lee, Hongjun Yang, Sungbin Lim, Songhwai Oh

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In simulation, the proposed estimator shows favorable performance compared to existing robust estimators for various p values and, for MAB problems, the proposed perturbation strategy outperforms existing exploration methods.
Researcher Affiliation	Academia	Kyungjae Lee Department of Electrical and Computer Engineering Seoul National University kyungjae.lee@rllab.snu.ac.kr Hongjun Yang Artiﬁcial Intelligence Graduate School UNIST hj42@unist.ac.kr Sungbin Lim Artiﬁcial Intelligence Graduate School UNIST sungbin@unist.ac.kr Songhwai Oh Department of Electrical and Computer Engineering Seoul National University songhwai@snu.ac.kr
Pseudocode	Yes	Algorithm 1 Adaptively Perturbed Exploration with a p-robust estimator (APE2)
Open Source Code	No	The paper does not provide an explicit statement or a link for open-source code availability.
Open Datasets	No	The paper describes synthesizing data for its experiments: 'To make a heavy-tailed noise, we employ a Pareto random variable zt with parameters αϵ and λϵ. Then, a noise is deﬁned as ϵt := zt E[zt] to make the mean of the noise zero.' It does not refer to a publicly available dataset with concrete access information.
Dataset Splits	No	The paper mentions synthesizing data for experiments and uses terms like 'averaged error over 60 runs' or 'averaged error over 40 runs' but does not specify training, validation, or test dataset splits in terms of percentages or counts, nor does it refer to predefined splits from known datasets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependency details with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	For APE2, we employ the optimal hyperparameter of perturbations shown in Table 1. Note that GEV with ζ = 0 is a Gumbel distribution and Gamma with α = 1 (or Weibull with k = 1) is an Exponential distribution and λ of Gumbel and Exponential is set to be one. Thus, we compare four perturbations: Gumbel, Exponential, Pareto, and Fréchet. For APE2 and DSEE, the best hyperparameter is found by using a grid search. For robust UCB, since the original robust UCB consistently shows poor performance, we modify the conﬁdence bound by multiplying a scale parameter c and optimize c using a grid search. All hyperparameters can be found in the supplementary material.