Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs
Authors: Bo Xue, Guanghui Wang, Yimu Wang, Lijun Zhang
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct numerical experiments to demonstrate the effectiveness of our algorithms and the empirical results strongly support our theoretical guarantees. |
| Researcher Affiliation | Academia | National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China |
| Pseudocode | Yes | Algorithm 1 Basic algorithm through Median of Means (BMM); Algorithm 2 Basic algorithm through Truncation (BTC); Algorithm 3 Master Algorithm (Sup BMM and Sup BTC) |
| Open Source Code | No | The paper does not provide any statement about making its source code available or include a link to a code repository. |
| Open Datasets | No | The paper describes generating synthetic data for experiments ('Each element of the vector xt,a is sampled from the uniform distribution of [0, 1]') and adding different types of noise, but it does not use a publicly available or open dataset with access information. |
| Dataset Splits | No | The paper describes its experimental setup including noise types and parameter settings, but it does not specify any training/validation/test dataset splits or cross-validation methodology. |
| Hardware Specification | No | The paper does not specify any hardware used for running the experiments (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers for reproducing the experiments. |
| Experiment Setup | Yes | All algorithms parameters are set to ϵ = 1 and δ = 0.01. We adopt Mo M and CRT of Medina and Yang [2016], MENU and TOFU of Shao et al. [2018] as baselines for comparison. Let the feature dimension d = 10, the number of arms K = 20 and θ = 1/√d Rd, where 1 is an all-1 vector so that θ = 1. Each element of the vector xt,a is sampled from the uniform distribution of [0, 1], and then the vector is normalized to a unit vector ( xt,a = 1). According to the linear bandit model, the observed payoff is rt,a = x t,aθ + ηt where ηt is generated from the following two noises. (i) Student s t-Noise... (ii) Pareto Noise... We run 10 independent repetitions for each algorithm and display the average cumulative regret with time evolution. |