No-Regret Algorithms for Heavy-Tailed Linear Bandits
Authors: Andres Munoz Medina, Scott Yang
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also present empirical results showing that our algorithms achieve a better performance than the current state of the art for bounded noise when the L1 bound on the noise is large yet the 1+ moment of the noise is small. 6. Experiments We now present empirical results showing that the truncation algorithm benefits from a better regret than the vanilla linear bandit algorithm of (Abbasi-Yadkori et al., 2011). |
| Researcher Affiliation | Collaboration | Andres Munoz Medina AMMEDINA@GOOGLE.COM Google Research, 111 8th Av, New York, NY 10011 Scott Yang YANGS@CIMS.NYU.EDU Courant Institute, 251 Mercer Street, New York, NY 10012 |
| Pseudocode | Yes | Algorithm 1 Confidence Region, Algorithm 2 Estimate by Truncation, Algorithm 3 Mini-Batch Confidence Region, Algorithm 4 Median of Means (Mo M) |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or include a link to a code repository. |
| Open Datasets | No | The paper describes generating synthetic data for experiments ('Our experimental setup is as follows: we let d = 50 and µ = 1 pn1 2 Rn...'), but it does not use a publicly available or open dataset with access information (link, DOI, citation). |
| Dataset Splits | No | The paper does not specify training, validation, or test dataset splits. It describes a simulation setting with T=10^6 iterations and 20 replicas, but no explicit data partitioning. |
| Hardware Specification | No | The paper describes the parameters of the experimental setup and data generation, but it does not provide any specific hardware details such as CPU/GPU models, memory, or cloud resources used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., specific programming languages, libraries, or frameworks and their versions). |
| Experiment Setup | Yes | Our experimental setup is as follows: we let d = 50 and µ = 1 pn1 2 Rn, where 1 is a vector with all entries set to 1. For every x 2 B1 the reward function is given by x 7! µ>x + , where is a random variable taking values γ with probability 1 γ2 and 1 γ with probability γ2 where γ = 1 p 40T . Figure 1(a) shows the mean regret over 20 replicas of the same experiment, ... for T = 10^6. |