No-Regret Algorithms for Heavy-Tailed Linear Bandits

Authors: Andres Munoz Medina, Scott Yang

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also present empirical results showing that our algorithms achieve a better performance than the current state of the art for bounded noise when the L1 bound on the noise is large yet the 1+ moment of the noise is small. 6. Experiments We now present empirical results showing that the truncation algorithm benefits from a better regret than the vanilla linear bandit algorithm of (Abbasi-Yadkori et al., 2011).
Researcher Affiliation Collaboration Andres Munoz Medina AMMEDINA@GOOGLE.COM Google Research, 111 8th Av, New York, NY 10011 Scott Yang YANGS@CIMS.NYU.EDU Courant Institute, 251 Mercer Street, New York, NY 10012
Pseudocode Yes Algorithm 1 Confidence Region, Algorithm 2 Estimate by Truncation, Algorithm 3 Mini-Batch Confidence Region, Algorithm 4 Median of Means (Mo M)
Open Source Code No The paper does not provide any explicit statement about releasing source code or include a link to a code repository.
Open Datasets No The paper describes generating synthetic data for experiments ('Our experimental setup is as follows: we let d = 50 and µ = 1 pn1 2 Rn...'), but it does not use a publicly available or open dataset with access information (link, DOI, citation).
Dataset Splits No The paper does not specify training, validation, or test dataset splits. It describes a simulation setting with T=10^6 iterations and 20 replicas, but no explicit data partitioning.
Hardware Specification No The paper describes the parameters of the experimental setup and data generation, but it does not provide any specific hardware details such as CPU/GPU models, memory, or cloud resources used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., specific programming languages, libraries, or frameworks and their versions).
Experiment Setup Yes Our experimental setup is as follows: we let d = 50 and µ = 1 pn1 2 Rn, where 1 is a vector with all entries set to 1. For every x 2 B1 the reward function is given by x 7! µ>x + , where is a random variable taking values γ with probability 1 γ2 and 1 γ with probability γ2 where γ = 1 p 40T . Figure 1(a) shows the mean regret over 20 replicas of the same experiment, ... for T = 10^6.