reproducibilityindex.ai

No-Regret Algorithms for Heavy-Tailed Linear Bandits

Authors: Andres Munoz Medina, Scott Yang

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also present empirical results showing that our algorithms achieve a better performance than the current state of the art for bounded noise when the L1 bound on the noise is large yet the 1+ moment of the noise is small. 6. Experiments We now present empirical results showing that the truncation algorithm beneﬁts from a better regret than the vanilla linear bandit algorithm of (Abbasi-Yadkori et al., 2011).
Researcher Affiliation	Collaboration	Andres Munoz Medina AMMEDINA@GOOGLE.COM Google Research, 111 8th Av, New York, NY 10011 Scott Yang YANGS@CIMS.NYU.EDU Courant Institute, 251 Mercer Street, New York, NY 10012
Pseudocode	Yes	Algorithm 1 Conﬁdence Region, Algorithm 2 Estimate by Truncation, Algorithm 3 Mini-Batch Conﬁdence Region, Algorithm 4 Median of Means (Mo M)
Open Source Code	No	The paper does not provide any explicit statement about releasing source code or include a link to a code repository.
Open Datasets	No	The paper describes generating synthetic data for experiments ('Our experimental setup is as follows: we let d = 50 and µ = 1 pn1 2 Rn...'), but it does not use a publicly available or open dataset with access information (link, DOI, citation).
Dataset Splits	No	The paper does not specify training, validation, or test dataset splits. It describes a simulation setting with T=10^6 iterations and 20 replicas, but no explicit data partitioning.
Hardware Specification	No	The paper describes the parameters of the experimental setup and data generation, but it does not provide any specific hardware details such as CPU/GPU models, memory, or cloud resources used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., specific programming languages, libraries, or frameworks and their versions).
Experiment Setup	Yes	Our experimental setup is as follows: we let d = 50 and µ = 1 pn1 2 Rn, where 1 is a vector with all entries set to 1. For every x 2 B1 the reward function is given by x 7! µ>x + , where is a random variable taking values γ with probability 1 γ2 and 1 γ with probability γ2 where γ = 1 p 40T . Figure 1(a) shows the mean regret over 20 replicas of the same experiment, ... for T = 10^6.