Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards

Authors: Bo Xue, Yimu Wang, Yuanyu Wan, Jinfeng Yi, Lijun Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experimental results confirm the merits of our algorithms. This section demonstrates the improvement of our algorithms by numerical experiments.
Researcher Affiliation Collaboration 1Department of Computer Science, City University of Hong Kong, Hong Kong, China 2The City University of Hong Kong Shenzhen Research Institute, Shenzhen, China 3Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada 4School of Software Technology, Zhejiang University, Ningbo, China 5JD AI Research, Beijing, China 6National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 7Peng Cheng Laboratory, Shenzhen, China
Pseudocode Yes Algorithm 1 Confidence Region with Truncated Mean (CRTM)
Open Source Code No The paper does not provide any explicit statements about open-sourcing the code for the described methods or a link to a code repository.
Open Datasets No The paper describes synthetic data generation based on specific distributions ("Student s t-Noise" and "Pareto Noise") and parameters, but does not refer to or provide access information for a pre-existing, publicly available dataset.
Dataset Splits No The paper describes a sequential decision-making process (bandit problem) where data is generated online, and thus does not include explicit descriptions of training, validation, or test dataset splits in the conventional sense of static datasets.
Hardware Specification Yes All algorithms are implemented using Py Charm 2022 and tested on a laptop with a 2.5GHz CPU and 32GB of memory.
Software Dependencies Yes All algorithms are implemented using Py Charm 2022
Experiment Setup Yes All algorithms are configured with ϵ = 1, δ = 0.01, and T = 106. The number of arms is set to K = 20, and the feature dimension is d = 10. We run 10 repetitions for each algorithm and display the average regret with time evolution.