Are sample means in multi-armed bandits positively or negatively biased?

Authors: Jaehyeok Shin, Aaditya Ramdas, Alessandro Rinaldo

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide examples of optimistic rules of each type, demonstrate that simulations confirm our theoretical predictions, and pose some natural but hard open problems. [...] In Section 4, we demonstrate the correctness of our theoretical predictions through simulations in a variety of practical situations. [...] 4 Numerical experiments
Researcher Affiliation Academia Department of Statistics and Data Science1 Machine Learning Department2 Carnegie Mellon University {shinjaehyeok, aramdas, arinaldo}@cmu.edu
Pseudocode No The paper describes algorithms like lil UCB using textual explanations and mathematical formulas (Section 4.3), but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement about releasing source code for the methodology described, nor does it provide any links to a code repository.
Open Datasets No The paper conducts simulations using 'unit-variance Gaussian arms' (e.g., Section 4.1) which are defined distributions for the purpose of the experiment, rather than external, publicly available datasets for which access information would be provided.
Dataset Splits No The paper describes simulation setups such as using 'three unit-variance Gaussian arms' and repeating trials (Section 4.1), but it does not specify any training, validation, or test dataset splits, as the experiments involve simulations rather than traditional model training on split datasets.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models, or memory specifications, used for running the experiments.
Software Dependencies No The paper mentions specific algorithms and references related works but does not provide details about the programming languages, software libraries, or their specific version numbers used for the simulations or implementations.
Experiment Setup Yes To demonstrate this, we conduct a simulation study in which we have three unit-variance Gaussian arms with µ1 = 1, µ2 = 2 and µ3 = 3. After sampling once from each arm, greedy, UCB and Thompson sampling are used to continue sampling until T = 200. We repeat the whole process from scratch 104 times for each algorithm to get an accurate estimate for the bias. [...] We choose M = 200, w = 10 and α = 0.1. As before, we repeat each experiment 104 times for each setting. [...] We set 3 unit-variance Gaussian arms with means (µ1, µ2, µ3) = (g, 0, g) for each gap parameter g = 1, 3, 5. We conduct 104 trials of the lil UCB algorithm with a valid choice of parameters described in Jamieson et al. [2014, Section 5].