The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation

Authors: Zhe Feng, David Parkes, Haifeng Xu

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide the results of simulations to validate our theoretical results. ... We run each bandit algorithm for T = 104 rounds, and this forms one trial. We repeat for 100 trials, and report the average results over these trials.
Researcher Affiliation Academia 1Harvard University, Cambridge, Massachusetts, USA 2University of Virginia, Charlottesville, Virginia, USA.
Pseudocode No The paper describes the bandit algorithms (UCB, ε-Greedy, and Thompson Sampling) textually and with mathematical formulas, but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that open-source code for the described methodology is available.
Open Datasets No The paper uses synthetic data generated from normal distributions with specified means and standard deviation (e.g., 'reward distributions N(µ1, σ2), N(µ2, σ2) and N(µ3, σ2), respectively. We fix µ1 = 5, µ2 = 8, µ3 = 10, and σ = 1'), but does not provide concrete access information to a publicly available or open dataset.
Dataset Splits No The paper describes a simulation setup for bandit algorithms with synthetic reward distributions over a fixed number of rounds and trials, but it does not specify any training, validation, or test dataset splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiments.
Experiment Setup Yes Throughout the simulations, we fix µ1 = 5, µ2 = 8, µ3 = 10, and σ = 1. All the arms use the LSI strategy. We run each bandit algorithm for T = 104 rounds, and this forms one trial. We repeat for 100 trials, and report the average results over these trials. In the ε-Greedy algorithm, we set εt = min{1, 4 t }.