The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation
Authors: Zhe Feng, David Parkes, Haifeng Xu
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide the results of simulations to validate our theoretical results. ... We run each bandit algorithm for T = 104 rounds, and this forms one trial. We repeat for 100 trials, and report the average results over these trials. |
| Researcher Affiliation | Academia | 1Harvard University, Cambridge, Massachusetts, USA 2University of Virginia, Charlottesville, Virginia, USA. |
| Pseudocode | No | The paper describes the bandit algorithms (UCB, ε-Greedy, and Thompson Sampling) textually and with mathematical formulas, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that open-source code for the described methodology is available. |
| Open Datasets | No | The paper uses synthetic data generated from normal distributions with specified means and standard deviation (e.g., 'reward distributions N(µ1, σ2), N(µ2, σ2) and N(µ3, σ2), respectively. We fix µ1 = 5, µ2 = 8, µ3 = 10, and σ = 1'), but does not provide concrete access information to a publicly available or open dataset. |
| Dataset Splits | No | The paper describes a simulation setup for bandit algorithms with synthetic reward distributions over a fixed number of rounds and trials, but it does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiments. |
| Experiment Setup | Yes | Throughout the simulations, we fix µ1 = 5, µ2 = 8, µ3 = 10, and σ = 1. All the arms use the LSI strategy. We run each bandit algorithm for T = 104 rounds, and this forms one trial. We repeat for 100 trials, and report the average results over these trials. In the ε-Greedy algorithm, we set εt = min{1, 4 t }. |