Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Thompson Sampling on Symmetric Alpha-Stable Bandits
Authors: Abhimanyu Dubey, Alex `Sandy' Pentland
IJCAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove finite-time regret bounds for both algorithms, and demonstrate through a series of experiments the stronger performance of Thompson Sampling in this setting. |
| Researcher Affiliation | Academia | Abhimanyu Dubey and Alex Sandy Pentland Massachusetts Institute of Technology EMAIL |
| Pseudocode | Yes | Algorithm 1 Chambers-Mallows-Stuck Generation, Algorithm 2 α-Thompson Sampling, Algorithm 3 Robust α-Thompson Sampling |
| Open Source Code | No | The paper mentions a |
| Open Datasets | No | The paper conducts simulations and generates data for its experiments, rather than using a publicly available dataset with a specific link or citation. |
| Dataset Splits | No | The paper conducts simulations over a fixed number of iterations (T=5000 or T=15K) and evaluates regret over time, but it does not specify explicit training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not specify any particular hardware used for running experiments. It vaguely mentions |
| Software Dependencies | No | The paper does not provide specific software names with version numbers, such as programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | We run 100 MAB experiments each for all 5 benchmarks for α = 1.8 and α = 1.3, and K = 50 arms, and for each arm, the mean is drawn from [0, 2000] randomly for each experiment, and σ = 2500. Each experiment is run for T = 5000 iterations, and we report the regret averaged over time, i.e. R(t)/t at any time t. Also, |