Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Authors: Young Hun Jung, Ambuj Tewari

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also present empirical results that support our theoretical findings.
Researcher Affiliation Academia Young Hun Jung Department of Statistics University of Michigan yhjung@umich.edu Ambuj Tewari Department of Statistics University of Michigan tewaria@umich.edu
Pseudocode Yes Algorithm 1 Thompson sampling in restless bandits
Open Source Code Yes Our code is available at https://github.com/yhjung88/Thompson Samplingin Restless Bandits
Open Datasets No The paper describes using Monte Carlo simulation and a uniform prior distribution over a finite support for parameters, but does not refer to a specific, named public dataset with access information. It discusses a Gilbert-Elliott channel model which is studied by Liu and Zhao [2010], but doesn't provide access to the 'dataset' itself, rather the model for simulation.
Dataset Splits No The paper does not explicitly mention training/test/validation splits for any dataset, as it uses simulations rather than pre-existing datasets with defined splits.
Hardware Specification No The paper does not specify any hardware used for running experiments.
Software Dependencies No The paper does not mention any specific software dependencies with version numbers.
Experiment Setup Yes We fix L = 50 and m = 30. We use Monte Carlo simulation with size 100 or greater to approximate expectations. As each arm has two parameters, there are 2K parameters. For these, we set the prior distribution to be uniform over a finite support {0.1, 0.2, , 0.9}.