Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems
Authors: Young Hun Jung, Ambuj Tewari
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also present empirical results that support our theoretical findings. |
| Researcher Affiliation | Academia | Young Hun Jung Department of Statistics University of Michigan yhjung@umich.edu Ambuj Tewari Department of Statistics University of Michigan tewaria@umich.edu |
| Pseudocode | Yes | Algorithm 1 Thompson sampling in restless bandits |
| Open Source Code | Yes | Our code is available at https://github.com/yhjung88/Thompson Samplingin Restless Bandits |
| Open Datasets | No | The paper describes using Monte Carlo simulation and a uniform prior distribution over a finite support for parameters, but does not refer to a specific, named public dataset with access information. It discusses a Gilbert-Elliott channel model which is studied by Liu and Zhao [2010], but doesn't provide access to the 'dataset' itself, rather the model for simulation. |
| Dataset Splits | No | The paper does not explicitly mention training/test/validation splits for any dataset, as it uses simulations rather than pre-existing datasets with defined splits. |
| Hardware Specification | No | The paper does not specify any hardware used for running experiments. |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers. |
| Experiment Setup | Yes | We fix L = 50 and m = 30. We use Monte Carlo simulation with size 100 or greater to approximate expectations. As each arm has two parameters, there are 2K parameters. For these, we set the prior distribution to be uniform over a finite support {0.1, 0.2, , 0.9}. |