Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning
Authors: Ahmadreza Moradipari, Mohammad Pedramfar, Modjtaba Shokrian Zini, Vaneet Aggarwal
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we prove the first Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. |
| Researcher Affiliation | Collaboration | Toyota Motor North America, Info Tech Labs, Mountain View, CA, USA, ahmadreza.moradipari@toyota.com Purdue University, West Lafayette, IN, USA, mpedramf@purdue.edu modjtaba.shokrianzini@gmail.com Purdue University, West Lafayette, IN, USA, vaneet@purdue.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described, nor does it explicitly state that code is being released or made available. |
| Open Datasets | No | The paper does not provide concrete access information for a publicly available or open dataset, as it is a theoretical paper and does not report on experiments. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce data partitioning, as it is a theoretical paper and does not report on experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running experiments, as it is a theoretical paper and does not report on experimental setups. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate experiments, as it is a theoretical paper. |
| Experiment Setup | No | The paper does not contain specific experimental setup details (concrete hyperparameter values, training configurations, or system-level settings) in the main text, as it is a theoretical paper and does not describe experiments. |