Provably adaptive reinforcement learning in metric spaces
Authors: Tongyi Cao, Akshay Krishnamurthy
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We provide a refined analysis of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the zooming dimension of the instance. This parameter, which originates in the bandit literature, captures the size of the subsets of near optimal actions and is always smaller than the covering dimension used in previous analyses. As such, our results are the first provably adaptive guarantees for reinforcement learning in metric spaces. |
| Researcher Affiliation | Collaboration | Tongyi Cao University of Massachusetts Amherst tcao@cs.umass.edu Akshay Krishnamurthy Microsoft Research NYC akshaykr@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Adaptive Q learning |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | No | The paper describes a theoretical analysis of an algorithm in a reinforcement learning setting but does not mention the use of any specific, publicly available datasets for training or experimentation. |
| Dataset Splits | No | The paper describes a theoretical analysis and does not involve experimental validation with dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup, thus no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers for reproducibility. |
| Experiment Setup | No | The paper describes a theoretical analysis and does not include details about an experimental setup, such as hyperparameters or system-level training settings. |