Provably adaptive reinforcement learning in metric spaces

Authors: Tongyi Cao, Akshay Krishnamurthy

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We provide a refined analysis of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the zooming dimension of the instance. This parameter, which originates in the bandit literature, captures the size of the subsets of near optimal actions and is always smaller than the covering dimension used in previous analyses. As such, our results are the first provably adaptive guarantees for reinforcement learning in metric spaces.
Researcher Affiliation Collaboration Tongyi Cao University of Massachusetts Amherst tcao@cs.umass.edu Akshay Krishnamurthy Microsoft Research NYC akshaykr@microsoft.com
Pseudocode Yes Algorithm 1 Adaptive Q learning
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets No The paper describes a theoretical analysis of an algorithm in a reinforcement learning setting but does not mention the use of any specific, publicly available datasets for training or experimentation.
Dataset Splits No The paper describes a theoretical analysis and does not involve experimental validation with dataset splits.
Hardware Specification No The paper is theoretical and does not describe any experimental setup, thus no hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not mention specific software dependencies with version numbers for reproducibility.
Experiment Setup No The paper describes a theoretical analysis and does not include details about an experimental setup, such as hyperparameters or system-level training settings.