Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provably adaptive reinforcement learning in metric spaces

Authors: Tongyi Cao, Akshay Krishnamurthy

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We provide a refined analysis of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the zooming dimension of the instance. This parameter, which originates in the bandit literature, captures the size of the subsets of near optimal actions and is always smaller than the covering dimension used in previous analyses. As such, our results are the first provably adaptive guarantees for reinforcement learning in metric spaces.
Researcher Affiliation Collaboration Tongyi Cao University of Massachusetts Amherst EMAIL Akshay Krishnamurthy Microsoft Research NYC EMAIL
Pseudocode Yes Algorithm 1 Adaptive Q learning
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets No The paper describes a theoretical analysis of an algorithm in a reinforcement learning setting but does not mention the use of any specific, publicly available datasets for training or experimentation.
Dataset Splits No The paper describes a theoretical analysis and does not involve experimental validation with dataset splits.
Hardware Specification No The paper is theoretical and does not describe any experimental setup, thus no hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not mention specific software dependencies with version numbers for reproducibility.
Experiment Setup No The paper describes a theoretical analysis and does not include details about an experimental setup, such as hyperparameters or system-level training settings.