Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning

Authors: Yingjie Fei, Ruitu Xu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel definition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to the underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds.
Researcher Affiliation Collaboration 1Bloomberg, New York, USA 2Department of Statistics and Data Science, Yale University, USA.
Pseudocode Yes Algorithm 1 RSVI2; Algorithm 2 RSQ2
Open Source Code No The paper does not provide any explicit statement or link indicating that its source code is publicly available.
Open Datasets No The paper describes a theoretical framework for reinforcement learning and does not mention the use of any specific public or open dataset for training or evaluation.
Dataset Splits No The paper is theoretical and does not involve empirical data or dataset splits for validation or training.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and does not mention any specific software dependencies or version numbers for running experiments.
Experiment Setup No The paper is theoretical and defines algorithms but does not provide experimental setup details like hyperparameter values or training configurations.