Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning
Authors: Yingjie Fei, Ruitu Xu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel deļ¬nition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to the underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds. |
| Researcher Affiliation | Collaboration | 1Bloomberg, New York, USA 2Department of Statistics and Data Science, Yale University, USA. |
| Pseudocode | Yes | Algorithm 1 RSVI2; Algorithm 2 RSQ2 |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that its source code is publicly available. |
| Open Datasets | No | The paper describes a theoretical framework for reinforcement learning and does not mention the use of any specific public or open dataset for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical data or dataset splits for validation or training. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any specific software dependencies or version numbers for running experiments. |
| Experiment Setup | No | The paper is theoretical and defines algorithms but does not provide experimental setup details like hyperparameter values or training configurations. |