reproducibilityindex.ai

Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning

Authors: Yingjie Fei, Ruitu Xu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel deﬁnition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to the underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds.
Researcher Affiliation	Collaboration	1Bloomberg, New York, USA 2Department of Statistics and Data Science, Yale University, USA.
Pseudocode	Yes	Algorithm 1 RSVI2; Algorithm 2 RSQ2
Open Source Code	No	The paper does not provide any explicit statement or link indicating that its source code is publicly available.
Open Datasets	No	The paper describes a theoretical framework for reinforcement learning and does not mention the use of any specific public or open dataset for training or evaluation.
Dataset Splits	No	The paper is theoretical and does not involve empirical data or dataset splits for validation or training.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not mention any specific software dependencies or version numbers for running experiments.
Experiment Setup	No	The paper is theoretical and defines algorithms but does not provide experimental setup details like hyperparameter values or training configurations.