Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning
Authors: Yingjie Fei, Ruitu Xu
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement learning based on the entropic risk measure. We propose a novel de๏ฌnition of sub-optimality gaps, which we call cascaded gaps, and we discuss their key components that adapt to the underlying structures of the problem. Based on the cascaded gaps, we derive non-asymptotic and logarithmic regret bounds for two model-free algorithms under episodic Markov decision processes. We show that, in appropriate settings, these bounds feature exponential improvement over existing ones that are independent of gaps. We also prove gap-dependent lower bounds, which certify the near optimality of the upper bounds. |
| Researcher Affiliation | Collaboration | 1Bloomberg, New York, USA 2Department of Statistics and Data Science, Yale University, USA. |
| Pseudocode | Yes | Algorithm 1 RSVI2; Algorithm 2 RSQ2 |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that its source code is publicly available. |
| Open Datasets | No | The paper describes a theoretical framework for reinforcement learning and does not mention the use of any specific public or open dataset for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical data or dataset splits for validation or training. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any specific software dependencies or version numbers for running experiments. |
| Experiment Setup | No | The paper is theoretical and defines algorithms but does not provide experimental setup details like hyperparameter values or training configurations. |