reproducibilityindex.ai

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

Authors: Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang, Qiaomin Xie

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose two provably efﬁcient model-free algorithms, Risk-Sensitive Value Iteration (RSVI) and Risk-Sensitive Q-learning (RSQ). These algorithms implement a form of risk-sensitive optimism in the face of uncertainty, which adapts to both riskseeking and risk-averse modes of exploration. We prove that RSVI attains an O λ(\|β\|H2) H3S2AT regret, while RSQ attains an O λ(\|β\|H2) regret, where λ(u) = (e3u 1)/u for u > 0. ... On the ﬂip side, we establish a regret lower bound showing that the exponential dependence on \|β\| and H is unavoidable for any algorithm with an O(T) regret (even when the risk objective is on the same scale as the original reward), thus certifying the near-optimality of the proposed algorithms.
Researcher Affiliation	Academia	1 Northwestern University; yf275@cornell.edu, zhaoranwang@gmail.com 2 Princeton University; zy6@princeton.edu 3 Cornell University; {yudong.chen, qiaomin.xie}@cornell.edu
Pseudocode	Yes	Algorithm 1 RSVI Input: number of episodes K Z>0, conﬁdence level δ (0, 1], and risk parameter β = 0
Open Source Code	No	The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets	No	This is a theoretical paper that presents algorithms and their regret analysis; it does not involve empirical training on a dataset.
Dataset Splits	No	This is a theoretical paper that presents algorithms and their regret analysis; it does not involve empirical validation on a dataset, and therefore no dataset splits are provided.
Hardware Specification	No	This is a theoretical paper that focuses on algorithm design and analysis, and therefore, it does not describe any specific hardware used for running experiments.
Software Dependencies	No	This is a theoretical paper focused on algorithms and proofs; it does not describe specific software dependencies with version numbers used for implementation or experiments.
Experiment Setup	No	This is a theoretical paper that presents algorithms and their regret analysis; it does not describe an empirical experimental setup with hyperparameters or training configurations.