Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

Authors: Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang, Qiaomin Xie

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose two provably efficient model-free algorithms, Risk-Sensitive Value Iteration (RSVI) and Risk-Sensitive Q-learning (RSQ). These algorithms implement a form of risk-sensitive optimism in the face of uncertainty, which adapts to both riskseeking and risk-averse modes of exploration. We prove that RSVI attains an O λ(|β|H2) H3S2AT regret, while RSQ attains an O λ(|β|H2) regret, where λ(u) = (e3u 1)/u for u > 0. ... On the flip side, we establish a regret lower bound showing that the exponential dependence on |β| and H is unavoidable for any algorithm with an O(T) regret (even when the risk objective is on the same scale as the original reward), thus certifying the near-optimality of the proposed algorithms.
Researcher Affiliation Academia 1 Northwestern University; yf275@cornell.edu, zhaoranwang@gmail.com 2 Princeton University; zy6@princeton.edu 3 Cornell University; {yudong.chen, qiaomin.xie}@cornell.edu
Pseudocode Yes Algorithm 1 RSVI Input: number of episodes K Z>0, confidence level δ (0, 1], and risk parameter β = 0
Open Source Code No The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets No This is a theoretical paper that presents algorithms and their regret analysis; it does not involve empirical training on a dataset.
Dataset Splits No This is a theoretical paper that presents algorithms and their regret analysis; it does not involve empirical validation on a dataset, and therefore no dataset splits are provided.
Hardware Specification No This is a theoretical paper that focuses on algorithm design and analysis, and therefore, it does not describe any specific hardware used for running experiments.
Software Dependencies No This is a theoretical paper focused on algorithms and proofs; it does not describe specific software dependencies with version numbers used for implementation or experiments.
Experiment Setup No This is a theoretical paper that presents algorithms and their regret analysis; it does not describe an empirical experimental setup with hyperparameters or training configurations.