Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret
Authors: Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang, Qiaomin Xie
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose two provably efficient model-free algorithms, Risk-Sensitive Value Iteration (RSVI) and Risk-Sensitive Q-learning (RSQ). These algorithms implement a form of risk-sensitive optimism in the face of uncertainty, which adapts to both riskseeking and risk-averse modes of exploration. We prove that RSVI attains an O λ(|β|H2) H3S2AT regret, while RSQ attains an O λ(|β|H2) regret, where λ(u) = (e3u 1)/u for u > 0. ... On the flip side, we establish a regret lower bound showing that the exponential dependence on |β| and H is unavoidable for any algorithm with an O(T) regret (even when the risk objective is on the same scale as the original reward), thus certifying the near-optimality of the proposed algorithms. |
| Researcher Affiliation | Academia | 1 Northwestern University; yf275@cornell.edu, zhaoranwang@gmail.com 2 Princeton University; zy6@princeton.edu 3 Cornell University; {yudong.chen, qiaomin.xie}@cornell.edu |
| Pseudocode | Yes | Algorithm 1 RSVI Input: number of episodes K Z>0, confidence level δ (0, 1], and risk parameter β = 0 |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | This is a theoretical paper that presents algorithms and their regret analysis; it does not involve empirical training on a dataset. |
| Dataset Splits | No | This is a theoretical paper that presents algorithms and their regret analysis; it does not involve empirical validation on a dataset, and therefore no dataset splits are provided. |
| Hardware Specification | No | This is a theoretical paper that focuses on algorithm design and analysis, and therefore, it does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | This is a theoretical paper focused on algorithms and proofs; it does not describe specific software dependencies with version numbers used for implementation or experiments. |
| Experiment Setup | No | This is a theoretical paper that presents algorithms and their regret analysis; it does not describe an empirical experimental setup with hyperparameters or training configurations. |