Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret
Authors: Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang, Qiaomin Xie
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose two provably efficient model-free algorithms, Risk-Sensitive Value Iteration (RSVI) and Risk-Sensitive Q-learning (RSQ). These algorithms implement a form of risk-sensitive optimism in the face of uncertainty, which adapts to both riskseeking and risk-averse modes of exploration. We prove that RSVI attains an O λ(|β|H2) H3S2AT regret, while RSQ attains an O λ(|β|H2) regret, where λ(u) = (e3u 1)/u for u > 0. ... On the flip side, we establish a regret lower bound showing that the exponential dependence on |β| and H is unavoidable for any algorithm with an O(T) regret (even when the risk objective is on the same scale as the original reward), thus certifying the near-optimality of the proposed algorithms. |
| Researcher Affiliation | Academia | 1 Northwestern University; EMAIL, EMAIL 2 Princeton University; EMAIL 3 Cornell University; EMAIL |
| Pseudocode | Yes | Algorithm 1 RSVI Input: number of episodes K Z>0, confidence level δ (0, 1], and risk parameter β = 0 |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | This is a theoretical paper that presents algorithms and their regret analysis; it does not involve empirical training on a dataset. |
| Dataset Splits | No | This is a theoretical paper that presents algorithms and their regret analysis; it does not involve empirical validation on a dataset, and therefore no dataset splits are provided. |
| Hardware Specification | No | This is a theoretical paper that focuses on algorithm design and analysis, and therefore, it does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | This is a theoretical paper focused on algorithms and proofs; it does not describe specific software dependencies with version numbers used for implementation or experiments. |
| Experiment Setup | No | This is a theoretical paper that presents algorithms and their regret analysis; it does not describe an empirical experimental setup with hyperparameters or training configurations. |