Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

Authors: Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang, Qiaomin Xie

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose two provably efficient model-free algorithms, Risk-Sensitive Value Iteration (RSVI) and Risk-Sensitive Q-learning (RSQ). These algorithms implement a form of risk-sensitive optimism in the face of uncertainty, which adapts to both riskseeking and risk-averse modes of exploration. We prove that RSVI attains an O λ(|β|H2) H3S2AT regret, while RSQ attains an O λ(|β|H2) regret, where λ(u) = (e3u 1)/u for u > 0. ... On the flip side, we establish a regret lower bound showing that the exponential dependence on |β| and H is unavoidable for any algorithm with an O(T) regret (even when the risk objective is on the same scale as the original reward), thus certifying the near-optimality of the proposed algorithms.
Researcher Affiliation Academia 1 Northwestern University; EMAIL, EMAIL 2 Princeton University; EMAIL 3 Cornell University; EMAIL
Pseudocode Yes Algorithm 1 RSVI Input: number of episodes K Z>0, confidence level δ (0, 1], and risk parameter β = 0
Open Source Code No The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets No This is a theoretical paper that presents algorithms and their regret analysis; it does not involve empirical training on a dataset.
Dataset Splits No This is a theoretical paper that presents algorithms and their regret analysis; it does not involve empirical validation on a dataset, and therefore no dataset splits are provided.
Hardware Specification No This is a theoretical paper that focuses on algorithm design and analysis, and therefore, it does not describe any specific hardware used for running experiments.
Software Dependencies No This is a theoretical paper focused on algorithms and proofs; it does not describe specific software dependencies with version numbers used for implementation or experiments.
Experiment Setup No This is a theoretical paper that presents algorithms and their regret analysis; it does not describe an empirical experimental setup with hyperparameters or training configurations.