Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach

Authors: Yingjie Fei, Zhuoran Yang, Zhaoran Wang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study function approximation for episodic reinforcement learning with entropic risk measure. We first propose an algorithm with linear function approximation. Compared to existing algorithms, which suffer from improper regularization and regression biases, this algorithm features debiasing transformations in backward induction and regression procedures. We further propose an algorithm with general function approximation, which is shown to perform implicit debiasing transformations. We prove that both algorithms achieve a sublinear regret and demonstrate a tradeoff between generality and efficiency. Our analysis provides a unified framework for function approximation in risk-sensitive reinforcement learning, which leads to the first sub-linear regret bounds in the setting.
Researcher Affiliation Academia 1Northwestern University, Evanston, Illinois, USA 2Princeton University, Princeton, New Jersey, USA.
Pseudocode Yes Algorithm 1 Meta RSVI Algorithm 2 RSVI-L Algorithm 3 RSVI-G
Open Source Code No The paper does not contain any statement about releasing source code or provide a link to a code repository.
Open Datasets No The paper is theoretical and focuses on algorithm design and regret bounds for reinforcement learning. It defines an episodic MDP and makes assumptions about function approximation, but it does not specify or use any particular publicly available dataset for training or evaluation.
Dataset Splits No The paper is theoretical and does not conduct experiments on datasets, thus it does not specify training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not report on experiments that would require specific hardware. No hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and focuses on algorithm design and analysis. It does not mention any software dependencies with version numbers required for implementation or experimentation.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or system-level training settings.