reproducibilityindex.ai

Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach

Authors: Yingjie Fei, Zhuoran Yang, Zhaoran Wang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study function approximation for episodic reinforcement learning with entropic risk measure. We first propose an algorithm with linear function approximation. Compared to existing algorithms, which suffer from improper regularization and regression biases, this algorithm features debiasing transformations in backward induction and regression procedures. We further propose an algorithm with general function approximation, which is shown to perform implicit debiasing transformations. We prove that both algorithms achieve a sublinear regret and demonstrate a tradeoff between generality and efficiency. Our analysis provides a unified framework for function approximation in risk-sensitive reinforcement learning, which leads to the first sub-linear regret bounds in the setting.
Researcher Affiliation	Academia	1Northwestern University, Evanston, Illinois, USA 2Princeton University, Princeton, New Jersey, USA.
Pseudocode	Yes	Algorithm 1 Meta RSVI Algorithm 2 RSVI-L Algorithm 3 RSVI-G
Open Source Code	No	The paper does not contain any statement about releasing source code or provide a link to a code repository.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and regret bounds for reinforcement learning. It defines an episodic MDP and makes assumptions about function approximation, but it does not specify or use any particular publicly available dataset for training or evaluation.
Dataset Splits	No	The paper is theoretical and does not conduct experiments on datasets, thus it does not specify training, validation, or test dataset splits.
Hardware Specification	No	The paper is theoretical and does not report on experiments that would require specific hardware. No hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and focuses on algorithm design and analysis. It does not mention any software dependencies with version numbers required for implementation or experimentation.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup with hyperparameters or system-level training settings.