reproducibilityindex.ai

Regret Bounds for Risk-Sensitive Reinforcement Learning

Authors: Osbert Bastani, Jason Yecheng Ma, Estelle Shen, Wanqiao Xu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove the ﬁrst regret bounds for reinforcement learning under a general class of risk-sensitive objectives including the popular CVa R objective. Our theory is based on a novel characterization of the CVa R objective as well as a novel optimistic MDP construction. Figure 1: Results on the frozen lake environment. Left: Regret of our algorithm vs. UCBVI (with expected return) and a greedy exploration strategy. Right: Regret of our algorithm across different α values. We show mean and standard deviation across ﬁve random seeds.
Researcher Affiliation	Academia	Osbert Bastani University of Pennsylvania obastani@seas.upenn.edu Yecheng Jason Ma University of Pennsylvania jasonyma@seas.upenn.edu Estelle Shen University of Pennsylvania pixna@sas.upenn.edu Wanqiao Xu Stanford University wanqiaox@stanford.edu
Pseudocode	Yes	Algorithm 1 Upper Conﬁdence Bound Algorithm
Open Source Code	No	No statement or link regarding the availability of source code for the described methodology is provided in the paper.
Open Datasets	No	We consider a classic frozen lake problem with a ﬁnite horizon... The paper does not provide concrete access information (link, DOI, citation with authors/year) for a publicly available dataset specifically used for the frozen lake environment setup.
Dataset Splits	No	The paper operates in an episodic reinforcement learning setting and discusses the number of episodes (K) but does not specify dataset splits (e.g., train/validation/test percentages or counts) as typically found in supervised learning.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, cloud instances) used for running experiments are mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers are mentioned in the paper.
Experiment Setup	Yes	We consider a classic frozen lake problem with a ﬁnite horizon. The agent moves to a block next to its current state at each timestep t and has a slipping probability of 0.1 in its moving direction if the next state is an ice block... We use a map with four paths of the same lengths that have different rewards at the end and different levels of risk of falling into holes. We consider α {0.40, 0.33, 0.25, 0.01}.