reproducibilityindex.ai

A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints

Authors: Krishna C. Kalagarla, Rahul Jain, Pierluigi Nuzzo8030-8037

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose an online algorithm which leverages the linear programming formulation of repeated optimistic planning for ﬁnite-horizon CMDP to provide a probably approximately correctness (PAC) guarantee on the number of episodes needed to ensure a near optimal policy, i.e., with resulting objective value close to that of the optimal value and satisfying the constraints within low tolerance, with high probability. The number of episodes needed is shown to have linear dependence on the sizes of the state and action spaces and quadratic dependence on the time horizon and an upper bound on the number of possible successor states for a state-action pair. Therefore, if the upper bound on the number of possible successor states is much smaller than the size of the state space, the number of needed episodes becomes linear in the sizes of the state and action spaces and quadratic in the time horizon. In this paper, we present one of the ﬁrst online algorithms with PAC guarantees for episodic constrained MDPs with unknown transition probabilities.
Researcher Affiliation	Academia	Krishna C. Kalagarla, Rahul Jain, Pierluigi Nuzzo Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles {kalagarl,rahul.jain,nuzzo}@usc.edu
Pseudocode	Yes	Algorithm 1 UC-CFH: Upper-Conﬁdence Constrained Fixed-Horizon Episodic Reinforcement Learning Algorithm
Open Source Code	No	The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets	No	This paper is theoretical and does not use or refer to any specific datasets for training or evaluation.
Dataset Splits	No	This paper is theoretical and does not involve data splits for training, validation, or testing.
Hardware Specification	No	This paper is theoretical and does not report on experimental setup or hardware specifications.
Software Dependencies	No	The paper describes theoretical algorithms and does not specify software dependencies with version numbers.
Experiment Setup	No	This paper is theoretical and does not report on experimental setup details or hyperparameters.