reproducibilityindex.ai

Near-Optimal Sample Complexity Bounds for Constrained MDPs

Authors: Sharan Vaswani, Lin Yang, Csaba Szepesvari

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We resolve this question by providing minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP with access to a generative model (simulator). In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint. For (i), we prove that our algorithm returns an ε-optimal policy... Finally, we prove a matching lower-bound for the strict feasibility setting, thus obtaining the ﬁrst near minimax optimal bounds for discounted CMDPs.
Researcher Affiliation	Collaboration	Sharan Vaswani Simon Fraser University vaswani.sharan@gmail.com Lin F. Yang University of California, Los Angeles linyang@ee.ucla.edu Csaba Szepesvári Amii, University of Alberta, Deep Mind szepesva@ualberta.ca
Pseudocode	Yes	Algorithm 1: Model-based algorithm for CMDPs with generative model
Open Source Code	No	The paper is theoretical and focuses on mathematical bounds and proofs. It does not mention providing access to source code for the methodology described. The author checklist indicates 'N/A' for experiment-related questions, including code availability.
Open Datasets	No	The paper assumes access to a 'generative model (simulator)' to obtain samples from a distribution P(.\|s, a), which is a theoretical construct for sampling. It does not mention or provide access information for any specific public or open dataset used for training in an empirical study.
Dataset Splits	No	The paper is theoretical and does not involve empirical experiments with datasets. Therefore, it does not specify any training/validation/test dataset splits. The author checklist for training details (3b) is marked N/A.
Hardware Specification	No	The paper is theoretical and does not report on empirical experiments. Consequently, there are no specifications of hardware used for running experiments. The author checklist regarding compute resources (3d) is marked N/A.
Software Dependencies	No	The paper is theoretical and describes algorithms and proofs, not empirical experiments that would require specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe empirical experiments. Therefore, it does not provide specific experimental setup details such as hyperparameters or system-level training settings. The author checklist for training details (3b) is marked N/A.