Near-Optimal Sample Complexity Bounds for Constrained MDPs

Authors: Sharan Vaswani, Lin Yang, Csaba Szepesvari

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We resolve this question by providing minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP with access to a generative model (simulator). In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint. For (i), we prove that our algorithm returns an ε-optimal policy... Finally, we prove a matching lower-bound for the strict feasibility setting, thus obtaining the first near minimax optimal bounds for discounted CMDPs.
Researcher Affiliation Collaboration Sharan Vaswani Simon Fraser University vaswani.sharan@gmail.com Lin F. Yang University of California, Los Angeles linyang@ee.ucla.edu Csaba Szepesvári Amii, University of Alberta, Deep Mind szepesva@ualberta.ca
Pseudocode Yes Algorithm 1: Model-based algorithm for CMDPs with generative model
Open Source Code No The paper is theoretical and focuses on mathematical bounds and proofs. It does not mention providing access to source code for the methodology described. The author checklist indicates 'N/A' for experiment-related questions, including code availability.
Open Datasets No The paper assumes access to a 'generative model (simulator)' to obtain samples from a distribution P(.|s, a), which is a theoretical construct for sampling. It does not mention or provide access information for any specific public or open dataset used for training in an empirical study.
Dataset Splits No The paper is theoretical and does not involve empirical experiments with datasets. Therefore, it does not specify any training/validation/test dataset splits. The author checklist for training details (3b) is marked N/A.
Hardware Specification No The paper is theoretical and does not report on empirical experiments. Consequently, there are no specifications of hardware used for running experiments. The author checklist regarding compute resources (3d) is marked N/A.
Software Dependencies No The paper is theoretical and describes algorithms and proofs, not empirical experiments that would require specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe empirical experiments. Therefore, it does not provide specific experimental setup details such as hyperparameters or system-level training settings. The author checklist for training details (3b) is marked N/A.