Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes
Authors: Dongsheng Ding, Kaiqing Zhang, Tamer Basar, Mihailo Jovanovic
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also provide computational results to demonstrate merits of our approach. To verify our convergence theory, we provide computational results by simulating the algorithm (8) and its sample-based version: Algorithm 2, for a finite CMDP with random initializations. |
| Researcher Affiliation | Academia | Dongsheng Ding ECE University of Southern California dongshed@usc.edu Kaiqing Zhang ECE and CSL University of Illinois at Urbana-Champaign kzhang66@illinois.edu Tamer Ba sar ECE and CSL University of Illinois at Urbana-Champaign basar1@illinois.edu Mihailo R. Jovanovi c ECE University of Southern California mihailo@usc.edu |
| Pseudocode | Yes | We display our algorithm as Algorithm 1 in Appendix F. We describe our algorithm as Algorithm 2 in Appendix H and show its sample complexity; |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | No | The paper uses randomly generated CMDPs for its computational results and does not specify training, validation, or test dataset splits. To verify our convergence theory, we provide computational results by simulating the algorithm (8) and its sample-based version: Algorithm 2, for a finite CMDP with random initializations. |
| Dataset Splits | No | The paper uses randomly generated CMDPs for its computational results and does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper does not specify version numbers for any software dependencies or libraries used in the implementation or experiments. |
| Experiment Setup | Yes | In this experiment, we have randomly generated a CMDP with |S| = 20, |A| = 10, γ = 0.8, and b = 3, and chosen: η1 = η2 = 1. (from Figure 1 caption) and In this experiment, we have randomly generated a CMDP with |S| = 20, |A| = 10, γ = 0.8, and b = 3, and chosen: η1 = η2 = 1 for Algorithm 2, and γ = 1, K = 100, and L = 10 for the dual Descent. (from Figure 2 caption) |