Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs
Authors: Dongsheng Ding, Chen-Yu Wei, Kaiqing Zhang, Alejandro Ribeiro
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further validate the merits and the effectiveness of our methods in computational experiments. ... We further exhibit the merits and the effectiveness of our methods in experiments. ... 5 Computational Experiment |
| Researcher Affiliation | Academia | Dongsheng Ding University of Pennsylvania dongshed@seas.upenn.edu Chen-Yu Wei University of Virginia chenyu.wei@virginia.edu Kaiqing Zhang University of Maryland, College Park kaiqing@umd.edu Alejandro Ribeiro University of Pennsylvania aribeiro@seas.upenn.edu |
| Pseudocode | Yes | Algorithm 1 Sample-based inexact RPG-PD algorithm with log-linear policy parametrization ... Algorithm 2 Unbiased estimate Q ... Algorithm 3 Unbiased estimate V |
| Open Source Code | No | The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | Our experiment is a tabular constrained MDP with a randomly generated transition kernel, a discount factor γ = 0.9, uniform rewards r [0, 1] and utilities g [−1, 1], and a uniform initial state distribution ρ. |
| Dataset Splits | No | The paper does not provide specific details on train/validation/test dataset splits. It describes generating a synthetic MDP environment but not data partitioning for machine learning models. |
| Hardware Specification | Yes | All the experiments were conducted on an Apple Mac Book Pro 2017 laptop equipped with a 2.3 GHz Dual-Core Intel Core i5 in Jupyter Notebook. |
| Software Dependencies | No | The paper mentions 'Jupyter Notebook' but does not provide specific version numbers for it or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | In this experiment, we use the same stepsize η = 0.1 for all methods, the regularization parameter τ = 0.08 for RPG-PD, and the uniform initial distribution ρ. |