reproducibilityindex.ai

Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs

Authors: Dongsheng Ding, Chen-Yu Wei, Kaiqing Zhang, Alejandro Ribeiro

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further validate the merits and the effectiveness of our methods in computational experiments. ... We further exhibit the merits and the effectiveness of our methods in experiments. ... 5 Computational Experiment
Researcher Affiliation	Academia	Dongsheng Ding University of Pennsylvania dongshed@seas.upenn.edu Chen-Yu Wei University of Virginia chenyu.wei@virginia.edu Kaiqing Zhang University of Maryland, College Park kaiqing@umd.edu Alejandro Ribeiro University of Pennsylvania aribeiro@seas.upenn.edu
Pseudocode	Yes	Algorithm 1 Sample-based inexact RPG-PD algorithm with log-linear policy parametrization ... Algorithm 2 Unbiased estimate Q ... Algorithm 3 Unbiased estimate V
Open Source Code	No	The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	Our experiment is a tabular constrained MDP with a randomly generated transition kernel, a discount factor γ = 0.9, uniform rewards r [0, 1] and utilities g [−1, 1], and a uniform initial state distribution ρ.
Dataset Splits	No	The paper does not provide specific details on train/validation/test dataset splits. It describes generating a synthetic MDP environment but not data partitioning for machine learning models.
Hardware Specification	Yes	All the experiments were conducted on an Apple Mac Book Pro 2017 laptop equipped with a 2.3 GHz Dual-Core Intel Core i5 in Jupyter Notebook.
Software Dependencies	No	The paper mentions 'Jupyter Notebook' but does not provide specific version numbers for it or any other software dependencies, which is required for reproducibility.
Experiment Setup	Yes	In this experiment, we use the same stepsize η = 0.1 for all methods, the regularization parameter τ = 0.08 for RPG-PD, and the uniform initial distribution ρ.