reproducibilityindex.ai

Policy-Based Primal-Dual Methods for Convex Constrained Markov Decision Processes

Authors: Donghao Ying, Mengzi Amy Guo, Yuhao Ding, Javad Lavaei, Zuo-Jun Shen

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we validate algorithm (11) in a feasibility constrained MDP problem (cf. Example 2.2). The experiment is performed on the single agent version of Open AI Particle environment (Lowe et al. 2017) as illustrated in Figure 1a.
Researcher Affiliation	Academia	UC Berkeley, Department of Industrial Engineering and Operations Research {donghaoy, mengzi_guo, yuhao_ding, lavaei, maxshen}@berkeley.edu
Pseudocode	Yes	In Appendix A.1, we provide a sample-based pseudocode for algorithm (11).
Open Source Code	No	No explicit statement about providing open-source code or a link to a repository was found.
Open Datasets	Yes	The experiment is performed on the single agent version of Open AI Particle environment (Lowe et al. 2017) as illustrated in Figure 1a.
Dataset Splits	No	The paper describes the environment and task, but does not provide specific details on training, validation, or test dataset splits or percentages.
Hardware Specification	No	No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were mentioned.
Software Dependencies	No	The paper mentions 'a two-layer fully-connected neural network' and 'REINFORCE-based method', but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup	Yes	The policy is parameterized by a two-layer fully-connected neural network with 64 neurons in each layer and Re LU activations. We estimate the policy gradient θL(θ,µ) through the REINFORCE-based method (Zhang et al. 2021) with n = 10 and K = 25 (see Algorithm 1 in Appendix A.1). The feasibility constraint has a threshold of d0 = 0.2.