Policy-Based Primal-Dual Methods for Convex Constrained Markov Decision Processes

Authors: Donghao Ying, Mengzi Amy Guo, Yuhao Ding, Javad Lavaei, Zuo-Jun Shen

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we validate algorithm (11) in a feasibility constrained MDP problem (cf. Example 2.2). The experiment is performed on the single agent version of Open AI Particle environment (Lowe et al. 2017) as illustrated in Figure 1a.
Researcher Affiliation Academia UC Berkeley, Department of Industrial Engineering and Operations Research {donghaoy, mengzi_guo, yuhao_ding, lavaei, maxshen}@berkeley.edu
Pseudocode Yes In Appendix A.1, we provide a sample-based pseudocode for algorithm (11).
Open Source Code No No explicit statement about providing open-source code or a link to a repository was found.
Open Datasets Yes The experiment is performed on the single agent version of Open AI Particle environment (Lowe et al. 2017) as illustrated in Figure 1a.
Dataset Splits No The paper describes the environment and task, but does not provide specific details on training, validation, or test dataset splits or percentages.
Hardware Specification No No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were mentioned.
Software Dependencies No The paper mentions 'a two-layer fully-connected neural network' and 'REINFORCE-based method', but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes The policy is parameterized by a two-layer fully-connected neural network with 64 neurons in each layer and Re LU activations. We estimate the policy gradient θL(θ,µ) through the REINFORCE-based method (Zhang et al. 2021) with n = 10 and K = 25 (see Algorithm 1 in Appendix A.1). The feasibility constraint has a threshold of d0 = 0.2.