Policy-Based Primal-Dual Methods for Convex Constrained Markov Decision Processes
Authors: Donghao Ying, Mengzi Amy Guo, Yuhao Ding, Javad Lavaei, Zuo-Jun Shen
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we validate algorithm (11) in a feasibility constrained MDP problem (cf. Example 2.2). The experiment is performed on the single agent version of Open AI Particle environment (Lowe et al. 2017) as illustrated in Figure 1a. |
| Researcher Affiliation | Academia | UC Berkeley, Department of Industrial Engineering and Operations Research {donghaoy, mengzi_guo, yuhao_ding, lavaei, maxshen}@berkeley.edu |
| Pseudocode | Yes | In Appendix A.1, we provide a sample-based pseudocode for algorithm (11). |
| Open Source Code | No | No explicit statement about providing open-source code or a link to a repository was found. |
| Open Datasets | Yes | The experiment is performed on the single agent version of Open AI Particle environment (Lowe et al. 2017) as illustrated in Figure 1a. |
| Dataset Splits | No | The paper describes the environment and task, but does not provide specific details on training, validation, or test dataset splits or percentages. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were mentioned. |
| Software Dependencies | No | The paper mentions 'a two-layer fully-connected neural network' and 'REINFORCE-based method', but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | The policy is parameterized by a two-layer fully-connected neural network with 64 neurons in each layer and Re LU activations. We estimate the policy gradient θL(θ,µ) through the REINFORCE-based method (Zhang et al. 2021) with n = 10 and K = 25 (see Algorithm 1 in Appendix A.1). The feasibility constraint has a threshold of d0 = 0.2. |