Projection-Based Constrained Policy Optimization
Authors: Tsung-Yen Yang, Justinian Rosca, Karthik Narasimhan, Peter J. Ramadge
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results over several control tasks demonstrate that PCPO achieves superior performance, averaging more than 3.5 times less constraint violation and around 15% higher reward compared to state-of-the-art methods. |
| Researcher Affiliation | Collaboration | Tsung-Yen Yang Princeton University ty3@princeton.edu Justinian Rosca Siemens Corporation, Corporate Technology justinian.rosca@siemens.com Karthik Narasimhan Princeton University karthikn@princeton.edu Peter J. Ramadge Princeton University ramadge@princeton.edu |
| Pseudocode | Yes | Algorithm 1 Projection-Based Constrained Policy Optimization (PCPO) |
| Open Source Code | Yes | For code see the project website: https://sites.google.com/view/iclr2020-pcpo |
| Open Datasets | Yes | The first two tasks Gather and Circle are Mujoco environments with safety constraints introduced by Achiam et al. (2017) and two traffic management tasks with fairness constraints introduced by Vinitsky et al. (2018). |
| Dataset Splits | No | The paper mentions batch sizes and rollout lengths but does not explicitly provide training/validation/test dataset splits in the conventional sense for reproducibility. |
| Hardware Specification | No | No specific hardware (e.g., GPU models, CPU types, or cloud instance names) used for experiments is mentioned in the paper. |
| Software Dependencies | No | The experiments are implemented in rllab (Duan et al., 2016), a tool for developing and evaluating RL algorithms. No specific version numbers for rllab or other software dependencies are provided. |
| Experiment Setup | Yes | The hyperparameters of each task for all algorithms are as follows (PC: point circle, PG: point gather, AC: ant circle, AG: ant gather, Gr: grid, and BN: bottleneck tasks): Parameter PC PG AC AG Gr BN discount factor γ 0.995 0.995 0.995 0.995 0.999 0.999 step size δ 10 4 10 4 10 4 10 4 10 4 10 4 λGAE R 0.95 0.95 0.95 0.95 0.97 0.97 λGAE C 1.0 1.0 0.5 0.5 0.5 1.0 Batch size 50,000 50,000 100,000 100,000 10,000 25,000 Rollout length 50 15 500 500 400 500 Cost constraint threshold h 5 0.1 10 0.2 0 0 |