Projection-Based Constrained Policy Optimization

Authors: Tsung-Yen Yang, Justinian Rosca, Karthik Narasimhan, Peter J. Ramadge

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results over several control tasks demonstrate that PCPO achieves superior performance, averaging more than 3.5 times less constraint violation and around 15% higher reward compared to state-of-the-art methods.
Researcher Affiliation Collaboration Tsung-Yen Yang Princeton University ty3@princeton.edu Justinian Rosca Siemens Corporation, Corporate Technology justinian.rosca@siemens.com Karthik Narasimhan Princeton University karthikn@princeton.edu Peter J. Ramadge Princeton University ramadge@princeton.edu
Pseudocode Yes Algorithm 1 Projection-Based Constrained Policy Optimization (PCPO)
Open Source Code Yes For code see the project website: https://sites.google.com/view/iclr2020-pcpo
Open Datasets Yes The first two tasks Gather and Circle are Mujoco environments with safety constraints introduced by Achiam et al. (2017) and two traffic management tasks with fairness constraints introduced by Vinitsky et al. (2018).
Dataset Splits No The paper mentions batch sizes and rollout lengths but does not explicitly provide training/validation/test dataset splits in the conventional sense for reproducibility.
Hardware Specification No No specific hardware (e.g., GPU models, CPU types, or cloud instance names) used for experiments is mentioned in the paper.
Software Dependencies No The experiments are implemented in rllab (Duan et al., 2016), a tool for developing and evaluating RL algorithms. No specific version numbers for rllab or other software dependencies are provided.
Experiment Setup Yes The hyperparameters of each task for all algorithms are as follows (PC: point circle, PG: point gather, AC: ant circle, AG: ant gather, Gr: grid, and BN: bottleneck tasks): Parameter PC PG AC AG Gr BN discount factor γ 0.995 0.995 0.995 0.995 0.999 0.999 step size δ 10 4 10 4 10 4 10 4 10 4 10 4 λGAE R 0.95 0.95 0.95 0.95 0.97 0.97 λGAE C 1.0 1.0 0.5 0.5 0.5 1.0 Batch size 50,000 50,000 100,000 100,000 10,000 25,000 Rollout length 50 15 500 500 400 500 Cost constraint threshold h 5 0.1 10 0.2 0 0